Discussion:
Shutting down BGP neighbor causes high CPU and many session flaps.
Javor Kliachev
2014-09-18 12:27:21 UTC
Permalink
Hello,

We use bird 1.4.2 as route server with multiple RIBs with ~100 BGP
active sessions.
Over one of these sessions, we're receiving ~ 360k prefixes and
re-announcing them to all other sessions.

By my calculations the total amount of all prefixes in all RIBs is about
~ 3600000 and till now everything was OK.

*But today we have experienced the following issue: *

When we stopped the session that we received ~360k, BIRD daemon went to
100% usage and held this behavior for
period of~5-6 min. This event caused а lot of ( but not all ) sessions
to start flapping.

Here is the output taken from our log during the event:

Sep 18 10:40:51 rs2 bird: R0_69: Received: Hold timer expired
Sep 18 10:40:51 rs2 bird: R0_69: BGP session closed
Sep 18 10:40:52 rs2 bird: R0_69: Down
Sep 18 10:41:11 rs2 bird: R0_69: Startup delayed by 60 seconds
Sep 18 10:41:45 rs2 bird: R0_69: Incoming connection from 10.0.0.69
(port 13073) rejected
Sep 18 10:41:59 rs2 bird: R0_69: Started
Sep 18 10:42:36 rs2 bird: R0_69: Incoming connection from 10.0.0.69
(port 24222) accepted
Sep 18 10:42:36 rs2 bird: R0_69: BGP session established

The above lines was repeated for all other affected sessions.

After the CPU peak of 5 min, all affected sessions became UP again.

Our CPU arch is 2 x Intel(R) Xeon(R) CPU E5310 1.6Ghz with 4 cores.

I know BIRD that still may use only 1 CPU core but is there any plans to
use more?

We will highly appreciate for any suggestions or share some experience
how to prevent in future such event.

Thanks in advance!

Best~
--
---
Find out about our new Cloud service - Cloudware.bg
<http://cloudware.bg/?utm_source=email&utm_medium=signature&utm_content=link&utm_campaign=newwebsite>
Access anywhere. Manage it yourself. Pay as you go.
------------------------------------------------------------------------
*Javor Kliachev*
IP Engineer

Neterra Ltd.
Telephone: +359 2 975 16 16
Fax: +359 2 975 34 36
www.neterra.net <http://www.neterra.net>
Ondrej Zajicek
2014-09-19 10:50:01 UTC
Permalink
Post by Javor Kliachev
Hello,
We use bird 1.4.2 as route server with multiple RIBs with ~100 BGP active
sessions.
Over one of these sessions, we're receiving ~ 360k prefixes and
re-announcing them to all other sessions.
By my calculations the total amount of all prefixes in all RIBs is about ~
3600000 and till now everything was OK.
*But today we have experienced the following issue: *
When we stopped the session that we received ~360k, BIRD daemon went to 100%
usage and held this behavior for
period of~5-6 min. This event caused а lot of ( but not all ) sessions to
start flapping.
Sep 18 10:40:51 rs2 bird: R0_69: Received: Hold timer expired
Sep 18 10:40:51 rs2 bird: R0_69: BGP session closed
Sep 18 10:40:52 rs2 bird: R0_69: Down
Sep 18 10:41:11 rs2 bird: R0_69: Startup delayed by 60 seconds
Sep 18 10:41:45 rs2 bird: R0_69: Incoming connection from 10.0.0.69 (port
13073) rejected
Sep 18 10:41:59 rs2 bird: R0_69: Started
Sep 18 10:42:36 rs2 bird: R0_69: Incoming connection from 10.0.0.69 (port
24222) accepted
Sep 18 10:42:36 rs2 bird: R0_69: BGP session established
The above lines was repeated for all other affected sessions.
Hello

Thanks for the bugreport. Could you send me the config file, the whole
log and information when exactly BIRD went to 100% and then back?

Even in permanent 100% CPU load, BIRD shouldn't miss timers for sending
keepalive packets.
--
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Loading...