Discussion:
Route aggregation in BIRD - how?
Maciej Wierzbicki
2012-01-23 16:01:08 UTC
Permalink
Hello.

Case study:
* importing full BGP table from various uplinks
* some routes received by BGP are being exported via OSPF to core1,
using filters:
(source = RTS_BGP && bgp_path ~ [= * ASNXYZ * =])

Question: how to aggregate routes (whenever possible) before exporting
them via OSPF to core?

Example: lets say that I've received a.b.c.d/22 and a.b.c.d/24 from
asnXYZ via BGP. Lets say that I want to export routes with asnXYZ in
aspath via ospf to core1 switch. Obviously, a.b.c.d/24 is in a.b.c.d/22,
so I would like to export only a.b.c.d/22. Is it doable in bird? If yes,
any hint/keywords in doc?
--
* Maciej Wierzbicki * At paranoia's poison door *
* VOO1-RIPE *
Alexander V. Chernikov
2012-01-23 16:14:28 UTC
Permalink
Post by Maciej Wierzbicki
Hello.
* importing full BGP table from various uplinks
* some routes received by BGP are being exported via OSPF to core1,
(source = RTS_BGP && bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting
them via OSPF to core?
It is not possible currently.

I'm working on BGP route aggregation and I plan to get more or less
working code at the end of this week.
Post by Maciej Wierzbicki
Example: lets say that I've received a.b.c.d/22 and a.b.c.d/24 from
asnXYZ via BGP. Lets say that I want to export routes with asnXYZ in
aspath via ospf to core1 switch. Obviously, a.b.c.d/24 is in a.b.c.d/22,
so I would like to export only a.b.c.d/22. Is it doable in bird? If yes,
any hint/keywords in doc?
Ondrej Zajicek
2012-01-23 18:00:22 UTC
Permalink
Post by Alexander V. Chernikov
Post by Maciej Wierzbicki
Hello.
* importing full BGP table from various uplinks
* some routes received by BGP are being exported via OSPF to core1,
(source = RTS_BGP && bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting
them via OSPF to core?
It is not possible currently.
I'm working on BGP route aggregation and I plan to get more or less
working code at the end of this week.
Do you plan to integrate it to the BGP protocol? I don't think it is
a good idea. It would be easy to make generic route aggregation -
'virtual' protocol similar to static, which generates aggregate routes
based on its config and received routes.
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Alexander V. Chernikov
2012-01-23 18:30:48 UTC
Permalink
Post by Ondrej Zajicek
Post by Alexander V. Chernikov
Post by Maciej Wierzbicki
Hello.
* importing full BGP table from various uplinks
* some routes received by BGP are being exported via OSPF to core1,
(source = RTS_BGP&& bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting
them via OSPF to core?
It is not possible currently.
I'm working on BGP route aggregation and I plan to get more or less
working code at the end of this week.
Do you plan to integrate it to the BGP protocol? I don't think it is
This is separate protocol, of course.
Post by Ondrej Zajicek
a good idea. It would be easy to make generic route aggregation -
My first idea was to implement generic aggregation protocol.
However, do we really need it generic?
Currently we have bunch of link-state protocols (ISIS / OSPF) which are
pure singletons, and, even if not we probably don't want to make summary
routes between instances. RIP[ng] is RIP(c). There are also some
multicast protocols but is is far-far away. Not sure if we should permit
route aggregation from different protocol types.
Post by Ondrej Zajicek
'virtual' protocol similar to static, which generates aggregate routes
based on its config and received routes
Yes. I personally see this as following:

protocol abgp agg1 {
aggregate address 1.2.3.0/24;
aggregate address 1.2.4.0/24 save attributes (or other keywords); #
Aggregate as much attributes as possible ( see RFC 4271 9.2.2.2. )
#
http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094826.shtml
aggregate address 2.3.4.5.0/24 summary only;
aggregate address 3.4.5.0/24 mandatory list { 3.4.5.1/32, 3.4.5.8/29};
import filter somefilter; # Change summary route attributes
}

protocol bgp bgp1 {
..
aggregator agg1;
}

Rte's matching 'summary only' instances have to be modified (no-export
community have to be added to community attribute) by aggregator before
passing them to rte_update

Mandatory list is a list of routes which have to exist before summary
route is announced.

[BGP] protocols using aggregator will call rte_update_agg() instead of
usual rte_update()

Aggregator stores its summary and mandatory routes in modified f_trie.
(
I think, there is no need to import/implement another tree if we can
modify current implementation:
e.g. use regular pools (flag passed to f_new_trie, along with node_size)
and add trie_remove_prefix
)


Possibly we have to implement some kind of lazy protocol name resolving.
I mean, add all "aggregator $proto_name" entries to linked-list with
file/line data and do symbol lookup after configuration parsing is
finished calling modified cf_error() if lookup fails.


Btw, I've got small patch from my previous approach, it moves default
protocol preference to struct protocol and assigns it in
proto_config_new instead of assigning it in every protocol manually.
Maybe it is a good candidate for the next commit? :)

--
Alexander V. Chernikov
Yandex NOC
Ondrej Zajicek
2012-01-24 11:11:48 UTC
Permalink
Post by Alexander V. Chernikov
Post by Ondrej Zajicek
Post by Alexander V. Chernikov
Post by Maciej Wierzbicki
Hello.
* importing full BGP table from various uplinks
* some routes received by BGP are being exported via OSPF to core1,
(source = RTS_BGP&& bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting
them via OSPF to core?
It is not possible currently.
I'm working on BGP route aggregation and I plan to get more or less
working code at the end of this week.
Do you plan to integrate it to the BGP protocol? I don't think it is
This is separate protocol, of course.
Post by Ondrej Zajicek
a good idea. It would be easy to make generic route aggregation -
My first idea was to implement generic aggregation protocol.
However, do we really need it generic?
Currently we have bunch of link-state protocols (ISIS / OSPF) which are
pure singletons, and, even if not we probably don't want to make summary
routes between instances. RIP[ng] is RIP(c). There are also some
multicast protocols but is is far-far away. Not sure if we should permit
route aggregation from different protocol types.
My idea is to to make this independent of a source protocol of
aggregated routes. So the question would be: Is there any advantage to
make it specific? Generally, the aggregator would accept any routes and
generates a new one, without any protocol specific attributes. There may
be an option for processing BGP attributes and generating proper BGP
attributes, but i guess this is not essential.

I guess most users would want either aggregate received BGP routes to
generate the default (or some more specific) routes for IGP (for this
usage pattern it would be useful to have option for mandatory minimal
number of routes to originate aggregate), or aggregate their IGP routes
for origination to BGP.
Post by Alexander V. Chernikov
protocol abgp agg1 {
aggregate address 1.2.3.0/24;
aggregate address 1.2.4.0/24 save attributes (or other keywords); #
Aggregate as much attributes as possible ( see RFC 4271 9.2.2.2. )
#
http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094826.shtml
aggregate address 2.3.4.5.0/24 summary only;
aggregate address 3.4.5.0/24 mandatory list { 3.4.5.1/32, 3.4.5.8/29};
import filter somefilter; # Change summary route attributes
}
protocol bgp bgp1 {
..
aggregator agg1;
}
Rte's matching 'summary only' instances have to be modified (no-export
community have to be added to community attribute) by aggregator before
passing them to rte_update
What about just having an aggregator protocol connected directly to a
table? Routes would be received from a table and aggregated ones are put
to the same table. This would work well for BGP->IGP and IGP->BGP
aggregation, where we do not care for non-aggregated routes. Not sure
about BGP->BGP transit, but i guess it could also work, or aggegator
could work like a pipe. Not sure if my explanation is clear enough, i
could add some examples.
Post by Alexander V. Chernikov
Aggregator stores its summary and mandatory routes in modified f_trie.
(
I think, there is no need to import/implement another tree if we can
e.g. use regular pools (flag passed to f_new_trie, along with node_size)
and add trie_remove_prefix
OK
Post by Alexander V. Chernikov
Btw, I've got small patch from my previous approach, it moves default
protocol preference to struct protocol and assigns it in
proto_config_new instead of assigning it in every protocol manually.
Maybe it is a good candidate for the next commit? :)
Merged.
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Maciej Wierzbicki
2012-01-24 16:21:16 UTC
Permalink
Post by Ondrej Zajicek
I guess most users would want either aggregate received BGP routes to
generate the default (or some more specific) routes for IGP (for this
usage pattern it would be useful to have option for mandatory minimal
number of routes to originate aggregate)
FWIW, that is exactly what I'd like to have - aggregation of some routes
(based on source/transit ASN) received from BGP, and pass them
aggregated via OSPF somewhere else.
--
* Maciej Wierzbicki * At paranoia's poison door *
* VOO1-RIPE *
Alexander V. Chernikov
2012-01-24 22:12:15 UTC
Permalink
Post by Ondrej Zajicek
Post by Alexander V. Chernikov
Post by Ondrej Zajicek
Post by Alexander V. Chernikov
Post by Maciej Wierzbicki
Hello.
* importing full BGP table from various uplinks
* some routes received by BGP are being exported via OSPF to core1,
(source = RTS_BGP&& bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting
them via OSPF to core?
It is not possible currently.
I'm working on BGP route aggregation and I plan to get more or less
working code at the end of this week.
Do you plan to integrate it to the BGP protocol? I don't think it is
This is separate protocol, of course.
Post by Ondrej Zajicek
a good idea. It would be easy to make generic route aggregation -
My first idea was to implement generic aggregation protocol.
However, do we really need it generic?
Currently we have bunch of link-state protocols (ISIS / OSPF) which are
pure singletons, and, even if not we probably don't want to make summary
routes between instances. RIP[ng] is RIP(c). There are also some
multicast protocols but is is far-far away. Not sure if we should permit
route aggregation from different protocol types.
My idea is to to make this independent of a source protocol of
aggregated routes. So the question would be: Is there any advantage to
make it specific? Generally, the aggregator would accept any routes and
generates a new one, without any protocol specific attributes. There may
be an option for processing BGP attributes and generating proper BGP
attributes, but i guess this is not essential.
I guess most users would want either aggregate received BGP routes to
generate the default (or some more specific) routes for IGP (for this
usage pattern it would be useful to have option for mandatory minimal
number of routes to originate aggregate), or aggregate their IGP routes
for origination to BGP.
Post by Alexander V. Chernikov
protocol abgp agg1 {
aggregate address 1.2.3.0/24;
aggregate address 1.2.4.0/24 save attributes (or other keywords); #
Aggregate as much attributes as possible ( see RFC 4271 9.2.2.2. )
#
http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080094826.shtml
aggregate address 2.3.4.5.0/24 summary only;
aggregate address 3.4.5.0/24 mandatory list { 3.4.5.1/32, 3.4.5.8/29};
import filter somefilter; # Change summary route attributes
}
protocol bgp bgp1 {
..
aggregator agg1;
}
Rte's matching 'summary only' instances have to be modified (no-export
community have to be added to community attribute) by aggregator before
passing them to rte_update
What about just having an aggregator protocol connected directly to a
table? Routes would be received from a table and aggregated ones are put
to the same table. This would work well for BGP->IGP and IGP->BGP
Yes, if we assume protocol instance can aggregate any routes we can use
singe instance per rtable and add pointer to it inside struct rtable.
First disadvantage is that protocol lost last bits of knowledge about
aggregation. I mean, in this approach we will do tree checks for every
best rte regardless of protocol type. We still have to add
no-export community to matching rtes for supporting summary-only routes
(nearly the same task as in discussion about LDP.label attribute).
The second one is summarized attributes:
1) We have to announce summary rte with some source (RTS_). If we're
aggregating BGP routes it is BGP. And if we get 3 BGP and 1 OSPF ?
We can't detertmine it reliably.
2) If we try to save attributes as much as possible (AS-PATH / AS-SET,
even communities) we can end with route having both OSPF metric, for
example and BGP attributes.

So in this approach we have to change syntax to

proto aggregator agg1 {
aggregate-address bgp 1.2.3.0/24;
}

or even

proto aggregator agg1 {
bgp {
aggregate-address 1.2.3.0/24;
};
ospf {
..
}
}
Post by Ondrej Zajicek
aggregation, where we do not care for non-aggregated routes. Not sure
about BGP->BGP transit, but i guess it could also work, or aggegator
could work like a pipe. Not sure if my explanation is clear enough, i
could add some examples.
Post by Alexander V. Chernikov
Aggregator stores its summary and mandatory routes in modified f_trie.
(
I think, there is no need to import/implement another tree if we can
e.g. use regular pools (flag passed to f_new_trie, along with node_size)
and add trie_remove_prefix
OK
Post by Alexander V. Chernikov
Btw, I've got small patch from my previous approach, it moves default
protocol preference to struct protocol and assigns it in
proto_config_new instead of assigning it in every protocol manually.
Maybe it is a good candidate for the next commit? :)
Merged.
Thanks!

Maybe another simple patch? :)
It shows "UNNAMED" as filter name instead of <NULL> where protocol
filter is configured as input filter { ... };
(It is patch #2 from general protocol limiting patch tale)

Btw #2,
And. there is another patch that removes [nearly] all pipe hacks from
core, can you take a look at it? (I've sent both patches on 29 october
with last limiting patch version) (And I probably know now how can I fix
last FIXME)
Ondrej Zajicek
2012-02-27 10:56:08 UTC
Permalink
Post by Maciej Wierzbicki
Post by Maciej Wierzbicki
Hello.
Hello.
Long story short, beta for aggregation protocol is attached.
Thanks, i will look at it.
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Alexander V. Chernikov
2012-02-28 01:06:36 UTC
Permalink
Post by Ondrej Zajicek
Post by Maciej Wierzbicki
Post by Maciej Wierzbicki
Hello.
Hello.
Long story short, beta for aggregation protocol is attached.
Thanks, i will look at it.
Next version. Various protocol filters seems to be handled correctly now.
Alexander V. Chernikov
2012-08-14 20:01:54 UTC
Permalink
Post by Alexander V. Chernikov
Next version. Various protocol filters seems to be handled correctly now.
Another new version.

Major rewrite. Many bugfixes, cleaner code, documentation updated.
Ondrej Zajicek
2012-08-14 22:55:26 UTC
Permalink
Post by Alexander V. Chernikov
Post by Alexander V. Chernikov
Next version. Various protocol filters seems to be handled correctly now.
Another new version.
Major rewrite. Many bugfixes, cleaner code, documentation updated.
Hello.

The patch is probably broken - containing just the modified files,
not the new ones.
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Ondrej Zajicek
2013-01-16 00:55:21 UTC
Permalink
Version 7 :)
Changes
* Fix bug with using 0/0 and :: in aggregation protocol
* Fix mandatory lists not working before second rehash
* Improve docs and configuration example a bit
* Some code cleanups
Thanks to Robers Hass for his bugreport and testing.
Hello

I would like to merge the aggregator, but i have three related
conceptual problems with your patch:

(1) About third of the code is related to BGP attribute processing
according to RFC 4271 9.2.2.2 (esp. AS_PATH merging and AS_SET
construction) for 'save attributes' option, but this behavior is IMHO
mostly useless, as such behavior is deprecated, RFC 6472 recommends that
AS_SETs should not be generated and there is a RFC draft [*] that
even prohibits them. As this feature adds significant and unnecessary
complexity, i would like to remove it.

[*] http://tools.ietf.org/html/draft-kumari-deprecate-as-set-confed-set-01

(2) If i understand it correctly, your aggregator keeps a copy of all
received and aggregated routes in an internal fib, which is useful for
implementing (1), but unnecessarily if you just want to originate an
aggregate route if there are some matching routes (or required number of
mandatory routes). In that case you just need some counters for summary
routes and flags for mandatory routes. Current implementation is especially
memory wasteful in the common case 'aggregate full BGP feed to get a default
route'.

(3) Generated aggregate routes have attrs->proto->proto == proto_agg,
but attrs->source == RTS_BGP. What is a reason for that? It does not
make much sense to me, because it does not have true BGP route behavior
(for example rte_better would not compare it with others because of
attrs->proto->proto) so i see no reason why not to define RTS_AGGREGATOR
and use it. Even if we would like to generate BGP-style aggregated routes
with BGP_AGGREGATOR and BGP_ATOMIC_AGGR attributes.


There are some minor issues (e.g. there is probably a bug in
bgp_update_sumroute() in BGP_ATOMIC_AGGR generation - even if not
received from aggregated routes, this attribute should be generated if
BGP routes with non-empty AS_PATH are aggregated and 'save attributes' is
disabled), but these probably aren't problematic.


If you do not have strong objections, i would remove (1), do some
readjustment for (2), fix some minor issues and merge it. Question is
whether there is any need to have optional keeping a copy of child
routes in (2) for some sophisticated aggregation modes like (1), but
without (1), i currently don't see the need.
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Alexander V. Chernikov
2013-01-16 11:50:00 UTC
Permalink
This post might be inappropriate. Click to display it.
Ondrej Zajicek
2013-01-16 14:08:22 UTC
Permalink
Post by Alexander V. Chernikov
Post by Ondrej Zajicek
(2) If i understand it correctly, your aggregator keeps a copy of all
received and aggregated routes in an internal fib, which is useful for
Actually, not all: only routes which are more-specific for any summary
or mandatory ones.
Yes, that is why i wrote 'received and aggregated' and not just 'received'.
Post by Alexander V. Chernikov
Post by Ondrej Zajicek
implementing (1), but unnecessarily if you just want to originate an
aggregate route if there are some matching routes (or required number of
mandatory routes). In that case you just need some counters for summary
routes and flags for mandatory routes. Current implementation is especially
memory wasteful in the common case 'aggregate full BGP feed to get a default
route'.
Well, aggregating default without any filters is a quick and wrong way
of generating summary since
1) you have at least aggregate RTS_BGP routes only
I thought about aggregating just routes from one BGP feed/uplink.
Post by Alexander V. Chernikov
2) If ISP fails probably either session goes down or full-view
disappears leaving _some_ (IX or provider-own) routes.
3) Additionally, sometimes ISP can loose transit to foreign countries
leaving national routes intact
So I assume someone to configure filter with several stable prefixes
(most valuable from user point of view) while doing default aggregation.
In that case it is not so wasteful, but i think that something like
'import all from that uplink, check whether there are at least 100000 routes'
is probably simpler and better.
Post by Alexander V. Chernikov
In this case, memory usage is not so wasteful. Additionally, copying
attributes increments their refcount for the most cases (or am I wrong?).
You are right, but for routing tables, attributes consume about half of
used memory, fib nodes and struct rte's is the other half.
Post by Alexander V. Chernikov
We probably can consider implementing additional something like 'min
count XXX' summary route attribute to ease aggregating default route.
Yes.
Post by Alexander V. Chernikov
Post by Ondrej Zajicek
If you do not have strong objections, i would remove (1), do some
readjustment for (2), fix some minor issues and merge it. Question is
whether there is any need to have optional keeping a copy of child
routes in (2) for some sophisticated aggregation modes like (1), but
without (1), i currently don't see the need.
The only thing I see is generic FV compression to fit in small FIB, but
this requires additional amount of work to be done, so I'm not against
removing.
I think that for FIB compression completely different data structures
and algorithms would be needed, which is probably best handled by
completely different protocol implementation.
Post by Alexander V. Chernikov
Btw, I can do (1), (2) and implement this 'min count' stuff to further
move to the review-based approach instead of you constantly fixing my
mistakes yourself :)
OK. I just feel that some of these issues aren't real mistakes, but more
like me forcing my point of view, but if you doing it OK with you, it is
OK with me.

I have these suggestions/requests:

1) Do not keep copies of child routes, just several counters for a summary
route, updated on rt_notify().

2) Perhaps completely remove fib. Mandatory routes could be kept
in trie, which also allows some more sophisticated mandatory route
matching, like any route for 1.2.3.4/32. As tries have little
overhead, perhaps it could be a separate trie just for mandatory routes
for each aggregate route, that would probaly significantly simplify it.

3) There is no need for nested blocks in aggregator configuration.
thinks like AS and IP for BGP aggregation could be global protocol
options. If someone needs several different kinds of aggregation,
it is natural to just use separate aggregator protocols (which would
be usually already necessary because of different import filters).

4) There are some tricky parts in aggregator reconfiguration.
Configure soft with reload is probably currently unsafe w.r.t.
proper value of 'old' route and could cause counters to be
unsynchronized. Protocol restart is mostly OK, but would cause
unnecessary route flaps. Perhaps some kind of hybrid which resets
internal state before feed like restart, but gracefully
updates generated routes. Something like 'reconfiguration
by hidden restart and refeed', it is natural if most of your
inner state is already stored in tries allocated from config LP.

5) There would be two variants of aggregation, plain and BGP. The
difference is probably just in summary route attributes. Plain
aggregation generates routes with no optional attributes, BGP-style
aggregation generates routes with some BGP attributes (probably
BGP_ORIGIN, empty BGP_PATH, BGP_AGGREGATOR and possibly
BGP_ATOMIC_AGGR). BGP_ORIGIN should obviously depend on aggregated
routes, BGO_ATOMIC_AGGR should be here if at least one aggregated
route has non-zero BGP_PATH. This could be also handled by some
counters for separate classes of aggregated routes (IGP, unknown,
BGP, BGP with non-zero BGP_PATH).

6) Specifying ASN would be needed just for BGP variant. Specifying
IP for BGP_AGGREGATOR is probably unnecessary, could be based on router
ID.

7) As the BGP variant (and possible other future variants) depends on
aggregator data structures, it should be implemented in aggregator code,
not as a protocol callback, *_sumroute callbacks would be removed.

8) Note that there is a problem in bgp_import_control() that calls
bgp_create_attrs() for non-BGP routes, which overwrites BGP attributes
if they are already present. This could be considered a bug, we
should use current attributes if available.

9) It is question whether unreachable routes should be ignored. It makes
sense for BGP routes with unreachable nexthop, but not much for other kinds
of unreachable routes. I feel that perhaps current recursive route
behavior should be changed in a way that routes with unreachable nexthop
would be generaly more ignored, but that is not related to aggregator
issue.

10) Change 'aggregate address' to 'route'. It is shorter and it is essentially
a kind of triggered static route. Not to mention it is not address but prefix.

11) *_mroute symbols should be more like mand_route, _mroute symbols suggest
multicast routes

12) Perhaps change prefix (and filenames) from agg_ to aggr_ ?
Not important, bug aggr_ seems to me like much more fitting.

13) Name in struct protocol should be "Aggregator", like "Static" or "Kernel".

14) There are some remnants of 'summary only' in config.h and agg.h

This is probably all i recall now.
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Jérôme Nicolle
2012-08-15 17:59:18 UTC
Permalink
Post by Ondrej Zajicek
I guess most users would want either aggregate received BGP routes
to generate the default (or some more specific) routes for IGP (for
this usage pattern it would be useful to have option for mandatory
minimal number of routes to originate aggregate), or aggregate
their IGP routes for origination to BGP.
I'd have great use of such an agregator, mainly to re-use older
hardware routers with limited FIB/TCAM size.

Beeing too clueless to code it, I'd just give some spec :
- - Take a full BGP table as an input
- - Provides customizable aggregation to a target route count
- - May agregate on the following criterias :
* Same AS-path (with or without stripping prepending)
* Same next-hop
* Every attributes matching (only the prefix lenght differs but the
longer is contained in the shorter, therefore only the less specific
prefix must remain)

Agregation could be destructive or not, meaning it will default to a
lower possible route-count without discarding any routing-policy, but
it could be used to reduce the route-count to less than 32k routes,
and therefore only match on the next-hop attribute, not taking care of
the originating AS (using upstream's), considering the best path
selection already occured in the source table.

BGP feeds -> single BGP RIB -> agregator -> minimized RIB -> iBGP to HWR
(best paths selected)

The 32k route limit is intended to use a routing switch as a faster
forwarding plane that a small X86 box could have. Session between the
agregated table and the routing switch could be established using
either iBGP, eBGP on private AS, RIP or OSPF (for L3 switches with no
BGP support).

Some other decent hardware (Fondry BigIron or Cisco SUP32) may
accomodate up to 256k routes approx and could really benefit from such
feature, getting them back to usable state for a quarter of their 1M
routes TCAM counterparts' price tag. It looks like a better solution
to me than filtering to /21, adding a few default routes and hoping
for the best.

The major issue with such setup would be to accomodate upstream's
requirements (BGP session is usualy end-to-end, on-band, and this new
setup requires eBHP multihop or NAT on the data-plane to work).

Maybe this could also be a path to using BIRD as an OpenFlow
control-plane ?

best regards,

- --
Jérôme Nicolle
+33 (0)6 19 31 27 14
Ondrej Zajicek
2012-08-15 20:04:22 UTC
Permalink
Post by Jérôme Nicolle
* Same AS-path (with or without stripping prepending)
* Same next-hop
* Every attributes matching (only the prefix lenght differs but the
longer is contained in the shorter, therefore only the less specific
prefix must remain)
Don't really understand this. For that purpose you could aggregate
everything with the same next-hop, other attributes are irrelevant. We
could think about it not as an BGP route aggregation, but as a
generation of completely new table that is more compact but equivalent
w.r.t. packet forwarding.
Post by Jérôme Nicolle
Agregation could be destructive or not, meaning it will default to a
lower possible route-count without discarding any routing-policy, but
it could be used to reduce the route-count to less than 32k routes,
and therefore only match on the next-hop attribute, not taking care of
the originating AS (using upstream's), considering the best path
selection already occured in the source table.
BGP feeds -> single BGP RIB -> agregator -> minimized RIB -> iBGP to HWR
(best paths selected)
The 32k route limit is intended to use a routing switch as a faster
forwarding plane that a small X86 box could have. Session between the
agregated table and the routing switch could be established using
either iBGP, eBGP on private AS, RIP or OSPF (for L3 switches with no
BGP support).
Optimal solution would be iBGP to distribute aggregated table and OSPF
running on these routers/L3 switches as an IGP. OSPF could be probably
also used, but that would have some problems (esp. OSPF does not work
very well with too many LSAs - slow propagation because of simple
send-acknowledge model).

BTW, do you have an idea what capabilities these 'cheaper' L3 switches
usually have? (i.e. how many routes they support, and what routing
protocols they support - i have no experience with them and i heard
that these usually support just static routes)
Post by Jérôme Nicolle
Some other decent hardware (Fondry BigIron or Cisco SUP32) may
accomodate up to 256k routes approx and could really benefit from such
feature, getting them back to usable state for a quarter of their 1M
routes TCAM counterparts' price tag. It looks like a better solution
to me than filtering to /21, adding a few default routes and hoping
for the best.
The major issue with such setup would be to accomodate upstream's
requirements (BGP session is usualy end-to-end, on-band, and this new
setup requires eBHP multihop or NAT on the data-plane to work).
This could be probably handled by having BGP X86 router 'outside'
of network (i.e. on L2 network connecting upstream and 'fast'
router), that could be done if your 'fast' router / L3 switch
allows you to bridge ports to upstream and to BGP router.
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Jérôme Nicolle
2012-08-15 21:17:27 UTC
Permalink
This post might be inappropriate. Click to display it.
Alexander V. Chernikov
2012-08-16 07:40:04 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Ondrej Zajicek
Don't really understand this. For that purpose you could aggregate
everything with the same next-hop, other attributes are irrelevant.
We could think about it not as an BGP route aggregation, but as a
generation of completely new table that is more compact but
equivalent w.r.t. packet forwarding.
You're right, the simplier approach would be to agregate only on the
next-hop and strip every other atributes before sending to the
downstream router.
Still, it'd be usefull to try not to strip the originating AS in order
to use NetFlow agregation for trafic statistic per-AS. Obviously this
isn't possible on a 32k maximum route limit, it should still be
possible on a 200k route limit.
For 32k or less it seems that several static defaults for outgoing
traffic balancing is sufficient, since
it is not possible to do fine-grained route selection for all routes.
200k is much more promising:
11:19 [0] bmw# birdc 'show route primary where net.len < 24' | wc -l
198369

It is possible to fetch RIPE/APNIC/etc (or RADB) database on route
objects, filter more-specific route object,
and build long Aggregator {} configuration with "save attributes" part.
It will probably fit it less than, say, 150k and the rest can be used
for IGP and local IX tables.
Some (how much?) of the aggregated routes will have different paths
resulting in origin AS being hidden in AS_SET attribute.
If this number is significant I can improve AS_PATH merging algorithm to
save originating AS if possible.

However,
1) it needs testing to determine exact number
2) In future, IPv4 sub-blocks selling between ISPs (and non-ISPs) will
decrease effectiveness of such approach.
Even so, if the route-limit is reached, then stub ASes could be
stripped, leaving only source's transits in the path. A transit ASes
list coul be fed to the algorithm, in order to strip ASes reached via
these transits.
Post by Ondrej Zajicek
BTW, do you have an idea what capabilities these 'cheaper' L3
switches usually have? (i.e. how many routes they support, and what
routing protocols they support - i have no experience with them and
i heard that these usually support just static routes)
Typical switch would be a Cisco 3560G our 3750G : approx 32k routes and
BGP support. Lower end switches (HP, BDCOM) would lack a proper BGP
support (if any), but RIP could be used instead.
Just to clarify : a Cisco SUP32-8GE (intended for a 6500 series chassis)
is approx 650€ from a decent broker. SUP720-3B is 100€ more. It's
basically a castrated SUP720-3BXL (approx 2500€) : same forwarding
capacity, only a smaller TCAM. With 8 integrated GE ports and 225€ for
a 16 GBIC linecard, it gives us a 40 port 32Gbps capable router for
approx 1500€
Foundry's BigIron 4000 with jetcore route processors are still
capable of forwarding 20Gbps but can't stand more than 200k routes
approx. I saw most of them used as door-stops recently, one with a 16x1G
SFP can be found for less than 1000€.
So the goal could be to let small ISPs use these boxes to scale
bandwidth-wise from a software-routed network, without paying a few
extra k€ for up-to-date cisco gear. Discociating data and control plane
is also the only way to protect against DDoS.
Post by Ondrej Zajicek
This could be probably handled by having BGP X86 router 'outside'
of network (i.e. on L2 network connecting upstream and 'fast'
router), that could be done if your 'fast' router / L3 switch
allows you to bridge ports to upstream and to BGP router.
I see two possible setups, the proper one requiering a specific
configuration from the peer.
The "proper" way would be to establish two interconnexion (at least two
subnets on the same L2) : one for the session to the BIRD router, the
other for actual forwarding to the L3 switch. Most L3 switches are
either L2 on L3 to one port, so it'll require a basic L2 switch between
the peer and the router and L3 switch.
The "dirty" way would be to use a NAPT capable switch/router, to
redirect TCP/179 originating from the peer to the software router. This
should be supported on Cisco 6500 series. I need to get my hands on a
SUP32 to check this.
- --
Jérôme Nicolle
06 19 31 27 14
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAlAsEecACgkQbt+nwQamihtgdQCcCaUs4Ws16/OT4HZHhzcxu4Vn
VWIAnjk007ia3GYIp219JNy9FsHm/fLi
=hnUp
-----END PGP SIGNATURE-----
Jérôme Nicolle
2012-08-16 10:14:55 UTC
Permalink
Post by Alexander V. Chernikov
For 32k or less it seems that several static defaults for outgoing
traffic balancing is sufficient, since it is not possible to do
fine-grained route selection for all routes. 200k is much more
promising: 11:19 [0] bmw# birdc 'show route primary where net.len
< 24' | wc -l 198369
bird> show route count where net.len < 24
197699 of 416648 routes for 416648 networks

confirmed. More on that :
Loading Image...

But I wouldn't count that much on prefix-lenght based agregation :
many /24s are stubs or PI space used to multi-home smaller networks.
Post by Alexander V. Chernikov
It is possible to fetch RIPE/APNIC/etc (or RADB) database on route
objects, filter more-specific route object, and build long
Aggregator {} configuration with "save attributes" part. It will
probably fit it less than, say, 150k and the rest can be used for
IGP and local IX tables. Some (how much?) of the aggregated routes
will have different paths resulting in origin AS being hidden in
AS_SET attribute. If this number is significant I can improve
AS_PATH merging algorithm to save originating AS if possible.
I wouldn't count on route objects, many are outdated, if any. But
cross-checking them could be a nice security feature, appart from
aggregation purposes. Could be linked to a developpment targeted at
full RPKI+ROA validator support in BIRD.
Post by Alexander V. Chernikov
However, 1) it needs testing to determine exact number 2) In
future, IPv4 sub-blocks selling between ISPs (and non-ISPs) will
decrease effectiveness of such approach.
You're right. But I consider it to be of minimal importance in
opposition to poorly managed ASes (AS6389 and AS28573 anyone ?)

On a purely algorithmic approach, I'd propose the following
strategies. Keep in mind this happens _after_ the best path selection
process in BIRD.

1) Strip prepending from AS_PATHs while copying routes in a B-tree

2) When creating a child in the B-tree, if the more specific has the
same AS_path (including Origin) than its parent, don't create it.

(note : when creating a child with no direct parent, you'd have to
create a virtual parent, hence marked as a non-existent route)

3) if a virtual parent has both clildren with the same AS path, strip
the children (don't get me wrong on that) and move their attributes to
the virtual parent, making it a "real" route.

There you get a naturally aggregated B-tree representing all routes
with a non-destructive FIB approach.

If the route count is still too high, you could do the following :

4) Build a secondary tree (not binary) representing AS relationships,
each node having a pointer to a list of pointer to routes known in the
route B-Tree

5) For "blacklisted ASes", meaning "known transit ASes with no special
interest in their customer cone", strip AS paths to make routes
originating from the listed AS and agregate the routes (hence the
pointer to list of pointers : way faster than recursing the B-tree)

OR (if you don't want to handle a blacklist, or in addition to it)

5bis) Shorten as_path to a maximum hop count, let's say 5 would be a
reasonable default limit for a well-connected network, and do the same
than 5). This strategy could be applied locally to large non-leaf
nodes in the AS-tree (ASes with more than, let's say, a few hundreds
of customer ASes) for maximum profit. On the contrary, it could also
be used to force the agregation of ASes with smaller customer cone,
considering them as marginal.

5ter) Same strategy as 5 and 5bis could be applied based on AS-sets.

(at this point it's still not really destructive regarding to respect
of the routing policy)

6) Recursing the AS-tree will let you find ASes with the largest
route-pointer lists, hence ASes with the largest route count. Counting
topmost entries in the B-tree will give you an approximate "potential
aggregation factor" disregarding the next-hop. There you may take the
step of overriding the next-hop to maximise agregation for that AS.
This I consider destructive.

7) Discrepency hunting : on a given route-B-tree branch, when two
"colors" (I first thought of it as colorizin the tree based on the
next-hop attribute) are highly dominant, and both AS groups (given the
initial best path selection policy), are reachable through both
"colors" (here again, read "next-hop"), then take the topmost node in
the route-B-tree and split it in half, each half with one hardcoded
color. Then mark them with private ASNs and write matching AS-sets to
the log output for debugging and trafic statistics purpose.

Many other policies could be writtent, I think it'd be all about
try-and-errors regarding their aggregation efficiency.

About the data structures, I thought of a B-tree but it could be
quaternary too. The most important thing would be to implement a
copy-on-write approach to routes : the initial tree has to be built
with pointers to the routes as stored in the initial table, it'd be
faster and more conservative (memory-wise). When a modification
happens to a route (attribute modification in the agregation process),
you will have to copy the modified route to a ne memory space and
rewrite the attribute

AS_paths attributes might be calculated from the AS-tree rather than
stored in litteral form. I guess this could be faster than stripping
prepending from the attribute string.

At this point, we don't seem to need any other attribute than prefix,
next-hop and AS_path. Community, MED and extended attributes might as
well be stripped in the modified instances of the routes. Those might
be ignored as well, and maybe the size of the resulting tree could
disqualify the copy-on-write approach due to its higher complexity in
regard to a lower interest in saving memory.




- --
Jérôme Nicolle
06 19 31 27 14
Jérôme Nicolle
2012-08-20 15:18:14 UTC
Permalink
By the way, if anyone can help me to implement this (meaning writing
code while I do the testing and debugging), my usual broker (Alturna)
offered me a free test chassis to actually run the agregation process
and see how a SUP32 or SUP720 reacts in real world.

I have 2 to 4 possible transit upstreams to test with in my area, so
basically we have every costy ressources at our disposal, I just
really need a talented developper to finally create a functionnal
route agregator.

Anyone ?
Post by Alexander V. Chernikov
For 32k or less it seems that several static defaults for
outgoing traffic balancing is sufficient, since it is not
possible to do fine-grained route selection for all routes. 200k
is much more promising: 11:19 [0] bmw# birdc 'show route primary
where net.len < 24' | wc -l 198369
bird> show route count where net.len < 24 197699 of 416648 routes
for 416648 networks
http://dedibox.nicolbolas.org/tmp/prefix-distribution.png
many /24s are stubs or PI space used to multi-home smaller
networks.
Post by Alexander V. Chernikov
It is possible to fetch RIPE/APNIC/etc (or RADB) database on
route objects, filter more-specific route object, and build long
Aggregator {} configuration with "save attributes" part. It will
probably fit it less than, say, 150k and the rest can be used
for IGP and local IX tables. Some (how much?) of the aggregated
routes will have different paths resulting in origin AS being
hidden in AS_SET attribute. If this number is significant I can
improve AS_PATH merging algorithm to save originating AS if
possible.
I wouldn't count on route objects, many are outdated, if any. But
cross-checking them could be a nice security feature, appart from
aggregation purposes. Could be linked to a developpment targeted
at full RPKI+ROA validator support in BIRD.
Post by Alexander V. Chernikov
However, 1) it needs testing to determine exact number 2) In
future, IPv4 sub-blocks selling between ISPs (and non-ISPs) will
decrease effectiveness of such approach.
You're right. But I consider it to be of minimal importance in
opposition to poorly managed ASes (AS6389 and AS28573 anyone ?)
On a purely algorithmic approach, I'd propose the following
strategies. Keep in mind this happens _after_ the best path
selection process in BIRD.
1) Strip prepending from AS_PATHs while copying routes in a B-tree
2) When creating a child in the B-tree, if the more specific has
the same AS_path (including Origin) than its parent, don't create
it.
(note : when creating a child with no direct parent, you'd have to
create a virtual parent, hence marked as a non-existent route)
3) if a virtual parent has both clildren with the same AS path,
strip the children (don't get me wrong on that) and move their
attributes to the virtual parent, making it a "real" route.
There you get a naturally aggregated B-tree representing all
routes with a non-destructive FIB approach.
4) Build a secondary tree (not binary) representing AS
relationships, each node having a pointer to a list of pointer to
routes known in the route B-Tree
5) For "blacklisted ASes", meaning "known transit ASes with no
special interest in their customer cone", strip AS paths to make
routes originating from the listed AS and agregate the routes
(hence the pointer to list of pointers : way faster than recursing
the B-tree)
OR (if you don't want to handle a blacklist, or in addition to it)
5bis) Shorten as_path to a maximum hop count, let's say 5 would be
a reasonable default limit for a well-connected network, and do the
same than 5). This strategy could be applied locally to large
non-leaf nodes in the AS-tree (ASes with more than, let's say, a
few hundreds of customer ASes) for maximum profit. On the contrary,
it could also be used to force the agregation of ASes with smaller
customer cone, considering them as marginal.
5ter) Same strategy as 5 and 5bis could be applied based on
AS-sets.
(at this point it's still not really destructive regarding to
respect of the routing policy)
6) Recursing the AS-tree will let you find ASes with the largest
route-pointer lists, hence ASes with the largest route count.
Counting topmost entries in the B-tree will give you an approximate
"potential aggregation factor" disregarding the next-hop. There you
may take the step of overriding the next-hop to maximise agregation
for that AS. This I consider destructive.
7) Discrepency hunting : on a given route-B-tree branch, when two
"colors" (I first thought of it as colorizin the tree based on the
next-hop attribute) are highly dominant, and both AS groups (given
the initial best path selection policy), are reachable through
both "colors" (here again, read "next-hop"), then take the topmost
node in the route-B-tree and split it in half, each half with one
hardcoded color. Then mark them with private ASNs and write
matching AS-sets to the log output for debugging and trafic
statistics purpose.
Many other policies could be writtent, I think it'd be all about
try-and-errors regarding their aggregation efficiency.
About the data structures, I thought of a B-tree but it could be
quaternary too. The most important thing would be to implement a
copy-on-write approach to routes : the initial tree has to be
built with pointers to the routes as stored in the initial table,
it'd be faster and more conservative (memory-wise). When a
modification happens to a route (attribute modification in the
agregation process), you will have to copy the modified route to a
ne memory space and rewrite the attribute
AS_paths attributes might be calculated from the AS-tree rather
than stored in litteral form. I guess this could be faster than
stripping prepending from the attribute string.
At this point, we don't seem to need any other attribute than
prefix, next-hop and AS_path. Community, MED and extended
attributes might as well be stripped in the modified instances of
the routes. Those might be ignored as well, and maybe the size of
the resulting tree could disqualify the copy-on-write approach due
to its higher complexity in regard to a lower interest in saving
memory.
- --
Jérôme Nicolle
06 19 31 27 14
d***@epicup.com
2012-01-23 18:11:41 UTC
Permalink
Isn't there a case when you DON'T want the links aggregated? If you have servers in 2 different data centers, both announcing a 1.1.1.1/20, but one also announces 1.1.1.1/22, and the other also announces 1.1.5.1/22, so the one data center tends to serve 1.1.1.1-1.1.4.255, and the other data center tends to serve 1.1.5.1-1.1.8.255, but if either data center goes down, the whole address range will be served by the other data center?

I don't have this setup, but was considering it at some point.

More of a curiosity question.


-----Original Message-----
From: "Alexander V. Chernikov" <***@yandex-team.ru>
Sent: Monday, January 23, 2012 11:14am
To: "Maciej Wierzbicki" <voovoos-***@killfile.pl>
Cc: bird-***@network.cz
Subject: Re: Route aggregation in BIRD - how?
Post by Maciej Wierzbicki
Hello.
* importing full BGP table from various uplinks
* some routes received by BGP are being exported via OSPF to core1,
(source = RTS_BGP && bgp_path ~ [= * ASNXYZ * =])
Question: how to aggregate routes (whenever possible) before exporting
them via OSPF to core?
It is not possible currently.

I'm working on BGP route aggregation and I plan to get more or less
working code at the end of this week.
Post by Maciej Wierzbicki
Example: lets say that I've received a.b.c.d/22 and a.b.c.d/24 from
asnXYZ via BGP. Lets say that I want to export routes with asnXYZ in
aspath via ospf to core1 switch. Obviously, a.b.c.d/24 is in a.b.c.d/22,
so I would like to export only a.b.c.d/22. Is it doable in bird? If yes,
any hint/keywords in doc?
Jérôme Nicolle
2012-08-15 18:15:23 UTC
Permalink
Post by d***@epicup.com
Isn't there a case when you DON'T want the links aggregated? If you
have servers in 2 different data centers, both announcing a 1.1.1.1/20,
but one also announces 1.1.1.1/22, and the other also announces
1.1.5.1/22, so the one data center tends to serve 1.1.1.1-1.1.4.255, and
the other data center tends to serve 1.1.5.1-1.1.8.255, but if either
data center goes down, the whole address range will be served by the
other data center?
On a user's perspective, I guess it has no difference to hit one
datacenter or the other. Seeing only the shortest prefix will make your
upstream choose wich one to hit.
--
Jérôme Nicolle
+33 (0)6 19 31 27 14
Loading...