Discussion:
HOWTO: Learning recursive routes from kernel protocol
Сергей Попович
2013-05-13 14:13:16 UTC
Permalink
Deploying VLAN per user with IP unnumbered schema using Linux as access server
we face with following problem:

  BIRD's kernel protocol does not learn routes with nexthop, that resolves trought
  another route (recursive routes).

This is reproducable for both IPv4 and IPv6.
----------------------------------------------------------

While for IPv4 this has minimal impact, as typically customer gets its /32 ip address and
entire LAN part of the customer connection uses addresses from RFC1918 and NAT.

For IPv6 this becomes more complicated:
  - there is no NAT in such view as with IPv4, (ok)
  - as with IPv4 customer gets one IPv6 address on its WAN interface (ok)
  - customer gets additional IPv6 block with 64 long prefix for its LAN interface (fail)

So for typical IPv6 deplyment with IP unnumbered schema we need:
# ip -6 route add fd11::2/128 dev vlan10 proto static src fd11::1
# ip -6 route add fd22::/64 dev vlan10 via fd11::2 proto static src fd11::1
# birdc 'show route filter { if proto = kernel254 then accept; reject; }'
BIRD 1.3.9 ready.
fd11::2/128 dev qinq22.226 [kernel254 12:57] * (10)

Last route MUST be advertised using dynamic routing protocol (BGP in our case).

Example of such routes for both IPv4 and IPv6 is:
------------------------------------------------------------------

# ip -4 route add 192.168.1.2/32 dev vlan10 proto static src 192.168.1.1
# ip -4 route add 10.0.1.0/24 dev vlan10 via 192.168.1.1 proto static src 192.168.1.1
# birdc 'show route filter { if net = 10.0.1.0/24 then accept; reject; }'
BIRD 1.3.9 ready.

# ip -6 route add fd8e:579a:623c:826a::2/128 dev vlan10 proto static src fd8e:579a:623c:826a::1
# ip -6 route add fd8e:579a:623c:ffff::/64 dev vlan10
# birdc 'show route filter { if net = fd8e:579a:623c:ffff::/64 then accept; reject; }'

Kernel protocol configuration
---------------------------------------
protocol kernel kernel254 {
    persist no;
    scan time 120;
    learn yes;
    device routes no;
    kernel table ipt_main;
    import filter {
        # Import only 'static' routes
        if krt_source != ipp_static then
            reject;
        accept;
    };
    export all;
}

Workaround
-----------------
Use static protocol with routes not learned from kernel.

protocol static static254 {
  route fd8e:579a:623c:ffff::/64 drop;
}

Any ideas about solution/other workarouns for this problem are welcome.

--
SP5474-RIPE
Sergey Popovich
Ondrej Zajicek
2013-05-13 19:11:25 UTC
Permalink
Deploying VLAN per user with IP unnumbered schema using Linux as access server
  BIRD's kernel protocol does not learn routes with nexthop, that resolves trought
  another route (recursive routes).
..
Any ideas about solution/other workarouns for this problem are welcome.
Hello

This is essentially a sanity check for validity of next hops. On IPv4,
you could disable it by using 'onlink' option for a route, but it seems
that Linux does not support 'onlink' option for IPv6. The simplest
workaround would be just to disable the check with attached patch.

BIRD does not really need the check, it is enough when there is an
explicitly specified iface in the route. But i guess there would be many
compatibility issues with such kinds of routes - for example, IPv6 Linux
(as i just tested) requires that 'parent' route have to be inserted
before 'child' route, while BIRD kernel protocol does not enforce any
order when exporting routes to kernel.


BTW, it seems that such routes are not really much recursive, if i do
this sequence of commands:

ip -6 route add fd11::2/128 dev eth0
ip -6 route add fd22::/64 via fd11::2
ip -6 route del fd11::2/128 dev eth0
ip -6 route add fd11::2/128 dev eth1

Then kernel reports that fd22::/64 points still to eth0.
# ip -6 route add fd11::2/128 dev vlan10 proto static src fd11::1
# ip -6 route add fd22::/64 dev vlan10 via fd11::2 proto static src fd11::1
BTW, why not to use link-local addressess as a next-hop? That would also
solve the problem in a cleaner way. If you don't want to track automatic
(MAC-based) link-local addres, you could use preconfigured link-local
addresses (fe80::1/64, fe80::2/64) on the 'ptp' vlan. In that case, it
would be enough to assign /64 prefix for a client (no need to assign a
separate /128 IP).
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Sergey Popovich
2013-05-14 11:43:48 UTC
Permalink
Post by Ondrej Zajicek
Hello
This is essentially a sanity check for validity of next hops. On IPv4,
you could disable it by using 'onlink' option for a route, but it seems
that Linux does not support 'onlink' option for IPv6. The simplest
workaround would be just to disable the check with attached patch.
"onlink" option for ip-route(8) works as expected with IPv4, but current IPv6
implementation does not handle this option, agree.

For IPv6, attached patch works as expected and route learned into BIRD's
routing table from kernel FIB. Thanks for good workaround for current
Linux kernel IPv6 implementation weakness.
Post by Ondrej Zajicek
BIRD does not really need the check, it is enough when there is an
explicitly specified iface in the route. But i guess there would be many
compatibility issues with such kinds of routes - for example, IPv6 Linux
(as i just tested) requires that 'parent' route have to be inserted
before 'child' route, while BIRD kernel protocol does not enforce any
order when exporting routes to kernel.
BTW, it seems that such routes are not really much recursive, if i do
ip -6 route add fd11::2/128 dev eth0
ip -6 route add fd22::/64 via fd11::2
ip -6 route del fd11::2/128 dev eth0
ip -6 route add fd11::2/128 dev eth1
Then kernel reports that fd22::/64 points still to eth0.
Oh, this (and many other things with IPv6) is true, thank you againg for
pointing to this weakness.
Post by Ondrej Zajicek
# ip -6 route add fd11::2/128 dev vlan10 proto static src fd11::1
# ip -6 route add fd22::/64 dev vlan10 via fd11::2 proto static src fd11::1
BTW, why not to use link-local addressess as a next-hop? That would also
solve the problem in a cleaner way. If you don't want to track automatic
(MAC-based) link-local addres, you could use preconfigured link-local
addresses (fe80::1/64, fe80::2/64) on the 'ptp' vlan. In that case, it
would be enough to assign /64 prefix for a client (no need to assign a
separate /128 IP).
Good point.

As for me (and from point of our helpdesk) this solution has one big
disadvantage:
traceroutes, from external networks to customer network(s) will indicate
missing hop - customer gateway, configured with link-local address on its
WAN interface (ICMP Destination unreachable dropped by our access server).

There is other minor cases where link-layer address usage is not best choise:
- some users like to connect their PC directly to Internet (:-)) (or at
least do this to test connectivity and speed).
- some "network" OS'es in network equipment does not allow (or make things a
bit complicated) setting link-layer addresses.

Anyway Ondrej, thank you for taking time on my question and good solution
provided.
--
SP5474-RIPE
Sergey Popovich
Ondrej Zajicek
2013-05-14 19:21:06 UTC
Permalink
Post by Sergey Popovich
"onlink" option for ip-route(8) works as expected with IPv4, but current IPv6
implementation does not handle this option, agree.
For IPv6, attached patch works as expected and route learned into BIRD's
routing table from kernel FIB. Thanks for good workaround for current
Linux kernel IPv6 implementation weakness.
Well there is a problem that conceptual models of Linux IPv4, Linux
IPv6, FreeBSD IPv4, ... routing tables are slightly different in some
details and they are probably not really documented anywhere. BIRD tries
to match its conceptual model of routing tables to these, but the match
is probably not really exact.
Post by Sergey Popovich
Post by Ondrej Zajicek
BTW, why not to use link-local addressess as a next-hop? That would also
solve the problem in a cleaner way. If you don't want to track automatic
(MAC-based) link-local addres, you could use preconfigured link-local
addresses (fe80::1/64, fe80::2/64) on the 'ptp' vlan. In that case, it
would be enough to assign /64 prefix for a client (no need to assign a
separate /128 IP).
Good point.
As for me (and from point of our helpdesk) this solution has one big
traceroutes, from external networks to customer network(s) will indicate
missing hop - customer gateway, configured with link-local address on its
WAN interface (ICMP Destination unreachable dropped by our access server).
AFAIK this should not be a problem - In IPv6, gateway should use some other
global address (like one from /64 used on local network) as a source addr
for ICMP answers (or other its traffic), so there would be all hosts in
the traceroute output.
Post by Sergey Popovich
- some users like to connect their PC directly to Internet (:-)) (or at
least do this to test connectivity and speed).
- some "network" OS'es in network equipment does not allow (or make things a
bit complicated) setting link-layer addresses.
BTW, choosing CPE properly supporting IPv6 (including prefix delegation)
seems to be a nontrivial problem itself. One of my ideas about how to
provide IPv6 in a small wireless ISP was to configure clients' prefixes
in CPEs and use RIPng in CPEs to propagate it to an ISP router
(to get proper link-local next-hops here), with some validation in ISP router,
of course.
--
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: ***@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
Sergey Popovich
2013-05-15 12:05:36 UTC
Permalink
Post by Ondrej Zajicek
Post by Sergey Popovich
Good point.
As for me (and from point of our helpdesk) this solution has one big
traceroutes, from external networks to customer network(s) will indicate
missing hop - customer gateway, configured with link-local address on its
WAN interface (ICMP Destination unreachable dropped by our access server).
AFAIK this should not be a problem - In IPv6, gateway should use some other
global address (like one from /64 used on local network) as a source addr
for ICMP answers (or other its traffic), so there would be all hosts in
the traceroute output.
Really. This is true (tested using script in attachment).

One big advantage with LL addresses is support on default kernel without extra
patches to extend Proxy NDP functionality to degree of Proxy ARP in IPv4.

Modifications done by these patches are very likely to be declined by upstream
because they at least puts interface in mode where network driver accept all
multicast (see ip-link(8) allmulticast interface flag) traffic and do few
modifications to IPv6 stack to bypass checking of multicast IPv6 address (used
for ND). Later code performs uRPF checks to prevent address spoofing and thus
filling neighbour cache with entries in STALL/DELAY state (however this is
still possible with LL) and other checks before reply to proxy (actually
behavior mostly identical to Proxy ARP).
However this activated only when proxy_ndp switch is on.

Without patches each proxified address must be configured explicitely using
ip-neighbour(8). There is no problem to proxy address used as gateway on CPE,
but connectivity between addresses, allocated to CPE INET interface broken.

However some minor circumstances are still valid (customer, connects PC
directly to test their connectivity at least).

Using LL addresses looks more robust and convenient way. Thanks.
Post by Ondrej Zajicek
Post by Sergey Popovich
- some users like to connect their PC directly to Internet (:-)) (or at
least do this to test connectivity and speed).
- some "network" OS'es in network equipment does not allow (or make
things a>
bit complicated) setting link-layer addresses.
BTW, choosing CPE properly supporting IPv6 (including prefix delegation)
seems to be a nontrivial problem itself. One of my ideas about how to
provide IPv6 in a small wireless ISP was to configure clients' prefixes
in CPEs and use RIPng in CPEs to propagate it to an ISP router
(to get proper link-local next-hops here), with some validation in ISP
router, of course.
Looks working (again, CPE equipment and their RIPng implementation:-)).
However there other cases:
- security, RIPng incecure, i can suppose there is no CPE with security
option implemented (HMAC-MD5,...)
- filtering on ISP router (even with BIRD this adds more overhead)
- additional multicast on link (this gives unecessary multicast on large L2
segments)
- overall administrative overhead and troubleshooting by adding dynamic
routing part to customer connection.
--
SP5474-RIPE
Sergey Popovich
Loading...