linux/net
David Ahern a6db4494d2 net: ipv4: Consider failed nexthops in multipath routes
Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.

Example:

                     [h2]                   [h3]   15.0.0.5
                      |                      |
                     3|                     3|
                    [SP1]                  [SP2]--+
                     1  2                   1     2
                     |  |     /-------------+     |
                     |   \   /                    |
                     |     X                      |
                     |    / \                     |
                     |   /   \---------------\    |
                     1  2                     1   2
         12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
                     4                         4
                      \                       /
                        \                    /
                         \                  /
                          -------|   |-----/
                                 1   2
                                [TOR3]
                                  3|
                                   |
                                  [h1]  12.0.0.1

host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:

    root@h1:~# ip ro ls
    ...
    12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
    15.0.0.0/16
            nexthop via 12.0.0.2  dev swp1 weight 1
            nexthop via 12.0.0.3  dev swp1 weight 1
    ...

If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:

    root@h1:~# ip neigh show
    ...
    12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
    12.0.0.2 dev swp1  FAILED
    ...

The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.

To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:16:13 -04:00
..
6lowpan 6lowpan: iphc: fix SAM/DAM bit comment 2016-03-10 19:51:29 +01:00
9p net/9p: convert to new CQ API 2016-03-10 20:54:09 -05:00
802
8021q vlan: propagate gso_max_segs 2016-03-17 21:05:01 -04:00
appletalk appletalk: fix erroneous return value 2016-02-18 14:59:34 -05:00
atm
ax25 ax25: add link layer header validation function 2016-03-09 22:13:01 -05:00
batman-adv batman-adv: clarify CFG80211 dependency 2016-03-02 13:45:47 -05:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
bridge Revert "bridge: Fix incorrect variable assignment on error path in br_sysfs_addbr" 2016-04-06 15:42:45 -04:00
caif net: caif: fix misleading indentation 2016-03-14 13:09:50 -04:00
can sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
ceph mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
core Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
dcb
dccp net: introduce lockdep_is_held and update various places to use it 2016-04-07 16:44:14 -04:00
decnet net: add validation for the socket syscall protocol argument 2015-12-14 16:09:30 -05:00
dns_resolver
dsa net: dsa: make the VLAN add function return void 2016-04-08 16:50:41 -04:00
ethernet eth: Pull header from first fragment via eth_get_headlen 2016-02-24 13:58:05 -05:00
hsr
ieee802154 ieee802154: 6lowpan: fix return of netdev notifier 2016-02-23 20:29:40 +01:00
ipv4 net: ipv4: Consider failed nexthops in multipath routes 2016-04-11 15:16:13 -04:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
ipx
irda Merge 4.5-rc4 into tty-next 2016-02-14 14:36:04 -08:00
iucv af_iucv: Validate socket address length in iucv_sock_bind() 2016-01-19 14:21:08 -05:00
kcm kcm: Add receive message timeout 2016-03-09 16:36:15 -05:00
key
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
l3mdev net: l3mdev: address selection should only consider devices in L3 domain 2016-02-26 14:22:26 -05:00
lapb
llc af_llc: fix types on llc_ui_wait_for_conn 2016-02-17 16:12:13 -05:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
mpls mpls: find_outdev: check for err ptr in addition to NULL check 2016-04-08 12:43:20 -04:00
netfilter For the 4.6 cycle, we have a number of changes: 2016-04-08 16:42:31 -04:00
netlabel netlabel: do not initialise statics to NULL 2016-03-07 11:08:26 -05:00
netlink rhashtable: accept GFP flags in rhashtable_walk_init 2016-04-05 10:56:32 +02:00
netrom
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2016-03-28 15:38:59 -04:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
phonet sock: struct proto hash function may error 2016-02-11 03:54:14 -05:00
rds RDS: fix congestion map corruption for PAGE_SIZE > 4k 2016-04-07 16:58:28 -04:00
rfkill rfkill: Use switch to demux userspace operations 2016-04-05 10:48:53 +02:00
rose
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
sched cls_bpf: reset class and reuse major in da 2016-03-18 19:35:21 -04:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
sunrpc mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
switchdev switchdev: fix typo in comments/doc 2016-03-24 14:51:24 -04:00
tipc tipc: stricter filtering of packets in bearer layer 2016-04-07 17:00:13 -04:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-02-23 00:09:14 -05:00
vmw_vsock VSOCK: Detach QP check should filter out non matching QPs. 2016-04-06 16:39:09 -04:00
wimax
wireless cfg80211: Allow reassociation to be requested with internal SME 2016-04-06 15:09:28 +02:00
x25
xfrm xfrm: Fix crash observed during device unregistration and decryption 2016-03-24 14:29:36 -04:00
compat.c
Kconfig Make DST_CACHE a silent config option 2016-03-21 22:56:38 -04:00
Makefile kcm: Kernel Connection Multiplexor module 2016-03-09 16:36:14 -05:00
socket.c net: introduce lockdep_is_held and update various places to use it 2016-04-07 16:44:14 -04:00
sysctl_net.c