linux/net/ipv6
Josh Hunt 32b293a53d IPv6: Avoid taking write lock for /proc/net/ipv6_route
During some debugging I needed to look into how /proc/net/ipv6_route
operated and in my digging I found its calling fib6_clean_all() which uses
"write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
found this on 2.6.32, but reading the code I believe the same basic idea
exists currently. Looking at the rtnetlink code they are only calling
"read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
reading from proc isn't the recommended way of fetching the ipv6 route
table; taking a write lock seems unnecessary and would probably cause
network performance issues.

To verify this I loaded up the ipv6 route table and then ran iperf in 3
cases:
  * doing nothing
  * reading ipv6 route table via proc
    (while :; do cat /proc/net/ipv6_route > /dev/null; done)
  * reading ipv6 route table via rtnetlink
    (while :; do ip -6 route show table all > /dev/null; done)

* Load the ipv6 route table up with:
  * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done

* iperf commands:
  * client: iperf -i 1 -V -c <ipv6 addr>
  * server: iperf -V -s

* iperf results - 3 runs each (in Mbits/sec)
  * nothing: client: 927,927,927 server: 927,927,927
  * proc: client: 179,97,96,113 server: 142,112,133
  * iproute: client: 928,927,928 server: 927,927,927

lock_stat shows taking the write lock is causing the slowdown. Using this
info I decided to write a version of fib6_clean_all() which replaces
write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
this new function I see the same results as with my rtnetlink iperf test.

Signed-off-by: Josh Hunt <joshhunt00@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 17:07:33 -05:00
..
netfilter Merge branch 'nf-next' of git://1984.lsi.us.es/net-next 2011-12-25 02:21:45 -05:00
addrconf_core.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
addrconf.c ipv6: Kill rt6i_dev and rt6i_expires defines. 2011-12-28 20:19:20 -05:00
addrlabel.c rtnetlink: Compute and store minimum ifinfo dump size 2011-06-09 20:38:07 -07:00
af_inet6.c per-netns ipv4 sysctl_tcp_mem 2011-12-12 19:04:11 -05:00
ah6.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
anycast.c ipv6: Kill rt6i_dev and rt6i_expires defines. 2011-12-28 20:19:20 -05:00
datagram.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
esp6.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-05-05 14:59:02 -07:00
exthdrs_core.c ipv6: Add fragment reporting to ipv6_skip_exthdr(). 2011-12-03 09:35:10 -08:00
exthdrs.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
fib6_rules.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
icmp.c ipv6: Add fragment reporting to ipv6_skip_exthdr(). 2011-12-03 09:35:10 -08:00
inet6_connection_sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-11-26 14:47:03 -05:00
inet6_hashtables.c net: Compute protocol sequence numbers and fragment IDs using MD5. 2011-08-06 18:33:19 -07:00
ip6_fib.c IPv6: Avoid taking write lock for /proc/net/ipv6_route 2011-12-30 17:07:33 -05:00
ip6_flowlabel.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
ip6_input.c ipv6: Add fragment reporting to ipv6_skip_exthdr(). 2011-12-03 09:35:10 -08:00
ip6_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-23 17:13:56 -05:00
ip6_tunnel.c ipv6: Kill rt6i_dev and rt6i_expires defines. 2011-12-28 20:19:20 -05:00
ip6mr.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
ipcomp6.c inet: constify ip headers and in6_addr 2011-04-22 11:04:14 -07:00
ipv6_sockglue.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-02 13:49:21 -05:00
Kconfig ipv6: ip6mr: support multiple tables 2010-05-11 14:40:55 +02:00
Makefile
mcast.c ipv6: Kill rt6i_dev and rt6i_expires defines. 2011-12-28 20:19:20 -05:00
mip6.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
ndisc.c ipv6: Kill rt6i_dev and rt6i_expires defines. 2011-12-28 20:19:20 -05:00
netfilter.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
proc.c ipv6: reduce percpu needs for icmpv6msg mibs 2011-11-14 00:12:26 -05:00
protocol.c net: add __rcu annotations to protocol 2010-10-27 11:37:31 -07:00
raw.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
reassembly.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
route.c IPv6: Avoid taking write lock for /proc/net/ipv6_route 2011-12-30 17:07:33 -05:00
sit.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-16 02:11:14 -05:00
syncookies.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
sysctl_net_ipv6.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
tcp_ipv6.c per-netns ipv4 sysctl_tcp_mem 2011-12-12 19:04:11 -05:00
tunnel6.c tunnels: add _rcu annotations 2010-10-25 13:09:45 -07:00
udp_impl.h net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
udp.c udp: Export code sk lookup routines 2011-12-09 14:14:08 -05:00
udplite.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
xfrm6_input.c netfilter: ipv6: use NFPROTO values for NF_HOOK invocation 2010-03-25 16:00:49 +01:00
xfrm6_mode_beet.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
xfrm6_mode_ro.c
xfrm6_mode_transport.c
xfrm6_mode_tunnel.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
xfrm6_output.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
xfrm6_policy.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
xfrm6_state.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
xfrm6_tunnel.c ipv6: Fix return of xfrm6_tunnel_rcv() 2011-05-24 01:11:51 -04:00