linux

iv/linux

History

Jesper Dangaard Brouer 93bb0ceb75 netfilter: conntrack: remove central spinlock nf_conntrack_lock nf_conntrack_lock is a monolithic lock and suffers from huge contention on current generation servers (8 or more core/threads). Perf locking congestion is clear on base kernel: - 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh - _raw_spin_lock_bh + 25.33% init_conntrack + 24.86% nf_ct_delete_from_lists + 24.62% __nf_conntrack_confirm + 24.38% destroy_conntrack + 0.70% tcp_packet + 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup + 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free + 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer + 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete + 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table This patch change conntrack locking and provides a huge performance improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen): Base kernel: 810.405 new conntrack/sec After patch: 2.233.876 new conntrack/sec Notice other floods attack (SYN+ACK or ACK) can easily be deflected using: # iptables -A INPUT -m state --state INVALID -j DROP # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0 Use an array of hashed spinlocks to protect insertions/deletions of conntracks into the hash table. 1024 spinlocks seem to give good results, at minimal cost (4KB memory). Due to lockdep max depth, 1024 becomes 8 if CONFIG_LOCKDEP=y The hash resize is a bit tricky, because we need to take all locks in the array. A seqcount_t is used to synchronize the hash table users with the resizing process. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>		2014-03-07 11:41:13 +01:00
..
conntrack.h	netfilter: conntrack: remove central spinlock nf_conntrack_lock	2014-03-07 11:41:13 +01:00
core.h	percpu: add __percpu sparse annotations to net	2010-02-16 23:05:38 -08:00
dccp.h
generic.h	BUG: headers with BUG/BUG_ON etc. need linux/bug.h	2012-03-04 17:54:34 -05:00
hash.h	net: cleanup unsigned to unsigned int	2012-04-15 12:44:40 -04:00
ipv4.h	ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing	2014-01-13 11:22:54 -08:00
ipv6.h	ipv6: add flowlabel_consistency sysctl	2014-01-19 17:12:31 -08:00
mib.h	net: use IS_ENABLED(CONFIG_IPV6)	2011-12-11 18:25:16 -05:00
netfilter.h	netfilter: nf_log: prepare net namespace support for loggers	2013-04-05 20:12:54 +02:00
nftables.h	netfilter: nf_tables: add "inet" table for IPv4/IPv6	2014-01-07 23:57:25 +01:00
packet.h	packet: fix broken build.	2012-08-23 09:29:45 -07:00
sctp.h	Revert "net: sctp: convert sctp_checksum_disable module param into sctp sysctl"	2013-08-09 13:09:41 -07:00
unix.h
x_tables.h	netfilter: {ipt,ebt}_ULOG: rise warning on deprecation	2013-05-23 14:23:16 +02:00
xfrm.h	xfrm: Remove ancient sleeping when the SA is in acquire state	2013-12-06 07:24:31 +01:00