linux/include/net/netns
Jesper Dangaard Brouer 93bb0ceb75 netfilter: conntrack: remove central spinlock nf_conntrack_lock
nf_conntrack_lock is a monolithic lock and suffers from huge contention
on current generation servers (8 or more core/threads).

Perf locking congestion is clear on base kernel:

-  72.56%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
   - _raw_spin_lock_bh
      + 25.33% init_conntrack
      + 24.86% nf_ct_delete_from_lists
      + 24.62% __nf_conntrack_confirm
      + 24.38% destroy_conntrack
      + 0.70% tcp_packet
+   2.21%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
+   1.15%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
+   0.77%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
+   0.70%  ksoftirqd/6  [nf_conntrack]       [k] nf_ct_delete
+   0.55%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table

This patch change conntrack locking and provides a huge performance
improvement.  SYN-flood attack tested on a 24-core E5-2695v2(ES) with
10Gbit/s ixgbe (with tool trafgen):

 Base kernel:   810.405 new conntrack/sec
 After patch: 2.233.876 new conntrack/sec

Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
 # iptables -A INPUT -m state --state INVALID -j DROP
 # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0

Use an array of hashed spinlocks to protect insertions/deletions of
conntracks into the hash table. 1024 spinlocks seem to give good
results, at minimal cost (4KB memory). Due to lockdep max depth,
1024 becomes 8 if CONFIG_LOCKDEP=y

The hash resize is a bit tricky, because we need to take all locks in
the array. A seqcount_t is used to synchronize the hash table users
with the resizing process.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07 11:41:13 +01:00
..
conntrack.h netfilter: conntrack: remove central spinlock nf_conntrack_lock 2014-03-07 11:41:13 +01:00
core.h percpu: add __percpu sparse annotations to net 2010-02-16 23:05:38 -08:00
dccp.h
generic.h BUG: headers with BUG/BUG_ON etc. need linux/bug.h 2012-03-04 17:54:34 -05:00
hash.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ipv4.h ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing 2014-01-13 11:22:54 -08:00
ipv6.h ipv6: add flowlabel_consistency sysctl 2014-01-19 17:12:31 -08:00
mib.h net: use IS_ENABLED(CONFIG_IPV6) 2011-12-11 18:25:16 -05:00
netfilter.h netfilter: nf_log: prepare net namespace support for loggers 2013-04-05 20:12:54 +02:00
nftables.h netfilter: nf_tables: add "inet" table for IPv4/IPv6 2014-01-07 23:57:25 +01:00
packet.h packet: fix broken build. 2012-08-23 09:29:45 -07:00
sctp.h Revert "net: sctp: convert sctp_checksum_disable module param into sctp sysctl" 2013-08-09 13:09:41 -07:00
unix.h
x_tables.h netfilter: {ipt,ebt}_ULOG: rise warning on deprecation 2013-05-23 14:23:16 +02:00
xfrm.h xfrm: Remove ancient sleeping when the SA is in acquire state 2013-12-06 07:24:31 +01:00