Commit Graph

11894 Commits

Author SHA1 Message Date
David S. Miller
074fb88016 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree:

1) Support for transparent proxying for nf_tables, from Mate Eckl.

2) Patchset to add OS passive fingerprint recognition for nf_tables,
   from Fernando Fernandez. This takes common code from xt_osf and
   place it into the new nfnetlink_osf module for codebase sharing.

3) Lightweight tunneling support for nf_tables.

4) meta and lookup are likely going to be used in rulesets, make them
   direct calls. From Florian Westphal.

A bunch of incremental updates:

5) use PTR_ERR_OR_ZERO() from nft_numgen, from YueHaibing.

6) Use kvmalloc_array() to allocate hashtables, from Li RongQing.

7) Explicit dependencies between nfnetlink_cttimeout and conntrack
   timeout extensions, from Harsha Sharma.

8) Simplify NLM_F_CREATE handling in nf_tables.

9) Removed unused variable in the get element command, from
   YueHaibing.

10) Expose bridge hook priorities through uapi, from Mate Eckl.

And a few fixes for previous Netfilter batch for net-next:

11) Use per-netns mutex from flowtable event, from Florian Westphal.

12) Remove explicit dependency on iptables CT target from conntrack
    zones, from Florian.

13) Fix use-after-free in rmmod nf_conntrack path, also from Florian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05 16:25:22 -07:00
zhong jiang
31ba191bf5 include/net/bond_3ad: Simplify the code by using the ARRAY_SIZE
We prefer to ARRAY_SIZE rather than the open code to calculate size.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04 13:23:15 -07:00
David Howells
eb9950eb31 rxrpc: Push iov_iter up from rxrpc_kernel_recv_data() to caller
Push iov_iter up from rxrpc_kernel_recv_data() to its caller to allow
non-contiguous iovs to be passed down, thereby permitting file reading to
be simplified in the AFS filesystem in a future patch.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03 12:46:20 -07:00
Li RongQing
285189c78e netfilter: use kvmalloc_array to allocate memory for hashtable
nf_ct_alloc_hashtable is used to allocate memory for conntrack,
NAT bysrc and expectation hashtable. Assuming 64k bucket size,
which means 7th order page allocation, __get_free_pages, called
by nf_ct_alloc_hashtable, will trigger the direct memory reclaim
and stall for a long time, when system has lots of memory stress

so replace combination of __get_free_pages and vzalloc with
kvmalloc_array, which provides a overflow check and a fallback
if no high order memory is available, and do not retry to reclaim
memory, reduce stall

and remove nf_ct_free_hashtable, since it is just a kvfree

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-03 18:37:55 +02:00
Vincent Bernat
db57dc7c7a net: don't declare IPv6 non-local bind helper if CONFIG_IPV6 undefined
Fixes: 83ba464515 ("net: add helpers checking if socket can be bound to nonlocal address")
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 13:45:31 -07:00
Jiri Pirko
290b1c8b1a net: sched: make tcf_chain_{get,put}() static
These are no longer used outside of cls_api.c so make them static.
Move tcf_chain_flush() to avoid fwd declaration of tcf_chain_put().

Signed-off-by: Jiri Pirko <jiri@mellanox.com>

v1->v2:
- new patch

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 10:06:19 -07:00
Petr Machata
d18c5d1995 net: ipv4: Notify about changes to ip_forward_update_priority
Drivers may make offloading decision based on whether
ip_forward_update_priority is enabled or not. Therefore distribute
netevent notifications to give them a chance to react to a change.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:52:30 -07:00
Petr Machata
432e05d328 net: ipv4: Control SKB reprioritization after forwarding
After IPv4 packets are forwarded, the priority of the corresponding SKB
is updated according to the TOS field of IPv4 header. This overrides any
prioritization done earlier by e.g. an skbedit action or ingress-qos-map
defined at a vlan device.

Such overriding may not always be desirable. Even if the packet ends up
being routed, which implies this is an L3 network node, an administrator
may wish to preserve whatever prioritization was done earlier on in the
pipeline.

Therefore introduce a sysctl that controls this behavior. Keep the
default value at 1 to maintain backward-compatible behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:52:30 -07:00
Vincent Bernat
83ba464515 net: add helpers checking if socket can be bound to nonlocal address
The construction "net->ipv4.sysctl_ip_nonlocal_bind || inet->freebind
|| inet->transparent" is present three times and its IPv6 counterpart
is also present three times. We introduce two small helpers to
characterize these tests uniformly.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:50:04 -07:00
Christoph Hellwig
e6476c2144 net: remove bogus RCU annotations on socket.wq
We never use RCU protection for it, just a lot of cargo-cult
rcu_deference_protects calls.

Note that we do keep the kfree_rcu call for it, as the references through
struct sock are RCU protected and thus might require a grace period before
freeing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-31 12:40:22 -07:00
Paolo Abeni
cd11b16407 net/tc: introduce TC_ACT_REINSERT.
This is similar TC_ACT_REDIRECT, but with a slightly different
semantic:
- on ingress the mirred skbs are passed to the target device
network stack without any additional check not scrubbing.
- the rcu-protected stats provided via the tcf_result struct
  are updated on error conditions.

This new tcfa_action value is not exposed to the user-space
and can be used only internally by clsact.

v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
 a new action type instead
v2 -> v3:
 - rename the new action value TC_ACT_REINJECT, update the
   helper accordingly
 - take care of uncloned reinjected packets in XDP generic
   hook
v3 -> v4:
 - renamed again the new action value (JiriP)
v4 -> v5:
 - fix build error with !NET_CLS_ACT (kbuild bot)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 09:31:14 -07:00
Paolo Abeni
7fd4b288ea tc/act: remove unneeded RCU lock in action callback
Each lockless action currently does its own RCU locking in ->act().
This allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with the _bh variant, cleans up a bit the surrounding code and
documents the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.

v1 -> v2:
 - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
   as pointed out by Daniel

v3 -> v4:
 - fixed some typos in the commit message (JiriP)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 09:31:13 -07:00
Christoph Hellwig
a331de3bf0 net: remove sock_poll_busy_flag
Fold it into the only caller to make the code simpler and easier to read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 09:10:25 -07:00
Christoph Hellwig
f641f13b99 net: remove sock_poll_busy_loop
There is no point in hiding this logic in a helper.  Also remove the
useless events != 0 check and only busy loop once we know we actually
have a poll method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 09:10:25 -07:00
Christoph Hellwig
d8bbd13bee net: don not detour through struct sock to find the poll waitqueue
For any open socket file descriptor sock->sk->sk_wq->wait will always
point to sock->wq->wait.  That means we can do the shorter dereference
and removal a NULL check and don't have to not worry about any RCU
protection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 09:10:25 -07:00
Christoph Hellwig
dd979b4df8 net: simplify sock_poll_wait
The wait_address argument is always directly derived from the filp
argument, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30 09:10:25 -07:00
Florian Westphal
222440b4e8 netfilter: nf_tables: handle meta/lookup with direct call
Currently nft uses inlined variants for common operations
such as 'ip saddr 1.2.3.4' instead of an indirect call.

Also handle meta get operations and lookups without indirect call,
both are builtin.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-30 11:52:02 +02:00
Petr Machata
b67c540b8a net: dcb: Add priority-to-DSCP map getters
On ingress, a network device such as a switch assigns to packets
priority based on various criteria. Common options include interpreting
PCP and DSCP fields according to user configuration. When a packet
egresses the switch, a reverse process may rewrite PCP and/or DSCP
values according to packet priority.

The following three functions support a) obtaining a DSCP-to-priority
map or vice versa, and b) finding default-priority entries in APP
database.

The DCB subsystem supports for APP entries a very generous M:N mapping
between priorities and protocol identifiers. Understandably,
several (say) DSCP values can map to the same priority. But this
asymmetry holds the other way around as well--one priority can map to
several DSCP values. For this reason, the following functions operate in
terms of bitmaps, with ones in positions that match some APP entry.

- dcb_ieee_getapp_dscp_prio_mask_map() to compute for a given netdevice
  a map of DSCP-to-priority-mask, which gives for each DSCP value a
  bitmap of priorities related to that DSCP value by APP, along the
  lines of dcb_ieee_getapp_mask().

- dcb_ieee_getapp_prio_dscp_mask_map() similarly to compute for a given
  netdevice a map from priorities to a bitmap of DSCPs.

- dcb_ieee_getapp_default_prio_mask() which finds all default-priority
  rules for a given port in APP database, and returns a mask of
  priorities allowed by these default-priority rules.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27 13:17:50 -07:00
Jiri Pirko
1f3ed383fb net: sched: don't dump chains only held by actions
In case a chain is empty and not explicitly created by a user,
such chain should not exist. The only exception is if there is
an action "goto chain" pointing to it. In that case, don't show the
chain in the dump. Track the chain references held by actions and
use them to find out if a chain should or should not be shown
in chain dump.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27 09:38:46 -07:00
David S. Miller
7a49d3d4ea Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-07-27

1) Extend the output_mark to also support the input direction
   and masking the mark values before applying to the skb.

2) Add a new lookup key for the upcomming xfrm interfaces.

3) Extend the xfrm lookups to match xfrm interface IDs.

4) Add virtual xfrm interfaces. The purpose of these interfaces
   is to overcome the design limitations that the existing
   VTI devices have.

  The main limitations that we see with the current VTI are the
  following:

  VTI interfaces are L3 tunnels with configurable endpoints.
  For xfrm, the tunnel endpoint are already determined by the SA.
  So the VTI tunnel endpoints must be either the same as on the
  SA or wildcards. In case VTI tunnel endpoints are same as on
  the SA, we get a one to one correlation between the SA and
  the tunnel. So each SA needs its own tunnel interface.

  On the other hand, we can have only one VTI tunnel with
  wildcard src/dst tunnel endpoints in the system because the
  lookup is based on the tunnel endpoints. The existing tunnel
  lookup won't work with multiple tunnels with wildcard
  tunnel endpoints. Some usecases require more than on
  VTI tunnel of this type, for example if somebody has multiple
  namespaces and every namespace requires such a VTI.

  VTI needs separate interfaces for IPv4 and IPv6 tunnels.
  So when routing to a VTI, we have to know to which address
  family this traffic class is going to be encapsulated.
  This is a lmitation because it makes routing more complex
  and it is not always possible to know what happens behind the
  VTI, e.g. when the VTI is move to some namespace.

  VTI works just with tunnel mode SAs. We need generic interfaces
  that ensures transfomation, regardless of the xfrm mode and
  the encapsulated address family.

  VTI is configured with a combination GRE keys and xfrm marks.
  With this we have to deal with some extra cases in the generic
  tunnel lookup because the GRE keys on the VTI are actually
  not GRE keys, the GRE keys were just reused for something else.
  All extensions to the VTI interfaces would require to add
  even more complexity to the generic tunnel lookup.

  So to overcome this, we developed xfrm interfaces with the
  following design goal:

  It should be possible to tunnel IPv4 and IPv6 through the same
  interface.

  No limitation on xfrm mode (tunnel, transport and beet).

  Should be a generic virtual interface that ensures IPsec
  transformation, no need to know what happens behind the
  interface.

  Interfaces should be configured with a new key that must match a
  new policy/SA lookup key.

  The lookup logic should stay in the xfrm codebase, no need to
  change or extend generic routing and tunnel lookups.

  Should be possible to use IPsec hardware offloads of the underlying
  interface.

5) Remove xfrm pcpu policy cache. This was added after the flowcache
   removal, but it turned out to make things even worse.
   From Florian Westphal.

6) Allow to update the set mark on SA updates.
   From Nathan Harold.

7) Convert some timestamps to time64_t.
   From Arnd Bergmann.

8) Don't check the offload_handle in xfrm code,
   it is an opaque data cookie for the driver.
   From Shannon Nelson.

9) Remove xfrmi interface ID from flowi. After this pach
   no generic code is touched anymore to do xfrm interface
   lookups. From Benedict Wong.

10) Allow to update the xfrm interface ID on SA updates.
    From Nathan Harold.

11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle.
    From YueHaibing.

12) Return more detailed errors on xfrm interface creation.
    From Benedict Wong.

13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR.
    From the kbuild test robot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27 09:33:37 -07:00
David S. Miller
19725496da Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-24 19:21:58 -07:00
David S. Miller
049f56044d Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Make sure we don't go over the maximum jump stack boundary,
   from Taehee Yoo.

2) Missing rcu_barrier() in hash and rbtree sets, also from Taehee.

3) Missing check to nul-node in rbtree timeout routine, from Taehee.

4) Use dev->name from flowtable to fix a memleak, from Florian.

5) Oneliner to free flowtable object on removal, from Florian.

6) Memleak in chain rename transaction, again from Florian.

7) Don't allow two chains to use the same name in the same
   transaction, from Florian.

8) handle DCCP SYNC/SYNCACK as invalid, this triggers an
   uninitialized timer in conntrack reported by syzbot, from Florian.

9) Fix leak in case netlink_dump_start() fails, from Florian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24 09:56:50 -07:00
David S. Miller
e1adf31471 Only a few fixes:
* always keep regulatory user hint
  * add missing break statement in station flags parsing
  * fix non-linear SKBs in port-control-over-nl80211
  * reconfigure VLAN stations during HW restart
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAltW0eYACgkQB8qZga/f
 l8SXvQ/5AYXK9qZuwhxIdJ5zpgcidDiWMl+aoaWeAdQF+giDTyB+tiZ7GyIYeJca
 +4OJoC90DEH9Yq835rNIYXqGkcT12vSwJgaBp79hx0ch3ufwZqBepdX8DcCd0+18
 NAP8IlRoSiGZ/GlpNxYCkXFMUEoRSxvbk4BXowlDwiN/h4ziCRnUkD1OaL6mqHsI
 XTRjVRTQCekurc1o74s4kGX3Nj2ir3JJEGaD9Ha5Jsc18OQxx2fs5fgaGl3ffJaR
 vJTiqvPPAvdymeV/S+J9BoCe8xAgO7Z0EDC3tbB7rlZ629XEJp9L91s2EN1Od8m3
 k1pxq+nT4pci5UVnRtlAyHrr06VxhIYCRBRc+qvjXOTDIkEVF1UEQH9RExi4XNp5
 OmcdXQ7fc1UhQfzy4vkM5EQamaxgJzAWZ9SGRIf4RmpUcux4ZTRDjoclwEkATXqr
 C2lxDuIkHtClqY9BL+0f2TEio1drEawf99iWD4bSmRL+gng5LK8yxm0rAKFrnG81
 kNfxb9Qg4OyJhl0764u2pDyK9YErFNPnxHEsNBCD3DT4Q/9yKIZzmgi4GJKZyWYe
 x09+1zFsS/xtg2n0SOaYlLtPNPAKC7AkLqZ+FtHreIjVbKTgW+dEgniscQdUAfYT
 nY09uPzbabEyNaF/fuZCOofu3YignVLWlr6Z+PG1sIKqp6qaPQY=
 =qtIc
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2018-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Only a few fixes:
 * always keep regulatory user hint
 * add missing break statement in station flags parsing
 * fix non-linear SKBs in port-control-over-nl80211
 * reconfigure VLAN stations during HW restart
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24 09:38:50 -07:00
Jiri Pirko
3473845273 net: sched: cls_flower: propagate chain teplate creation and destruction to drivers
Introduce a couple of flower offload commands in order to propagate
template creation/destruction events down to device drivers.
Drivers may use this information to prepare HW in an optimal way
for future filter insertions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:44:12 -07:00
Jiri Pirko
9f407f1768 net: sched: introduce chain templates
Allow user to set a template for newly created chains. Template lock
down the chain for particular classifier type/options combinations.
The classifier needs to support templates, otherwise kernel would
reply with error.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:44:12 -07:00
Jiri Pirko
32a4f5ecd7 net: sched: introduce chain object to uapi
Allow user to create, destroy, get and dump chain objects. Do that by
extending rtnl commands by the chain-specific ones. User will now be
able to explicitly create or destroy chains (so far this was done only
automatically according the filter/act needs and refcounting). Also, the
user will receive notification about any chain creation or destuction.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:44:12 -07:00
Jiri Pirko
f71e0ca4db net: sched: Avoid implicit chain 0 creation
Currently, chain 0 is implicitly created during block creation. However
that does not align with chain object exposure, creation and destruction
api introduced later on. So make the chain 0 behave the same way as any
other chain and only create it when it is needed. Since chain 0 is
somehow special as the qdiscs need to hold pointer to the first chain
tp, this requires to move the chain head change callback infra to the
block structure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:44:12 -07:00
Wei Wang
e873e4b9cc ipv6: use fib6_info_hold_safe() when necessary
In the code path where only rcu read lock is held, e.g. in the route
lookup code path, it is not safe to directly call fib6_info_hold()
because the fib6_info may already have been deleted but still exists
in the rcu grace period. Holding reference to it could cause double
free and crash the kernel.

This patch adds a new function fib6_info_hold_safe() and replace
fib6_info_hold() in all necessary places.

Syzbot reported 3 crash traces because of this. One of them is:
8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
dst_release: dst:(____ptrval____) refcnt:-1
dst_release: dst:(____ptrval____) refcnt:-2
WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 dst_hold include/net/dst.h:239 [inline]
WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204
dst_release: dst:(____ptrval____) refcnt:-1
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4845 Comm: syz-executor493 Not tainted 4.18.0-rc3+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 panic+0x238/0x4e7 kernel/panic.c:184
dst_release: dst:(____ptrval____) refcnt:-2
dst_release: dst:(____ptrval____) refcnt:-3
 __warn.cold.8+0x163/0x1ba kernel/panic.c:536
dst_release: dst:(____ptrval____) refcnt:-4
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
dst_release: dst:(____ptrval____) refcnt:-5
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:dst_hold include/net/dst.h:239 [inline]
RIP: 0010:ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204
Code: c1 ed 03 89 9d 18 ff ff ff 48 b8 00 00 00 00 00 fc ff df 41 c6 44 05 00 f8 e9 2d 01 00 00 4c 8b a5 c8 fe ff ff e8 1a f6 e6 fa <0f> 0b e9 6a fc ff ff e8 0e f6 e6 fa 48 8b 85 d0 fe ff ff 48 8d 78
RSP: 0018:ffff8801a8fcf178 EFLAGS: 00010293
RAX: ffff8801a8eba5c0 RBX: 0000000000000000 RCX: ffffffff869511e6
RDX: 0000000000000000 RSI: ffffffff869515b6 RDI: 0000000000000005
RBP: ffff8801a8fcf2c8 R08: ffff8801a8eba5c0 R09: ffffed0035ac8338
R10: ffffed0035ac8338 R11: ffff8801ad6419c3 R12: ffff8801a8fcf720
R13: ffff8801a8fcf6a0 R14: ffff8801ad6419c0 R15: ffff8801ad641980
 ip6_make_skb+0x2c8/0x600 net/ipv6/ip6_output.c:1768
 udpv6_sendmsg+0x2c90/0x35f0 net/ipv6/udp.c:1376
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:641 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:651
 ___sys_sendmsg+0x51d/0x930 net/socket.c:2125
 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2220
 __do_sys_sendmmsg net/socket.c:2249 [inline]
 __se_sys_sendmmsg net/socket.c:2246 [inline]
 __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2246
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446ba9
Code: e8 cc bb 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb39a469da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000006dcc54 RCX: 0000000000446ba9
RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003
RBP: 00000000006dcc50 R08: 00007fb39a46a700 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 45c828efc7a64843
R13: e6eeb815b9d8a477 R14: 5068caf6f713c6fc R15: 0000000000000001
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: syzbot+902e2a1bcd4f7808cef5@syzkaller.appspotmail.com
Reported-by: syzbot+8ae62d67f647abeeceb9@syzkaller.appspotmail.com
Reported-by: syzbot+3f08feb14086930677d0@syzkaller.appspotmail.com
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 11:19:02 -07:00
Jakub Kicinski
042f882556 nfp: bring back support for offloading shared blocks
Now that we have offload replay infrastructure added by
commit 326367427c ("net: sched: call reoffload op on block callback reg")
and flows are guaranteed to be removed correctly, we can revert
commit 951a8ee6de ("nfp: reject binding to shared blocks").

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22 10:58:52 -07:00
David Ahern
24b711edfc net/ipv6: Fix linklocal to global address with VRF
Example setup:
    host: ip -6 addr add dev eth1 2001:db8:104::4
           where eth1 is enslaved to a VRF

    switch: ip -6 ro add 2001:db8:104::4/128 dev br1
            where br1 only has an LLA

           ping6 2001:db8:104::4
           ssh   2001:db8:104::4

(NOTE: UDP works fine if the PKTINFO has the address set to the global
address and ifindex is set to the index of eth1 with a destination an
LLA).

For ICMP, icmp6_iif needs to be updated to check if skb->dev is an
L3 master. If it is then return the ifindex from rt6i_idev similar
to what is done for loopback.

For TCP, restore the original tcp_v6_iif definition which is needed in
most places and add a new tcp_v6_iif_l3_slave that considers the
l3_slave variability. This latter check is only needed for socket
lookups.

Fixes: 9ff7438460 ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-21 19:31:46 -07:00
Tyler Hicks
fbdeaed408 net: create reusable function for getting ownership info of sysfs inodes
Make net_ns_get_ownership() reusable by networking code outside of core.
This is useful, for example, to allow bridge related sysfs files to be
owned by container root.

Add a function comment since this is a potentially dangerous function to
use given the way that kobject_get_ownership() works by initializing uid
and gid before calling .get_ownership().

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 23:44:36 -07:00
David S. Miller
99d20a461c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree:

1) No need to set ttl from reject action for the bridge family, from
   Taehee Yoo.

2) Use a fixed timeout for flow that are passed up from the flowtable
   to conntrack, from Florian Westphal.

3) More preparation patches for tproxy support for nf_tables, from Mate
   Eckl.

4) Remove unnecessary indirection in core IPv6 checksum function, from
   Florian Westphal.

5) Use nf_ct_get_tuplepr() from openvswitch, instead of opencoding it.
   From Florian Westphal.

6) socket match now selects socket infrastructure, instead of depending
   on it. From Mate Eckl.

7) Patch series to simplify conntrack tuple building/parsing from packet
   path and ctnetlink, from Florian Westphal.

8) Fetch timeout policy from protocol helpers, instead of doing it from
   core, from Florian Westphal.

9) Merge IPv4 and IPv6 protocol trackers into conntrack core, from
   Florian Westphal.

10) Depend on CONFIG_NF_TABLES_IPV6 and CONFIG_IP6_NF_IPTABLES
    respectively, instead of IPV6. Patch from Mate Eckl.

11) Add specific function for garbage collection in conncount,
    from Yi-Hung Wei.

12) Catch number of elements in the connlimit list, from Yi-Hung Wei.

13) Move locking to nf_conncount, from Yi-Hung Wei.

14) Series of patches to add lockless tree traversal in nf_conncount,
    from Yi-Hung Wei.

15) Resolve clash in matching conntracks when race happens, from
    Martynas Pumputis.

16) If connection entry times out, remove template entry from the
    ip_vs_conn_tab table to improve behaviour under flood, from
    Julian Anastasov.

17) Remove useless parameter from nf_ct_helper_ext_add(), from Gao feng.

18) Call abort from 2-phase commit protocol before requesting modules,
    make sure this is done under the mutex, from Florian Westphal.

19) Grab module reference when starting transaction, also from Florian.

20) Dynamically allocate expression info array for pre-parsing, from
    Florian.

21) Add per netns mutex for nf_tables, from Florian Westphal.

22) A couple of patches to simplify and refactor nf_osf code to prepare
    for nft_osf support.

23) Break evaluation on missing socket, from Mate Eckl.

24) Allow to match socket mark from nft_socket, from Mate Eckl.

25) Remove dependency on nf_defrag_ipv6, now that IPv6 tracker is
    built-in into nf_conntrack. From Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 22:28:28 -07:00
David S. Miller
c4c5551df1 Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
All conflicts were trivial overlapping changes, so reasonably
easy to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 21:17:12 -07:00
Yuchung Cheng
a0496ef2c2 tcp: do not delay ACK in DCTCP upon CE status change
Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
has to be sent immediately so the sender can respond quickly:

""" When receiving packets, the CE codepoint MUST be processed as follows:

   1.  If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
       true and send an immediate ACK.

   2.  If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
       to false and send an immediate ACK.
"""

Previously DCTCP implementation may continue to delay the ACK. This
patch fixes that to implement the RFC by forcing an immediate ACK.

Tested with this packetdrill script provided by Larry Brakmo

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
   +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0

0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001

0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001

0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
+0.005 < [ce] . 2001:3001(1000) ack 2 win 257

+0.000 > [ect01] . 2:2(0) ack 2001
// Previously the ACK below would be delayed by 40ms
+0.000 > [ect01] E. 2:2(0) ack 3001

+0.500 < F. 9501:9501(0) ack 4 win 257

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 14:32:23 -07:00
Yuchung Cheng
27cde44a25 tcp: do not cancel delay-AcK on DCTCP special ACK
Currently when a DCTCP receiver delays an ACK and receive a
data packet with a different CE mark from the previous one's, it
sends two immediate ACKs acking previous and latest sequences
respectly (for ECN accounting).

Previously sending the first ACK may mark off the delayed ACK timer
(tcp_event_ack_sent). This may subsequently prevent sending the
second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
The culprit is that tcp_send_ack() assumes it always acknowleges
the latest sequence, which is not true for the first special ACK.

The fix is to not make the assumption in tcp_send_ack and check the
actual ack sequence before cancelling the delayed ACK. Further it's
safer to pass the ack sequence number as a local variable into
tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
future bugs like this.

Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 14:32:23 -07:00
Florian Westphal
b8088dda98 netfilter: nf_tables: use dev->name directly
no need to store the name in separate area.

Furthermore, it uses kmalloc but not kfree and most accesses seem to treat
it as char[IFNAMSIZ] not char *.

Remove this and use dev->name instead.

In case event zeroed dev, just omit the name in the dump.

Fixes: d92191aa84 ("netfilter: nf_tables: cache device name in flowtable object")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-20 15:31:43 +02:00
Benedict Wong
bc56b33404 xfrm: Remove xfrmi interface ID from flowi
In order to remove performance impact of having the extra u32 in every
single flowi, this change removes the flowi_xfrm struct, prefering to
take the if_id as a method parameter where needed.

In the inbound direction, if_id is only needed during the
__xfrm_check_policy() function, and the if_id can be determined at that
point based on the skb. As such, xfrmi_decode_session() is only called
with the skb in __xfrm_check_policy().

In the outbound direction, the only place where if_id is needed is the
xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
directly passed into the xfrm_lookup_with_ifid() call. All existing
callers can still call xfrm_lookup(), which uses a default if_id of 0.

This change does not change any behavior of XFRMIs except for improving
overall system performance via flowi size reduction.

This change has been tested against the Android Kernel Networking Tests:

https://android.googlesource.com/kernel/tests/+/master/net/test

Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-20 10:14:41 +02:00
Or Gerlitz
5544adb970 flow_dissector: Dissect tos and ttl from the tunnel info
Add dissection of the tos and ttl from the ip tunnel headers
fields in case a match is needed on them.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 23:26:01 -07:00
Colin Ian King
169dc027fb ipv6: fix useless rol32 call on hash
The rol32 call is currently rotating hash but the rol'd value is
being discarded. I believe the current code is incorrect and hash
should be assigned the rotated value returned from rol32.

Thanks to David Lebrun for spotting this.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18 15:11:09 -07:00
Salvatore Mesoraca
0015b80abc net: dsa: Remove VLA usage
We avoid 2 VLAs by using a pre-allocated field in dsa_switch. We also
try to avoid dynamic allocation whenever possible (when using fewer than
bits-per-long ports, which is the common case).

Link: http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Link: http://lkml.kernel.org/r/20180505185145.GB32630@lunn.ch
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
[kees: tweak commit subject and message slightly]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18 15:08:31 -07:00
Florian Westphal
70b095c843 ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module
IPV6=m
DEFRAG_IPV6=m
CONNTRACK=y yields:

net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get':
net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable'
net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6'

Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params
ip6_frag_init and ip6_expire_frag_queue so it would be needed to force
IPV6=y too.

This patch gets rid of the 'followup linker error' by removing
the dependency of ipv6.ko symbols from netfilter ipv6 defrag.

Shared code is placed into a header, then used from both.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:53 +02:00
Florian Westphal
f102d66b33 netfilter: nf_tables: use dedicated mutex to guard transactions
Continue to use nftnl subsys mutex to protect (un)registration of hook types,
expressions and so on, but force batch operations to do their own
locking.

This allows distinct net namespaces to perform transactions in parallel.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:48 +02:00
Gao Feng
440534d3c5 netfilter: Remove useless param helper of nf_ct_helper_ext_add
The param helper of nf_ct_helper_ext_add is useless now, then remove
it now.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:42 +02:00
Julian Anastasov
275411430f ipvs: add assured state for conn templates
cp->state was not used for templates. Add support for state bits
and for the first "assured" bit which indicates that some
connection controlled by this template was established or assured
by the real server. In a followup patch we will use it to drop
templates under SYN attack.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:40 +02:00
Julian Anastasov
ec1b28ca96 ipvs: provide just conn to ip_vs_state_name
In preparation for followup patches, provide just the cp
ptr to ip_vs_state_name.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:39 +02:00
Yi-Hung Wei
5c789e131c netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search
This patch is originally from Florian Westphal.

This patch does the following 3 main tasks.

1) Add list lock to 'struct nf_conncount_list' so that we can
alter the lists containing the individual connections without holding the
main tree lock.  It would be useful when we only need to add/remove to/from
a list without allocate/remove a node in the tree.  With this change, we
update nft_connlimit accordingly since we longer need to maintain
a list lock in nft_connlimit now.

2) Use RCU for the initial tree search to improve tree look up performance.

3) Add a garbage collection worker. This worker is schedule when there
are excessive tree node that needed to be recycled.

Moreover,the rbnode reclaim logic is moved from search tree to insert tree
to avoid race condition.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:37 +02:00
Yi-Hung Wei
976afca1ce netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup
This patch is originally from Florian Westphal.

This patch does the following three tasks.

It applies the same early exit technique for nf_conncount_lookup().

Since now we keep the number of connections in 'struct nf_conncount_list',
we no longer need to return the count in nf_conncount_lookup().

Moreover, we expose the garbage collection function nf_conncount_gc_list()
for nft_connlimit.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:34 +02:00
Yi-Hung Wei
cb2b36f5a9 netfilter: nf_conncount: Switch to plain list
Original patch is from Florian Westphal.

This patch switches from hlist to plain list to store the list of
connections with the same filtering key in nf_conncount. With the
plain list, we can insert new connections at the tail, so over time
the beginning of list holds long-running connections and those are
expired, while the newly creates ones are at the end.

Later on, we could probably move checked ones to the end of the list,
so the next run has higher chance to reclaim stale entries in the front.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18 11:26:32 +02:00
Taehee Yoo
26b2f55252 netfilter: nf_tables: fix jumpstack depth validation
The level of struct nft_ctx is updated by nf_tables_check_loops().  That
is used to validate jumpstack depth. But jumpstack validation routine
doesn't update and validate recursively.  So, in some cases, chain depth
can be bigger than the NFT_JUMP_STACK_SIZE.

After this patch, The jumpstack validation routine is located in the
nft_chain_validate(). When new rules or new set elements are added, the
nft_table_validate() is called by the nf_tables_newrule and the
nf_tables_newsetelem. The nft_table_validate() calls the
nft_chain_validate() that visit all their children chains recursively.
So it can update depth of chain certainly.

Reproducer:
   %cat ./test.sh
   #!/bin/bash
   nft add table ip filter
   nft add chain ip filter input { type filter hook input priority 0\; }
   for ((i=0;i<20;i++)); do
	nft add chain ip filter a$i
   done

   nft add rule ip filter input jump a1

   for ((i=0;i<10;i++)); do
	nft add rule ip filter a$i jump a$((i+1))
   done

   for ((i=11;i<19;i++)); do
	nft add rule ip filter a$i jump a$((i+1))
   done

   nft add rule ip filter a10 jump a11

Result:
[  253.931782] WARNING: CPU: 1 PID: 0 at net/netfilter/nf_tables_core.c:186 nft_do_chain+0xacc/0xdf0 [nf_tables]
[  253.931915] Modules linked in: nf_tables nfnetlink ip_tables x_tables
[  253.932153] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #48
[  253.932153] RIP: 0010:nft_do_chain+0xacc/0xdf0 [nf_tables]
[  253.932153] Code: 83 f8 fb 0f 84 c7 00 00 00 e9 d0 00 00 00 83 f8 fd 74 0e 83 f8 ff 0f 84 b4 00 00 00 e9 bd 00 00 00 83 bd 64 fd ff ff 0f 76 09 <0f> 0b 31 c0 e9 bc 02 00 00 44 8b ad 64 fd
[  253.933807] RSP: 0018:ffff88011b807570 EFLAGS: 00010212
[  253.933807] RAX: 00000000fffffffd RBX: ffff88011b807660 RCX: 0000000000000000
[  253.933807] RDX: 0000000000000010 RSI: ffff880112b39d78 RDI: ffff88011b807670
[  253.933807] RBP: ffff88011b807850 R08: ffffed0023700ece R09: ffffed0023700ecd
[  253.933807] R10: ffff88011b80766f R11: ffffed0023700ece R12: ffff88011b807898
[  253.933807] R13: ffff880112b39d80 R14: ffff880112b39d60 R15: dffffc0000000000
[  253.933807] FS:  0000000000000000(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
[  253.933807] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  253.933807] CR2: 00000000014f1008 CR3: 000000006b216000 CR4: 00000000001006e0
[  253.933807] Call Trace:
[  253.933807]  <IRQ>
[  253.933807]  ? sched_clock_cpu+0x132/0x170
[  253.933807]  ? __nft_trace_packet+0x180/0x180 [nf_tables]
[  253.933807]  ? sched_clock_cpu+0x132/0x170
[  253.933807]  ? debug_show_all_locks+0x290/0x290
[  253.933807]  ? __lock_acquire+0x4835/0x4af0
[  253.933807]  ? inet_ehash_locks_alloc+0x1a0/0x1a0
[  253.933807]  ? unwind_next_frame+0x159e/0x1840
[  253.933807]  ? __read_once_size_nocheck.constprop.4+0x5/0x10
[  253.933807]  ? nft_do_chain_ipv4+0x197/0x1e0 [nf_tables]
[  253.933807]  ? nft_do_chain+0x5/0xdf0 [nf_tables]
[  253.933807]  nft_do_chain_ipv4+0x197/0x1e0 [nf_tables]
[  253.933807]  ? nft_do_chain_arp+0xb0/0xb0 [nf_tables]
[  253.933807]  ? __lock_is_held+0x9d/0x130
[  253.933807]  nf_hook_slow+0xc4/0x150
[  253.933807]  ip_local_deliver+0x28b/0x380
[  253.933807]  ? ip_call_ra_chain+0x3e0/0x3e0
[  253.933807]  ? ip_rcv_finish+0x1610/0x1610
[  253.933807]  ip_rcv+0xbcc/0xcc0
[  253.933807]  ? debug_show_all_locks+0x290/0x290
[  253.933807]  ? ip_local_deliver+0x380/0x380
[  253.933807]  ? __lock_is_held+0x9d/0x130
[  253.933807]  ? ip_local_deliver+0x380/0x380
[  253.933807]  __netif_receive_skb_core+0x1c9c/0x2240

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-17 20:48:24 +02:00
Florian Westphal
a0ae2562c6 netfilter: conntrack: remove l3proto abstraction
This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
abstraction.

This gets rid of all l3proto indirect calls and the need to do
a lookup on the function to call for l3 demux.

It increases module size by only a small amount (12kbyte), so this reduces
size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
or nf_conntrack_ipv6 module.

before:
   text    data     bss     dec     hex filename
   7357    1088       0    8445    20fd nf_conntrack_ipv4.ko
   7405    1084       4    8493    212d nf_conntrack_ipv6.ko
  72614   13689     236   86539   1520b nf_conntrack.ko
 19K nf_conntrack_ipv4.ko
 19K nf_conntrack_ipv6.ko
179K nf_conntrack.ko

after:
   text    data     bss     dec     hex filename
  79277   13937     236   93450   16d0a nf_conntrack.ko
  191K nf_conntrack.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-17 15:27:49 +02:00