47231 Commits

Author SHA1 Message Date
Julien Gomes
5a645dd86c ipmr: add netlink notifications on igmpmsg cache reports
Add Netlink notifications on cache reports in ipmr, in addition to the
existing igmpmsg sent to mroute_sk.
Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV4_MROUTE_R.

MSGTYPE, VIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
same data as their equivalent fields in the igmpmsg header.
PKT attribute is the packet sent to mroute_sk, without the added igmpmsg
header.

Suggested-by: Ryan Halbrook <halbrook@arista.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 11:22:52 -04:00
Julien Gomes
5f729eaabe rtnetlink: add restricted rtnl groups for ipv4 and ipv6 mroute
Add RTNLGRP_{IPV4,IPV6}_MROUTE_R as two new restricted groups for the
NETLINK_ROUTE family.
Binding to these groups specifically requires CAP_NET_ADMIN to allow
multicast of sensitive messages (e.g. mroute cache reports).

Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 11:22:52 -04:00
Arnd Bergmann
083a0326ae tcp: md5: hide unused variable
Changing from a memcpy to per-member comparison left the
size variable unused:

net/ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup':
net/ipv4/tcp_ipv4.c:910:15: error: unused variable 'size' [-Werror=unused-variable]

This does not show up when CONFIG_IPV6 is enabled, but the
variable can be removed either way, along with the now unused
assignment.

Fixes: 6797318e623d ("tcp: md5: add an address prefix for key lookup")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 11:19:35 -04:00
WANG Cong
b4846fc3c8 igmp: add a missing spin_lock_init()
Andrey reported a lockdep warning on non-initialized
spinlock:

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 1 PID: 4099 Comm: a.out Not tainted 4.12.0-rc6+ #9
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:16
  dump_stack+0x292/0x395 lib/dump_stack.c:52
  register_lock_class+0x717/0x1aa0 kernel/locking/lockdep.c:755
  ? 0xffffffffa0000000
  __lock_acquire+0x269/0x3690 kernel/locking/lockdep.c:3255
  lock_acquire+0x22d/0x560 kernel/locking/lockdep.c:3855
  __raw_spin_lock_bh ./include/linux/spinlock_api_smp.h:135
  _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:175
  spin_lock_bh ./include/linux/spinlock.h:304
  ip_mc_clear_src+0x27/0x1e0 net/ipv4/igmp.c:2076
  igmpv3_clear_delrec+0xee/0x4f0 net/ipv4/igmp.c:1194
  ip_mc_destroy_dev+0x4e/0x190 net/ipv4/igmp.c:1736

We miss a spin_lock_init() in igmpv3_add_delrec(), probably
because previously we never use it on this code path. Since
we already unlink it from the global mc_tomb list, it is
probably safe not to acquire this spinlock here. It does not
harm to have it although, to avoid conditional locking.

Fixes: c38b7d327aaf ("igmp: acquire pmc lock for ip_mc_clear_src()")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 15:51:57 -04:00
Serhey Popovych
db833d40ad rtnetlink: add IFLA_GROUP to ifla_policy
Network interface groups support added while ago, however
there is no IFLA_GROUP attribute description in policy
and netlink message size calculations until now.

Add IFLA_GROUP attribute to the policy.

Fixes: cbda10fa97d7 ("net_device: add support for network device groups")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 15:36:16 -04:00
Serhey Popovych
07f615574f ipv6: Do not leak throw route references
While commit 73ba57bfae4a ("ipv6: fix backtracking for throw routes")
does good job on error propagation to the fib_rules_lookup()
in fib rules core framework that also corrects throw routes
handling, it does not solve route reference leakage problem
happened when we return -EAGAIN to the fib_rules_lookup()
and leave routing table entry referenced in arg->result.

If rule with matched throw route isn't last matched in the
list we overwrite arg->result losing reference on throw
route stored previously forever.

We also partially revert commit ab997ad40839 ("ipv6: fix the
incorrect return value of throw route") since we never return
routing table entry with dst.error == -EAGAIN when
CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point
to check for RTF_REJECT flag since it is always set throw
route.

Fixes: 73ba57bfae4a ("ipv6: fix backtracking for throw routes")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 15:34:02 -04:00
Xin Long
5ee8aa6897 sctp: handle errors when updating asoc
It's a bad thing not to handle errors when updating asoc. The memory
allocation failure in any of the functions called in sctp_assoc_update()
would cause sctp to work unexpectedly.

This patch is to fix it by aborting the asoc and reporting the error when
any of these functions fails.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 15:32:55 -04:00
Xin Long
8cd5c25f2d sctp: uncork the old asoc before changing to the new one
local_cork is used to decide if it should uncork asoc outq after processing
some cmds, and it is set when replying or sending msgs. local_cork should
always have the same value with current asoc q->cork in some way.

The thing is when changing to a new asoc by cmd SET_ASOC, local_cork may
not be consistent with the current asoc any more. The cmd seqs can be:

  SCTP_CMD_UPDATE_ASSOC (asoc)
  SCTP_CMD_REPLY (asoc)
  SCTP_CMD_SET_ASOC (new_asoc)
  SCTP_CMD_DELETE_TCB (new_asoc)
  SCTP_CMD_SET_ASOC (asoc)
  SCTP_CMD_REPLY (asoc)

The 1st REPLY makes OLD asoc q->cork and local_cork both are 1, and the cmd
DELETE_TCB clears NEW asoc q->cork and local_cork. After asoc goes back to
OLD asoc, q->cork is still 1 while local_cork is 0. The 2nd REPLY will not
set local_cork because q->cork is already set and it can't be uncorked and
sent out because of this.

To keep local_cork consistent with the current asoc q->cork, this patch is
to uncork the old asoc if local_cork is set before changing to the new one.

Note that the above cmd seqs will be used in the next patch when updating
asoc and handling errors in it.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 15:32:55 -04:00
Xin Long
a0f9a4c2ff dccp: call inet_add_protocol after register_pernet_subsys in dccp_v6_init
Patch "call inet_add_protocol after register_pernet_subsys in dccp_v4_init"
fixed a null pointer dereference issue for dccp_ipv4 module.

The same fix is needed for dccp_ipv6 module.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 15:31:43 -04:00
Xin Long
d5494acb88 dccp: call inet_add_protocol after register_pernet_subsys in dccp_v4_init
Now dccp_ipv4 works as a kernel module. During loading this module, if
one dccp packet is being recieved after inet_add_protocol but before
register_pernet_subsys in which v4_ctl_sk is initialized, a null pointer
dereference may be triggered because of init_net.dccp.v4_ctl_sk is 0x0.

Jianlin found this issue when the following call trace occurred:

[  171.950177] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
[  171.951007] IP: [<ffffffffc0558364>] dccp_v4_ctl_send_reset+0xc4/0x220 [dccp_ipv4]
[...]
[  171.984629] Call Trace:
[  171.984859]  <IRQ>
[  171.985061]
[  171.985213]  [<ffffffffc0559a53>] dccp_v4_rcv+0x383/0x3f9 [dccp_ipv4]
[  171.985711]  [<ffffffff815ca054>] ip_local_deliver_finish+0xb4/0x1f0
[  171.986309]  [<ffffffff815ca339>] ip_local_deliver+0x59/0xd0
[  171.986852]  [<ffffffff810cd7a4>] ? update_curr+0x104/0x190
[  171.986956]  [<ffffffff815c9cda>] ip_rcv_finish+0x8a/0x350
[  171.986956]  [<ffffffff815ca666>] ip_rcv+0x2b6/0x410
[  171.986956]  [<ffffffff810c83b4>] ? task_cputime+0x44/0x80
[  171.986956]  [<ffffffff81586f22>] __netif_receive_skb_core+0x572/0x7c0
[  171.986956]  [<ffffffff810d2c51>] ? trigger_load_balance+0x61/0x1e0
[  171.986956]  [<ffffffff81587188>] __netif_receive_skb+0x18/0x60
[  171.986956]  [<ffffffff8158841e>] process_backlog+0xae/0x180
[  171.986956]  [<ffffffff8158799d>] net_rx_action+0x16d/0x380
[  171.986956]  [<ffffffff81090b7f>] __do_softirq+0xef/0x280
[  171.986956]  [<ffffffff816b6a1c>] call_softirq+0x1c/0x30

This patch is to move inet_add_protocol after register_pernet_subsys in
dccp_v4_init, so that v4_ctl_sk is initialized before any incoming dccp
packets are processed.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 15:31:43 -04:00
Matthias Schiffer
dc5321d796 vxlan: get rid of redundant vxlan_dev.flags
There is no good reason to keep the flags twice in vxlan_dev and
vxlan_config.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 13:37:02 -04:00
yuan linyu
b952f4dff2 net: manual clean code which call skb_put_[data:zero]
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 13:30:15 -04:00
yuan linyu
de77b966ce net: introduce __skb_put_[zero, data, u8]
follow Johannes Berg, semantic patch file as below,
@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = __skb_put(skb, len);
+p = __skb_put_zero(skb, len);
|
-p = (t)__skb_put(skb, len);
+p = __skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
identifier p;
expression len;
expression skb;
type t;
@@
(
-t p = __skb_put(skb, len);
+t p = __skb_put_zero(skb, len);
)
... when != p
(
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = __skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
|
-p = (t *)__skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(__skb_put(skb, len), 0, len);
+__skb_put_zero(skb, len);

@@
expression skb, len, data;
@@
-memcpy(__skb_put(skb, len), data, len);
+__skb_put_data(skb, data, len);

@@
expression SKB, C, S;
typedef u8;
identifier fn = {__skb_put};
fresh identifier fn2 = fn ## "_u8";
@@
- *(u8 *)fn(SKB, S) = C;
+ fn2(SKB, C);

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 13:30:14 -04:00
Sebastian Siewior
fe420d87bb net/core: remove explicit do_softirq() from busy_poll_stop()
Since commit 217f69743681 ("net: busy-poll: allow preemption in
sk_busy_loop()") there is an explicit do_softirq() invocation after
local_bh_enable() has been invoked.
I don't understand why we need this because local_bh_enable() will
invoke do_softirq() once the softirq counter reached zero and we have
softirq-related work pending.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 13:09:33 -04:00
Serhey Popovych
bdaf32c3ce fib_rules: Resolve goto rules target on delete
We should avoid marking goto rules unresolved when their
target is actually reachable after rule deletion.

Consolder following sample scenario:

  # ip -4 ru sh
  0:      from all lookup local
  32000:  from all goto 32100
  32100:  from all lookup main
  32100:  from all lookup default
  32766:  from all lookup main
  32767:  from all lookup default

  # ip -4 ru del pref 32100 table main
  # ip -4 ru sh
  0:      from all lookup local
  32000:  from all goto 32100 [unresolved]
  32100:  from all lookup default
  32766:  from all lookup main
  32767:  from all lookup default

After removal of first rule with preference 32100 we
mark all goto rules as unreachable, even when rule with
same preference as removed one still present.

Check if next rule with same preference is available
and make all rules with goto action pointing to it.

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 12:39:18 -04:00
Ingo Molnar
ac6424b981 sched/wait: Rename wait_queue_t => wait_queue_entry_t
Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:18:27 +02:00
Jason A. Donenfeld
7aed9f72c3 net/route: use get_random_int for random counter
Using get_random_int here is faster, more fitting of the use case, and
just as cryptographically secure. It also has the benefit of providing
better randomness at early boot, which is when many of these structures
are assigned.

Also, semantically, it's not really proper to have been assigning an
atomic_t in this way before, even if in practice it works fine.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-19 22:06:28 -04:00
Jason A. Donenfeld
b3d0f7895d net/neighbor: use get_random_u32 for 32-bit hash random
Using get_random_u32 here is faster, more fitting of the use case, and
just as cryptographically secure. It also has the benefit of providing
better randomness at early boot, which is when many of these structures
are assigned.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-19 22:06:28 -04:00
Jason A. Donenfeld
ae5b806a3d ceph: ensure RNG is seeded before using
Ceph uses the RNG for various nonce generations, and it shouldn't accept
using bad randomness. So, we wait for the RNG to be properly seeded. We
do this by calling wait_for_random_bytes() in a function that is
certainly called in process context, early on, so that all subsequent
calls to get_random_bytes are necessarily acceptable.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-19 22:06:28 -04:00
Xin Long
86fdb3448c sctp: ensure ep is not destroyed before doing the dump
Now before dumping a sock in sctp_diag, it only holds the sock while
the ep may be already destroyed. It can cause a use-after-free panic
when accessing ep->asocs.

This patch is to set sctp_sk(sk)->ep NULL in sctp_endpoint_destroy,
and check if this ep is already destroyed before dumping this ep.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdrver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-19 15:13:43 -04:00
Gao Feng
9745e362ad net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
The register_vlan_device would invoke free_netdev directly, when
register_vlan_dev failed. It would trigger the BUG_ON in free_netdev
if the dev was already registered. In this case, the netdev would be
freed in netdev_run_todo later.

So add one condition check now. Only when dev is not registered, then
free it directly.

The following is the part coredump when netdev_upper_dev_link failed
in register_vlan_dev. I removed the lines which are too long.

[  411.237457] ------------[ cut here ]------------
[  411.237458] kernel BUG at net/core/dev.c:7998!
[  411.237484] invalid opcode: 0000 [#1] SMP
[  411.237705]  [last unloaded: 8021q]
[  411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G            E   4.12.0-rc5+ #6
[  411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000
[  411.237782] RIP: 0010:free_netdev+0x116/0x120
[  411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297
[  411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878
[  411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000
[  411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801
[  411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000
[  411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000
[  411.239518] FS:  00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000
[  411.239949] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0
[  411.240936] Call Trace:
[  411.241462]  vlan_ioctl_handler+0x3f1/0x400 [8021q]
[  411.241910]  sock_ioctl+0x18b/0x2c0
[  411.242394]  do_vfs_ioctl+0xa1/0x5d0
[  411.242853]  ? sock_alloc_file+0xa6/0x130
[  411.243465]  SyS_ioctl+0x79/0x90
[  411.243900]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[  411.244425] RIP: 0033:0x7fb69089a357
[  411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357
[  411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003
[  411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999
[  411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004
[  411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001
[  411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-19 14:10:20 -04:00
Ivan Delalande
8917a777be tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix
Replace first padding in the tcp_md5sig structure with a new flag field
and address prefix length so it can be specified when configuring a new
key for TCP MD5 signature. The tcpm_flags field will only be used if the
socket option is TCP_MD5SIG_EXT to avoid breaking existing programs, and
tcpm_prefixlen only when the TCP_MD5SIG_FLAG_PREFIX flag is set.

Signed-off-by: Bob Gilligan <gilligan@arista.com>
Signed-off-by: Eric Mowat <mowat@arista.com>
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-19 13:51:34 -04:00
Ivan Delalande
6797318e62 tcp: md5: add an address prefix for key lookup
This allows the keys used for TCP MD5 signature to be used for whole
range of addresses, specified with a prefix length, instead of only one
address as it currently is.

Signed-off-by: Bob Gilligan <gilligan@arista.com>
Signed-off-by: Eric Mowat <mowat@arista.com>
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-19 13:50:55 -04:00
Pablo Neira Ayuso
04ba724b65 netfilter: nfnetlink: extended ACK reporting
Pass down struct netlink_ext_ack as parameter to all of our nfnetlink
subsystem callbacks, so we can work on follow up patches to provide
finer grain error reporting using the new infrastructure that
2d4bc93368f5 ("netlink: extended ACK reporting") provides.

No functional change, just pass down this new object to callbacks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:38:24 +02:00
Florian Westphal
d8297d4f3e netfilter: nf_tables: reduce chain type table size
text  data  bss     dec    hex filename
old: 151590  2240 1152  154982  25d66 net/netfilter/nf_tables_api.o
new: 151666  2240  416  154322  25ad2 net/netfilter/nf_tables_api.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:20:59 +02:00
Florian Westphal
b7b5fda468 netfilter: conntrack: use NFPROTO_MAX to size array
We don't support anything larger than NFPROTO_MAX, so we can shrink this a bit:

     text data  dec  hex filename
old: 8259 1096 9355 248b net/netfilter/nf_conntrack_proto.o
new: 8259  624 8883 22b3 net/netfilter/nf_conntrack_proto.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:20:49 +02:00
Liping Zhang
d53e3fc390 netfilter: use nf_conntrack_helpers_register when possible
amanda_helper, nf_conntrack_helper_ras and nf_conntrack_helper_q931 are
all arrays, so we can use nf_conntrack_helpers_register to register
the ct helper, this will help us to eliminate some "goto errX"
statements.

Also introduce h323_helper_init/exit helper function to register the ct
helpers, this is prepared for the followup patch, which will add net
namespace support for ct helper.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:13:21 +02:00
Jike Song
2becbbc547 netfilter, kbuild: use canonical method to specify objs.
Should use ":=" instead of "+=".

Signed-off-by: Jike Song <jike.song@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:09:20 +02:00
Gao Feng
e15b9c50c4 netfilter: ebt: Use new helper ebt_invalid_target to check target
Use the new helper function ebt_invalid_target instead of the old
macro INVALID_TARGET and other duplicated codes to enhance the readability.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:09:19 +02:00
Florian Westphal
7866cc57b5 netns: add and use net_ns_barrier
Quoting Joe Stringer:
  If a user loads nf_conntrack_ftp, sends FTP traffic through a network
  namespace, destroys that namespace then unloads the FTP helper module,
  then the kernel will crash.

Events that lead to the crash:
1. conntrack is created with ftp helper in netns x
2. This netns is destroyed
3. netns destruction is scheduled
4. netns destruction wq starts, removes netns from global list
5. ftp helper is unloaded, which resets all helpers of the conntracks
via for_each_net()

but because netns is already gone from list the for_each_net() loop
doesn't include it, therefore all of these conntracks are unaffected.

6. helper module unload finishes
7. netns wq invokes destructor for rmmod'ed helper

CC: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:09:19 +02:00
Florian Westphal
2c41f33c1b netfilter: move table iteration out of netns exit paths
We only need to iterate & remove in case of module removal;
for netns destruction all conntracks will be removed anyway.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:09:19 +02:00
Xin Long
202f59afd4 netfilter: ipt_CLUSTERIP: do not hold dev
It's a terrible thing to hold dev in iptables target. When the dev is
being removed, unregister_netdevice has to wait for the dev to become
free. dmesg will keep logging the err:

  kernel:unregister_netdevice: waiting for veth0_in to become free. \
  Usage count = 1

until iptables rules with this target are removed manually.

The worse thing is when deleting a netns, a virtual nic will be deleted
instead of reset to init_net in default_device_ops exit/exit_batch. As
it is earlier than to flush the iptables rules in iptable_filter_net_ops
exit, unregister_netdevice will block to wait for the nic to become free.

As unregister_netdevice is actually waiting for iptables rules flushing
while iptables rules have to be flushed after unregister_netdevice. This
'dead lock' will cause unregister_netdevice to block there forever. As
the netns is not available to operate at that moment, iptables rules can
not even be flushed manually either.

The reproducer can be:

  # ip netns add test
  # ip link add veth0_in type veth peer name veth0_out
  # ip link set veth0_in netns test
  # ip netns exec test ip link set lo up
  # ip netns exec test ip link set veth0_in up
  # ip netns exec test iptables -I INPUT -d 1.2.3.4 -i veth0_in -j \
    CLUSTERIP --new --clustermac 89:d4:47:eb:9a:fa --total-nodes 3 \
    --local-node 1 --hashmode sourceip-sourceport
  # ip netns del test

This issue can be triggered by all virtual nics with ipt_CLUSTERIP.

This patch is to fix it by not holding dev in ipt_CLUSTERIP, but saving
the dev->ifindex instead of the dev.

As Pablo Neira Ayuso's suggestion, it will refresh c->ifindex and dev's
mc by registering a netdevice notifier, just as what xt_TEE does. So it
removes the old codes updating dev's mc, and also no need to initialize
c->ifindex with dev->ifindex.

But as one config can be shared by more than one targets, and the netdev
notifier is per config, not per target. It couldn't get e->ip.iniface
in the notifier handler. So e->ip.iniface has to be saved into config.

Note that for backwards compatibility, this patch doesn't remove the
codes checking if the dev exists before creating a config.

v1->v2:
  - As Pablo Neira Ayuso's suggestion, register a netdevice notifier to
    manage c->ifindex and dev's mc.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19 19:06:20 +02:00
David S. Miller
4b153ca989 Here's just the fix for that ancient bug:
* remove wext calling ndo_do_ioctl, since nobody needs
    that now and it makes the type change easier
  * use struct iwreq instead of struct ifreq almost everywhere
    in wireless extensions code
  * copy only struct iwreq from userspace in dev_ioctl for the
    wireless extensions, since it's smaller than struct ifreq
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAllDhUsACgkQa3t4Rpy0
 AB2ttw//SepU66meFuzYy+bFbR38Q2sguKADmSN9jjng3oPyKhHKfEJwRusZZ3zg
 eEIk/NNB2iPTMLSaa4kR1Wclcae0jq5KgO8HwJBLvS7peCXWKx03vnP4Dy7yJ/6U
 VvOk+3JoudnQFhdDnIg+RVsGbwLx0hlq2l727U1Sp6kFyChK2etLikPzVKkEgVnG
 2R/l1BhDqXdQ6Lh7nXWa6O9pwaqkpnPOuJvipJzmUQRB/4GBNjBxSK6J+ac98sm6
 +KCCONBvBBMBago0xySTVURzMTrhW2UH1cE6ITQYjlShB/zsyilYkECvFzOSAYZL
 u9ob1yCAmZwDqhtvEUSi7CEfLtcO43I0XDF4oL00xfmYD9alm9dJPAlvZ1ihsrw7
 ojBDjyykUstWRSeP8zETTdYDIMSPVsed1Y6NzQiy+el/6U3//+o2FcOShqUh89lx
 OIlQwX5i9LBRC/POQ6L8R4VPelNZ/czKMNlq1Z+ubNM9i3PT/8gGf6WapbMPpNUk
 AqAsB13tR17QmLjNpdVxHtoNvD9aceYaFkN+GXRNSb3pJNoJouedx6d5maFYJAju
 GRdZXBV14Z7bamKB3x9EAjpD3DHplJw4m8BvwnBr9zWkGyAvoNsHIC5h8ynzjWSp
 J7KpXPB9IKX6ne+1gCNrrPod2AmK4sWIaAT/SaWMCoHjV4m74k4=
 =O240
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Here's just the fix for that ancient bug:
 * remove wext calling ndo_do_ioctl, since nobody needs
   that now and it makes the type change easier
 * use struct iwreq instead of struct ifreq almost everywhere
   in wireless extensions code
 * copy only struct iwreq from userspace in dev_ioctl for the
   wireless extensions, since it's smaller than struct ifreq
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-19 00:03:51 -04:00
Haishuang Yan
46f8cd9d2f ip6_tunnel: Correct tos value in collect_md mode
Same as ip_gre, geneve and vxlan, use key->tos as traffic class value.

CC: Peter Dawson <petedaws@gmail.com>
Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets”)
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-18 23:56:57 -04:00
Johan Hovold
20777bc57c NFC: fix broken device allocation
Commit 7eda8b8e9677 ("NFC: Use IDR library to assing NFC devices IDs")
moved device-id allocation and struct-device initialisation from
nfc_allocate_device() to nfc_register_device().

This broke just about every nfc-device-registration error path, which
continue to call nfc_free_device() that tries to put the device
reference of the now uninitialised (but zeroed) struct device:

kobject: '(null)' (ce316420): is not initialized, yet kobject_put() is being called.

The late struct-device initialisation also meant that various work
queues whose names are derived from the nfc device name were also
misnamed:

  421 root         0 SW<  [(null)_nci_cmd_]
  422 root         0 SW<  [(null)_nci_rx_w]
  423 root         0 SW<  [(null)_nci_tx_w]

Move the id-allocation and struct-device initialisation back to
nfc_allocate_device() and fix up the single call site which did not use
nfc_free_device() in its error path.

Fixes: 7eda8b8e9677 ("NFC: Use IDR library to assing NFC devices IDs")
Cc: stable <stable@vger.kernel.org>     # 3.8
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-06-18 23:57:58 +02:00
Florian Fainelli
06d4d450db net: dsa: Fix legacy probing
After commit 6d3c8c0dd88a ("net: dsa: Remove master_netdev and
use dst->cpu_dp->netdev") and a29342e73911 ("net: dsa: Associate
slave network device with CPU port") we would be seeing NULL pointer
dereferences when accessing dst->cpu_dp->netdev too early. In the legacy
code, we actually know early in advance the master network device, so
pass it down to the relevant functions.

Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev")
Fixes: a29342e73911 ("net: dsa: Associate slave network device with CPU port")
Reported-by: Jason Cobham <jcobham@questertangent.com>
Tested-by: Jason Cobham <jcobham@questertangent.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:59:45 -04:00
Dave Watson
d807ec656f tls: update Kconfig
Missing crypto deps for some platforms.
Default to n for new module.

config: m68k-amcore_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0

make.cross ARCH=m68k
All errors (new ones prefixed by >>):

   net/built-in.o: In function `tls_set_sw_offload':
>> (.text+0x732f8): undefined reference to `crypto_alloc_aead'
   net/built-in.o: In function `tls_set_sw_offload':
>> (.text+0x7333c): undefined reference to `crypto_aead_setkey'
   net/built-in.o: In function `tls_set_sw_offload':
>> (.text+0x73354): undefined reference to `crypto_aead_setauthsize'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:56:46 -04:00
Wei Wang
a4c2fd7f78 net: remove DST_NOCACHE flag
DST_NOCACHE flag check has been removed from dst_release() and
dst_hold_safe() in a previous patch because all the dst are now ref
counted properly and can be released based on refcnt only.
Looking at the rest of the DST_NOCACHE use, all of them can now be
removed or replaced with other checks.
So this patch gets rid of all the DST_NOCACHE usage and remove this flag
completely.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
b2a9c0ed75 net: remove DST_NOGC flag
Now that all the components have been changed to release dst based on
refcnt only and not depend on dst gc anymore, we can remove the
temporary flag DST_NOGC.

Note that we also need to remove the DST_NOCACHE check in dst_release()
and dst_hold_safe() because now all the dst are released based on refcnt
and behaves as DST_NOCACHE.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
5b7c9a8ff8 net: remove dst gc related code
This patch removes all dst gc related code and all the dst free
functions

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
560fd93bca decnet: take dst->__refcnt when struct dn_route is created
struct dn_route is inserted into dn_rt_hash_table but no dst->__refcnt
is taken.
This patch makes sure the dn_rt_hash_table's reference to the dst is ref
counted.

As the dst is always ref counted properly, we can safely mark
DST_NOGC flag so dst_release() will release dst based on refcnt only.
And dst gc is no longer needed and all dst_free() or its related
function calls should be replaced with dst_release() or
dst_release_immediate(). And dst_dev_put() is called when removing dst
from the hash table to release the reference on dst->dev before we lose
pointer to it.

Also, correct the logic in dn_dst_check_expire() and dn_dst_gc() to
check dst->__refcnt to be > 1 to indicate it is referenced by other
users.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:01 -04:00
Wei Wang
52df157f17 xfrm: take refcnt of dst when creating struct xfrm_dst bundle
During the creation of xfrm_dst bundle, always take ref count when
allocating the dst. This way, xfrm_bundle_create() will form a linked
list of dst with dst->child pointing to a ref counted dst child. And
the returned dst pointer is also ref counted. This makes the link from
the flow cache to this dst now ref counted properly.
As the dst is always ref counted properly, we can safely mark
DST_NOGC flag so dst_release() will release dst based on refcnt only.
And dst gc is no longer needed and all dst_free() and its related
function calls should be replaced with dst_release() or
dst_release_immediate().

The special handling logic for dst->child in dst_destroy() can be
replaced with a simple dst_release_immediate() call on the child to
release the whole list linked by dst->child pointer.
Previously used DST_NOHASH flag is not needed anymore as well. The
reason that DST_NOHASH is used in the existing code is mainly to prevent
the dst inserted in the fib tree to be wrongly destroyed during the
deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH
flag is marked in all the dst children except the one which is in the
fib tree.
However, with this patch series to remove dst gc logic and release dst
only based on ref count, it is safe to release all the children from a
xfrm_dst bundle as long as the dst children are all ref counted
properly which is already the case in the existing code.
So, this patch removes the use of DST_NOHASH flag.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
db916649b5 ipv6: get rid of icmp6 dst garbage collector
icmp6 dst route is currently ref counted during creation and will be
freed by user during its call of dst_release(). So no need of a garbage
collector for it.
Remove all icmp6 dst garbage collector related code.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
587fea7411 ipv6: mark DST_NOGC and remove the operation of dst_free()
With the previous preparation patches, we are ready to get rid of the
dst gc operation in ipv6 code and release dst based on refcnt only.
So this patch adds DST_NOGC flag for all IPv6 dst and remove the calls
to dst_free() and its related functions.
At this point, all dst created in ipv6 code do not use the dst gc
anymore and will be destroyed at the point when refcnt drops to 0.

Also, as icmp6 dst route is refcounted during creation and will be freed
by user during its call of dst_release(), there is no need to add this
dst to the icmp6 gc list as well.
Instead, we need to add it into uncached list so that when a
NETDEV_DOWN/NETDEV_UNREGISRER event comes, we can properly go through
these icmp6 dst as well and release the net device properly.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
ad65a2f056 ipv6: call dst_hold_safe() properly
Similar as ipv4, ipv6 path also needs to call dst_hold_safe() when
necessary to avoid double free issue on the dst.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
9514528d92 ipv6: call dst_dev_put() properly
As the intend of this patch series is to completely remove dst gc,
we need to call dst_dev_put() to release the reference to dst->dev
when removing routes from fib because we won't keep the gc list anymore
and will lose the dst pointer right after removing the routes.
Without the gc list, there is no way to find all the dst's that have
dst->dev pointing to the going-down dev.
Hence, we are doing dst_dev_put() immediately before we lose the last
reference of the dst from the routing code. The next dst_check() will
trigger a route re-lookup to find another route (if there is any).

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
1cfb71eeb1 ipv6: take dst->__refcnt for insertion into fib6 tree
In IPv6 routing code, struct rt6_info is created for each static route
and RTF_CACHE route and inserted into fib6 tree. In both cases, dst
ref count is not taken.
As explained in the previous patch, this leads to the need of the dst
garbage collector.

This patch holds ref count of dst before inserting the route into fib6
tree and properly releases the dst when deleting it from the fib6 tree
as a preparation in order to fully get rid of dst gc later.

Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate
no user is referencing the dst.

And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts
dst->__refcnt to 1.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
b838d5e1c5 ipv4: mark DST_NOGC and remove the operation of dst_free()
With the previous preparation patches, we are ready to get rid of the
dst gc operation in ipv4 code and release dst based on refcnt only.
So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls
to dst_free().
At this point, all dst created in ipv4 code do not use the dst gc
anymore and will be destroyed at the point when refcnt drops to 0.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
9df16efadd ipv4: call dst_hold_safe() properly
This patch checks all the calls to
dst_hold()/skb_dst_force()/dst_clone()/dst_use() to see if
dst_hold_safe() is needed to avoid double free issue if dst
gc is removed and dst_release() directly destroys dst when
dst->__refcnt drops to 0.

In tx path, TCP hold sk->sk_rx_dst ref count and also hold sock_lock().
UDP and other similar protocols always hold refcount for
skb->_skb_refdst. So both paths seem to be safe.

In rx path, as it is lockless and skb_dst_set_noref() is likely to be
used, dst_hold_safe() should always be used when trying to hold dst.

In the routing code, if dst is held during an rcu protected session, it
is necessary to call dst_hold_safe() as the current dst might be in its
rcu grace period.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:54:00 -04:00
Wei Wang
95c47f9cf5 ipv4: call dst_dev_put() properly
As the intend of this patch series is to completely remove dst gc,
we need to call dst_dev_put() to release the reference to dst->dev
when removing routes from fib because we won't keep the gc list anymore
and will lose the dst pointer right after removing the routes.
Without the gc list, there is no way to find all the dst's that have
dst->dev pointing to the going-down dev.
Hence, we are doing dst_dev_put() immediately before we lose the last
reference of the dst from the routing code. The next dst_check() will
trigger a route re-lookup to find another route (if there is any).

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17 22:53:59 -04:00