50742 Commits

Author SHA1 Message Date
kbuild test robot
da18ab32d7 tipc: tipc_disc_addr_trial_msg() can be static
Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Jon Maloy jon.maloy@ericsson.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25 21:21:43 -04:00
Paolo Abeni
10b8a3de60 ipv6: the entire IPv6 header chain must fit the first fragment
While building ipv6 datagram we currently allow arbitrary large
extheaders, even beyond pmtu size. The syzbot has found a way
to exploit the above to trigger the following splat:

kernel BUG at ./include/linux/skbuff.h:2073!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline]
RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636
RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293
RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18
RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000
R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6
R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0
FS:  0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  ip6_finish_skb include/net/ipv6.h:969 [inline]
  udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073
  udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343
  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:640
  ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046
  __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136
  SYSC_sendmmsg net/socket.c:2167 [inline]
  SyS_sendmmsg+0x35/0x60 net/socket.c:2162
  do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4404c9
RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9
RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003
RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0
R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000
Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29
5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d
87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe
RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0
RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP:
ffff8801bc18f0f0

As stated by RFC 7112 section 5:

   When a host fragments an IPv6 datagram, it MUST include the entire
   IPv6 Header Chain in the First Fragment.

So this patch addresses the issue dropping datagrams with excessive
extheader length. It also updates the error path to report to the
calling socket nonnegative pmtu values.

The issue apparently predates git history.

v1 -> v2: cleanup error path, as per Eric's suggestion

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25 21:17:20 -04:00
Alexander Potapenko
7880287981 netlink: make sure nladdr has correct size in netlink_connect()
KMSAN reports use of uninitialized memory in the case when |alen| is
smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
fully copied from the userspace.

Signed-off-by: Alexander Potapenko <glider@google.com>
Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25 21:14:51 -04:00
Hans Wippel
bc58a1baf2 net/ipv4: disable SMC TCP option with SYN Cookies
Currently, the SMC experimental TCP option in a SYN packet is lost on
the server side when SYN Cookies are active. However, the corresponding
SYNACK sent back to the client contains the SMC option. This causes an
inconsistent view of the SMC capabilities on the client and server.

This patch disables the SMC option in the SYNACK when SYN Cookies are
active to avoid this issue.

Fixes: 60e2a7780793b ("tcp: TCP experimental option for SMC")
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25 20:53:54 -04:00
Yonghong Song
13acc94eff net: permit skb_segment on head_frag frag_list skb
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
function skb_segment(), line 3667. The bpf program attaches to
clsact ingress, calls bpf_skb_change_proto to change protocol
from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
to send the changed packet out.

3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3473                             netdev_features_t features)
3474 {
3475         struct sk_buff *segs = NULL;
3476         struct sk_buff *tail = NULL;
...
3665                 while (pos < offset + len) {
3666                         if (i >= nfrags) {
3667                                 BUG_ON(skb_headlen(list_skb));
3668
3669                                 i = 0;
3670                                 nfrags = skb_shinfo(list_skb)->nr_frags;
3671                                 frag = skb_shinfo(list_skb)->frags;
3672                                 frag_skb = list_skb;
...

call stack:
...
 #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
 #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
 #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
 #4 [ffff883ffef03668] die at ffffffff8101deb2
 #5 [ffff883ffef03698] do_trap at ffffffff8101a700
 #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
 #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
 #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
    [exception RIP: skb_segment+3044]
    RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
    RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
    RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
    RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
    R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
    R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25 16:46:04 -04:00
David S. Miller
b9ee96b45f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Don't pick fixed hash implementation for NFT_SET_EVAL sets, otherwise
   userspace hits EOPNOTSUPP with valid rules using the meter statement,
   from Florian Westphal.

2) If you send a batch that flushes the existing ruleset (that contains
   a NAT chain) and the new ruleset definition comes with a new NAT
   chain, don't bogusly hit EBUSY. Also from Florian.

3) Missing netlink policy attribute validation, from Florian.

4) Detach conntrack template from skbuff if IP_NODEFRAG is set on,
   from Paolo Abeni.

5) Cache device names in flowtable object, otherwise we may end up
   walking over devices going aways given no rtnl_lock is held.

6) Fix incorrect net_device ingress with ingress hooks.

7) Fix crash when trying to read more data than available in UDP
   packets from the nf_socket infrastructure, from Subash.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-24 17:10:01 -04:00
Subash Abhinov Kasiviswanathan
32c1733f0d netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6}
skb_header_pointer will copy data into a buffer if data is non linear,
otherwise it will return a pointer in the linear section of the data.
nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later
accesses memory within the size of tcphdr (th->doff) in case of TCP
packets. This causes a crash when running with KASAN with the following
call stack -

BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718
net/netfilter/xt_socket.c:178
Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971
CPU: 2 PID: 28971 Comm: syz-executor Tainted: G    B   W  O    4.9.65+ #1
Call trace:
[<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76
[<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226
[<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51
[<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248
[<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline]
[<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371
[<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372
[<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline]
[<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739
[<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline]
[<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178

Fix this by copying data into appropriate size headers based on protocol.

Fixes: a583636a83ea ("inet: refactor inet[6]_lookup functions to take skb")
Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-24 21:17:14 +01:00
Linus Lüssing
a752c0a452 batman-adv: fix packet loss for broadcasted DHCP packets to a server
DHCP connectivity issues can currently occur if the following conditions
are met:

1) A DHCP packet from a client to a server
2) This packet has a multicast destination
3) This destination has a matching entry in the translation table
   (FF:FF:FF:FF:FF:FF for IPv4, 33:33:00:01:00:02/33:33:00:01:00:03
    for IPv6)
4) The orig-node determined by TT for the multicast destination
   does not match the orig-node determined by best-gateway-selection

In this case the DHCP packet will be dropped.

The "gateway-out-of-range" check is supposed to only be applied to
unicasted DHCP packets to a specific DHCP server.

In that case dropping the the unicasted frame forces the client to
retry via a broadcasted one, but now directed to the new best
gateway.

A DHCP packet with broadcast/multicast destination is already ensured to
always be delivered to the best gateway. Dropping a multicasted
DHCP packet here will only prevent completing DHCP as there is no
other fallback.

So far, it seems the unicast check was implicitly performed by
expecting the batadv_transtable_search() to return NULL for multicast
destinations. However, a multicast address could have always ended up in
the translation table and in fact is now common.

To fix this potential loss of a DHCP client-to-server packet to a
multicast address this patch adds an explicit multicast destination
check to reliably bail out of the gateway-out-of-range check for such
destinations.

The issue and fix were tested in the following three node setup:

- Line topology, A-B-C
- A: gateway client, DHCP client
- B: gateway server, hop-penalty increased: 30->60, DHCP server
- C: gateway server, code modifications to announce FF:FF:FF:FF:FF:FF

Without this patch, A would never transmit its DHCP Discover packet
due to an always "out-of-range" condition. With this patch,
a full DHCP handshake between A and B was possible again.

Fixes: be7af5cf9cae ("batman-adv: refactoring gateway handling code")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-03-24 10:25:49 +01:00
Linus Lüssing
f8fb3419ea batman-adv: fix multicast-via-unicast transmission with AP isolation
For multicast frames AP isolation is only supposed to be checked on
the receiving nodes and never on the originating one.

Furthermore, the isolation or wifi flag bits should only be intepreted
as such for unicast and never multicast TT entries.

By injecting flags to the multicast TT entry claimed by a single
target node it was verified in tests that this multicast address
becomes unreachable, leading to packet loss.

Omitting the "src" parameter to the batadv_transtable_search() call
successfully skipped the AP isolation check and made the target
reachable again.

Fixes: 1d8ab8d3c176 ("batman-adv: Modified forwarding behaviour for multicast packets")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-03-24 10:25:07 +01:00
Davide Caratti
94cb549240 net/sched: act_vlan: declare push_vid with host byte order
use u16 in place of __be16 to suppress the following sparse warnings:

 net/sched/act_vlan.c:150:26: warning: incorrect type in assignment (different base types)
 net/sched/act_vlan.c:150:26: expected restricted __be16 [usertype] push_vid
 net/sched/act_vlan.c:150:26: got unsigned short
 net/sched/act_vlan.c:151:21: warning: restricted __be16 degrades to integer
 net/sched/act_vlan.c:208:26: warning: incorrect type in assignment (different base types)
 net/sched/act_vlan.c:208:26: expected unsigned short [unsigned] [usertype] tcfv_push_vid
 net/sched/act_vlan.c:208:26: got restricted __be16 [usertype] push_vid

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 21:54:07 -04:00
Davide Caratti
affaa0c724 net/sched: remove tcf_idr_cleanup()
tcf_idr_cleanup() is no more used, so remove it.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 21:52:19 -04:00
Eric Dumazet
1bfa26ff8c ipv6: fix possible deadlock in rt6_age_examine_exception()
syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception()

rt6_age_examine_exception() is called while rt6_exception_lock is held.
This lock is the lower one in the lock hierarchy, thus we can not
call dst_neigh_lookup() function, as it can fallback to neigh_create()

We should instead do a pure RCU lookup. As a bonus we avoid
a pair of atomic operations on neigh refcount.

[1]

WARNING: possible circular locking dependency detected
4.16.0-rc4+ #277 Not tainted

syz-executor7/4015 is trying to acquire lock:
 (&ndev->lock){++--}, at: [<00000000416dce19>] __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928

but task is already holding lock:
 (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&tbl->lock){++-.}:
       __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
       _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
       __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528
       neigh_create include/net/neighbour.h:315 [inline]
       ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228
       dst_neigh_lookup include/net/dst.h:405 [inline]
       rt6_age_examine_exception net/ipv6/route.c:1609 [inline]
       rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645
       fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033
       fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919
       fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845
       fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893
       fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970
       __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986
       fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline]
       fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053
       ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
       inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
       sock_do_ioctl+0xef/0x390 net/socket.c:957
       sock_ioctl+0x36b/0x610 net/socket.c:1081
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> #2 (rt6_exception_lock){+.-.}:
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
       _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
       spin_lock_bh include/linux/spinlock.h:315 [inline]
       rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367
       fib6_del_route net/ipv6/ip6_fib.c:1677 [inline]
       fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761
       __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980
       ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993
       __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332
       ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline]
       ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200
       inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433
       sock_release+0x8d/0x1e0 net/socket.c:594
       sock_close+0x16/0x20 net/socket.c:1149
       __fput+0x327/0x7e0 fs/file_table.c:209
       ____fput+0x15/0x20 fs/file_table.c:243
       task_work_run+0x199/0x270 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x9bb/0x1ad0 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:968
       get_signal+0x73a/0x16d0 kernel/signal.c:2469
       do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
       exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
       prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
       do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> #1 (&(&tb->tb6_lock)->rlock){+.-.}:
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
       _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
       spin_lock_bh include/linux/spinlock.h:315 [inline]
       __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007
       ip6_route_add+0x141/0x190 net/ipv6/route.c:2955
       addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359
       fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline]
       addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline]
       addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357
       rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965
       rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641
       netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659
       netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
       netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:639
       ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
       __sys_sendmsg+0xe5/0x210 net/socket.c:2081
       SYSC_sendmsg net/socket.c:2092 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2088
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> #0 (&ndev->lock){++--}:
       lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
       __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
       _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
       __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
       ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
       pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
       pneigh_ifdown net/core/neighbour.c:695 [inline]
       neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
       rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
       addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
       addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
       inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
       packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
       sock_do_ioctl+0xef/0x390 net/socket.c:957
       sock_ioctl+0x36b/0x610 net/socket.c:1081
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

other info that might help us debug this:

Chain exists of:
  &ndev->lock --> rt6_exception_lock --> &tbl->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&tbl->lock);
                               lock(rt6_exception_lock);
                               lock(&tbl->lock);
  lock(&ndev->lock);

 *** DEADLOCK ***

2 locks held by syz-executor7/4015:
 #0:  (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
 #1:  (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292

stack backtrace:
CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
 check_prev_add kernel/locking/lockdep.c:1863 [inline]
 check_prevs_add kernel/locking/lockdep.c:1976 [inline]
 validate_chain kernel/locking/lockdep.c:2417 [inline]
 __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431
 lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
 __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
 _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
 __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
 ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
 pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
 pneigh_ifdown net/core/neighbour.c:695 [inline]
 neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
 rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
 addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
 addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
 call_netdevice_notifiers net/core/dev.c:1725 [inline]
 __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
 dev_change_flags+0xf5/0x140 net/core/dev.c:6994
 devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
 inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
 packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
 sock_do_ioctl+0xef/0x390 net/socket.c:957
 sock_ioctl+0x36b/0x610 net/socket.c:1081
 vfs_ioctl fs/ioctl.c:46 [inline]
 do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
 SYSC_ioctl fs/ioctl.c:701 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for exception table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:40:34 -04:00
Jon Maloy
52dfae5c85 tipc: obtain node identity from interface by default
Selecting and explicitly configuring a TIPC node identity may be
unwanted in some cases.

In this commit we introduce a default setting if the identity has not
been set at the moment the first bearer is enabled. We do this by
using a raw copy of a unique identifier from the used interface: MAC
address in the case of an L2 bearer, IPv4/IPv6 address in the case
of a UDP bearer.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:12:18 -04:00
Jon Maloy
25b0b9c4e8 tipc: handle collisions of 32-bit node address hash values
When a 32-bit node address is generated from a 128-bit identifier,
there is a risk of collisions which must be discovered and handled.

We do this as follows:
- We don't apply the generated address immediately to the node, but do
  instead initiate a 1 sec trial period to allow other cluster members
  to discover and handle such collisions.

- During the trial period the node periodically sends out a new type
  of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
  to all the other nodes in the cluster.

- When a node is receiving such a message, it must check that the
  presented 32-bit identifier either is unused, or was used by the very
  same peer in a previous session. In both cases it accepts the request
  by not responding to it.

- If it finds that the same node has been up before using a different
  address, it responds with a DSC_TRIAL_FAIL_MSG containing that
  address.

- If it finds that the address has already been taken by some other
  node, it generates a new, unused address and returns it to the
  requester.

- During the trial period the requesting node must always be prepared
  to accept a failure message, i.e., a message where a peer suggests a
  different (or equal)  address to the one tried. In those cases it
  must apply the suggested value as trial address and restart the trial
  period.

This algorithm ensures that in the vast majority of cases a node will
have the same address before and after a reboot. If a legacy user
configures the address explicitly, there will be no trial period and
messages, so this protocol addition is completely backwards compatible.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:12:18 -04:00
Jon Maloy
d50ccc2d39 tipc: add 128-bit node identifier
We add a 128-bit node identity, as an alternative to the currently used
32-bit node address.

For the sake of compatibility and to minimize message header changes
we retain the existing 32-bit address field. When not set explicitly by
the user, this field will be filled with a hash value generated from the
much longer node identity, and be used as a shorthand value for the
latter.

We permit either the address or the identity to be set by configuration,
but not both, so when the address value is set by a legacy user the
corresponding 128-bit node identity is generated based on the that value.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:12:18 -04:00
Jon Maloy
23fd3eace0 tipc: remove direct accesses to own_addr field in struct tipc_net
As a preparation to changing the addressing structure of TIPC we replace
all direct accesses to the tipc_net::own_addr field with the function
dedicated for this, tipc_own_addr().

There are no changes to program logics in this commit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:12:18 -04:00
Jon Maloy
b89afb116c tipc: allow closest-first lookup algorithm when legacy address is configured
The removal of an internal structure of the node address has an unwanted
side effect.
- Currently, if a user is sending an anycast message with destination
  domain 0, the tipc_namebl_translate() function will use the 'closest-
  first' algorithm to first look for a node local destination, and only
  when no such is found, will it resort to the cluster global 'round-
  robin' lookup algorithm.
- Current users can get around this, and enforce unconditional use of
  global round-robin by indicating a destination as Z.0.0 or Z.C.0.
- This option disappears when we make the node address flat, since the
  lookup algorithm has no way of recognizing this case. So, as long as
  there are node local destinations, the algorithm will always select
  one of those, and there is nothing the sender can do to change this.

We solve this by eliminating the 'closest-first' option, which was never
a good idea anyway, for non-legacy users, but only for those. To
distinguish between legacy users and non-legacy users we introduce a new
flag 'legacy_addr_format' in struct tipc_core, to be set when the user
configures a legacy-style Z.C.N node address. Hence, when a legacy user
indicates a zero lookup domain 'closest-first' is selected, and in all
other cases we use 'round-robin'.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:12:18 -04:00
Jon Maloy
2026364149 tipc: remove restrictions on node address values
Nominally, TIPC organizes network nodes into a three-level network
hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This
hierarchy is reflected in the node address format, - it is sub-divided
into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id.

However, the 'zone' and 'cluster' levels have in reality never been
fully implemented,and never will be. The result of this has been
that the first 20 bits the node identity structure have been wasted,
and the usable node identity range within a cluster has been limited
to 12 bits. This is starting to become a problem.

In the following commits, we will need to be able to connect between
nodes which are using the whole 32-bit value space of the node address.
We therefore remove the restrictions on which values can be assigned
to node identity, -it is from now on only a 32-bit integer with no
assumed internal structure.

Isolation between clusters is now achieved only by setting different
values for the 'network id' field used during neighbor discovery, in
practice leading to the latter becoming the new cluster identity.

The rules for accepting discovery requests/responses from neighboring
nodes now become:

- If the user is using legacy address format on both peers, reception
  of discovery messages is subject to the legacy lookup domain check
  in addition to the cluster id check.

- Otherwise, the discovery request/response is always accepted, provided
  both peers have the same network id.

This secures backwards compatibility for users who have been using zone
or cluster identities as cluster separators, instead of the intended
'network id'.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:12:18 -04:00
Jon Maloy
b39e465e56 tipc: some cleanups in the file discover.c
To facilitate the coming changes in the neighbor discovery functionality
we make some renaming and refactoring of that code. The functional changes
in this commit are trivial, e.g., that we move the message sending call in
tipc_disc_timeout() outside the spinlock protected region.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:12:17 -04:00
Jon Maloy
cb30a63384 tipc: refactor function tipc_enable_bearer()
As a preparation for the next commits we try to reduce the footprint of
the function tipc_enable_bearer(), while hopefully making is simpler to
follow.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:12:17 -04:00
Kirill Tkhai
b2864fbdc5 net: Convert rxrpc_net_ops
These pernet_operations modifies rxrpc_net_id-pointed
per-net entities. There is external link to AF_RXRPC
in fs/afs/Kconfig, but it seems there is no other
pernet_operations interested in that per-net entities.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:00:47 -04:00
Kirill Tkhai
fc18999ed2 net: Convert udp_sysctl_ops
These pernet_operations just initialize udp4 defaults.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 13:00:46 -04:00
Petr Machata
f6cc9c054e ip_tunnel: Emit events for post-register MTU changes
For tunnels created with IFLA_MTU, MTU of the netdevice is set by
rtnl_create_link() (called from rtnl_newlink()) before the device is
registered. However without IFLA_MTU that's not done.

rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
via ip_tunnel_newlink() calls register_netdevice(), and that emits
NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
MTU of 0.

After ip_tunnel_newlink() corrects the MTU after registering the
netdevice, but since there's no event, the listeners don't get to know
about the MTU until something else happens--such as a NETDEV_UP event.
That's not ideal.

So instead of setting the MTU directly, go through dev_set_mtu(), which
takes care of distributing the necessary NETDEV_PRECHANGEMTU and
NETDEV_CHANGEMTU events.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:54:34 -04:00
Nikolay Aleksandrov
82792a070b net: bridge: fix direct access to bridge vlan_enabled and use helper
We need to use br_vlan_enabled() helper otherwise we'll break builds
without bridge vlans:
net/bridge//br_if.c: In function ‘br_mtu’:
net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
member named ‘vlan_enabled’
  if (br->vlan_enabled)
        ^
net/bridge//br_if.c:462:1: warning: control reaches end of non-void
function [-Wreturn-type]
 }
 ^
scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
failed

Fixes: 419d14af9e07 ("bridge: Allow max MTU when multiple VLANs present")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:41:14 -04:00
Dave Watson
c46234ebb4 tls: RX path for ktls
Add rx path for tls software implementation.

recvmsg, splice_read, and poll implemented.

An additional sockopt TLS_RX is added, with the same interface as
TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
together (with two different setsockopt calls with appropriate keys).

Control messages are passed via CMSG in a similar way to transmit.
If no cmsg buffer is passed, then only application data records
will be passed to userspace, and EIO is returned for other types of
alerts.

EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
framing too big, and EBADMSG for framing too small (matching openssl
semantics). EINVAL is returned for TLS versions that do not match the
original setsockopt call.  All are unrecoverable.

strparser is used to parse TLS framing.   Decryption is done directly
in to userspace buffers if they are large enough to support it, otherwise
sk_cow_data is called (similar to ipsec), and buffers are decrypted in
place and copied.  splice_read always decrypts in place, since no
buffers are provided to decrypt in to.

sk_poll is overridden, and only returns POLLIN if a full TLS message is
received.  Otherwise we wait for strparser to finish reading a full frame.
Actual decryption is only done during recvmsg or splice_read calls.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:25:54 -04:00
Dave Watson
583715853a tls: Refactor variable names
Several config variables are prefixed with tx, drop the prefix
since these will be used for both tx and rx.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:25:54 -04:00
Dave Watson
f4a8e43f1f tls: Pass error code explicitly to tls_err_abort
Pass EBADMSG explicitly to tls_err_abort.  Receive path will
pass additional codes - EMSGSIZE if framing is larger than max
TLS record size, EINVAL if TLS version mismatch.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:25:54 -04:00
Dave Watson
dbe425599b tls: Move cipher info to a separate struct
Separate tx crypto parameters to a separate cipher_context struct.
The same parameters will be used for rx using the same struct.

tls_advance_record_sn is modified to only take the cipher info.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:25:53 -04:00
Dave Watson
69ca9293e8 tls: Generalize zerocopy_from_iter
Refactor zerocopy_from_iter to take arguments for pages and size,
such that it can be used for both tx and rx. RX will also support
zerocopy direct to output iter, as long as the full message can
be copied at once (a large enough userspace buffer was provided).

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:25:53 -04:00
Chas Williams
419d14af9e bridge: Allow max MTU when multiple VLANs present
If the bridge is allowing multiple VLANs, some VLANs may have
different MTUs.  Instead of choosing the minimum MTU for the
bridge interface, choose the maximum MTU of the bridge members.
With this the user only needs to set a larger MTU on the member
ports that are participating in the large MTU VLANS.

Signed-off-by: Chas Williams <3chas3@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:17:30 -04:00
David S. Miller
03fe2debbb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Fun set of conflict resolutions here...

For the mac80211 stuff, these were fortunately just parallel
adds.  Trivially resolved.

In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.

In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.

The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.

The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:

====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch.  This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
            drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524
            (IB/mlx5: Fix cleanup order on unload) added to for-rc and
            commit b5ca15ad7e61 (IB/mlx5: Add proper representors support)
            add as part of the devel cycle both needed to modify the
            init/de-init functions used by mlx5.  To support the new
            representors, the new functions added by the cleanup patch
            needed to be made non-static, and the init/de-init list
            added by the representors patch needed to be modified to
            match the init/de-init list changes made by the cleanup
            patch.
    Updates:
            drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
            prototypes added by representors patch to reflect new function
            names as changed by cleanup patch
            drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
            stage list to match new order from cleanup patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 11:31:58 -04:00
Pradeep Kumar Chitrapu
dcbe73ca55 mac80211: notify driver for change in multicast rates
With drivers implementing rate control in driver or firmware
rate_control_send_low() may not get called, and thus the
driver needs to know about changes in the multicast rate.

Add and use a new BSS change flag for this.

Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-23 13:23:17 +01:00
Steffen Klassert
9a3fb9fb84 xfrm: Fix transport mode skb control buffer usage.
A recent commit introduced a new struct xfrm_trans_cb
that is used with the sk_buff control buffer. Unfortunately
it placed the structure in front of the control buffer and
overlooked that the IPv4/IPv6 control buffer is still needed
for some layer 4 protocols. As a result the IPv4/IPv6 control
buffer is overwritten with this structure. Fix this by setting
a apropriate header in front of the structure.

Fixes acf568ee859f ("xfrm: Reinject transport-mode packets ...")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-23 07:56:04 +01:00
Kirill Tkhai
d9ff304973 net: Replace ip_ra_lock with per-net mutex
Since ra_chain is per-net, we may use per-net mutexes
to protect them in ip_ra_control(). This improves
scalability.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 15:12:56 -04:00
Kirill Tkhai
5796ef75ec net: Make ip_ra_chain per struct net
This is optimization, which makes ip_call_ra_chain()
iterate less sockets to find the sockets it's looking for.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 15:12:56 -04:00
Kirill Tkhai
128aaa98ad net: Revert "ipv4: fix a deadlock in ip_ra_control"
This reverts commit 1215e51edad1.
Since raw_close() is used on every RAW socket destruction,
the changes made by 1215e51edad1 scale sadly. This clearly
seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
kwork spends a lot of time waiting for rtnl_lock() introduced
by this commit.

Previous patch moved IP_ROUTER_ALERT out of rtnl_lock(),
so we revert this patch.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 15:12:56 -04:00
Kirill Tkhai
0526947f9d net: Move IP_ROUTER_ALERT out of lock_sock(sk)
ip_ra_control() does not need sk_lock. Who are the another
users of ip_ra_chain? ip_mroute_setsockopt() doesn't take
sk_lock, while parallel IP_ROUTER_ALERT syscalls are
synchronized by ip_ra_lock. So, we may move this command
out of sk_lock.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 15:12:55 -04:00
Kirill Tkhai
76d3e153d0 net: Revert "ipv4: get rid of ip_ra_lock"
This reverts commit ba3f571d5dde. The commit was made
after 1215e51edad1 "ipv4: fix a deadlock in ip_ra_control",
and killed ip_ra_lock, which became useless after rtnl_lock()
made used to destroy every raw ipv4 socket. This scales
very bad, and next patch in series reverts 1215e51edad1.
ip_ra_lock will be used again.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 15:12:55 -04:00
Colin Ian King
1574639411 gre: fix TUNNEL_SEQ bit check on sequence numbering
The current logic of flags | TUNNEL_SEQ is always non-zero and hence
sequence numbers are always incremented no matter the setting of the
TUNNEL_SEQ bit.  Fix this by using & instead of |.

Detected by CoverityScan, CID#1466039 ("Operands don't affect result")

Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 14:52:43 -04:00
GhantaKrishnamurthy MohanKrishna
872619d8cf tipc: step sk->sk_drops when rcv buffer is full
Currently when tipc is unable to queue a received message on a
socket, the message is rejected back to the sender with error
TIPC_ERR_OVERLOAD. However, the application on this socket
has no knowledge about these discards.

In this commit, we try to step the sk_drops counter when tipc
is unable to queue a received message. Export sk_drops
using tipc socket diagnostics.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 14:43:37 -04:00
GhantaKrishnamurthy MohanKrishna
c30b70deb5 tipc: implement socket diagnostics for AF_TIPC
This commit adds socket diagnostics capability for AF_TIPC in netlink
family NETLINK_SOCK_DIAG in a new kernel module (diag.ko).

The following are key design considerations:
- config TIPC_DIAG has default y, like INET_DIAG.
- only requests with flag NLM_F_DUMP is supported (dump all).
- tipc_sock_diag_req message is introduced to send filter parameters.
- the response attributes are of TLV, some nested.

To avoid exposing data structures between diag and tipc modules and
avoid code duplication, the following additions are required:
- export tipc_nl_sk_walk function to reuse socket iterator.
- export tipc_sk_fill_sock_diag to fill the tipc diag attributes.
- create a sock_diag response message in __tipc_add_sock_diag defined
  in diag.c and use the above exported tipc_sk_fill_sock_diag
  to fill response.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 14:43:35 -04:00
GhantaKrishnamurthy MohanKrishna
dfde331e75 tipc: modify socket iterator for sock_diag
The current socket iterator function tipc_nl_sk_dump, handles socket
locks and calls __tipc_nl_add_sk for each socket.
To reuse this logic in sock_diag implementation, we do minor
modifications to make these functions generic as described below.

In this commit, we add a two new functions __tipc_nl_sk_walk,
__tipc_nl_add_sk_info and modify tipc_nl_sk_dump, __tipc_nl_add_sk
accordingly.

In __tipc_nl_sk_walk we:
1. acquire and release socket locks
2. for each socket, execute the specified callback function

In __tipc_nl_add_sk we:
- Move the netlink attribute insertion to __tipc_nl_add_sk_info.

tipc_nl_sk_dump calls tipc_nl_sk_walk with __tipc_nl_add_sk as argument.

sock_diag will use these generic functions in a later commit.

There is no functional change in this commit.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 14:43:29 -04:00
David S. Miller
e0645d9b96 Two more fixes (in three patches):
* ath9k_htc doesn't like QoS NDP frames, use regular ones
  * hwsim: set up wmediumd for radios created later
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlqySjUACgkQB8qZga/f
 l8RDig//bV/Fnwn7deyR7LJOi/g6HVyueCUi+cturTo9RQEQHnBtRVRu3c2Lnd+o
 74f+2ofEWEFOYqTvZq5jUWjANOZ/ZGgA+fUw6tOfSmjkEPw9EkST8mQl7lKH0dSR
 DYrRLKCHwPof9MQQgXHLq44e/26yFCxYP+ADSn9Q6yqo84e/cxP76nqSwYGLforx
 By1zzkMXPKtfvXTZ41UZTfXRMml2LIxBbCoWTfScIZWvusQjCl653f3lBfR3fynj
 qwzJJfcjt1TnOQSgCLzk03xi+ci5o157//GmvjT2hrRjRL3i22e+cmFtnXGNCjtg
 avmc7HftV0sAY8gecqhCLOiWnCuxxjDz1KxZW3dccuJxh1/jfsGtby3H/6stkU4s
 EU7AU/QhXL7HPHop+fyI+DapFmry0/h1tdKqoSGffONF/qMJmbkXFUvG0kAMqoWh
 nb/LTlGaVqk8Bz7hNbzP6zkPgEep3i6dBC9iRdfK3gclcEALsh2Z0mx6+ae3FEX3
 HHbfB9RTDYUT0Tk33cVFrjMsOpwdhcGsu1ncMJc/iuTOGI/AO6TVhgPEjEj/4xth
 FO7SzpIpRUe47yKqM0Ykvoz5EwE2IWUt0/eXj+1X2hPNGpcYEZ4bvHQGk8D2h7tA
 kwd0YqEXjiUw53cSnS4+A45ecOhErH/ZND/ULb65KMl/JDf5gOQ=
 =fs06
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2018-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Two more fixes (in three patches):
 * ath9k_htc doesn't like QoS NDP frames, use regular ones
 * hwsim: set up wmediumd for radios created later
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 13:19:10 -04:00
David Ahern
145307460b devlink: Remove top_hierarchy arg to devlink_resource_register
top_hierarchy arg can be determined by comparing parent_resource_id to
DEVLINK_RESOURCE_ID_PARENT_TOP so it does not need to be a separate
argument.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 13:08:41 -04:00
David Ahern
68e2ffdeb5 net/ipv6: Handle onlink flag with multipath routes
For multipath routes the ONLINK flag can be specified per nexthop in
rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
to consider the ONLINK setting coming from rtnh_flags. Each loop over
nexthops the config for the sibling route is initialized to the global
config and then per nexthop settings overlayed. The flag is 'or'ed into
fib6_config to handle the ONLINK flag coming from either rtm_flags or
rtnh_flags.

Fixes: fc1e64e1092f ("net/ipv6: Add support for onlink flag")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 12:40:04 -04:00
David Lebrun
8936ef7604 ipv6: sr: fix NULL pointer dereference when setting encap source address
When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the
source address of the outer IPv6 header, in case none was specified.
Using skb->dev can lead to BUG() when it is in an inconsistent state.
This patch uses the net_device attached to the skb's dst instead.

[940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c
[940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0
[940807.815725] PGD 0 P4D 0
[940807.847173] Oops: 0000 [#1] SMP PTI
[940807.890073] Modules linked in:
[940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W        4.16.0-rc1-seg6bpf+ #2
[940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
[940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0
[940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206
[940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe
[940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500
[940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff
[940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850
[940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002
[940808.684675] FS:  0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
[940808.783036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0
[940808.939634] Call Trace:
[940808.970041]  <IRQ>
[940808.995250]  ? ip6t_do_table+0x265/0x640
[940809.043341]  seg6_do_srh_encap+0x28f/0x300
[940809.093516]  ? seg6_do_srh+0x1a0/0x210
[940809.139528]  seg6_do_srh+0x1a0/0x210
[940809.183462]  seg6_output+0x28/0x1e0
[940809.226358]  lwtunnel_output+0x3f/0x70
[940809.272370]  ip6_xmit+0x2b8/0x530
[940809.313185]  ? ac6_proc_exit+0x20/0x20
[940809.359197]  inet6_csk_xmit+0x7d/0xc0
[940809.404173]  tcp_transmit_skb+0x548/0x9a0
[940809.453304]  __tcp_retransmit_skb+0x1a8/0x7a0
[940809.506603]  ? ip6_default_advmss+0x40/0x40
[940809.557824]  ? tcp_current_mss+0x24/0x90
[940809.605925]  tcp_retransmit_skb+0xd/0x80
[940809.654016]  tcp_xmit_retransmit_queue.part.17+0xf9/0x210
[940809.719797]  tcp_ack+0xa47/0x1110
[940809.760612]  tcp_rcv_established+0x13c/0x570
[940809.812865]  tcp_v6_do_rcv+0x151/0x3d0
[940809.858879]  tcp_v6_rcv+0xa5c/0xb10
[940809.901770]  ? seg6_output+0xdd/0x1e0
[940809.946745]  ip6_input_finish+0xbb/0x460
[940809.994837]  ip6_input+0x74/0x80
[940810.034612]  ? ip6_rcv_finish+0xb0/0xb0
[940810.081663]  ipv6_rcv+0x31c/0x4c0
...

Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Reported-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 12:22:45 -04:00
David Lebrun
191f86ca8e ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
The seg6_build_state() function is called with RCU read lock held,
so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.

[   92.770271] =============================
[   92.770628] WARNING: suspicious RCU usage
[   92.770921] 4.16.0-rc4+ #12 Not tainted
[   92.771277] -----------------------------
[   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
[   92.772279]
[   92.772279] other info that might help us debug this:
[   92.772279]
[   92.773067]
[   92.773067] rcu_scheduler_active = 2, debug_locks = 1
[   92.773514] 2 locks held by ip/2413:
[   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
[   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
[   92.775065]
[   92.775065] stack backtrace:
[   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
[   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
[   92.776608] Call Trace:
[   92.776852]  dump_stack+0x7d/0xbc
[   92.777130]  __schedule+0x133/0xf00
[   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
[   92.777783]  ? __sched_text_start+0x8/0x8
[   92.778073]  ? rcu_is_watching+0x19/0x30
[   92.778383]  ? kernel_text_address+0x49/0x60
[   92.778800]  ? __kernel_text_address+0x9/0x30
[   92.779241]  ? unwind_get_return_address+0x29/0x40
[   92.779727]  ? pcpu_alloc+0x102/0x8f0
[   92.780101]  _cond_resched+0x23/0x50
[   92.780459]  __mutex_lock+0xbd/0xad0
[   92.780818]  ? pcpu_alloc+0x102/0x8f0
[   92.781194]  ? seg6_build_state+0x11d/0x240
[   92.781611]  ? save_stack+0x9b/0xb0
[   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
[   92.782480]  ? seg6_build_state+0x11d/0x240
[   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
[   92.783393]  ? ip6_route_info_create+0x687/0x1640
[   92.783846]  ? ip6_route_add+0x74/0x110
[   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0

Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 12:21:11 -04:00
David S. Miller
755f6633d6 This feature/cleanup patchset includes the following patches:
- avoid redundant multicast TT entries, by Linus Luessing
 
  - add netlink support for distributed arp table cache and multicast flags,
    by Linus Luessing (2 patches)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlqv59kWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoSBaD/9r/oJC+Q/3Eu6DTAAiS7Jx2IpQ
 kOwU7l4hkGK8mBZ098CkmHTBY+zurqYwcokCHhKJO5mqJEpvlM27PuQxzqWSBMMO
 FWFax2YlKPpJp+/f/rSD9HS73RTY7npv6l5/eFg6+0WSQET04PjLB1KxPrO5u1+Z
 JujrAxp0GEyMoVQgMy9uloedkpizhADyYSZzDDXnHhq1NiAPU87cjrTLv/xdtdp7
 TvbNfobhZUmKZ951yaRlDmE+mH8IoTQoY7HD/JANnduYeFJAlIPnHQEQa8+5tLwO
 qWUeLa4Acv5MhO2KjbKQpu5r2dFbs0x+jmsja8xBmgNWO5meKh/aE8TKGJeDVQEW
 TTynEivf82suiquCIZ573fBnliJkipffg32ZHgtNGrh54hh+YU7Sts0t9Lsou4ar
 aOU6lup3MHFysf3s9hK6y9TzSqwFJ8Mak0UFsa03r0Ub8am6bKHTqMFaCgN0aK9P
 vBL4atSvJVguwPlzxLMi44K4NxqEVfa41dZ7nQ99P3BFzWwSvph3i4lu+cxGxwI7
 4kgWU5Cz8T51dH7g8j3vUish36DzwQtUsKLWZVpV+DV4BaHJ/rLyqeug3ffUrWRk
 p0bFU7wBAv8rKeFPd30m2tdJ/nMo+rDbN6Tm9n43tK4NWKOGBndhCoNhjfrzhM8U
 un6Iy7taISgeElZ5fQ==
 =HVxO
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20180319' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - avoid redundant multicast TT entries, by Linus Luessing

 - add netlink support for distributed arp table cache and multicast flags,
   by Linus Luessing (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 11:28:54 -04:00
David S. Miller
ee54a9f9ae Here are some batman-adv bugfixes:
- fix possible IPv6 packet loss when multicast extension is used, by Linus Luessing
 
  - fix SKB handling issues for TTVN and DAT, by Matthias Schiffer (two patches)
 
  - fix include for eventpoll, by Sven Eckelmann
 
  - fix skb checksum for ttvn reroutes, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlqv5tcWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeob7PD/wPnjVmFl6uQlRHfOfUBzGvW9hU
 ASakGylzfZTXzP2VviMJ+7JehXQgediM4caTr/8jWhJFeVxu5jmeYzv8nzOlwZJ/
 PjNlX+ChuIAWv1wyxHI3YWwj7Y1Ox9Lp8Gs7m5UbmfyiZEtb+ybHjvPzZ/FMfhWo
 hOgbndSpFKfwFuXFkXyitektBcKTiFUyjk9U22v91XCQZZzMb8KqougVBE+bdg0w
 N8LfgKXVC448sUNKTEczhji7bFuJwR+Sogx1TwIIsyfT2Tp4JXY3A7M1LDwAL/Rc
 nTu9L58vk4qUouRjIQJ7rhhoxdPSrD5p3oinzVnLvJBxgOS3pxegY3asX8sRMXsj
 bgolIGCZ9vDfA21WXqa3RyjUiBEx9UId6W++h+22kVqVHt5tqIFK2nVQ6IInOXwB
 kzd3UDrRBLgRdBpwlu/ii9rn68MOEdLNpL3Seo9DkViJESfLn1ODnFA1Y2rFoK6S
 cVeYl1DVj4SQSXjNilqYhFZoTEo/kxN5KhgHRXzujTv6MCHVa2SnCZaXS5EXp8n6
 u4gpPc363Isj17A+UVlmHmmzA6d8Jqe8veVvET0xSvQPHmE7eepMkJU9hIUNXfcv
 fENmQnU10pS1keJqS6xuR6k2RxMoofYRUPpKEigCcH7z/0GytWQc6QVGmo19LD9b
 GNcWAhKI453pc1IuUA==
 =hsM8
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-for-davem-20180319' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - fix possible IPv6 packet loss when multicast extension is used, by Linus Luessing

 - fix SKB handling issues for TTVN and DAT, by Matthias Schiffer (two patches)

 - fix include for eventpoll, by Sven Eckelmann

 - fix skb checksum for ttvn reroutes, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 11:26:13 -04:00
Sowmini Varadhan
bdf5bd7f21 rds: tcp: remove register_netdevice_notifier infrastructure.
The netns deletion path does not need to wait for all net_devices
to be unregistered before dismantling rds_tcp state for the netns
(we are able to dismantle this state on module unload even when
all net_devices are active so there is no dependency here).

This patch removes code related to netdevice notifiers and
refactors all the code needed to dismantle rds_tcp state
into a ->exit callback for the pernet_operations used with
register_pernet_device().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22 11:21:45 -04:00