66601 Commits

Author SHA1 Message Date
Eric Dumazet
8d65cd8d25 fou: remove sparse errors
We need to add __rcu qualifier to avoid these errors:

net/ipv4/fou.c:250:18: warning: incorrect type in assignment (different address spaces)
net/ipv4/fou.c:250:18:    expected struct net_offload const **offloads
net/ipv4/fou.c:250:18:    got struct net_offload const [noderef] __rcu **
net/ipv4/fou.c:251:15: error: incompatible types in comparison expression (different address spaces):
net/ipv4/fou.c:251:15:    struct net_offload const [noderef] __rcu *
net/ipv4/fou.c:251:15:    struct net_offload const *
net/ipv4/fou.c:272:18: warning: incorrect type in assignment (different address spaces)
net/ipv4/fou.c:272:18:    expected struct net_offload const **offloads
net/ipv4/fou.c:272:18:    got struct net_offload const [noderef] __rcu **
net/ipv4/fou.c:273:15: error: incompatible types in comparison expression (different address spaces):
net/ipv4/fou.c:273:15:    struct net_offload const [noderef] __rcu *
net/ipv4/fou.c:273:15:    struct net_offload const *
net/ipv4/fou.c:442:18: warning: incorrect type in assignment (different address spaces)
net/ipv4/fou.c:442:18:    expected struct net_offload const **offloads
net/ipv4/fou.c:442:18:    got struct net_offload const [noderef] __rcu **
net/ipv4/fou.c:443:15: error: incompatible types in comparison expression (different address spaces):
net/ipv4/fou.c:443:15:    struct net_offload const [noderef] __rcu *
net/ipv4/fou.c:443:15:    struct net_offload const *
net/ipv4/fou.c:489:18: warning: incorrect type in assignment (different address spaces)
net/ipv4/fou.c:489:18:    expected struct net_offload const **offloads
net/ipv4/fou.c:489:18:    got struct net_offload const [noderef] __rcu **
net/ipv4/fou.c:490:15: error: incompatible types in comparison expression (different address spaces):
net/ipv4/fou.c:490:15:    struct net_offload const [noderef] __rcu *
net/ipv4/fou.c:490:15:    struct net_offload const *
net/ipv4/udp_offload.c:170:26: warning: incorrect type in assignment (different address spaces)
net/ipv4/udp_offload.c:170:26:    expected struct net_offload const **offloads
net/ipv4/udp_offload.c:170:26:    got struct net_offload const [noderef] __rcu **
net/ipv4/udp_offload.c:171:23: error: incompatible types in comparison expression (different address spaces):
net/ipv4/udp_offload.c:171:23:    struct net_offload const [noderef] __rcu *
net/ipv4/udp_offload.c:171:23:    struct net_offload const *

Fixes: efc98d08e1ec ("fou: eliminate IPv4,v6 specific GRO functions")
Fixes: 8bce6d7d0d1e ("udp: Generalize skb_udp_segment")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31 12:03:33 +01:00
Eric Dumazet
92548b0ee2 ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
The UDP length field should be in network order.
This removes the following sparse error:

net/ipv4/route.c:3173:27: warning: incorrect type in assignment (different base types)
net/ipv4/route.c:3173:27:    expected restricted __be16 [usertype] len
net/ipv4/route.c:3173:27:    got unsigned long

Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31 12:03:03 +01:00
Eric Dumazet
dc56ad7028 af_unix: fix potential NULL deref in unix_dgram_connect()
syzbot was able to trigger NULL deref in unix_dgram_connect() [1]

This happens in

	if (unix_peer(sk))
		sk->sk_state = other->sk_state = TCP_ESTABLISHED; // crash because @other is NULL

Because locks have been dropped, unix_peer() might be non NULL,
while @other is NULL (AF_UNSPEC case)

We need to move code around, so that we no longer access
unix_peer() and sk_state while locks have been released.

[1]
general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 0 PID: 10341 Comm: syz-executor239 Not tainted 5.14.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226
Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00
RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3
R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740
R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0
FS:  00007f3eb0052700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004787d0 CR3: 0000000029c0a000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __sys_connect_file+0x155/0x1a0 net/socket.c:1890
 __sys_connect+0x161/0x190 net/socket.c:1907
 __do_sys_connect net/socket.c:1917 [inline]
 __se_sys_connect net/socket.c:1914 [inline]
 __x64_sys_connect+0x6f/0xb0 net/socket.c:1914
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x446a89
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3eb0052208 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00000000004cc4d8 RCX: 0000000000446a89
RDX: 000000000000006e RSI: 0000000020000180 RDI: 0000000000000003
RBP: 00000000004cc4d0 R08: 00007f3eb0052700 R09: 0000000000000000
R10: 00007f3eb0052700 R11: 0000000000000246 R12: 00000000004cc4dc
R13: 00007ffd791e79cf R14: 00007f3eb0052300 R15: 0000000000022000
Modules linked in:
---[ end trace 4eb809357514968c ]---
RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226
Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00
RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3
R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740
R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0
FS:  00007f3eb0052700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd791fe960 CR3: 0000000029c0a000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31 11:31:52 +01:00
MichelleJin
6baeb3951c net: bridge: use mld2r_ngrec instead of icmpv6_dataun
br_ip6_multicast_mld2_report function uses icmp6h
to parse mld2_report packet.

mld2r_ngrec defines mld2r_hdr.icmp6_dataun.un_data16[1]
in include/net/mld.h.

So, it is more compact to use mld2r rather than icmp6h.

By doing printk test, it is confirmed that
icmp6h->icmp6_dataun.un_data16[1] and mld2r->mld2r_ngrec are
indeed equivalent.

Also, sizeof(*mld2r) and sizeof(*icmp6h) are equivalent, too.

Signed-off-by: MichelleJin <shjy180909@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31 11:29:23 +01:00
Xiyu Yang
c660701258 net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed
The reference counting issue happens in one exception handling path of
cbq_change_class(). When failing to get tcf_block, the function forgets
to decrease the refcount of "rtab" increased by qdisc_put_rtab(),
causing a refcount leak.

Fix this issue by jumping to "failure" label when get tcf_block failed.

Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30 20:29:03 -07:00
Linus Torvalds
c547d89a9a for-5.15/io_uring-2021-08-30
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEs7tIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoR2EACdOPj0tivXWufgFnQyQYPpX4/Qe3lNw608
 MLJ9/zshPFe5kx+SnHzT3UucHd2LO4C68pNVEHdHoi1gqMnMe9+Du82LJlo+Cx9i
 53yiWxNiY7er3t3lC2nvaPF0BPKiaUKaRzqugOTfdWmKLP3MyYyygEBrRDkaK1S1
 BjVq2ewmKTYf63gHkluHeRTav9KxcLWvSqgFUC8Y0mNhazTSdzGB6MAPFodpsuj1
 Vv8ytCiagp9Gi0AibvS2mZvV/WxQtFP7qBofbnG3KcgKgOU+XKTJCH+cU6E/3J9Q
 nPy1loh2TISOtYAz2scypAbwsK4FWeAHg1SaIj/RtUGKG7zpVU5u97CPZ8+UfHzu
 CuR3a1o36Cck8+ZdZIjtRZfvQGog0Dh5u4ZQ4dRwFQd6FiVxdO8LHkegqkV6PStc
 dVrHSo5kUE5hGT8ed1YFfuOSJDZ6w0/LleZGMdU4pRGGs8wqkerZlfLL4ustSfLk
 AcS+azmG2f3iI5iadnxOUNeOT2lmE84fvvLyH2krfsA3AtX0CtHXQcLYAguRAIwg
 gnvYf70JOya6Lb/hbXUi8d8h3uXxeFsoitz1QyasqAEoJoY05EW+l94NbiEzXwol
 mKrVfZCk+wuhw3npbVCqK7nkepvBWso1qTip//T0HDAaKaZnZcmhobUn5SwI+QPx
 fcgAo6iw8g==
 =WBzF
 -----END PGP SIGNATURE-----

Merge tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:

 - cancellation cleanups (Hao, Pavel)

 - io-wq accounting cleanup (Hao)

 - io_uring submit locking fix (Hao)

 - io_uring link handling fixes (Hao)

 - fixed file improvements (wangyangbo, Pavel)

 - allow updates of linked timeouts like regular timeouts (Pavel)

 - IOPOLL fix (Pavel)

 - remove batched file get optimization (Pavel)

 - improve reference handling (Pavel)

 - IRQ task_work batching (Pavel)

 - allow pure fixed file, and add support for open/accept (Pavel)

 - GFP_ATOMIC RT kernel fix

 - multiple CQ ring waiter improvement

 - funnel IRQ completions through task_work

 - add support for limiting async workers explicitly

 - add different clocksource support for timeouts

 - io-wq wakeup race fix

 - lots of cleanups and improvement (Pavel et al)

* tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-block: (87 commits)
  io-wq: fix wakeup race when adding new work
  io-wq: wqe and worker locks no longer need to be IRQ safe
  io-wq: check max_worker limits if a worker transitions bound state
  io_uring: allow updating linked timeouts
  io_uring: keep ltimeouts in a list
  io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts
  io-wq: provide a way to limit max number of workers
  io_uring: add build check for buf_index overflows
  io_uring: clarify io_req_task_cancel() locking
  io_uring: add task-refs-get helper
  io_uring: fix failed linkchain code logic
  io_uring: remove redundant req_set_fail()
  io_uring: don't free request to slab
  io_uring: accept directly into fixed file table
  io_uring: hand code io_accept() fd installing
  io_uring: openat directly into fixed fd table
  net: add accept helper not installing fd
  io_uring: fix io_try_cancel_userdata race for iowq
  io_uring: IRQ rw completion batching
  io_uring: batch task work locking
  ...
2021-08-30 19:22:52 -07:00
Linus Torvalds
9a1d6c9e3f for-5.15/drivers-2021-08-30
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEs6LsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpqnLD/9c8v7WTLjrDR6FLD8fHUmkwk9ss6OeyYJC
 Z62QOyk6BqNOu6FAwBax9wFaboXdUqOdpJU0PVQ7WJ5wBiCQ9DAZY6T+iwW0jE79
 +iOSqdXHVLAIyIM9GplzLH5AH3tx4445bX7fRWwWX1OgmSidkAhb25FusCvpcpHx
 1k+9dSLClLeHPR6jVT3k6tHv2RzPSw+/vYOggeWYA0YYPfoCx/Ft0uwO+PjKpvLQ
 Je5jASlLGYCXazswJBZgfjbroA97EuaLOmHHIHrwhkkFsbV6ewv6mlmanbMEs4fX
 Wh+axTt8so27g6gbw31EOcGsxTi0B37Jx9MOrSla6NdJoZkFE2sn6K+D5k4oeSrg
 QgYXL00U62eSgWmgSB0f0X081cQfI+FUMe5u5S368WdrgCPfaXl11zHw8nXw8gEW
 UvqR4zr3hQd4piXsIWl2bwZrmpPBCeB8iStLq3C92RLPFT6hJO3GM/ZmwTn+0HT0
 lMXzoEdkPywkKWi8aBbSgzXiGknNl8HAYnwMhcQjiHbYQOycGkI9pigJDNY9Ox1l
 fYHFSompmJ/XK8cIiU7QIglXEXJky5jQ89Ni0ryCstOaP20tPxWtkpOCgidXfNGz
 4lmQV8D5aBTUFs6ifPjXfiXUmDiU3SaxiFhAqaEkGII9BbkrNhlibB4LBAU+toi1
 Q0yGhGR/mg==
 =4uWF
 -----END PGP SIGNATURE-----

Merge tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "Sitting on top of the core block changes, here are the driver changes
  for the 5.15 merge window:

   - NVMe updates via Christoph:
       - suspend improvements for devices with an HMB (Keith Busch)
       - handle double completions more gacefull (Sagi Grimberg)
       - cleanup the selects for the nvme core code a bit (Sagi Grimberg)
       - don't update queue count when failing to set io queues (Ruozhu Li)
       - various nvmet connect fixes (Amit Engel)
       - cleanup lightnvm leftovers (Keith Busch, me)
       - small cleanups (Colin Ian King, Hou Pu)
       - add tracing for the Set Features command (Hou Pu)
       - CMB sysfs cleanups (Keith Busch)
       - add a mutex_destroy call (Keith Busch)

   - remove lightnvm subsystem. It's served its purpose and ultimately
     led to zoned nvme support, we no longer need it (Christoph)

   - revert floppy O_NDELAY fix (Denis)

   - nbd fixes (Hou, Pavel, Baokun)

   - nbd locking fixes (Tetsuo)

   - nbd device removal fixes (Christoph)

   - raid10 rcu warning fix (Xiao)

   - raid1 write behind fix (Guoqing)

   - rnbd fixes (Gioh, Md Haris)

   - misc fixes (Colin)"

* tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block: (42 commits)
  Revert "floppy: reintroduce O_NDELAY fix"
  raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
  md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard
  nbd: remove nbd->destroy_complete
  nbd: only return usable devices from nbd_find_unused
  nbd: set nbd->index before releasing nbd_index_mutex
  nbd: prevent IDR lookups from finding partially initialized devices
  nbd: reset NBD to NULL when restarting in nbd_genl_connect
  nbd: add missing locking to the nbd_dev_add error path
  nvme: remove the unused NVME_NS_* enum
  nvme: remove nvm_ndev from ns
  nvme: Have NVME_FABRICS select NVME_CORE instead of transport drivers
  block: nbd: add sanity check for first_minor
  nvmet: check that host sqsize does not exceed ctrl MQES
  nvmet: avoid duplicate qid in connect cmd
  nvmet: pass back cntlid on successful completion
  nvme-rdma: don't update queue count when failing to set io queues
  nvme-tcp: don't update queue count when failing to set io queues
  nvme-tcp: pair send_mutex init with destroy
  nvme: allow user toggling hmb usage
  ...
2021-08-30 19:01:46 -07:00
Jakub Kicinski
19a31d7921 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2021-08-31

We've added 116 non-merge commits during the last 17 day(s) which contain
a total of 126 files changed, 6813 insertions(+), 4027 deletions(-).

The main changes are:

1) Add opaque bpf_cookie to perf link which the program can read out again,
   to be used in libbpf-based USDT library, from Andrii Nakryiko.

2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu.

3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang.

4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch
   to another congestion control algorithm during init, from Martin KaFai Lau.

5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima.

6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta.

7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT}
   progs, from Xu Liu and Stanislav Fomichev.

8) Support for __weak typed ksyms in libbpf, from Hao Luo.

9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky.

10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov.

11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann.

12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian,
    Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others.

13) Another big batch to revamp XDP samples in order to give them consistent look
    and feel, from Kumar Kartikeya Dwivedi.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits)
  MAINTAINERS: Remove self from powerpc BPF JIT
  selftests/bpf: Fix potential unreleased lock
  samples: bpf: Fix uninitialized variable in xdp_redirect_cpu
  selftests/bpf: Reduce more flakyness in sockmap_listen
  bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
  bpf: selftests: Add dctcp fallback test
  bpf: selftests: Add connect_to_fd_opts to network_helpers
  bpf: selftests: Add sk_state to bpf_tcp_helpers.h
  bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
  selftests: xsk: Preface options with opt
  selftests: xsk: Make enums lower case
  selftests: xsk: Generate packets from specification
  selftests: xsk: Generate packet directly in umem
  selftests: xsk: Simplify cleanup of ifobjects
  selftests: xsk: Decrease sending speed
  selftests: xsk: Validate tx stats on tx thread
  selftests: xsk: Simplify packet validation in xsk tests
  selftests: xsk: Rename worker_* functions that are not thread entry points
  selftests: xsk: Disassociate umem size with packets sent
  selftests: xsk: Remove end-of-test packet
  ...
====================

Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30 16:42:47 -07:00
Maxim Mikityanskiy
ca49bfd90a sch_htb: Fix inconsistency when leaf qdisc creation fails
In HTB offload mode, qdiscs of leaf classes are grafted to netdev
queues. sch_htb expects the dev_queue field of these qdiscs to point to
the corresponding queues. However, qdisc creation may fail, and in that
case noop_qdisc is used instead. Its dev_queue doesn't point to the
right queue, so sch_htb can lose track of used netdev queues, which will
cause internal inconsistencies.

This commit fixes this bug by keeping track of the netdev queue inside
struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the
new field, the two values are synced on writes, and WARNs are added to
assert equality of the two values.

The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue,
the driver used to pass the old and new queue IDs to sch_htb. Now that
there is a new field (offload_queue) in struct htb_class that needs to
be updated on this operation, the driver will pass the old class ID to
sch_htb instead (it already knows the new class ID).

Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30 16:33:59 -07:00
Yajun Deng
1b9fbe8130 net: ipv4: Fix the warning for dereference
Add a if statements to avoid the warning.

Dan Carpenter report:
The patch faf482ca196a: "net: ipv4: Move ip_options_fragment() out of
loop" from Aug 23, 2021, leads to the following Smatch complaint:

    net/ipv4/ip_output.c:833 ip_do_fragment()
    warn: variable dereferenced before check 'iter.frag' (see line 828)

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: faf482ca196a ("net: ipv4: Move ip_options_fragment() out of loop")
Link: https://lore.kernel.org/netdev/20210830073802.GR7722@kadam/T/#t
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30 12:47:09 +01:00
Dan Carpenter
aaa8e4922c net: qrtr: make checks in qrtr_endpoint_post() stricter
These checks are still not strict enough.  The main problem is that if
"cb->type == QRTR_TYPE_NEW_SERVER" is true then "len - hdrlen" is
guaranteed to be 4 but we need to be at least 16 bytes.  In fact, we
can reject everything smaller than sizeof(*pkt) which is 20 bytes.

Also I don't like the ALIGN(size, 4).  It's better to just insist that
data is needs to be aligned at the start.

Fixes: 0baa99ee353c ("net: qrtr: Allow non-immediate node routing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30 12:26:57 +01:00
Haimin Zhang
efe487fce3 fix array-index-out-of-bounds in taprio_change
syzbot report an array-index-out-of-bounds in taprio_change
index 16 is out of range for type '__u16 [16]'
that's because mqprio->num_tc is lager than TC_MAX_QUEUE,so we check
the return value of netdev_set_num_tc.

Reported-by: syzbot+2b3e5fb6c7ef285a94f6@syzkaller.appspotmail.com
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30 12:24:40 +01:00
王贇
e842cb60e8 net: fix NULL pointer reference in cipso_v4_doi_free
In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
failed, we sometime observe panic:

  BUG: kernel NULL pointer dereference, address:
  ...
  RIP: 0010:cipso_v4_doi_free+0x3a/0x80
  ...
  Call Trace:
   netlbl_cipsov4_add_std+0xf4/0x8c0
   netlbl_cipsov4_add+0x13f/0x1b0
   genl_family_rcv_msg_doit.isra.15+0x132/0x170
   genl_rcv_msg+0x125/0x240

This is because in cipso_v4_doi_free() there is no check
on 'doi_def->map.std' when doi_def->type got value 1, which
is possibe, since netlbl_cipsov4_add_std() haven't initialize
it before alloc 'doi_def->map.std'.

This patch just add the check to prevent panic happen in similar
cases.

Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30 12:23:18 +01:00
Eric Dumazet
67d6d681e1 ipv4: make exception cache less predictible
Even after commit 6457378fe796 ("ipv4: use siphash instead of Jenkins in
fnhe_hashfun()"), an attacker can still use brute force to learn
some secrets from a victim linux host.

One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.

Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.

After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.

This is slightly increasing memory size used by the hash table,
by 50% in average, we do not expect this to be a problem.

This patch is more complex than the prior one (IPv6 equivalent),
because IPv4 was reusing the oldest entry.
Since we need to be able to evict more than one entry per
update_or_create_fnhe() call, I had to replace
fnhe_oldest() with fnhe_remove_oldest().

Also note that we will queue extra kfree_rcu() calls under stress,
which hopefully wont be a too big issue.

Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30 12:21:38 +01:00
Eric Dumazet
a00df2caff ipv6: make exception cache less predictible
Even after commit 4785305c05b2 ("ipv6: use siphash in rt6_exception_hash()"),
an attacker can still use brute force to learn some secrets from a victim
linux host.

One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.

Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.

After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.

This is slightly increasing memory size used by the hash table,
we do not expect this to be a problem.

Following patch is dealing with the same issue in IPv4.

Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30 12:21:38 +01:00
David S. Miller
9dfa859da0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Clean up and consolidate ct ecache infrastructure by merging ct and
   expect notifiers, from Florian Westphal.

2) Missing counters and timestamp in nfnetlink_queue and _log conntrack
   information.

3) Missing error check for xt_register_template() in iptables mangle,
   as a incremental fix for the previous pull request, also from
   Florian Westphal.

4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from
   Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl
   to make sure existing netfilter rulesets do not break. There is
   a static key to disable the hooks by default.

   The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable
   impact in the seg6_input path for non-netfilter users: similar
   numbers with and without this patch.

   This is a sample of the perf report output:

    11.67%  kpktgend_0       [ipv6]                    [k] ipv6_get_saddr_eval
     7.89%  kpktgend_0       [ipv6]                    [k] __ipv6_addr_label
     7.52%  kpktgend_0       [ipv6]                    [k] __ipv6_dev_get_saddr
     6.63%  kpktgend_0       [kernel.vmlinux]          [k] asm_exc_nmi
     4.74%  kpktgend_0       [ipv6]                    [k] fib6_node_lookup_1
     3.48%  kpktgend_0       [kernel.vmlinux]          [k] pskb_expand_head
     3.33%  kpktgend_0       [ipv6]                    [k] ip6_rcv_core.isra.29
     3.33%  kpktgend_0       [ipv6]                    [k] seg6_do_srh_encap
     2.53%  kpktgend_0       [ipv6]                    [k] ipv6_dev_get_saddr
     2.45%  kpktgend_0       [ipv6]                    [k] fib6_table_lookup
     2.24%  kpktgend_0       [kernel.vmlinux]          [k] ___cache_free
     2.16%  kpktgend_0       [ipv6]                    [k] ip6_pol_route
     2.11%  kpktgend_0       [kernel.vmlinux]          [k] __ipv6_addr_type
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-30 10:57:54 +01:00
Florian Westphal
d7e7747ac5 netfilter: refuse insertion if chain has grown too large
Also add a stat counter for this that gets exported both via old /proc
interface and ctnetlink.

Assuming the old default size of 16536 buckets and max hash occupancy of
64k, this results in 128k insertions (origin+reply), so ~8 entries per
chain on average.

The revised settings in this series will result in about two entries per
bucket on average.

This allows a hard-limit ceiling of 64.

This is not tunable at the moment, but its possible to either increase
nf_conntrack_buckets or decrease nf_conntrack_max to reduce average
lengths.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30 11:52:21 +02:00
Florian Westphal
dd6d2910c5 netfilter: conntrack: switch to siphash
Replace jhash in conntrack and nat core with siphash.

While at it, use the netns mix value as part of the input key
rather than abuse the seed value.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30 11:49:55 +02:00
Florian Westphal
d532bcd0b2 netfilter: conntrack: sanitize table size default settings
conntrack has two distinct table size settings:
nf_conntrack_max and nf_conntrack_buckets.

The former limits how many conntrack objects are allowed to exist
in each namespace.

The second sets the size of the hashtable.

As all entries are inserted twice (once for original direction, once for
reply), there should be at least twice as many buckets in the table than
the maximum number of conntrack objects that can exist at the same time.

Change the default multiplier to 1 and increase the chosen bucket sizes.
This results in the same nf_conntrack_max settings as before but reduces
the average bucket list length.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30 11:49:54 +02:00
Ryoga Saito
7a3f5b0de3 netfilter: add netfilter hooks to SRv6 data plane
This patch introduces netfilter hooks for solving the problem that
conntrack couldn't record both inner flows and outer flows.

This patch also introduces a new sysctl toggle for enabling lightweight
tunnel netfilter hooks.

Signed-off-by: Ryoga Saito <contact@proelbtn.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30 01:51:36 +02:00
Rocco Yue
49b99da2c9 ipv6: add IFLA_INET6_RA_MTU to expose mtu value
The kernel provides a "/proc/sys/net/ipv6/conf/<iface>/mtu"
file, which can temporarily record the mtu value of the last
received RA message when the RA mtu value is lower than the
interface mtu, but this proc has following limitations:

(1) when the interface mtu (/sys/class/net/<iface>/mtu) is
updeated, mtu6 (/proc/sys/net/ipv6/conf/<iface>/mtu) will
be updated to the value of interface mtu;
(2) mtu6 (/proc/sys/net/ipv6/conf/<iface>/mtu) only affect
ipv6 connection, and not affect ipv4.

Therefore, when the mtu option is carried in the RA message,
there will be a problem that the user sometimes cannot obtain
RA mtu value correctly by reading mtu6.

After this patch set, if a RA message carries the mtu option,
you can send a netlink msg which nlmsg_type is RTM_GETLINK,
and then by parsing the attribute of IFLA_INET6_RA_MTU to
get the mtu value carried in the RA message received on the
inet6 device. In addition, you can also get a link notification
when ra_mtu is updated so it doesn't have to poll.

In this way, if the MTU values that the device receives from
the network in the PCO IPv4 and the RA IPv6 procedures are
different, the user can obtain the correct ipv6 ra_mtu value
and compare the value of ra_mtu and ipv4 mtu, then the device
can use the lower MTU value for both IPv4 and IPv6.

Signed-off-by: Rocco Yue <rocco.yue@mediatek.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210827150412.9267-1-rocco.yue@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-27 17:29:18 -07:00
Olga Kornievskaia
dc48e0abee SUNRPC enforce creation of no more than max_connect xprts
If we are adding new transports via rpc_clnt_test_and_add_xprt()
then check if we've reached the limit. Currently only pnfs path
adds transports via that function but this is done in
preparation when the client would add new transports when
session trunking is detected. A warning is logged if the
limit is reached.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:29 -04:00
Olga Kornievskaia
df205d0a8e SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs
In sysfs's xprt_switch_info attribute also display the value of
number of transports with unique destination addresses for this
xprt_switch.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:03 -04:00
Olga Kornievskaia
3a3f976639 SUNRPC keep track of number of transports to unique addresses
Currently, xprt_switch keeps a number of all xprts (xps_nxprts)
that were added to the switch regardless of whethere it's an
nconnect transport or a transport to a trunkable address.
Introduce a new counter to keep track of transports to unique
destination addresses per xprt_switch.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:36:53 -04:00
Trond Myklebust
7c81e6a9d7 SUNRPC: Tweak TCP socket shutdown in the RPC client
We only really need to call shutdown() if we're in the ESTABLISHED TCP
state, since that is the only case where the client is initiating a
close of an established connection.

If the socket is in FIN_WAIT1 or FIN_WAIT2, then we've already initiated
socket shutdown and are waiting for the server's reply, so do nothing.

In all other cases where we've already received a FIN from the server,
we should be able to just close the socket.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:36:21 -04:00
Trond Myklebust
0a6ff58edb SUNRPC: Simplify socket shutdown when not reusing TCP ports
If we're not required to reuse the TCP port, then we can just
immediately close the socket, and leave the cleanup details to the TCP
layer.

Fixes: e6237b6feb37 ("NFSv4.1: Don't rebind to the same source port when reconnecting to the server")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:36:21 -04:00
David S. Miller
fe50893aa8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/
ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2021-08-27

1) Remove an unneeded extra variable in esp4 esp_ssg_unref.
   From Corey Minyard.

2) Add a configuration option to change the default behaviour
   to block traffic if there is no matching policy.
   Joint work with Christian Langrock and Antony Antony.

3) Fix a shift-out-of-bounce bug reported from syzbot.
   From Pavel Skripkin.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 11:16:29 +01:00
Paolo Abeni
9758f40e90 mptcp: make the locking tx schema more readable
Florian noted the locking schema used by __mptcp_push_pending()
is hard to follow, let's add some more descriptive comments
and drop an unneeded and confusing check.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 09:45:07 +01:00
Paolo Abeni
f6c2ef59bc mptcp: optimize the input options processing
Most MPTCP packets carries a single MPTCP subption: the
DSS containing the mapping for the current packet.

Check explicitly for the above, so that is such scenario we
replace most conditional statements with a single likely() one.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 09:45:07 +01:00
Paolo Abeni
74c7dfbee3 mptcp: consolidate in_opt sub-options fields in a bitmask
This makes input options processing more consistent with
output ones and will simplify the next patch.

Also avoid clearing the suboption field after processing
it, since it's not needed.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 09:45:07 +01:00
Paolo Abeni
a086aebae0 mptcp: better binary layout for mptcp_options_received
This change reorder the mptcp_options_received fields
to shrink the structure a bit and to ensure the most
frequently used fields are all in the first cacheline.

Sub-opt specific flags are moved out of the suboptions area,
and we must now explicitly set them when the relevant
suboption is parsed.

There is a notable exception: 'csum_reqd' is used by both DSS
and MPC suboptions, and keeping such field in the suboptions
flag area will simplfy the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 09:45:07 +01:00
Paolo Abeni
8d548ea1dd mptcp: do not set unconditionally csum_reqd on incoming opt
Should be set only if the ingress packets present it, otherwise
we can confuse csum validation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 09:45:07 +01:00
Peter Collingbourne
d0efb16294 net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
A common implementation of isatty(3) involves calling a ioctl passing
a dummy struct argument and checking whether the syscall failed --
bionic and glibc use TCGETS (passing a struct termios), and musl uses
TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
copy sizeof(struct ifreq) bytes of data from the argument and return
-EFAULT if that fails. The result is that the isatty implementations
may return a non-POSIX-compliant value in errno in the case where part
of the dummy struct argument is inaccessible, as both struct termios
and struct winsize are smaller than struct ifreq (at least on arm64).

Although there is usually enough stack space following the argument
on the stack that this did not present a practical problem up to now,
with MTE stack instrumentation it's more likely for the copy to fail,
as the memory following the struct may have a different tag.

Fix the problem by adding an early check for whether the ioctl is a
valid socket ioctl, and return -ENOTTY if it isn't.

Fixes: 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers")
Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d
Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-27 09:40:25 +01:00
Neil Spring
3aa7857fe1 tcp: enable mid stream window clamp
The TCP_WINDOW_CLAMP socket option is defined in tcp(7) to "Bound the size
of the advertised window to this value."  Window clamping is distributed
across two variables, window_clamp ("Maximal window to advertise" in
tcp.h) and rcv_ssthresh ("Current window clamp").

This patch updates the function where the window clamp is set to also
reduce the current window clamp, rcv_sshthresh, if needed.  With this,
setting the TCP_WINDOW_CLAMP option has the documented effect of limiting
the window.

Signed-off-by: Neil Spring <ntspring@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210825210117.1668371-1-ntspring@fb.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 18:00:40 -07:00
Jakub Kicinski
97c78d0af5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 17:57:57 -07:00
Linus Torvalds
73367f05b2 This is a one-liner fix for a serious bug that can cause the server to
become unresponsive to a client, so I think it's worth the last-minute
 inclusion for 5.14.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmEn5MwVHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+DoIP+QHYnLK5fVN8TcBV/I3RbhEGafMv
 XSym9RtpLVAlfhrM6eBxiQqq2eHzjKADwatE2orDVD7w7rPKa19xvF8+LoYvtGnm
 cs3j49DlncWjoO1zO36QteO9M9FHxYM85PFX1kM74ZwBuLNTvZecIdHhuIg9WrnN
 GDhQHwUMXxXFZJyWARBar/XLGMUDkCl1CEj1QgKePvbwY/ucmtTLgbwaSooRrNog
 9g7ac5s5ZHEgM2oniS+WZ1C+18azOOoo9blP8bzEAM4mE91uq/k1jYGkLcnZIyK/
 CUO1CF7G26yW+4Q+k/OSPrN2cNnA9Uvg3z7hld/yVoAQKHMQ0ndt3SUWny3pH/xi
 y7f62jRHeLnhpTwX5psOTnL24XyuPHyoSriop1xfym9Tbrhsskc+BtVpY2TltggG
 rbdyYH7BjQoMMqyXla/tUFA7Iso5W06qdSqapLbquu8/XPaMFs7R147GMsFcmA8D
 NdCbIwMeE/1YnE2qx0XqwXfxEkK/prLnabPtDhSiQ1flAYxHwBhTw3teHSD0Ohy/
 BbwY4eHRBnY8q22b1dJp1PNHWVSCuPtXX7QHIeQeYTBvSu/dwahq1xo6LtYoQExl
 Fhmer7Jvgh1+X8OMYOHYmbKcHXXSi1esL4+XB55d/m4vZTCxoeMonUP6bdcnzqkp
 aOmgygKJSyAzsXA8
 =NtJD
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fix from Bruce Fields:
 "This is a one-liner fix for a serious bug that can cause the server to
  become unresponsive to a client, so I think it's worth the last-minute
  inclusion for 5.14"

* tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
2021-08-26 13:26:40 -07:00
Kalle Valo
9ebc2758d0 Revert "net: really fix the build..."
This reverts commit ce78ffa3ef1681065ba451cfd545da6126f5ca88.

Wren and Nicolas reported that ath11k was failing to initialise QCA6390
Wi-Fi 6 device with error:

qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22

Commit ce78ffa3ef16 ("net: really fix the build..."), introduced in
v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
devices are broken, but I only tested QCA6390. Let's revert the broken
commit so that ath11k works again.

Reported-by: Wren Turkal <wt@penguintechs.org>
Reported-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 11:08:32 -07:00
王贇
733c99ee8b net: fix NULL pointer reference in cipso_v4_doi_free
In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
failed, we sometime observe panic:

  BUG: kernel NULL pointer dereference, address:
  ...
  RIP: 0010:cipso_v4_doi_free+0x3a/0x80
  ...
  Call Trace:
   netlbl_cipsov4_add_std+0xf4/0x8c0
   netlbl_cipsov4_add+0x13f/0x1b0
   genl_family_rcv_msg_doit.isra.15+0x132/0x170
   genl_rcv_msg+0x125/0x240

This is because in cipso_v4_doi_free() there is no check
on 'doi_def->map.std' when 'doi_def->type' equal 1, which
is possibe, since netlbl_cipsov4_add_std() haven't initialize
it before alloc 'doi_def->map.std'.

This patch just add the check to prevent panic happen for similar
cases.

Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 12:20:47 +01:00
Andrey Ignatov
96a6b93b69 rtnetlink: Return correct error on changing device netns
Currently when device is moved between network namespaces using
RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID,
IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and
target namespace already has device with same name, userspace will get
EINVAL what is confusing and makes debugging harder.

Fix it so that userspace gets more appropriate EEXIST instead what makes
debugging much easier.

Before:

  # ./ifname.sh
  + ip netns add ns0
  + ip netns exec ns0 ip link add l0 type dummy
  + ip netns exec ns0 ip link show l0
  8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff
  + ip link add l0 type dummy
  + ip link show l0
  10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff
  + ip link set l0 netns ns0
  RTNETLINK answers: Invalid argument

After:

  # ./ifname.sh
  + ip netns add ns0
  + ip netns exec ns0 ip link add l0 type dummy
  + ip netns exec ns0 ip link show l0
  8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff
  + ip link add l0 type dummy
  + ip link show l0
  10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff
  + ip link set l0 netns ns0
  RTNETLINK answers: File exists

The problem is that do_setlink() passes its `char *ifname` argument,
that it gets from a caller, to __dev_change_net_namespace() as is (as
`const char *pat`), but semantics of ifname and pat can be different.

For example, __rtnl_newlink() does this:

net/core/rtnetlink.c
    3270	char ifname[IFNAMSIZ];
     ...
    3286	if (tb[IFLA_IFNAME])
    3287		nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
    3288	else
    3289		ifname[0] = '\0';
     ...
    3364	if (dev) {
     ...
    3394		return do_setlink(skb, dev, ifm, extack, tb, ifname, status);
    3395	}

, i.e. do_setlink() gets ifname pointer that is always valid no matter
if user specified IFLA_IFNAME or not and then do_setlink() passes this
ifname pointer as is to __dev_change_net_namespace() as pat argument.

But the pat (pattern) in __dev_change_net_namespace() is used as:

net/core/dev.c
   11198	err = -EEXIST;
   11199	if (__dev_get_by_name(net, dev->name)) {
   11200		/* We get here if we can't use the current device name */
   11201		if (!pat)
   11202			goto out;
   11203		err = dev_get_valid_name(net, dev, pat);
   11204		if (err < 0)
   11205			goto out;
   11206	}

As the result the `goto out` path on line 11202 is neven taken and
instead of returning EEXIST defined on line 11198,
__dev_change_net_namespace() returns an error from dev_get_valid_name()
and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.

Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 12:08:08 +01:00
David S. Miller
8b325d2a09 A few more things:
* Use correct DFS domain for self-managed devices
  * some preparations for transmit power element handling
    and other 6 GHz regulatory handling
  * TWT support in AP mode in mac80211
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmEnWbwACgkQB8qZga/f
 l8QfOQ/+LwZsbYDxvbluoQPBviIPey1s5PqLyVpCDA1XZI0G8G9FmZ33J1Ao3b4A
 /MCFB05rL7Pv8h9Rpx5Nd6ZdMrq4+rF0qYHJrNQnYlfxeb/z0CJCZGDaTQOcSDLc
 HPTQRU2hd5+ZInxgnefbD84nJ8Bpdd6cRTKS06xPxY2+1k6dPSHQ/OjLdmur5IlN
 gLSfyiPY6ryVWbNanzbIlKcisw6RxuagFjqzbay/JVycKNx0x2hWnoF7Ad/AHdkK
 P7l60CiMyTyibU7VfQ1NnGm3WK55Df32fDVVKtflFRi3fY08p8zRNKZdD4XD+4KD
 ptqwd173qiV+WjZaEr6YoHFirCcheFibc8/Z3iYS6LwD+w1isMoZPa8rRFoXnPkS
 xMlJAvAgcMBHUBOfeS8Ymo0qZvIvo36933S81g64Bl5IPbhfin2Vybs8jT0xLEPP
 LMDqWv+jcQ5ONc+RAjQ23c8I3mTdDPTZ+F8lzXkzhTBdcfWkXQWxMbLN/CU5RX8t
 hGTtTTlHC1cqgSQLT42WrPPDgbxzAqAcLTTvRwnu0TLvJcTc6NDcfoBXkXjiLOij
 g+GiPOXG36qy+cKTwzvSASkKqP71nQOdO74VHyftzMGHqvaEz2NzSVPNWk/3Nwpo
 7MuQvfE0cQAvTdmt9fTlelm61CyKLl16aKV1IjA6/Ob9Nr/haUw=
 =2Ucp
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-net-next-2021-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
A few more things:
 * Use correct DFS domain for self-managed devices
 * some preparations for transmit power element handling
   and other 6 GHz regulatory handling
 * TWT support in AP mode in mac80211
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:47:43 +01:00
Yunsheng Lin
723783d077 sock: remove one redundant SKB_FRAG_PAGE_ORDER macro
Both SKB_FRAG_PAGE_ORDER are defined to the same value in
net/core/sock.c and drivers/vhost/net.c.

Move the SKB_FRAG_PAGE_ORDER definition to net/core/sock.h,
as both net/core/sock.c and drivers/vhost/net.c include it,
and it seems a reasonable file to put the macro.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:46:20 +01:00
Eric Dumazet
6457378fe7 ipv4: use siphash instead of Jenkins in fnhe_hashfun()
A group of security researchers brought to our attention
the weakness of hash function used in fnhe_hashfun().

Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.

Also remove the inline keyword, this really is distracting.

Fixes: d546c621542d ("ipv4: harden fnhe_hashfun()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:20:34 +01:00
Eric Dumazet
4785305c05 ipv6: use siphash in rt6_exception_hash()
A group of security researchers brought to our attention
the weakness of hash function used in rt6_exception_hash()

Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.

Following patch deals with IPv4.

Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:20:34 +01:00
Sriram R
90bd5bee50 cfg80211: use wiphy DFS domain if it is self-managed
Currently during CAC start or other radar events, the DFS
domain is fetched from cfg based on global DFS domain,
even if the wiphy regdomain disagrees.

But this could be different in case of self managed wiphy's
in case the self managed driver updates its database or supports
regions which has DFS domain set to UNSET in cfg80211 local
regdomain.

So for explicitly self-managed wiphys, just use their DFS
domain.

Signed-off-by: Sriram R <srirrama@codeaurora.org>
Link: https://lore.kernel.org/r/1629934730-16388-1-git-send-email-srirrama@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-26 11:04:55 +02:00
Wen Gong
b0345850ad mac80211: parse transmit power envelope element
Parse and store the transmit power envelope element.

Signed-off-by: Wen Gong <wgong@codeaurora.org>
Link: https://lore.kernel.org/r/20210820122041.12157-8-wgong@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-08-26 10:18:56 +02:00
Martin KaFai Lau
eb18b49ea7 bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
This patch allows the bpf-tcp-cc to call bpf_setsockopt.  One use
case is to allow a bpf-tcp-cc switching to another cc during init().
For example, when the tcp flow is not ecn ready, the bpf_dctcp
can switch to another cc by calling setsockopt(TCP_CONGESTION).

During setsockopt(TCP_CONGESTION), the new tcp-cc's init() will be
called and this could cause a recursion but it is stopped by the
current trampoline's logic (in the prog->active counter).

While retiring a bpf-tcp-cc (e.g. in tcp_v[46]_destroy_sock()),
the tcp stack calls bpf-tcp-cc's release().  To avoid the retiring
bpf-tcp-cc making further changes to the sk, bpf_setsockopt is not
available to the bpf-tcp-cc's release().  This will avoid release()
making setsockopt() call that will potentially allocate new resources.

Although the bpf-tcp-cc already has a more powerful way to read tcp_sock
from the PTR_TO_BTF_ID, it is usually expected that bpf_getsockopt and
bpf_setsockopt are available together.  Thus, bpf_getsockopt() is also
added to all tcp_congestion_ops except release().

When the old bpf-tcp-cc is calling setsockopt(TCP_CONGESTION)
to switch to a new cc, the old bpf-tcp-cc will be released by
bpf_struct_ops_put().  Thus, this patch also puts the bpf_struct_ops_map
after a rcu grace period because the trampoline's image cannot be freed
while the old bpf-tcp-cc is still running.

bpf-tcp-cc can only access icsk_ca_priv as SCALAR.  All kernel's
tcp-cc is also accessing the icsk_ca_priv as SCALAR.   The size
of icsk_ca_priv has already been raised a few times to avoid
extra kmalloc and memory referencing.  The only exception is the
kernel's tcp_cdg.c that stores a kmalloc()-ed pointer in icsk_ca_priv.
To avoid the old bpf-tcp-cc accidentally overriding this tcp_cdg's pointer
value stored in icsk_ca_priv after switching and without over-complicating
the bpf's verifier for this one exception in tcp_cdg, this patch does not
allow switching to tcp_cdg.  If there is a need, bpf_tcp_cdg can be
implemented and then use the bpf_sk_storage as the extended storage.

bpf_sk_setsockopt proto has only been recently added and used
in bpf-sockopt and bpf-iter-tcp, so impose the tcp_cdg limitation in the
same proto instead of adding a new proto specifically for bpf-tcp-cc.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210824173007.3976921-1-kafai@fb.com
2021-08-25 17:40:35 -07:00
Trond Myklebust
062b829c52 SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
If the attempt to reserve a slot fails, we currently leak the XPT_BUSY
flag on the socket. Among other things, this make it impossible to close
the socket.

Fixes: 82011c80b3ec ("SUNRPC: Move svc_xprt_received() call sites")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-08-25 16:58:09 -04:00
Pavel Begunkov
d32f89da7f net: add accept helper not installing fd
Introduce and reuse a helper that acts similarly to __sys_accept4_file()
but returns struct file instead of installing file descriptor. Will be
used by io_uring.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Acked-by: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/c57b9e8e818d93683a3d24f8ca50ca038d1da8c4.1629888991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25 06:36:56 -06:00
Lukas Bulwahn
7bc416f147 netfilter: x_tables: handle xt_register_template() returning an error value
Commit fdacd57c79b7 ("netfilter: x_tables: never register tables by
default") introduces the function xt_register_template(), and in one case,
a call to that function was missing the error-case handling.

Handle when xt_register_template() returns an error value.

This was identified with the clang-analyzer's Dead-Store analysis.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25 13:06:48 +02:00
Pablo Neira Ayuso
6c89dac5b9 netfilter: ctnetlink: missing counters and timestamp in nfnetlink_{log,queue}
Add counters and timestamps (if available) to the conntrack object
that is represented in nfnetlink_log and _queue messages.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25 13:06:48 +02:00