70362 Commits

Author SHA1 Message Date
Florian Westphal
2024439bd5 netfilter: nf_tables: fix scheduling-while-atomic splat
nf_tables_check_loops() can be called from rhashtable list
walk so cond_resched() cannot be used here.

Fixes: 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 17:57:28 +02:00
Florian Westphal
976bf59c69 netfilter: nf_ct_irc: cap packet search space to 4k
This uses a pseudo-linearization scheme with a 64k global buffer,
but BIG TCP arrival means IPv6 TCP stack can generate skbs
that exceed this size.

In practice, IRC commands are not expected to exceed 512 bytes, plus
this is interactive protocol, so we should not see large packets
in practice.

Given most IRC connections nowadays use TLS so this helper could also be
removed in the near future.

Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 16:51:13 +02:00
Florian Westphal
c783a29c7e netfilter: nf_ct_ftp: prefer skb_linearize
This uses a pseudo-linearization scheme with a 64k global buffer,
but BIG TCP arrival means IPv6 TCP stack can generate skbs
that exceed this size.

Use skb_linearize.  It should be possible to rewrite this to properly
deal with segmented skbs (i.e., only do small chunk-wise accesses),
but this is going to be a lot more intrusive than this because every
helper function needs to get the sk_buff instead of a pointer to a raw
data buffer.

In practice, provided we're really looking at FTP control channel packets,
there should never be a case where we deal with huge packets.

Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 16:51:01 +02:00
Florian Westphal
f3e124c36f netfilter: nf_ct_h323: cap packet size at 64k
With BIG TCP, packets generated by tcp stack may exceed 64kb.
Cap datalen at 64kb.  The internal message format uses 16bit fields,
so no embedded message can exceed 64k size.

Multiple h323 messages in a single superpacket may now result
in a message to get treated as incomplete/truncated, but thats
better than scribbling past h323_buffer.

Another alternative suitable for net tree would be a switch to
skb_linearize().

Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 16:50:49 +02:00
Florian Westphal
a664375da7 netfilter: nf_ct_sane: remove pseudo skb linearization
For historical reason this code performs pseudo linearization of skbs
via skb_header_pointer and a global 64k buffer.

With arrival of BIG TCP, packets generated by TCP stack can exceed 64kb.

Rewrite this to only extract the needed header data.  This also allows
to get rid of the locking.

Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 16:50:25 +02:00
Maxim Mikityanskiy
94ce3b64c6 net/tls: Use RCU API to access tls_ctx->netdev
Currently, tls_device_down synchronizes with tls_device_resync_rx using
RCU, however, the pointer to netdev is stored using WRITE_ONCE and
loaded using READ_ONCE.

Although such approach is technically correct (rcu_dereference is
essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
NULL), using special RCU helpers for pointers is more valid, as it
includes additional checks and might change the implementation
transparently to the callers.

Mark the netdev pointer as __rcu and use the correct RCU helpers to
access it. For non-concurrent access pass the right conditions that
guarantee safe access (locks taken, refcount value). Also use the
correct helper in mlx5e, where even READ_ONCE was missing.

The transition to RCU exposes existing issues, fixed by this commit:

1. bond_tls_device_xmit could read netdev twice, and it could become
NULL the second time, after the NULL check passed.

2. Drivers shouldn't stop processing the last packet if tls_device_down
just set netdev to NULL, before tls_dev_del was called. This prevents a
possible packet drop when transitioning to the fallback software mode.

Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload")
Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:58:43 -07:00
Jakub Kicinski
d800a7b357 tls: rx: device: don't try to copy too much on detach
Another device offload bug, we use the length of the output
skb as an indication of how much data to copy. But that skb
is sized to offset + record length, and we start from offset.
So we end up double-counting the offset which leads to
skb_copy_bits() returning -EFAULT.

Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/20220809175544.354343-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:53:25 -07:00
Jakub Kicinski
86b259f6f8 tls: rx: device: bound the frag walk
We can't do skb_walk_frags() on the input skbs, because
the input skbs is really just a pointer to the tcp read
queue. We need to bound the "is decrypted" check by the
amount of data in the message.

Note that the walk in tls_device_reencrypt() is after a
CoW so the skb there is safe to walk. Actually in the
current implementation it can't have frags at all, but
whatever, maybe one day it will.

Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/20220809175544.354343-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:53:25 -07:00
Thadeu Lima de Souza Cascardo
9ad36309e2 net_sched: cls_route: remove from list when handle is 0
When a route filter is replaced and the old filter has a 0 handle, the old
one won't be removed from the hashtable, while it will still be freed.

The test was there since before commit 1109c00547fc ("net: sched: RCU
cls_route"), when a new filter was not allocated when there was an old one.
The old filter was reused and the reinserting would only be necessary if an
old filter was replaced. That was still wrong for the same case where the
old handle was 0.

Remove the old filter from the list independently from its handle value.

This fixes CVE-2022-2588, also reported as ZDI-CAN-17440.

Reported-by: Zhenpeng Lin <zplin@u.northwestern.edu>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Kamal Mostafa <kamal@canonical.com>
Cc: <stable@vger.kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:53:11 -07:00
Jakub Kicinski
fbe8870f72 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
bpf 2022-08-10

We've added 23 non-merge commits during the last 7 day(s) which contain
a total of 19 files changed, 424 insertions(+), 35 deletions(-).

The main changes are:

1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao.

2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia.

3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu.

4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev.

5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney.

6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi.

7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa.

8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai.

9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address
   offset buffer, from Aijun Sun.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
  selftests/bpf: Ensure sleepable program is rejected by hash map iter
  selftests/bpf: Add write tests for sk local storage map iterator
  selftests/bpf: Add tests for reading a dangling map iter fd
  bpf: Only allow sleepable program for resched-able iterator
  bpf: Check the validity of max_rdwr_access for sock local storage map iterator
  bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator
  bpf: Acquire map uref in .init_seq_private for sock local storage map iterator
  bpf: Acquire map uref in .init_seq_private for hash map iterator
  bpf: Acquire map uref in .init_seq_private for array map iterator
  bpf: Disallow bpf programs call prog_run command.
  bpf, arm64: Fix bpf trampoline instruction endianness
  selftests/bpf: Add test for prealloc_lru_pop bug
  bpf: Don't reinit map value in prealloc_lru_pop
  bpf: Allow calling bpf_prog_test kfuncs in tracing programs
  bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc
  selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf
  bpf: Use proper target btf when exporting attach_btf_obj_id
  mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled
  bpf: Cleanup ftrace hash in bpf_trampoline_put
  BPF: Fix potential bad pointer dereference in bpf_sys_bpf()
  ...
====================

Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 21:48:15 -07:00
Hawkins Jiawei
2a0133723f net: fix refcount bug in sk_psock_get (2)
Syzkaller reports refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
 <TASK>
 __refcount_add_not_zero include/linux/refcount.h:163 [inline]
 __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
 refcount_inc_not_zero include/linux/refcount.h:245 [inline]
 sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
 sk_backlog_rcv include/net/sock.h:1061 [inline]
 __release_sock+0x134/0x3b0 net/core/sock.c:2849
 release_sock+0x54/0x1b0 net/core/sock.c:3404
 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
 __sys_shutdown_sock net/socket.c:2331 [inline]
 __sys_shutdown_sock net/socket.c:2325 [inline]
 __sys_shutdown+0xf1/0x1b0 net/socket.c:2343
 __do_sys_shutdown net/socket.c:2351 [inline]
 __se_sys_shutdown net/socket.c:2349 [inline]
 __x64_sys_shutdown+0x50/0x70 net/socket.c:2349
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
 </TASK>

During SMC fallback process in connect syscall, kernel will
replaces TCP with SMC. In order to forward wakeup
smc socket waitqueue after fallback, kernel will sets
clcsk->sk_user_data to origin smc socket in
smc_fback_replace_callbacks().

Later, in shutdown syscall, kernel will calls
sk_psock_get(), which treats the clcsk->sk_user_data
as psock type, triggering the refcnt warning.

So, the root cause is that smc and psock, both will use
sk_user_data field. So they will mismatch this field
easily.

This patch solves it by using another bit(defined as
SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
sk_user_data points to a psock object or not.
This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e
("net, sk_msg: Clear sk_user_data pointer on clone if tagged").

For there will possibly be more flags in the sk_user_data field,
this patch also refactor sk_user_data flags code to be more generic
to improve its maintainability.

Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 21:47:58 -07:00
Linus Torvalds
aeb6e6ac18 NFS client updates for Linux 5.20
Highlights include:
 
 Stable fixes:
 - pNFS/flexfiles: Fix infinite looping when the RDMA connection errors out
 
 Bugfixes:
 - NFS: fix port value parsing
 - SUNRPC: Reinitialise the backchannel request buffers before reuse
 - SUNRPC: fix expiry of auth creds
 - NFSv4: Fix races in the legacy idmapper upcall
 - NFS: O_DIRECT fixes from Jeff Layton
 - NFSv4.1: Fix OP_SEQUENCE error handling
 - SUNRPC: Fix an RPC/RDMA performance regression
 - NFS: Fix case insensitive renames
 - NFSv4/pnfs: Fix a use-after-free bug in open
 - NFSv4.1: RECLAIM_COMPLETE must handle EACCES
 
 Features:
 - NFSv4.1: session trunking enhancements
 - NFSv4.2: READ_PLUS performance optimisations
 - NFS: relax the rules for rsize/wsize mount options
 - NFS: don't unhash dentry during unlink/rename
 - SUNRPC: Fail faster on bad verifier
 - NFS/SUNRPC: Various tracing improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmLzy24ACgkQZwvnipYK
 APJJvhAAnVmv3jeOLjwDm+3X9xBevTq8PXAikWVeJgbSKCdjfqXO1J+XF49MpxXl
 N8PQZMqnwWkF3WvhvYobblwGSl6gJ3RnhVuTgdv4jiPl0ZWyS3XngJY0dCTnNgdL
 5O9AjoOMtnGVZwN5j8agymA8f2TUcel5mED6sAk10t2zDZY7VxuQqjp0m696jYjF
 0PKBmxoC4+6tXtYJS7d2PGRCTjfEUx2BLnnGuLOKsB7X8f63XmxJiu8/AIiY7spr
 M/M+BjAF3ok86dT1LjGlkvSNp23H70Wmsv98udfvxIWBm1l4972oCR/CS+kh3/mU
 dYM+oQ3JxxgKEN7Fdak+zU/+qma9q5z2rPFpSIT1fMEuaKN/7H2cbiHPi5RnEBLa
 AHWilX/lWBIMnZJZd9g3yYcGe6E/pkT6TqW5JY+2510koyfNER4IismAWMx2iYKU
 D0WSZOkmEBS/OYZxpTnqGwvS4L1szo9DN3c+yG2KXLifnmVPpjXZO25wahqSuUo3
 V6eYUCXRJmVg+IuXGsMNdrjxGYxD12xChoYzx5RlXls2lwHGeZr+iG3aL3+XayHa
 I1Kji3500UmfEOUUUr4UiQ428dOdL3QqNzVzdymN8Vh4d7v64LUL0GSseY+10Xrs
 xcbR6l/hwjBIo+I1Bi2mmv3W10tKErFy9eBIKzql3D6VHg7ESOo=
 =U00h
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - pNFS/flexfiles: Fix infinite looping when the RDMA connection
     errors out

  Bugfixes:
   - NFS: fix port value parsing
   - SUNRPC: Reinitialise the backchannel request buffers before reuse
   - SUNRPC: fix expiry of auth creds
   - NFSv4: Fix races in the legacy idmapper upcall
   - NFS: O_DIRECT fixes from Jeff Layton
   - NFSv4.1: Fix OP_SEQUENCE error handling
   - SUNRPC: Fix an RPC/RDMA performance regression
   - NFS: Fix case insensitive renames
   - NFSv4/pnfs: Fix a use-after-free bug in open
   - NFSv4.1: RECLAIM_COMPLETE must handle EACCES

  Features:
   - NFSv4.1: session trunking enhancements
   - NFSv4.2: READ_PLUS performance optimisations
   - NFS: relax the rules for rsize/wsize mount options
   - NFS: don't unhash dentry during unlink/rename
   - SUNRPC: Fail faster on bad verifier
   - NFS/SUNRPC: Various tracing improvements"

* tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits)
  NFS: Improve readpage/writepage tracing
  NFS: Improve O_DIRECT tracing
  NFS: Improve write error tracing
  NFS: don't unhash dentry during unlink/rename
  NFSv4/pnfs: Fix a use-after-free bug in open
  NFS: nfs_async_write_reschedule_io must not recurse into the writeback code
  SUNRPC: Don't reuse bvec on retransmission of the request
  SUNRPC: Reinitialise the backchannel request buffers before reuse
  NFSv4.1: RECLAIM_COMPLETE must handle EACCES
  NFSv4.1 probe offline transports for trunking on session creation
  SUNRPC create a function that probes only offline transports
  SUNRPC export xprt_iter_rewind function
  SUNRPC restructure rpc_clnt_setup_test_and_add_xprt
  NFSv4.1 remove xprt from xprt_switch if session trunking test fails
  SUNRPC create an rpc function that allows xprt removal from rpc_clnt
  SUNRPC enable back offline transports in trunking discovery
  SUNRPC create an iterator to list only OFFLINE xprts
  NFSv4.1 offline trunkable transports on DESTROY_SESSION
  SUNRPC add function to offline remove trunkable transports
  SUNRPC expose functions for offline remote xprt functionality
  ...
2022-08-10 14:04:32 -07:00
Hou Tao
52bd05eb7c bpf: Check the validity of max_rdwr_access for sock local storage map iterator
The value of sock local storage map is writable in map iterator, so check
max_rdwr_access instead of max_rdonly_access.

Fixes: 5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 10:12:48 -07:00
Hou Tao
f0d2b2716d bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator
sock_map_iter_attach_target() acquires a map uref, and the uref may be
released before or in the middle of iterating map elements. For example,
the uref could be released in sock_map_iter_detach_target() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().

Fixing it by acquiring an extra map uref in .init_seq_private and
releasing it in .fini_seq_private.

Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 10:12:48 -07:00
Hou Tao
3c5f6e698b bpf: Acquire map uref in .init_seq_private for sock local storage map iterator
bpf_iter_attach_map() acquires a map uref, and the uref may be released
before or in the middle of iterating map elements. For example, the uref
could be released in bpf_iter_detach_map() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().

So acquiring an extra map uref in bpf_iter_init_sk_storage_map() and
releasing it in bpf_iter_fini_sk_storage_map().

Fixes: 5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 10:12:48 -07:00
Pablo Neira Ayuso
c485c35ff6 netfilter: nf_tables: possible module reference underflow in error path
dst->ops is set on when nft_expr_clone() fails, but module refcount has
not been bumped yet, therefore nft_expr_destroy() leads to module
reference underflow.

Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-10 17:06:05 +02:00
Pablo Neira Ayuso
4963674c2e netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag
These are mutually exclusive, actually NFTA_SET_ELEM_KEY_END replaces
the flag notation.

Fixes: 7b225d0b5c6d ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-10 17:06:05 +02:00
Pablo Neira Ayuso
3400278328 netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access
The generation ID is bumped from the commit path while holding the
mutex, however, netlink dump operations rely on RCU.

This patch also adds missing cb->base_eq initialization in
nf_tables_dump_set().

Fixes: 38e029f14a97 ("netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-10 17:06:05 +02:00
Ido Schimmel
6b4db2e528 devlink: Fix use-after-free after a failed reload
After a failed devlink reload, devlink parameters are still registered,
which means user space can set and get their values. In the case of the
mlxsw "acl_region_rehash_interval" parameter, these operations will
trigger a use-after-free [1].

Fix this by rejecting set and get operations while in the failed state.
Return the "-EOPNOTSUPP" error code which does not abort the parameters
dump, but instead causes it to skip over the problematic parameter.

Another possible fix is to perform these checks in the mlxsw parameter
callbacks, but other drivers might be affected by the same problem and I
am not aware of scenarios where these stricter checks will cause a
regression.

[1]
mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev
mlxsw_spectrum3 0000:00:10.0: Failed to create ports

==================================================================
BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777

CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1
Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:313 [inline]
 print_report.cold+0x5e/0x5cf mm/kasan/report.c:429
 kasan_report+0xb9/0xf0 mm/kasan/report.c:491
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306
 mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
 mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106
 mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854
 devlink_param_get net/core/devlink.c:4981 [inline]
 devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089
 devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168
 devlink_ns_change_notify net/core/devlink.c:4417 [inline]
 devlink_ns_change_notify net/core/devlink.c:4396 [inline]
 devlink_reload+0x15f/0x700 net/core/devlink.c:4507
 devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272
 ops_pre_exit_list net/core/net_namespace.c:152 [inline]
 cleanup_net+0x494/0xc00 net/core/net_namespace.c:582
 process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289
 worker_thread+0x675/0x10b0 kernel/workqueue.c:2436
 kthread+0x30c/0x3d0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>

The buggy address belongs to the physical page:
page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc
flags: 0x100000000000000(node=0|zone=1)
raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                    ^
 ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================

Fixes: 98bbf70c1c41 ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-10 13:48:04 +01:00
Peilin Ye
a3e7b29e30 vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout()
Imagine two non-blocking vsock_connect() requests on the same socket.
The first request schedules @connect_work, and after it times out,
vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps
*socket* state as SS_CONNECTING.

Later, the second request returns -EALREADY, meaning the socket "already
has a pending connection in progress", even though the first request has
already timed out.

As suggested by Stefano, fix it by setting *socket* state back to
SS_UNCONNECTED, so that the second request will return -ETIMEDOUT.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-10 09:50:18 +01:00
Peilin Ye
7e97cfed99 vsock: Fix memory leak in vsock_connect()
An O_NONBLOCK vsock_connect() request may try to reschedule
@connect_work.  Imagine the following sequence of vsock_connect()
requests:

  1. The 1st, non-blocking request schedules @connect_work, which will
     expire after 200 jiffies.  Socket state is now SS_CONNECTING;

  2. Later, the 2nd, blocking request gets interrupted by a signal after
     a few jiffies while waiting for the connection to be established.
     Socket state is back to SS_UNCONNECTED, but @connect_work is still
     pending, and will expire after 100 jiffies.

  3. Now, the 3rd, non-blocking request tries to schedule @connect_work
     again.  Since @connect_work is already scheduled,
     schedule_delayed_work() silently returns.  sock_hold() is called
     twice, but sock_put() will only be called once in
     vsock_connect_timeout(), causing a memory leak reported by syzbot:

  BUG: memory leak
  unreferenced object 0xffff88810ea56a40 (size 1232):
    comm "syz-executor756", pid 3604, jiffies 4294947681 (age 12.350s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      28 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  (..@............
    backtrace:
      [<ffffffff837c830e>] sk_prot_alloc+0x3e/0x1b0 net/core/sock.c:1930
      [<ffffffff837cbe22>] sk_alloc+0x32/0x2e0 net/core/sock.c:1989
      [<ffffffff842ccf68>] __vsock_create.constprop.0+0x38/0x320 net/vmw_vsock/af_vsock.c:734
      [<ffffffff842ce8f1>] vsock_create+0xc1/0x2d0 net/vmw_vsock/af_vsock.c:2203
      [<ffffffff837c0cbb>] __sock_create+0x1ab/0x2b0 net/socket.c:1468
      [<ffffffff837c3acf>] sock_create net/socket.c:1519 [inline]
      [<ffffffff837c3acf>] __sys_socket+0x6f/0x140 net/socket.c:1561
      [<ffffffff837c3bba>] __do_sys_socket net/socket.c:1570 [inline]
      [<ffffffff837c3bba>] __se_sys_socket net/socket.c:1568 [inline]
      [<ffffffff837c3bba>] __x64_sys_socket+0x1a/0x20 net/socket.c:1568
      [<ffffffff84512815>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      [<ffffffff84512815>] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
      [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
  <...>

Use mod_delayed_work() instead: if @connect_work is already scheduled,
reschedule it, and undo sock_hold() to keep the reference count
balanced.

Reported-and-tested-by: syzbot+b03f55bf128f9a38f064@syzkaller.appspotmail.com
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Co-developed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-10 09:50:18 +01:00
Topi Miettinen
2cd0e8dba7 netlabel: fix typo in comment
'IPv4 and IPv4' should be 'IPv4 and IPv6'.

Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-10 09:24:41 +01:00
David S. Miller
e7f164955f linux-can-fixes-for-6.0-20220810
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmLzWagTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCtfkuQ2KDTXU/wB/4lE4h9FGmCkAT14+y/83zPZ+eK3Sym
 EjtOLJA6RX7WwsIICU5xTGr8imQQIju0Q2fMzmaweT3+bley8uvdN6zUVhf5UyjX
 7wmud4x5TfZxj22EUQM+MmWCuAiUet3zf9ad+2zdUsCNWI6VH6kwGYHZAC5JhZz/
 zZqG8Z+oB/nt0ykMsmNHea6w60P3DDD5icAqY6J8nkIOozpxhm1anRGshYj88YwH
 CRwXZv1DgjwMgJyPMMyM8xWb1zEOsPDOu6HgzChnphLZTe+XL/prU5TZazG52aNU
 yEq5ooV7kT0ld5enClyY5v4voTlR+TAgJshpdiVV19peZXKtslQA+dfk
 =sGmZ
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-6.0-20220810' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
this is a pull request of 4 patches for net/master, with the
whitespace issue fixed.

Fedor Pchelkin contributes 2 fixes for the j1939 CAN protocol.

A patch by me for the ems_usb driver fixes an unaligned access
warning.

Sebastian Würl's patch for the mcp251x driver fixes a race condition
in the receive interrupt.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-10 09:19:50 +01:00
Matthias May
ab7e2e0dfa ipv6: do not use RT_TOS for IPv6 flowlabel
According to Guillaume Nault RT_TOS should never be used for IPv6.

Quote:
RT_TOS() is an old macro used to interprete IPv4 TOS as described in
the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
code, although, given the current state of the code, most of the
existing calls have no consequence.

But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
field to be interpreted the RFC 1349 way. There's no historical
compatibility to worry about.

Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Matthias May <matthias.may@westermo.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-09 22:19:21 -07:00
Jakub Kicinski
690bf64395 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Harden set element field checks to avoid out-of-bound memory access,
   this patch also fixes the type of issue described in 7e6bc1f6cabc
   ("netfilter: nf_tables: stricter validation of element data") in a
   broader way.

2) Patches to restrict the chain, set, and rule id lookup in the
   transaction to the corresponding top-level table, patches from
   Thadeu Lima de Souza Cascardo.

3) Fix incorrect comment in ip6t_LOG.h

4) nft_data_init() performs upfront validation of the expected data.
   struct nft_data_desc is used to describe the expected data to be
   received from userspace. The .size field represents the maximum size
   that can be stored, for bound checks. Then, .len is an input/output field
   which stores the expected length as input (this is optional, to restrict
   the checks), as output it stores the real length received from userspace
   (if it was not specified as input). This patch comes in response to
   7e6bc1f6cabc ("netfilter: nf_tables: stricter validation of element data")
   to address this type of issue in a more generic way by avoid opencoded
   data validation. Next patch requires this as a dependency.

5) Disallow jump to implicit chain from set element, this configuration
   is invalid. Only allow jump to chain via immediate expression is
   supported at this stage.

6) Fix possible null-pointer derefence in the error path of table updates,
   if memory allocation of the transaction fails. From Florian Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix null deref due to zeroed list head
  netfilter: nf_tables: disallow jump to implicit chain from set element
  netfilter: nf_tables: upfront validation of data via nft_data_init()
  netfilter: ip6t_LOG: Fix a typo in a comment
  netfilter: nf_tables: do not allow RULE_ID to refer to another chain
  netfilter: nf_tables: do not allow CHAIN_ID to refer to another table
  netfilter: nf_tables: do not allow SET_ID to refer to another table
  netfilter: nf_tables: validate variable length element extension
====================

Link: https://lore.kernel.org/r/20220809220532.130240-1-pablo@netfilter.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-09 21:28:21 -07:00
Kumar Kartikeya Dwivedi
1f0752628e bpf: Allow calling bpf_prog_test kfuncs in tracing programs
In addition to TC hook, enable these in tracing programs so that they
can be used in selftests.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220809213033.24147-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-09 18:46:11 -07:00
Linus Torvalds
e394ff83bb NFSD 6.0 Release Notes
Work on "courteous server", which was introduced in 5.19, continues
 apace. This release introduces a more flexible limit on the number
 of NFSv4 clients that NFSD allows, now that NFSv4 clients can remain
 in courtesy state long after the lease expiration timeout. The
 client limit is adjusted based on the physical memory size of the
 server.
 
 The NFSD filecache is a cache of files held open by NFSv4 clients or
 recently touched by NFSv2 or NFSv3 clients. This cache had some
 significant scalability constraints that have been relieved in this
 release. Thanks to all who contributed to this work.
 
 A data corruption bug found during the most recent NFS bake-a-thon
 that involves NFSv3 and NFSv4 clients writing the same file has been
 addressed in this release.
 
 This release includes several improvements in CPU scalability for
 NFSv4 operations. In addition, Neil Brown provided patches that
 simplify locking during file lookup, creation, rename, and removal
 that enables subsequent work on making these operations more
 scalable. We expect to see that work materialize in the next
 release.
 
 There are also numerous single-patch fixes, clean-ups, and the
 usual improvements in observability.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLujF0ACgkQM2qzM29m
 f5c14w/+Lsoryo5vdTXMAZBXBNvVdXQmHqLIEotJEVA3sECr+Kad2bF8rFaCWVzS
 Gf9KhTetmcDO9O73I5I/UtJ2qFT9B4I6baSGpOzIkjsM/aKeEEQpbpdzPYhKrCEv
 bQu3P54js7snH4YV8s0I39nBFOWdnahYXaw7peqE/2GOHxaR2mz88bkrQ+OybCxz
 KETqUxA6bKzkOT61S0nHcnQKd8HQzhocMDtrxtANHGsMM167ngI1dw4tUQAtfAUI
 s9R+GS6qwiKgwGz1oqhTR6LA/h4DROxPnc7AieuD9FvuAnR3kXw61bGMN5Biwv2T
 JZUTBbQvWhNasSV+7qOY9nBu+sHVC6Q7OZ5C9F/KjMyqCioDX0DnbxX9uKP20CDd
 EAAMS8n4Tdgd4KRBWdkLXPzizWYAjZQmFIJtcZne1JzGZ4IWRnikgM5qD6n1VviZ
 kcPRm5EN3DRHA+Hte4jG0EHIrE/7g5gnf+zr9dWl3uNhZtfTmumCfU16YYmKG8pP
 QN4kXBR2w7dAvp8nRaOsY6bBFLDAk/jHbpY8Q4xoUO4tsojfWayCTGVFOrecOjxv
 uSn0LhiidC5pLlkcPgwemhysVywDzr+gGXBRJXeUOHfdd05Q2gbFK8OpqDSvJ3dZ
 aC/RxFvHc8jaktUcuIjkE6Rsz6AVaAH3EZj84oMZ4hZhyGbEreg=
 =PEJ3
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Work on 'courteous server', which was introduced in 5.19, continues
  apace. This release introduces a more flexible limit on the number of
  NFSv4 clients that NFSD allows, now that NFSv4 clients can remain in
  courtesy state long after the lease expiration timeout. The client
  limit is adjusted based on the physical memory size of the server.

  The NFSD filecache is a cache of files held open by NFSv4 clients or
  recently touched by NFSv2 or NFSv3 clients. This cache had some
  significant scalability constraints that have been relieved in this
  release. Thanks to all who contributed to this work.

  A data corruption bug found during the most recent NFS bake-a-thon
  that involves NFSv3 and NFSv4 clients writing the same file has been
  addressed in this release.

  This release includes several improvements in CPU scalability for
  NFSv4 operations. In addition, Neil Brown provided patches that
  simplify locking during file lookup, creation, rename, and removal
  that enables subsequent work on making these operations more scalable.
  We expect to see that work materialize in the next release.

  There are also numerous single-patch fixes, clean-ups, and the usual
  improvements in observability"

* tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (78 commits)
  lockd: detect and reject lock arguments that overflow
  NFSD: discard fh_locked flag and fh_lock/fh_unlock
  NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
  NFSD: use explicit lock/unlock for directory ops
  NFSD: reduce locking in nfsd_lookup()
  NFSD: only call fh_unlock() once in nfsd_link()
  NFSD: always drop directory lock in nfsd_unlink()
  NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
  NFSD: add posix ACLs to struct nfsd_attrs
  NFSD: add security label to struct nfsd_attrs
  NFSD: set attributes when creating symlinks
  NFSD: introduce struct nfsd_attrs
  NFSD: verify the opened dentry after setting a delegation
  NFSD: drop fh argument from alloc_init_deleg
  NFSD: Move copy offload callback arguments into a separate structure
  NFSD: Add nfsd4_send_cb_offload()
  NFSD: Remove kmalloc from nfsd4_do_async_copy()
  NFSD: Refactor nfsd4_do_copy()
  NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
  NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
  ...
2022-08-09 14:56:49 -07:00
Jakub Kicinski
7ba0fa7f32 wireless fixes for v6.0
First set of fixes for v6.0. Small one this time, fix a cfg80211
 warning seen with brcmfmac and remove an unncessary inline keyword
 from wilc1000.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmLyj2ERHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZtxaggAptzh9NVi2qCWpCdwIjp+d6CusPoEA4NN
 eI7PSLecWPA5MVCR5YXSOboVDEtV/wGDOk/N1fKpKVXW02+7nvuLohx5tOclFpms
 CZtS2thpyEvUW6Zu+bE1Opwyx1v4e3nyznrNXMHW8tcnaVI3BNwYpdp7LRCylv07
 JQPNKZvxR5fs8NuIhf0O1TSjPaUSvRrMWfRn3ZioHWVa7+j8qMfnxWk+o6n38zP5
 fqbYlhLEBS3Nu9jp3e26KRMRrkAs/OTb/oRc/bPbU68V0VFPquP97Fz0vOobyjzO
 +B5+qAcaNpP6lSlAmrVyPxFEO1Y0utXblXblrWQsAqox7rt/PXQecg==
 =2QwM
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.0

First set of fixes for v6.0. Small one this time, fix a cfg80211
warning seen with brcmfmac and remove an unncessary inline keyword
from wilc1000.

* tag 'wireless-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: wilc1000: fix spurious inline in wilc_handle_disconnect()
  wifi: cfg80211: Fix validating BSS pointers in __cfg80211_connect_result
====================

Link: https://lore.kernel.org/r/20220809164756.B1DAEC433D6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-09 11:51:00 -07:00
Florian Westphal
580077855a netfilter: nf_tables: fix null deref due to zeroed list head
In nf_tables_updtable, if nf_tables_table_enable returns an error,
nft_trans_destroy is called to free the transaction object.

nft_trans_destroy() calls list_del(), but the transaction was never
placed on a list -- the list head is all zeroes, this results in
a null dereference:

BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59
Call Trace:
 nft_trans_destroy+0x26/0x59
 nf_tables_newtable+0x4bc/0x9bc
 [..]

Its sane to assume that nft_trans_destroy() can be called
on the transaction object returned by nft_trans_alloc(), so
make sure the list head is initialised.

Fixes: 55dd6f93076b ("netfilter: nf_tables: use new transaction infrastructure to handle table")
Reported-by: mingi cho <mgcho.minic@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 20:13:30 +02:00
Pablo Neira Ayuso
f323ef3a0d netfilter: nf_tables: disallow jump to implicit chain from set element
Extend struct nft_data_desc to add a flag field that specifies
nft_data_init() is being called for set element data.

Use it to disallow jump to implicit chain from set element, only jump
to chain via immediate expression is allowed.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 20:13:29 +02:00
Pablo Neira Ayuso
341b694160 netfilter: nf_tables: upfront validation of data via nft_data_init()
Instead of parsing the data and then validate that type and length are
correct, pass a description of the expected data so it can be validated
upfront before parsing it to bail out earlier.

This patch adds a new .size field to specify the maximum size of the
data area. The .len field is optional and it is used as an input/output
field, it provides the specific length of the expected data in the input
path. If then .len field is not specified, then obtained length from the
netlink attribute is stored. This is required by cmp, bitwise, range and
immediate, which provide no netlink attribute that describes the data
length. The immediate expression uses the destination register type to
infer the expected data type.

Relying on opencoded validation of the expected data might lead to
subtle bugs as described in 7e6bc1f6cabc ("netfilter: nf_tables:
stricter validation of element data").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 20:13:29 +02:00
Thadeu Lima de Souza Cascardo
36d5b29132 netfilter: nf_tables: do not allow RULE_ID to refer to another chain
When doing lookups for rules on the same batch by using its ID, a rule from
a different chain can be used. If a rule is added to a chain but tries to
be positioned next to a rule from a different chain, it will be linked to
chain2, but the use counter on chain1 would be the one to be incremented.

When looking for rules by ID, use the chain that was used for the lookup by
name. The chain used in the context copied to the transaction needs to
match that same chain. That way, struct nft_rule does not need to get
enlarged with another member.

Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Fixes: 75dd48e2e420 ("netfilter: nf_tables: Support RULE_ID reference in new rule")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:18 +02:00
Thadeu Lima de Souza Cascardo
95f466d223 netfilter: nf_tables: do not allow CHAIN_ID to refer to another table
When doing lookups for chains on the same batch by using its ID, a chain
from a different table can be used. If a rule is added to a table but
refers to a chain in a different table, it will be linked to the chain in
table2, but would have expressions referring to objects in table1.

Then, when table1 is removed, the rule will not be removed as its linked to
a chain in table2. When expressions in the rule are processed or removed,
that will lead to a use-after-free.

When looking for chains by ID, use the table that was used for the lookup
by name, and only return chains belonging to that same table.

Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:17 +02:00
Thadeu Lima de Souza Cascardo
470ee20e06 netfilter: nf_tables: do not allow SET_ID to refer to another table
When doing lookups for sets on the same batch by using its ID, a set from a
different table can be used.

Then, when the table is removed, a reference to the set may be kept after
the set is freed, leading to a potential use-after-free.

When looking for sets by ID, use the table that was used for the lookup by
name, and only return sets belonging to that same table.

This fixes CVE-2022-2586, also reported as ZDI-CAN-17470.

Reported-by: Team Orca of Sea Security (@seasecresponse)
Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:17 +02:00
Pablo Neira Ayuso
34aae2c2fb netfilter: nf_tables: validate variable length element extension
Update template to validate variable length extensions. This patch adds
a new .ext_len[id] field to the template to store the expected extension
length. This is used to sanity check the initialization of the variable
length extension.

Use PTR_ERR() in nft_set_elem_init() to report errors since, after this
update, there are two reason why this might fail, either because of
ENOMEM or insufficient room in the extension field (EINVAL).

Kernels up until 7e6bc1f6cabc ("netfilter: nf_tables: stricter
validation of element data") allowed to copy more data to the extension
than was allocated. This ext_len field allows to validate if the
destination has the correct size as additional check.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:16 +02:00
Fedor Pchelkin
8c21c54a53 can: j1939: j1939_session_destroy(): fix memory leak of skbs
We need to drop skb references taken in j1939_session_skb_queue() when
destroying a session in j1939_session_destroy(). Otherwise those skbs
would be lost.

Link to Syzkaller info and repro: https://forge.ispras.ru/issues/11743.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

V1: https://lore.kernel.org/all/20220708175949.539064-1-pchelkin@ispras.ru

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Suggested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20220805150216.66313-1-pchelkin@ispras.ru
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-08-09 09:05:06 +02:00
Fedor Pchelkin
8ef49f7f82 can: j1939: j1939_sk_queue_activate_next_locked(): replace WARN_ON_ONCE with netdev_warn_once()
We should warn user-space that it is doing something wrong when trying
to activate sessions with identical parameters but WARN_ON_ONCE macro
can not be used here as it serves a different purpose.

So it would be good to replace it with netdev_warn_once() message.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20220729143655.1108297-1-pchelkin@ispras.ru
[mkl: fix indention]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-08-09 09:05:06 +02:00
Jakub Kicinski
b8c3bf0ed2 bluetooth pull request for net:
- Fixes various issues related to ISO channel/socket support
  - Fixes issues when building with C=1
  - Fix cancel uninitilized work which blocks syzbot to run
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLxpdoZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKbjqEACZiIUKIACTYWa8Os0fTuzu
 LM/h4aOnh3L+W2KyA9Kh4Hmm7Caf9JtUZrTIMhMigGiTTN91eLPoScu6ATm7q0vY
 7JgfRKbsLCjhUV8uQfypDBM0uQq7exbEiwd1KTHo8XfOgiheZL6ergN4r2g+V/gt
 up0a58j4ukc6PhWAxujc3UzvMj2c1Sb5jY6TIuyiQM7RONtWLH9VDzc0InRNGqpa
 eEpPDqCuXsgDTKDAvcJoWARwnj6nsODN3QaSWVlwgN1JgE0/OjXI9hoUNQ83ueCH
 pl6qigJIuCnGq4ZDbdDE+QcK5I2ouoGoJ9rQMLuUFdupmaBtTEdMK7pw7opzYt3c
 HqW/TvIR8t2LG0oFmrvFSKH+OMHkIH7D7zaCHGYx5T7B778x5fnUK4OfhvnJ4NPu
 HkKYD5BJv92X7cHacgclJwQdwwbParrr7wPbqGiSRgiw2ec2puC1VQSYj/+4nwV5
 De3AJ2OORv+2kcIw+zi3T0wGzddQF07gXXpz7ckOnFQ1A5jiYX5yGrfGlJezvblX
 LnXikwvPkkl640ZRrSZvGQBPNySKv8N2yuE/FtbkKfNjoumAkC67PA+4NYOLc9g9
 gkgPnR6y4Cm+h3yLILV3njcYbif+5Ue0KHx8L3rr523bZ9C7vUKdcE1i+Tkr9B0y
 I6rMyxtfkUFwVRRFKwvguw==
 =nFPi
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2022-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fixes various issues related to ISO channel/socket support
 - Fixes issues when building with C=1
 - Fix cancel uninitilized work which blocks syzbot to run

* tag 'for-net-2022-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: ISO: Fix not using the correct QoS
  Bluetooth: don't try to cancel uninitialized works at mgmt_index_removed()
  Bluetooth: ISO: Fix iso_sock_getsockopt for BT_DEFER_SETUP
  Bluetooth: MGMT: Fixes build warnings with C=1
  Bluetooth: hci_event: Fix build warning with C=1
  Bluetooth: ISO: Fix memory corruption
  Bluetooth: Fix null pointer deref on unexpected status event
  Bluetooth: ISO: Fix info leak in iso_sock_getsockopt()
  Bluetooth: hci_conn: Fix updating ISO QoS PHY
  Bluetooth: ISO: unlock on error path in iso_sock_setsockopt()
  Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression
====================

Link: https://lore.kernel.org/r/20220809001224.412807-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08 20:59:07 -07:00
Martin Schiller
944e594cfa net/x25: fix call timeouts in blocking connects
When a userspace application starts a blocking connect(), a CALL REQUEST
is sent, the t21 timer is started and the connect is waiting in
x25_wait_for_connection_establishment(). If then for some reason the t21
timer expires before any reaction on the assigned logical channel (e.g.
CALL ACCEPT, CLEAR REQUEST), there is sent a CLEAR REQUEST and timer
t23 is started waiting for a CLEAR confirmation. If we now receive a
CLEAR CONFIRMATION from the peer, x25_disconnect() is called in
x25_state2_machine() with reason "0", which means "normal" call
clearing. This is ok, but the parameter "reason" is used as sk->sk_err
in x25_disconnect() and sock_error(sk) is evaluated in
x25_wait_for_connection_establishment() to check if the call is still
pending. As "0" is not rated as an error, the connect will stuck here
forever.

To fix this situation, also check if the sk->sk_state changed form
TCP_SYN_SENT to TCP_CLOSE in the meantime, which is also done by
x25_disconnect().

Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20220805061810.10824-1-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-08 20:48:51 -07:00
Linus Torvalds
f30adc0d33 iov_iter stuff, part 2, rebased
* more new_sync_{read,write}() speedups - ITER_UBUF introduction
 * ITER_PIPE cleanups
 * unification of iov_iter_get_pages/iov_iter_get_pages_alloc and
   switching them to advancing semantics
 * making ITER_PIPE take high-order pages without splitting them
 * handling copy_page_from_iter() for high-order pages properly
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYvHI8QAKCRBZ7Krx/gZQ
 62CQAPsGlbebqBeAT2pMulaGDxfLAsgz5Yf4BEaMLhPtRqFOQgD+KrZQId7Sd8O0
 3IWucpTb2c4jvLlXhGMS+XWnusQH+AQ=
 =pBux
 -----END PGP SIGNATURE-----

Merge tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull more iov_iter updates from Al Viro:

 - more new_sync_{read,write}() speedups - ITER_UBUF introduction

 - ITER_PIPE cleanups

 - unification of iov_iter_get_pages/iov_iter_get_pages_alloc and
   switching them to advancing semantics

 - making ITER_PIPE take high-order pages without splitting them

 - handling copy_page_from_iter() for high-order pages properly

* tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits)
  fix copy_page_from_iter() for compound destinations
  hugetlbfs: copy_page_to_iter() can deal with compound pages
  copy_page_to_iter(): don't split high-order page in case of ITER_PIPE
  expand those iov_iter_advance()...
  pipe_get_pages(): switch to append_pipe()
  get rid of non-advancing variants
  ceph: switch the last caller of iov_iter_get_pages_alloc()
  9p: convert to advancing variant of iov_iter_get_pages_alloc()
  af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages()
  iter_to_pipe(): switch to advancing variant of iov_iter_get_pages()
  block: convert to advancing variants of iov_iter_get_pages{,_alloc}()
  iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
  iov_iter: saner helper for page array allocation
  fold __pipe_get_pages() into pipe_get_pages()
  ITER_XARRAY: don't open-code DIV_ROUND_UP()
  unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts
  unify xarray_get_pages() and xarray_get_pages_alloc()
  unify pipe_get_pages() and pipe_get_pages_alloc()
  iov_iter_get_pages(): sanity-check arguments
  iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper
  ...
2022-08-08 20:04:35 -07:00
Al Viro
7f02464739 9p: convert to advancing variant of iov_iter_get_pages_alloc()
that one is somewhat clumsier than usual and needs serious testing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08 22:37:23 -04:00
Al Viro
1ef255e257 iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.

Provide inline wrappers doing that, convert trivial open-coded
uses of those.

BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08 22:37:22 -04:00
Luiz Augusto von Dentz
1d1ab5d39b Bluetooth: ISO: Fix not using the correct QoS
This fixes using wrong QoS settings when attempting to send frames while
acting as peripheral since the QoS settings in use are stored in
hconn->iso_qos not in sk->qos, this is actually properly handled on
getsockopt(BT_ISO_QOS) but not on iso_send_frame.

Fixes: ccf74f2390d60 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-08 17:06:36 -07:00
Tetsuo Handa
3f2893d3c1 Bluetooth: don't try to cancel uninitialized works at mgmt_index_removed()
syzbot is reporting attempt to cancel uninitialized work at
mgmt_index_removed() [1], for calling cancel_delayed_work_sync() without
INIT_DELAYED_WORK() is not permitted.

INIT_DELAYED_WORK() is called from mgmt_init_hdev() via chan->hdev_init()
 from hci_mgmt_cmd(), but cancel_delayed_work_sync() is unconditionally
called from mgmt_index_removed().

Call cancel_delayed_work_sync() only if HCI_MGMT flag was set, for
mgmt_init_hdev() sets HCI_MGMT flag when calling INIT_DELAYED_WORK().

Link: https://syzkaller.appspot.com/bug?extid=b8ddd338a8838e581b1c [1]
Reported-by: syzbot <syzbot+b8ddd338a8838e581b1c@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 0ef08313cefdd60d ("Bluetooth: Convert delayed discov_off to hci_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-08 17:06:23 -07:00
Luiz Augusto von Dentz
9dfe1727b2 Bluetooth: ISO: Fix iso_sock_getsockopt for BT_DEFER_SETUP
BT_DEFER_SETUP shall be considered valid for all states except for
BT_CONNECTED as it is also used when initiated a connection rather then
only for BT_BOUND and BT_LISTEN.

Fixes: ccf74f2390d60 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-08 17:06:10 -07:00
Luiz Augusto von Dentz
0c7937587d Bluetooth: MGMT: Fixes build warnings with C=1
This fixes the following warning when building with make C=1:

net/bluetooth/mgmt.c:3821:29: warning: restricted __le16 degrades to integer
net/bluetooth/mgmt.c:4625:9: warning: cast to restricted __le32

Fixes: 600a87490ff98 ("Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS.")
Fixes: 4c54bf2b093bb ("Bluetooth: Add get/set device flags mgmt op")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-08 17:05:45 -07:00
Luiz Augusto von Dentz
889f0346d4 Bluetooth: hci_event: Fix build warning with C=1
This fixes the following warning when build with make C=1:

net/bluetooth/hci_event.c:337:15: warning: restricted __le16 degrades to integer

Fixes: a93661203641e ("Bluetooth: Process result of HCI Delete Stored Link Key command")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-08 17:05:12 -07:00
Luiz Augusto von Dentz
b444342327 Bluetooth: ISO: Fix memory corruption
The following memory corruption can happen since iso_pinfo.base size
did not account for its headers (4 bytes):

net/bluetooth/eir.c
    76          memcpy(&eir[eir_len], data, data_len);
                            ^^^^^^^         ^^^^^^^^
    77          eir_len += data_len;
    78
    79          return eir_len;
    80  }

The "eir" buffer has 252 bytes and data_len is 252 but we do a memcpy()
to &eir[4] so this can corrupt 4 bytes beyond the end of the buffer.

Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2022-08-08 17:04:51 -07:00
Soenke Huster
ce78e557ff Bluetooth: Fix null pointer deref on unexpected status event
__hci_cmd_sync returns NULL if the controller responds with a status
event. This is unexpected for the commands sent here, but on
occurrence leads to null pointer dereferences and thus must be
handled.

Signed-off-by: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-08 17:04:37 -07:00
Luiz Augusto von Dentz
0eee4995f4 Bluetooth: ISO: Fix info leak in iso_sock_getsockopt()
The C standard rules for when struct holes are zeroed out are slightly
weird.  The existing assignments might initialize everything, but GCC
is allowed to (and does sometimes) leave the struct holes uninitialized,
so instead of using yet another variable and copy the QoS settings just
use a pointer to the stored QoS settings.

Fixes: ccf74f2390d60 ("Bluetooth: Add BTPROTO_ISO socket type")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-08-08 17:04:24 -07:00