72722 Commits

Author SHA1 Message Date
Eric Dumazet
4b397c06cb net: tunnels: annotate lockless accesses to dev->needed_headroom
IP tunnels can apparently update dev->needed_headroom
in their xmit path.

This patch takes care of three tunnels xmit, and also the
core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA()
helpers.

More changes might be needed for completeness.

BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit

read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1:
ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:444 [inline]
ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246

write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0:
ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4881 [inline]
netdev_start_xmit include/linux/netdevice.h:4895 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
__dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3051 [inline]
neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
neigh_output include/net/neighbour.h:546 [inline]
ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134
__ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:444 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820
mld_send_cr net/ipv6/mcast.c:2121 [inline]
mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653
process_one_work+0x3e6/0x750 kernel/workqueue.c:2390
worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537
kthread+0x1ac/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

value changed: 0x0dd4 -> 0x0e14

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5fa354-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Workqueue: mld mld_ifc_work

Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230310191109.2384387-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 00:04:04 -07:00
Kristian Overskeid
28e8cabe80 net: hsr: Don't log netdev_err message on unknown prp dst node
If no frames has been exchanged with a node for HSR_NODE_FORGET_TIME, the
node will be deleted from the node_db list. If a frame is sent to the node
after it is deleted, a netdev_err message for each slave interface is
produced. This should not happen with dan nodes because of supervision
frames, but can happen often with san nodes, which clutters the kernel
log. Since the hsr protocol does not support sans, this is only relevant
for the prp protocol.

Signed-off-by: Kristian Overskeid <koverskeid@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-14 15:26:10 +00:00
Kristian Overskeid
4821c186b9 net: hsr: Don't log netdev_err message on unknown prp dst node
If no frames has been exchanged with a node for HSR_NODE_FORGET_TIME, the
node will be deleted from the node_db list. If a frame is sent to the node
after it is deleted, a netdev_err message for each slave interface is
produced. This should not happen with dan nodes because of supervision
frames, but can happen often with san nodes, which clutters the kernel
log. Since the hsr protocol does not support sans, this is only relevant
for the prp protocol.

Signed-off-by: Kristian Overskeid <koverskeid@gmail.com>
Link: https://lore.kernel.org/r/20230309092302.179586-1-koverskeid@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13 16:15:58 -07:00
D. Wythe
22a825c541 net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()
When performing a stress test on SMC-R by rmmod mlx5_ib driver
during the wrk/nginx test, we found that there is a probability
of triggering a panic while terminating all link groups.

This issue dues to the race between smc_smcr_terminate_all()
and smc_buf_create().

			smc_smcr_terminate_all

smc_buf_create
/* init */
conn->sndbuf_desc = NULL;
...

			__smc_lgr_terminate
				smc_conn_kill
					smc_close_abort
						smc_cdc_get_slot_and_msg_send

			__softirqentry_text_start
				smc_wr_tx_process_cqe
					smc_cdc_tx_handler
						READ(conn->sndbuf_desc->len);
						/* panic dues to NULL sndbuf_desc */

conn->sndbuf_desc = xxx;

This patch tries to fix the issue by always to check the sndbuf_desc
before send any cdc msg, to make sure that no null pointer is
seen during cqe processing.

Fixes: 0b29ec643613 ("net/smc: immediate termination for SMCR link groups")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/1678263432-17329-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13 16:03:58 -07:00
Vincenzo Palazzo
ad4bf5f240 net: socket: suppress unused warning
suppress unused warnings and fix the error that there is
with the W=1 enabled.

Warning generated

net/socket.c: In function ‘__sys_getsockopt’:
net/socket.c:2300:13: error: variable ‘max_optlen’ set but not used [-Werror=unused-but-set-variable]
 2300 |         int max_optlen;

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230310221851.304657-1-vincenzopalazzodev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-13 15:59:20 -07:00
Hector Martin
79d1ed5ca7 wifi: cfg80211: Partial revert "wifi: cfg80211: Fix use after free for wext"
This reverts part of commit 015b8cc5e7c4 ("wifi: cfg80211: Fix use after
free for wext")

This commit broke WPA offload by unconditionally clearing the crypto
modes for non-WEP connections. Drop that part of the patch.

Signed-off-by: Hector Martin <marcan@marcan.st>
Reported-by: Ilya <me@0upti.me>
Reported-and-tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Fixes: 015b8cc5e7c4 ("wifi: cfg80211: Fix use after free for wext")
Cc: stable@kernel.org
Link: https://lore.kernel.org/linux-wireless/ZAx0TWRBlGfv7pNl@kroah.com/T/#m11e6e0915ab8fa19ce8bc9695ab288c0fe018edf
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-12 16:21:59 -07:00
Jakub Kicinski
064d70527a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) nft_parse_register_load() gets an incorrect datatype size
   as input, from Jeremy Sowden.

2) incorrect maximum netlink attribute in nft_redir, also
   from Jeremy.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_redir: correct value of inet type `.maxattrs`
  netfilter: nft_redir: correct length for loading protocol registers
  netfilter: nft_masq: correct length for loading protocol registers
  netfilter: nft_nat: correct length for loading protocol registers
====================

Link: https://lore.kernel.org/r/20230309174655.69816-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:45:03 -08:00
Paolo Abeni
cee4034a3d mptcp: fix lockdep false positive in mptcp_pm_nl_create_listen_socket()
Christoph reports a lockdep splat in the mptcp_subflow_create_socket()
error path, when such function is invoked by
mptcp_pm_nl_create_listen_socket().

Such code path acquires two separates, nested socket lock, with the
internal lock operation lacking the "nested" annotation. Adding that
in sock_release() for mptcp's sake only could be confusing.

Instead just add a new lockclass to the in-kernel msk socket,
re-initializing the lockdep infra after the socket creation.

Fixes: ad2171009d96 ("mptcp: fix locking for in-kernel listener creation")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/354
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:42:56 -08:00
Matthieu Baerts
3ba1452868 mptcp: avoid setting TCP_CLOSE state twice
tcp_set_state() is called from tcp_done() already.

There is then no need to first set the state to TCP_CLOSE, then call
tcp_done().

Fixes: d582484726c4 ("mptcp: fix fallback for MP_JOIN subflows")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/362
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:42:56 -08:00
Geliang Tang
822467a48e mptcp: add ro_after_init for tcp{,v6}_prot_override
Add __ro_after_init labels for the variables tcp_prot_override and
tcpv6_prot_override, just like other variables adjacent to them, to
indicate that they are initialised from the init hooks and no writes
occur afterwards.

Fixes: b19bc2945b40 ("mptcp: implement delegated actions")
Cc: stable@vger.kernel.org
Fixes: 51fa7f8ebf0e ("mptcp: mark ops structures as ro_after_init")
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:42:56 -08:00
Paolo Abeni
0a3f4f1f9c mptcp: fix UaF in listener shutdown
As reported by Christoph after having refactored the passive
socket initialization, the mptcp listener shutdown path is prone
to an UaF issue.

  BUG: KASAN: use-after-free in _raw_spin_lock_bh+0x73/0xe0
  Write of size 4 at addr ffff88810cb23098 by task syz-executor731/1266

  CPU: 1 PID: 1266 Comm: syz-executor731 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6e/0x91
   print_report+0x16a/0x46f
   kasan_report+0xad/0x130
   kasan_check_range+0x14a/0x1a0
   _raw_spin_lock_bh+0x73/0xe0
   subflow_error_report+0x6d/0x110
   sk_error_report+0x3b/0x190
   tcp_disconnect+0x138c/0x1aa0
   inet_child_forget+0x6f/0x2e0
   inet_csk_listen_stop+0x209/0x1060
   __mptcp_close_ssk+0x52d/0x610
   mptcp_destroy_common+0x165/0x640
   mptcp_destroy+0x13/0x80
   __mptcp_destroy_sock+0xe7/0x270
   __mptcp_close+0x70e/0x9b0
   mptcp_close+0x2b/0x150
   inet_release+0xe9/0x1f0
   __sock_release+0xd2/0x280
   sock_close+0x15/0x20
   __fput+0x252/0xa20
   task_work_run+0x169/0x250
   exit_to_user_mode_prepare+0x113/0x120
   syscall_exit_to_user_mode+0x1d/0x40
   do_syscall_64+0x48/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

The msk grace period can legitly expire in between the last
reference count dropped in mptcp_subflow_queue_clean() and
the later eventual access in inet_csk_listen_stop()

After the previous patch we don't need anymore special-casing
msk listener socket cleanup: the mptcp worker will process each
of the unaccepted msk sockets.

Just drop the now unnecessary code.

Please note this commit depends on the two parent ones:

  mptcp: refactor passive socket initialization
  mptcp: use the workqueue to destroy unaccepted sockets

Fixes: 6aeed9045071 ("mptcp: fix race on unaccepted mptcp sockets")
Cc: stable@vger.kernel.org
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/346
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:42:56 -08:00
Paolo Abeni
b6985b9b82 mptcp: use the workqueue to destroy unaccepted sockets
Christoph reported a UaF at token lookup time after having
refactored the passive socket initialization part:

  BUG: KASAN: use-after-free in __token_bucket_busy+0x253/0x260
  Read of size 4 at addr ffff88810698d5b0 by task syz-executor653/3198

  CPU: 1 PID: 3198 Comm: syz-executor653 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6e/0x91
   print_report+0x16a/0x46f
   kasan_report+0xad/0x130
   __token_bucket_busy+0x253/0x260
   mptcp_token_new_connect+0x13d/0x490
   mptcp_connect+0x4ed/0x860
   __inet_stream_connect+0x80e/0xd90
   tcp_sendmsg_fastopen+0x3ce/0x710
   mptcp_sendmsg+0xff1/0x1a20
   inet_sendmsg+0x11d/0x140
   __sys_sendto+0x405/0x490
   __x64_sys_sendto+0xdc/0x1b0
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc

We need to properly clean-up all the paired MPTCP-level
resources and be sure to release the msk last, even when
the unaccepted subflow is destroyed by the TCP internals
via inet_child_forget().

We can re-use the existing MPTCP_WORK_CLOSE_SUBFLOW infra,
explicitly checking that for the critical scenario: the
closed subflow is the MPC one, the msk is not accepted and
eventually going through full cleanup.

With such change, __mptcp_destroy_sock() is always called
on msk sockets, even on accepted ones. We don't need anymore
to transiently drop one sk reference at msk clone time.

Please note this commit depends on the parent one:

  mptcp: refactor passive socket initialization

Fixes: 58b09919626b ("mptcp: create msk early")
Cc: stable@vger.kernel.org
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/347
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:42:56 -08:00
Paolo Abeni
3a236aef28 mptcp: refactor passive socket initialization
After commit 30e51b923e43 ("mptcp: fix unreleased socket in accept queue")
unaccepted msk sockets go throu complete shutdown, we don't need anymore
to delay inserting the first subflow into the subflow lists.

The reference counting deserve some extra care, as __mptcp_close() is
unaware of the request socket linkage to the first subflow.

Please note that this is more a refactoring than a fix but because this
modification is needed to include other corrections, see the following
commits. Then a Fixes tag has been added here to help the stable team.

Fixes: 30e51b923e43 ("mptcp: fix unreleased socket in accept queue")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:42:56 -08:00
Paolo Abeni
b7a679ba7c mptcp: fix possible deadlock in subflow_error_report
Christoph reported a possible deadlock while the TCP stack
destroys an unaccepted subflow due to an incoming reset: the
MPTCP socket error path tries to acquire the msk-level socket
lock while TCP still owns the listener socket accept queue
spinlock, and the reverse dependency already exists in the
TCP stack.

Note that the above is actually a lockdep false positive, as
the chain involves two separate sockets. A different per-socket
lockdep key will address the issue, but such a change will be
quite invasive.

Instead, we can simply stop earlier the socket error handling
for orphaned or unaccepted subflows, breaking the critical
lockdep chain. Error handling in such a scenario is a no-op.

Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Fixes: 15cc10453398 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/355
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:42:56 -08:00
Lorenzo Bianconi
f85949f982 xdp: add xdp_set_features_flag utility routine
Introduce xdp_set_features_flag utility routine in order to update
dynamically xdp_features according to the dynamic hw configuration via
ethtool (e.g. changing number of hw rx/tx queues).
Add xdp_clear_features_flag() in order to clear all xdp_feature flag.

Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 21:33:47 -08:00
Jakub Kicinski
2af560e5a5 wireless-next patches for 6.4
Major changes:
 
 cfg80211
  * 6 GHz improvements
  * HW timestamping support
  * support for randomized auth/deauth TA for PASN privacy
    (also for mac80211)
 
 mac80211
  * radiotap TLV and EHT support for the iwlwifi sniffer
  * HW timestamping support
  * per-link debugfs for multi-link
 
 brcmfmac
  * support for Apple (M1 Pro/Max) devices
 
 iwlwifi
  * support for a few new devices
  * EHT sniffer support
 
 rtw88
  * better support for some SDIO devices
    (e.g. MAC address from efuse)
 
 rtw89
  * HW scan support for 8852b
  * better support for 6 GHz scanning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmQLG7oACgkQ10qiO8sP
 aAAqsBAAmiGDd9CDToWmXiyf3Qwv7eazNtqqz5LUKbSnmAXyyDDpHR28A8DlsEU0
 OTlrQK0yQr3R4sUhLVY+tL49gZvgqp0B9pj7NqnK3HIOniFdfpZdTXdFNdx3KuEs
 k1c/D+wGAIhVHP8csIKhh49KpGFa4U2YMXCUx4mNrUQKGhO+b/XbiMKS8jmoQAF9
 1GQk/KcGblBmuKFWE1euSScyBh2CvvcJq+F9f25eiX/+OgVXNYWWfAHfr3jjX7RG
 lSdlMvKvF4LImCW0vnhmFUeSSl+GpszNXwaqIh6OGLrw2CBZDcujGTT88cOw3A1t
 JFTB3EgeuFL3fDHQogWFxQ/RhfZB/tHeBYEQSx+1WizHKtJEMY2pvjuj9rZ7TEWy
 HHFfP1OaU1tbqMIkkhchAhOyciVJpk77Cl6yO0TcpRLt9VR1LvACMAd3/3eqfQGf
 gJxS8x7qWHp3pSWTNa3X8Q2NPatstKz/LUO0JMQP690/sRwdta9RwR11CYm0/Mdb
 22TwOSa6lfAlOF/l/YkEy39U8J1XNxPschkdAz4h001N88h8dAf6UZieDqqPYZFK
 4sYsGLqtACEU5W/Y5PG7v3/ZZ+QtEtfWU057tEtvHkVws2BKvRRQ3c0DBG5QTVG0
 4k9QU6mxKxL/R2Nkf6+rB8lvQkHEyXp6voF3LmqRlB/gnh1FKY0=
 =sPir
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2023-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
wireless-next patches for 6.4

Major changes:

cfg80211
 * 6 GHz improvements
 * HW timestamping support
 * support for randomized auth/deauth TA for PASN privacy
   (also for mac80211)

mac80211
 * radiotap TLV and EHT support for the iwlwifi sniffer
 * HW timestamping support
 * per-link debugfs for multi-link

brcmfmac
 * support for Apple (M1 Pro/Max) devices

iwlwifi
 * support for a few new devices
 * EHT sniffer support

rtw88
 * better support for some SDIO devices
   (e.g. MAC address from efuse)

rtw89
 * HW scan support for 8852b
 * better support for 6 GHz scanning

* tag 'wireless-next-2023-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (84 commits)
  wifi: iwlwifi: mvm: fix EOF bit reporting
  wifi: iwlwifi: Do not include radiotap EHT user info if not needed
  wifi: iwlwifi: mvm: add EHT RU allocation to radiotap
  wifi: iwlwifi: Update logs for yoyo reset sw changes
  wifi: iwlwifi: mvm: clean up duplicated defines
  wifi: iwlwifi: rs-fw: break out for unsupported bandwidth
  wifi: iwlwifi: Add support for B step of BnJ-Fm4
  wifi: iwlwifi: mvm: make flush code a bit clearer
  wifi: iwlwifi: mvm: avoid UB shift of snif_queue
  wifi: iwlwifi: mvm: add primary 80 known for EHT radiotap
  wifi: iwlwifi: mvm: parse FW frame metadata for EHT sniffer mode
  wifi: iwlwifi: mvm: decode USIG_B1_B7 RU to nl80211 RU width
  wifi: iwlwifi: mvm: rename define to generic name
  wifi: iwlwifi: mvm: allow Microsoft to use TAS
  wifi: iwlwifi: mvm: add all EHT based on data0 info from HW
  wifi: iwlwifi: mvm: add EHT radiotap info based on rate_n_flags
  wifi: iwlwifi: mvm: add an helper function radiotap TLVs
  wifi: radiotap: separate vendor TLV into header/content
  wifi: iwlwifi: reduce verbosity of some logging events
  wifi: iwlwifi: Adding the code to get RF name for MsP device
  ...
====================

Link: https://lore.kernel.org/r/20230310120159.36518-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 18:22:29 -08:00
Jakub Kicinski
27c30b9b44 Just a few fixes:
* MLO connection socket ownership didn't work
  * basic rates validation was missing (reported by
    by a private syzbot instances)
  * puncturing bitmap netlink policy was completely broken
  * properly check chandef for NULL channel, it can be
    pointing to a chandef that's still uninitialized
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmQLGEgACgkQ10qiO8sP
 aADrxBAAn7viIvzUegAZFsqgAfRsKGmGmwOOqg5Vph5oDRnxjfXTUw6hUf+CfFJp
 c4baHN0GaiJUNDCUM7KFWOicDDaqFZN8WX+t23mUaweXWHoqcH1S9IjiJCl/XtWu
 saI16+b062QCIQq3jZ3LGvpgZJXjBZbNDd8VW3eWK3nLdBzfHrDAWqx+TfY6dHHj
 XG9v3G7Q/IdT04HTNCLznrmptSJHy0FDekjtypK8uOrcUElWnmUf5SXzDXyva4Dl
 evU7xx5RY0tavqL2xQueOgtEgEBJQZWeQDrkZ4o/HDprsT9n6EObLDnVqJG2E+uO
 yqSxOR6hpkZfEjzyhmRJu1B4KzNWoxU2rzhNljsjxXNZEXDRTQJ803gSFlPaZ8Iq
 pXFKBIvuzY+7MIGrDZOQAqAnLYtrfVo7XJbhXUDYm5vmBn0ZwHgnQD0Z6X7E5rpC
 ukbBkNZ0NztZs4gUdYPzd/Uu/YxECMiLTlNzBcc29demR2peRBWH9preZH+NMGAq
 Dsq7WJZWVE7apKoyLJ9Fgi+F3h1clRqTns1Fy2dE4Fty6xyUEPGzZ9146ob39iLx
 ByDZp1MTCkZPfJokzDguupYeOUQavMfLgJqf2upvyBTD3wfWdDReRVDgEExwEvv7
 J+Lqp/7qGzynH3UpNtUoc7nQEevVIOB6oqbtQigXE+cvIbiy4sY=
 =HigV
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2023-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Just a few fixes:

 * MLO connection socket ownership didn't work
 * basic rates validation was missing (reported by
   by a private syzbot instances)
 * puncturing bitmap netlink policy was completely broken
 * properly check chandef for NULL channel, it can be
   pointing to a chandef that's still uninitialized

* tag 'wireless-2023-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: cfg80211: fix MLO connection ownership
  wifi: mac80211: check basic rates validity
  wifi: nl80211: fix puncturing bitmap policy
  wifi: nl80211: fix NULL-ptr deref in offchan check
====================

Link: https://lore.kernel.org/r/20230310114647.35422-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 18:19:55 -08:00
Gal Pressman
3c6401266f skbuff: Add likely to skb pointer in build_skb()
Similarly to napi_build_skb(), it is likely the skb allocation in
build_skb() succeeded. frag_size != 0 is also likely, as stated in
__build_skb_around().

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 16:49:21 -08:00
Gal Pressman
566b6701d5 skbuff: Replace open-coded skb_propagate_pfmemalloc()s
Use skb_propagate_pfmemalloc() in build_skb()/build_skb_around() instead
of open-coding it.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-10 16:49:21 -08:00
Linus Torvalds
92cadfcffa nfsd-6.3 fixes:
- Protect NFSD writes against filesystem freezing
 - Fix a potential memory leak during server shutdown
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmQLPvQACgkQM2qzM29m
 f5crgRAAiuZvYynBppTxeEZ9ITLRJCgeYhTCJDBFngh3NZtW0UxrTUfYIry/MJTm
 j520mvdzx8+OVyasUJCMoy+dY/1toeCaCspfEdyul47JRVrxuzJokbYzyV5fFZav
 q3vDNHdBXOkrJkXvJd+VtbdZa2iYBMIQ1pTc6vG2FbUaxfNqZyg7uKug/R2Q/Ipj
 tF587jXQuMvpPwlXOeFol3fhLQCO9KjTrZF/d4+m5+KTSQ4sgzrgoWuVD/39Is4Q
 bVx5hBX74PFNHaFMqviIjtIEFjrOfe6wVjdNgbmpzbEiUm9VGZ6zJ2k+LsnawMIw
 8W3178Yjp+SEb21X+b+pe7/efIqD01SOX36zUOETL9gOvX3aMHbTuDGERQlCBUdZ
 9pcUwWfzeGzQSsnEvEi05HlnrDQpeavAc4tVYWgiEvO2S8cngXgsuoGhUO2ov8qY
 scq4PQGclpmR6IMYDJLL6JbOvfBDIMoFHGbDbdkjpkkhoRHgtQgnRebOgB20bxom
 D3XPk39U2CbVX9UJqgPN0yvA8L4biG4RPNZZNOjFtOup6oggpzwgN2M9h1ZhuulG
 bTOjHWl9NJmSNIYUDzOyH7m98UCI1pEXcH1kkV7vjjAIAPbn3ovXgzMjSBqDVvNj
 tyqGujS4snRb5/NI8g6O+VyTbMSnimu+tZpUvKH+4oSBH9ZAUVI=
 =z6NN
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Protect NFSD writes against filesystem freezing

 - Fix a potential memory leak during server shutdown

* tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix a server shutdown leak
  NFSD: Protect against filesystem freezing
2023-03-10 08:45:30 -08:00
Johannes Berg
96c0695083 wifi: cfg80211: fix MLO connection ownership
When disconnecting from an MLO connection we need the AP
MLD address, not an arbitrary BSSID. Fix the code to do
that.

Fixes: 9ecff10e82a5 ("wifi: nl80211: refactor BSS lookup in nl80211_associate()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230301115906.4c1b3b18980e.I008f070c7f3b8e8bde9278101ef9e40706a82902@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-10 11:47:25 +01:00
Johannes Berg
ce04abc3fc wifi: mac80211: check basic rates validity
When userspace sets basic rates, it might send us some rates
list that's empty or consists of invalid values only. We're
currently ignoring invalid values and then may end up with a
rates bitmap that's empty, which later results in a warning.

Reject the call if there were no valid rates.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-10 11:47:00 +01:00
Johannes Berg
b27f07c50a wifi: nl80211: fix puncturing bitmap policy
This was meant to be a u32, and while applying the patch
I tried to use policy validation for it. However, not only
did I copy/paste it to u8 instead of u32, but also used
the policy range erroneously. Fix both of these issues.

Fixes: d7c1a9a0ed18 ("wifi: nl80211: validate and configure puncturing bitmap")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-10 11:46:45 +01:00
Johannes Berg
f624bb6fad wifi: nl80211: fix NULL-ptr deref in offchan check
If, e.g. in AP mode, the link was already created by userspace
but not activated yet, it has a chandef but the chandef isn't
valid and has no channel. Check for this and ignore this link.

Fixes: 7b0a0e3c3a88 ("wifi: cfg80211: do some rework towards MLO link APIs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230301115906.71bd4803fbb9.Iee39c0f6c2d3a59a8227674dc55d52e38b1090cf@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-10 11:46:22 +01:00
Jason Xing
fd9c31f834 udp: introduce __sk_mem_schedule() usage
Keep the accounting schema consistent across different protocols
with __sk_mem_schedule(). Besides, it adjusts a little bit on how
to calculate forward allocated memory compared to before. After
applied this patch, we could avoid receive path scheduling extra
amount of memory.

Link: https://lore.kernel.org/lkml/20230221110344.82818-1-kerneljasonxing@gmail.com/
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230308021153.99777-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 23:32:12 -08:00
Jakub Kicinski
d0928c1c5b Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:

====================
Netfilter updates for net-next

1. nf_tables 'brouting' support, from Sriram Yagnaraman.

2. Update bridge netfilter and ovs conntrack helpers to handle
   IPv6 Jumbo packets properly, i.e. fetch the packet length
   from hop-by-hop extension header, from Xin Long.

   This comes with a test BIG TCP test case, added to
   tools/testing/selftests/net/.

3. Fix spelling and indentation in conntrack, from Jeremy Sowden.

* 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nat: fix indentation of function arguments
  netfilter: conntrack: fix typo
  selftests: add a selftest for big tcp
  netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
  netfilter: move br_nf_check_hbh_len to utils
  netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
  netfilter: bridge: check len before accessing more nh data
  netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
  netfilter: bridge: introduce broute meta statement
====================

Link: https://lore.kernel.org/r/20230308193033.13965-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 23:28:53 -08:00
Leon Romanovsky
76b9bf965c neighbour: delete neigh_lookup_nodev as not used
neigh_lookup_nodev isn't used in the kernel after removal
of DECnet. So let's remove it.

Fixes: 1202cdd66531 ("Remove DECnet support from kernel")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/eb5656200d7964b2d177a36b77efa3c597d6d72d.1678267343.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 23:25:26 -08:00
Eric Dumazet
62423bd2d2 net: sched: remove qdisc_watchdog->last_expires
This field mirrors hrtimer softexpires, we can instead
use the existing helpers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230308182648.1150762-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 23:24:14 -08:00
Breno Leitao
bced3f7db9 tcp: tcp_make_synack() can be called from process context
tcp_rtx_synack() now could be called in process context as explained in
0a375c822497 ("tcp: tcp_rtx_synack() can be called from process
context").

tcp_rtx_synack() might call tcp_make_synack(), which will touch per-CPU
variables with preemption enabled. This causes the following BUG:

    BUG: using __this_cpu_add() in preemptible [00000000] code: ThriftIO1/5464
    caller is tcp_make_synack+0x841/0xac0
    Call Trace:
     <TASK>
     dump_stack_lvl+0x10d/0x1a0
     check_preemption_disabled+0x104/0x110
     tcp_make_synack+0x841/0xac0
     tcp_v6_send_synack+0x5c/0x450
     tcp_rtx_synack+0xeb/0x1f0
     inet_rtx_syn_ack+0x34/0x60
     tcp_check_req+0x3af/0x9e0
     tcp_rcv_state_process+0x59b/0x2030
     tcp_v6_do_rcv+0x5f5/0x700
     release_sock+0x3a/0xf0
     tcp_sendmsg+0x33/0x40
     ____sys_sendmsg+0x2f2/0x490
     __sys_sendmsg+0x184/0x230
     do_syscall_64+0x3d/0x90

Avoid calling __TCP_INC_STATS() with will touch per-cpu variables. Use
TCP_INC_STATS() which is safe to be called from context switch.

Fixes: 8336886f786f ("tcp: TCP Fast Open Server - support TFO listeners")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230308190745.780221-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 23:12:00 -08:00
Florian Westphal
6978052448 netlink: remove unused 'compare' function
No users in the tree.  Tested with allmodconfig build.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230308142006.20879-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 23:11:09 -08:00
Nick Alcock
14296c7d72 mctp: remove MODULE_LICENSE in non-modules
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.

So remove it in the files in this commit, none of which can be built as
modules.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Jeremy Kerr <jk@codeconstruct.com.au>
Cc: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20230308121230.5354-2-nick.alcock@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 23:06:21 -08:00
Jakub Kicinski
d0ddf5065f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Documentation/bpf/bpf_devel_QA.rst
  b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
  d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 22:22:11 -08:00
Linus Torvalds
44889ba56c Networking fixes for 6.3-rc2, including fixes from netfilter, bpf
Current release - regressions:
 
   - core: avoid skb end_offset change in __skb_unclone_keeptruesize()
 
   - sched:
     - act_connmark: handle errno on tcf_idr_check_alloc
     - flower: fix fl_change() error recovery path
 
   - ieee802154: prevent user from crashing the host
 
 Current release - new code bugs:
 
   - eth: bnxt_en: fix the double free during device removal
 
   - tools: ynl:
     - fix enum-as-flags in the generic CLI
     - fully inherit attrs in subsets
     - re-license uniformly under GPL-2.0 or BSD-3-clause
 
 Previous releases - regressions:
 
   - core: use indirect calls helpers for sk_exit_memory_pressure()
 
   - tls:
     - fix return value for async crypto
     - avoid hanging tasks on the tx_lock
 
   - eth: ice: copy last block omitted in ice_get_module_eeprom()
 
 Previous releases - always broken:
 
   - core: avoid double iput when sock_alloc_file fails
 
   - af_unix: fix struct pid leaks in OOB support
 
   - tls:
     - fix possible race condition
     - fix device-offloaded sendpage straddling records
 
   - bpf:
     - sockmap: fix an infinite loop error
     - test_run: fix &xdp_frame misplacement for LIVE_FRAMES
     - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
 
   - netfilter: tproxy: fix deadlock due to missing BH disable
 
   - phylib: get rid of unnecessary locking
 
   - eth: bgmac: fix *initial* chip reset to support BCM5358
 
   - eth: nfp: fix csum for ipsec offload
 
   - eth: mtk_eth_soc: fix RX data corruption issue
 
 Misc:
 
   - usb: qmi_wwan: add telit 0x1080 composition
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmQJzQISHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOky5YP/04Dbsbeqpk0Q94axmjoaS0J/4rW49js
 RaA7v8ci7sL1omW8k5tILPXniAouN4YHNOCW1KbLBMR5O7lyn9qM1RteHOIpOmte
 TLzAw+6Wl7CyGiiqirv2GU96Wd/jZoZpPXFZz/gXP59GnkChSHzQcpexmz0nrmxI
 eCRSs+qm+re3wmDKTYm5C+g+420PNXu9JItPnTNf+nTkTBxpmOEMyry03I0taXKS
 wceQHB2q5E0sSWXDfkxG/pmUuYTj3AdRSQ+vo+FLevSs/LWeThs2I6pT5sn8XS76
 1S8Lh6FytfBhyalFmRtrpqIJYyGae5MwEXQ29ddfmF4OFFLedx3IH0+JFQxTE9So
 i4gaXmM5SUI7c5vhib097xUISoLxKqqXQVQQSQ1MPZRfXtVubbA2gCv+vh6fXVoj
 zQYatZOLM7KT9q4Pw8A+9bJPof/FV+ObC67pbGQbJJgBoy+oOixDuP+x5DYT384L
 /5XS+23OZiFe7bvQoE/0SQMeRk3lF2XkS5l9gSbdSnGQPiaOqKhDgkoCmdkn1jvg
 qtkBS6+tRRoOBNsjC4r4eFXBVOQ1+myyjZetBnEOaSp22FaTJFQh9qX3AMFIHbUy
 m0jDi9OJZSWHICd6KNWPm3JK43cMjiyZbGftYqOHhuY5HN30vQN6sl7DXIJ0rIcE
 myHMfizwqmGT
 =hSXM
 -----END PGP SIGNATURE-----

Merge tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and bpf.

  Current release - regressions:

   - core: avoid skb end_offset change in __skb_unclone_keeptruesize()

   - sched:
      - act_connmark: handle errno on tcf_idr_check_alloc
      - flower: fix fl_change() error recovery path

   - ieee802154: prevent user from crashing the host

  Current release - new code bugs:

   - eth: bnxt_en: fix the double free during device removal

   - tools: ynl:
      - fix enum-as-flags in the generic CLI
      - fully inherit attrs in subsets
      - re-license uniformly under GPL-2.0 or BSD-3-clause

  Previous releases - regressions:

   - core: use indirect calls helpers for sk_exit_memory_pressure()

   - tls:
      - fix return value for async crypto
      - avoid hanging tasks on the tx_lock

   - eth: ice: copy last block omitted in ice_get_module_eeprom()

  Previous releases - always broken:

   - core: avoid double iput when sock_alloc_file fails

   - af_unix: fix struct pid leaks in OOB support

   - tls:
      - fix possible race condition
      - fix device-offloaded sendpage straddling records

   - bpf:
      - sockmap: fix an infinite loop error
      - test_run: fix &xdp_frame misplacement for LIVE_FRAMES
      - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR

   - netfilter: tproxy: fix deadlock due to missing BH disable

   - phylib: get rid of unnecessary locking

   - eth: bgmac: fix *initial* chip reset to support BCM5358

   - eth: nfp: fix csum for ipsec offload

   - eth: mtk_eth_soc: fix RX data corruption issue

  Misc:

   - usb: qmi_wwan: add telit 0x1080 composition"

* tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  tools: ynl: fix enum-as-flags in the generic CLI
  tools: ynl: move the enum classes to shared code
  net: avoid double iput when sock_alloc_file fails
  af_unix: fix struct pid leaks in OOB support
  eth: fealnx: bring back this old driver
  net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
  net: microchip: sparx5: fix deletion of existing DSCP mappings
  octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
  net/smc: fix fallback failed while sendmsg with fastopen
  ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
  mailmap: update entries for Stephen Hemminger
  mailmap: add entry for Maxim Mikityanskiy
  nfc: change order inside nfc_se_io error path
  ethernet: ice: avoid gcc-9 integer overflow warning
  ice: don't ignore return codes in VSI related code
  ice: Fix DSCP PFC TLV creation
  net: usb: qmi_wwan: add Telit 0x1080 composition
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
  netfilter: conntrack: adopt safer max chain length
  net: tls: fix device-offloaded sendpage straddling records
  ...
2023-03-09 10:56:58 -08:00
Xin Long
42d452e770 sctp: add weighted fair queueing stream scheduler
As it says in rfc8260#section-3.6 about the weighted fair queueing
scheduler:

   A Weighted Fair Queueing scheduler between the streams is used.  The
   weight is configurable per outgoing SCTP stream.  This scheduler
   considers the lengths of the messages of each stream and schedules
   them in a specific way to use the capacity according to the given
   weights.  If the weight of stream S1 is n times the weight of stream
   S2, the scheduler should assign to stream S1 n times the capacity it
   assigns to stream S2.  The details are implementation dependent.
   Interleaving user messages allows for a better realization of the
   capacity usage according to the given weights.

This patch adds Weighted Fair Queueing Scheduler actually based on
the code of Fair Capacity Scheduler by adding fc_weight into struct
sctp_stream_out_ext and taking it into account when sorting stream->
fc_list in sctp_sched_fc_sched() and sctp_sched_fc_dequeue_done().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-09 11:31:44 +01:00
Xin Long
4821a076eb sctp: add fair capacity stream scheduler
As it says in rfc8260#section-3.5 about the fair capacity scheduler:

   A fair capacity distribution between the streams is used.  This
   scheduler considers the lengths of the messages of each stream and
   schedules them in a specific way to maintain an equal capacity for
   all streams.  The details are implementation dependent.  interleaving
   user messages allows for a better realization of the fair capacity
   usage.

This patch adds Fair Capacity Scheduler based on the foundations added
by commit 5bbbbe32a431 ("sctp: introduce stream scheduler foundations"):

A fc_list and a fc_length are added into struct sctp_stream_out_ext and
a fc_list is added into struct sctp_stream. In .enqueue, when there are
chunks enqueued into a stream, this stream will be linked into stream->
fc_list by its fc_list ordered by its fc_length. In .dequeue, it always
picks up the 1st skb from stream->fc_list. In .dequeue_done, fc_length
is increased by chunk's len and update its location in stream->fc_list
according to the its new fc_length.

Note that when the new fc_length overflows in .dequeue_done, instead of
resetting all fc_lengths to 0, we only reduced them by U32_MAX / 4 to
avoid a moment of imbalance in the scheduling, as Marcelo suggested.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-09 11:31:44 +01:00
Thadeu Lima de Souza Cascardo
649c15c769 net: avoid double iput when sock_alloc_file fails
When sock_alloc_file fails to allocate a file, it will call sock_release.
__sys_socket_file should then not call sock_release again, otherwise there
will be a double free.

[   89.319884] ------------[ cut here ]------------
[   89.320286] kernel BUG at fs/inode.c:1764!
[   89.320656] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   89.321051] CPU: 7 PID: 125 Comm: iou-sqp-124 Not tainted 6.2.0+ #361
[   89.321535] RIP: 0010:iput+0x1ff/0x240
[   89.321808] Code: d1 83 e1 03 48 83 f9 02 75 09 48 81 fa 00 10 00 00 77 05 83 e2 01 75 1f 4c 89 ef e8 fb d2 ba 00 e9 80 fe ff ff c3 cc cc cc cc <0f> 0b 0f 0b e9 d0 fe ff ff 0f 0b eb 8d 49 8d b4 24 08 01 00 00 48
[   89.322760] RSP: 0018:ffffbdd60068bd50 EFLAGS: 00010202
[   89.323036] RAX: 0000000000000000 RBX: ffff9d7ad3cacac0 RCX: 0000000000001107
[   89.323412] RDX: 000000000003af00 RSI: 0000000000000000 RDI: ffff9d7ad3cacb40
[   89.323785] RBP: ffffbdd60068bd68 R08: ffffffffffffffff R09: ffffffffab606438
[   89.324157] R10: ffffffffacb3dfa0 R11: 6465686361657256 R12: ffff9d7ad3cacb40
[   89.324529] R13: 0000000080000001 R14: 0000000080000001 R15: 0000000000000002
[   89.324904] FS:  00007f7b28516740(0000) GS:ffff9d7aeb1c0000(0000) knlGS:0000000000000000
[   89.325328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   89.325629] CR2: 00007f0af52e96c0 CR3: 0000000002a02006 CR4: 0000000000770ee0
[   89.326004] PKRU: 55555554
[   89.326161] Call Trace:
[   89.326298]  <TASK>
[   89.326419]  __sock_release+0xb5/0xc0
[   89.326632]  __sys_socket_file+0xb2/0xd0
[   89.326844]  io_socket+0x88/0x100
[   89.327039]  ? io_issue_sqe+0x6a/0x430
[   89.327258]  io_issue_sqe+0x67/0x430
[   89.327450]  io_submit_sqes+0x1fe/0x670
[   89.327661]  io_sq_thread+0x2e6/0x530
[   89.327859]  ? __pfx_autoremove_wake_function+0x10/0x10
[   89.328145]  ? __pfx_io_sq_thread+0x10/0x10
[   89.328367]  ret_from_fork+0x29/0x50
[   89.328576] RIP: 0033:0x0
[   89.328732] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[   89.329073] RSP: 002b:0000000000000000 EFLAGS: 00000202 ORIG_RAX: 00000000000001a9
[   89.329477] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f7b28637a3d
[   89.329845] RDX: 00007fff4e4318a8 RSI: 00007fff4e4318b0 RDI: 0000000000000400
[   89.330216] RBP: 00007fff4e431830 R08: 00007fff4e431711 R09: 00007fff4e4318b0
[   89.330584] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff4e441b38
[   89.330950] R13: 0000563835e3e725 R14: 0000563835e40d10 R15: 00007f7b28784040
[   89.331318]  </TASK>
[   89.331441] Modules linked in:
[   89.331617] ---[ end trace 0000000000000000 ]---

Fixes: da214a475f8b ("net: add __sys_socket_file()")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230307173707.468744-1-cascardo@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-08 23:26:51 -08:00
Eric Dumazet
2aab4b9690 af_unix: fix struct pid leaks in OOB support
syzbot reported struct pid leak [1].

Issue is that queue_oob() calls maybe_add_creds() which potentially
holds a reference on a pid.

But skb->destructor is not set (either directly or by calling
unix_scm_to_skb())

This means that subsequent kfree_skb() or consume_skb() would leak
this reference.

In this fix, I chose to fully support scm even for the OOB message.

[1]
BUG: memory leak
unreferenced object 0xffff8881053e7f80 (size 128):
comm "syz-executor242", pid 5066, jiffies 4294946079 (age 13.220s)
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff812ae26a>] alloc_pid+0x6a/0x560 kernel/pid.c:180
[<ffffffff812718df>] copy_process+0x169f/0x26c0 kernel/fork.c:2285
[<ffffffff81272b37>] kernel_clone+0xf7/0x610 kernel/fork.c:2684
[<ffffffff812730cc>] __do_sys_clone+0x7c/0xb0 kernel/fork.c:2825
[<ffffffff849ad699>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff849ad699>] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
[<ffffffff84a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Reported-by: syzbot+7699d9e5635c10253a27@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rao Shoaib <rao.shoaib@oracle.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230307164530.771896-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-08 23:26:03 -08:00
Jakub Kicinski
ed69e0667d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
pull-request: bpf-next 2023-03-08

We've added 23 non-merge commits during the last 2 day(s) which contain
a total of 28 files changed, 414 insertions(+), 104 deletions(-).

The main changes are:

1) Add more precise memory usage reporting for all BPF map types,
   from Yafang Shao.

2) Add ARM32 USDT support to libbpf, from Puranjay Mohan.

3) Fix BTF_ID_LIST size causing problems in !CONFIG_DEBUG_INFO_BTF,
   from Nathan Chancellor.

4) IMA selftests fix, from Roberto Sassu.

5) libbpf fix in APK support code, from Daniel Müller.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
  selftests/bpf: Fix IMA test
  libbpf: USDT arm arg parsing support
  libbpf: Refactor parse_usdt_arg() to re-use code
  libbpf: Fix theoretical u32 underflow in find_cd() function
  bpf: enforce all maps having memory usage callback
  bpf: offload map memory usage
  bpf, net: xskmap memory usage
  bpf, net: sock_map memory usage
  bpf, net: bpf_local_storage memory usage
  bpf: local_storage memory usage
  bpf: bpf_struct_ops memory usage
  bpf: queue_stack_maps memory usage
  bpf: devmap memory usage
  bpf: cpumap memory usage
  bpf: bloom_filter memory usage
  bpf: ringbuf memory usage
  bpf: reuseport_array memory usage
  bpf: stackmap memory usage
  bpf: arraymap memory usage
  bpf: hashtab memory usage
  ...
====================

Link: https://lore.kernel.org/r/20230308193533.1671597-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-08 14:34:22 -08:00
Benjamin Coddington
9ca6705d9d SUNRPC: Fix a server shutdown leak
Fix a race where kthread_stop() may prevent the threadfn from ever getting
called.  If that happens the svc_rqst will not be cleaned up.

Fixes: ed6473ddc704 ("NFSv4: Fix callback server shutdown")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-08 08:46:41 -05:00
Jeremy Sowden
b0ca200077 netfilter: nat: fix indentation of function arguments
A couple of arguments to a function call are incorrectly indented.
Fix them.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:25:44 +01:00
Jeremy Sowden
e5d015a114 netfilter: conntrack: fix typo
There's a spelling mistake in a comment.  Fix it.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:25:43 +01:00
Xin Long
eaafdaa3e9 netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
For IPv6 Jumbo packets, the ipv6_hdr(skb)->payload_len is always 0,
and its real payload_len ( > 65535) is saved in hbh exthdr. With 0
length for the jumbo packets, all data and exthdr will be trimmed
in nf_ct_skb_network_trim().

This patch is to call nf_ip6_check_hbh_len() to get real pkt_len
of the IPv6 packet, similar to br_validate_ipv6().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:25:41 +01:00
Xin Long
28e144cf5f netfilter: move br_nf_check_hbh_len to utils
Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it
to netfilter utils, so that it can be used by other modules, like
ovs and tc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:25:40 +01:00
Xin Long
0b24bd71a6 netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
br_nf_check_hbh_len() is a function to check the Hop-by-hop option
header, and shouldn't do pskb_trim_rcsum() there. This patch is to
pass pkt_len out to br_validate_ipv6() and do pskb_trim_rcsum()
after calling br_validate_ipv6() instead.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:25:40 +01:00
Xin Long
a7f1a2f43e netfilter: bridge: check len before accessing more nh data
In the while loop of br_nf_check_hbh_len(), similar to ip6_parse_tlv(),
before accessing 'nh[off + 1]', it should add a check 'len < 2'; and
before parsing IPV6_TLV_JUMBO, it should add a check 'optlen > len',
in case of overflows.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:25:39 +01:00
Xin Long
9ccff83b13 netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
When checking Hop-by-hop option header, if the option data is in
nonlinear area, it should do pskb_may_pull instead of discarding
the skb as a bad IPv6 packet.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:25:38 +01:00
Eric Dumazet
1036908045 net: reclaim skb->scm_io_uring bit
Commit 0091bfc81741 ("io_uring/af_unix: defer registered
files gc to io_uring release") added one bit to struct sk_buff.

This structure is critical for networking, and we try very hard
to not add bloat on it, unless absolutely required.

For instance, we can use a specific destructor as a wrapper
around unix_destruct_scm(), to identify skbs that unix_gc()
has to special case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-08 13:21:47 +00:00
Sriram Yagnaraman
4386b92185 netfilter: bridge: introduce broute meta statement
nftables equivalent for ebtables -t broute.

Implement broute meta statement to set br_netfilter_broute flag
in skb to force a packet to be routed instead of being bridged.

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08 14:21:18 +01:00
D. Wythe
ce7ca79471 net/smc: fix fallback failed while sendmsg with fastopen
Before determining whether the msg has unsupported options, it has been
prematurely terminated by the wrong status check.

For the application, the general usages of MSG_FASTOPEN likes

fd = socket(...)
/* rather than connect */
sendto(fd, data, len, MSG_FASTOPEN)

Hence, We need to check the flag before state check, because the sock
state here is always SMC_INIT when applications tries MSG_FASTOPEN.
Once we found unsupported options, fallback it to TCP.

Fixes: ee9dfbef02d1 ("net/smc: handle sockopts forcing fallback")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>

v2 -> v1: Optimize code style
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-08 13:00:55 +00:00
Jeremy Sowden
493924519b netfilter: nft_redir: correct value of inet type .maxattrs
`nft_redir_inet_type.maxattrs` was being set, presumably because of a
cut-and-paste error, to `NFTA_MASQ_MAX`, instead of `NFTA_REDIR_MAX`.

Fixes: 63ce3940f3ab ("netfilter: nft_redir: add inet support")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-03-08 12:26:42 +01:00