linux

iv/linux

Author	SHA1	Message	Date
Martin Willi	2c50fc0475	netfilter: Use l3mdev flow key when re-routing mangled packets Commit 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") introduces a flow key specific for layer 3 domains, such as a VRF master device. This allows for explicit VRF domain selection instead of abusing the oif flow key. Update ip[6]_route_me_harder() to make use of that new key when re-routing mangled packets within VRFs instead of setting the flow oif, making it consistent with other users. Signed-off-by: Martin Willi <martin@strongswan.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-16 13:03:29 +02:00
Felix Fietkau	2456074935	netfilter: nft_flow_offload: fix offload with pppoe + vlan When running a combination of PPPoE on top of a VLAN, we need to set info->outdev to the PPPoE device, otherwise PPPoE encap is skipped during software offload. Fixes: 72efd585f714 ("netfilter: flowtable: add pppoe support") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-16 12:58:55 +02:00
Felix Fietkau	cf2df74e20	net: fix dev_fill_forward_path with pppoe + bridge When calling dev_fill_forward_path on a pppoe device, the provided destination address is invalid. In order for the bridge fdb lookup to succeed, the pppoe code needs to update ctx->daddr to the correct value. Fix this by storing the address inside struct net_device_path_ctx Fixes: f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-16 12:58:55 +02:00
Felix Fietkau	45ca3e6199	netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices The dst entry does not contain a valid hardware address, so skip the lookup in order to avoid running into errors here. The proper hardware address is filled in from nft_dev_path_info Fixes: 72efd585f714 ("netfilter: flowtable: add pppoe support") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-16 12:58:55 +02:00
Felix Fietkau	396ef64113	netfilter: flowtable: fix excessive hw offload attempts after failure If a flow cannot be offloaded, the code currently repeatedly tries again as quickly as possible, which can significantly increase system load. Fix this by limiting flow timeout update and hardware offload retry to once per second. Fixes: c07531c01d82 ("netfilter: flowtable: Remove redundant hw refresh bit") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-16 12:58:55 +02:00
Paolo Abeni	4d42d54a7d	net/sched: act_pedit: sanitize shift argument before usage syzbot was able to trigger an Out-of-Bound on the pedit action: UBSAN: shift-out-of-bounds in net/sched/act_pedit.c:238:43 shift exponent 1400735974 is too large for 32-bit type 'unsigned int' CPU: 0 PID: 3606 Comm: syz-executor151 Not tainted 5.18.0-rc5-syzkaller-00165-g810c2f0a3f86 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 ubsan_epilogue+0xb/0x50 lib/ubsan.c:151 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x187 lib/ubsan.c:322 tcf_pedit_init.cold+0x1a/0x1f net/sched/act_pedit.c:238 tcf_action_init_1+0x414/0x690 net/sched/act_api.c:1367 tcf_action_init+0x530/0x8d0 net/sched/act_api.c:1432 tcf_action_add+0xf9/0x480 net/sched/act_api.c:1956 tc_ctl_action+0x346/0x470 net/sched/act_api.c:2015 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5993 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e2/0x800 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fe36e9e1b59 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffef796fe88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe36e9e1b59 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003 RBP: 00007fe36e9a5d00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe36e9a5d90 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The 'shift' field is not validated, and any value above 31 will trigger out-of-bounds. The issue predates the git history, but syzbot was able to trigger it only after the commit mentioned in the fixes tag, and this change only applies on top of such commit. Address the issue bounding the 'shift' value to the maximum allowed by the relevant operator. Reported-and-tested-by: syzbot+8ed8fc4c57e9dcf23ca6@syzkaller.appspotmail.com Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 11:48:30 +01:00
Eric Dumazet	9098765002	net: call skb_defer_free_flush() before each napi_poll() skb_defer_free_flush() can consume cpu cycles, it seems better to call it in the inner loop: - Potentially frees page/skb that will be reallocated while hot. - Account for the cpu cycles in the @time_limit determination. - Keep softnet_data.defer_count small to reduce chances for skb_attempt_defer_free() to send an IPI. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 11:33:59 +01:00
Eric Dumazet	39564c3fdc	net: add skb_defer_max sysctl commit 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") added another per-cpu cache of skbs. It was expected to be small, and an IPI was forced whenever the list reached 128 skbs. We might need to be able to control more precisely queue capacity and added latency. An IPI is generated whenever queue reaches half capacity. Default value of the new limit is 64. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 11:33:59 +01:00
Eric Dumazet	2db60eed1a	net: use napi_consume_skb() in skb_defer_free_flush() skb_defer_free_flush() runs from softirq context, we have the opportunity to refill the napi_alloc_cache, and/or use kmem_cache_free_bulk() when this cache is full. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 11:33:59 +01:00
Eric Dumazet	97e719a82b	net: fix possible race in skb_attempt_defer_free() A cpu can observe sd->defer_count reaching 128, and call smp_call_function_single_async() Problem is that the remote CPU can clear sd->defer_count before the IPI is run/acknowledged. Other cpus can queue more packets and also decide to call smp_call_function_single_async() while the pending IPI was not yet delivered. This is a common issue with smp_call_function_single_async(). Callers must ensure correct synchronization and serialization. I triggered this issue while experimenting smaller threshold. Performing the call to smp_call_function_single_async() under sd->defer_lock protection did not solve the problem. Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async() to insert locked csd") replaced an informative WARN_ON_ONCE() with a return of -EBUSY, which is often ignored. Test of CSD_FLAG_LOCK presence is racy anyway. Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 11:33:59 +01:00
Menglong Dong	f8319dfd1b	net: tcp: reset 'drop_reason' to NOT_SPCIFIED in tcp_v{4,6}_rcv() The 'drop_reason' that passed to kfree_skb_reason() in tcp_v4_rcv() and tcp_v6_rcv() can be SKB_NOT_DROPPED_YET(0), as it is used as the return value of tcp_inbound_md5_hash(). And it can panic the kernel with NULL pointer in net_dm_packet_report_size() if the reason is 0, as drop_reasons[0] is NULL. Fixes: 1330b6ef3313 ("skb: make drop reason booleanable") Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:47:44 +01:00
Menglong Dong	20bbcd0a94	net: skb: check the boundrary of drop reason in kfree_skb_reason() Sometimes, we may forget to reset skb drop reason to NOT_SPECIFIED after we make it the return value of the functions with return type of enum skb_drop_reason, such as tcp_inbound_md5_hash. Therefore, its value can be SKB_NOT_DROPPED_YET(0), which is invalid for kfree_skb_reason(). So we check the range of drop reason in kfree_skb_reason() with DEBUG_NET_WARN_ON_ONCE(). Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:47:43 +01:00
Menglong Dong	a3af33abd9	net: dm: check the boundary of skb drop reasons The 'reason' will be set to 'SKB_DROP_REASON_NOT_SPECIFIED' if it not small that SKB_DROP_REASON_MAX in net_dm_packet_trace_kfree_skb_hit(), but it can't avoid it to be 0, which is invalid and can cause NULL pointer in drop_reasons. Therefore, reset it to SKB_DROP_REASON_NOT_SPECIFIED when 'reason <= 0'. Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:47:43 +01:00
Guangguan Wang	3aba103006	net/smc: align the connect behaviour with TCP Connect with O_NONBLOCK will not be completed immediately and returns -EINPROGRESS. It is possible to use selector/poll for completion by selecting the socket for writing. After select indicates writability, a second connect function call will return 0 to indicate connected successfully as TCP does, but smc returns -EISCONN. Use socket state for smc to indicate connect state, which can help smc aligning the connect behaviour with TCP. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:40:07 +01:00
Eric Dumazet	eda090c31f	inet: rename INET_MATCH() This is no longer a macro, but an inlined function. INET_MATCH() -> inet_match() Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Olivier Hartkopp <socketcan@hartkopp.net> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:06 +01:00
Eric Dumazet	5d368f0328	ipv6: add READ_ONCE(sk->sk_bound_dev_if) in INET6_MATCH() INET6_MATCH() runs without holding a lock on the socket. We probably need to annotate most reads. This patch makes INET6_MATCH() an inline function to ease our changes. v2: inline function only defined if IS_ENABLED(CONFIG_IPV6) Change the name to inet6_match(), this is no longer a macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:06 +01:00
Eric Dumazet	ff0094030f	l2tp: use add READ_ONCE() to fetch sk->sk_bound_dev_if Use READ_ONCE() in paths not holding the socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:06 +01:00
Eric Dumazet	70f87de9fa	net_sched: em_meta: add READ_ONCE() in var_sk_bound_if() sk->sk_bound_dev_if can change under us, use READ_ONCE() annotation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:06 +01:00
Eric Dumazet	d2c135619c	inet: add READ_ONCE(sk->sk_bound_dev_if) in inet_csk_bind_conflict() inet_csk_bind_conflict() can access sk->sk_bound_dev_if for unlocked sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:06 +01:00
Eric Dumazet	36f7cec4f3	dccp: use READ_ONCE() to read sk->sk_bound_dev_if When reading listener sk->sk_bound_dev_if locklessly, we must use READ_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:06 +01:00
Eric Dumazet	e5fccaa1eb	net: core: add READ_ONCE/WRITE_ONCE annotations for sk->sk_bound_dev_if sock_bindtoindex_locked() needs to use WRITE_ONCE(sk->sk_bound_dev_if, val), because other cpus/threads might locklessly read this field. sock_getbindtodevice(), sock_getsockopt() need READ_ONCE() because they run without socket lock held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:06 +01:00
Eric Dumazet	a20ea29807	sctp: read sk->sk_bound_dev_if once in sctp_rcv() sctp_rcv() reads sk->sk_bound_dev_if twice while the socket is not locked. Another cpu could change this field under us. Fixes: 0fd9a65a76e8 ("[SCTP] Support SO_BINDTODEVICE socket option on incoming packets.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:06 +01:00
Eric Dumazet	4c971d2f35	net: annotate races around sk->sk_bound_dev_if UDP sendmsg() is lockless, and reads sk->sk_bound_dev_if while this field can be changed by another thread. Adds minimal annotations to avoid KCSAN splats for UDP. Following patches will add more annotations to potential lockless readers. BUG: KCSAN: data-race in __ip6_datagram_connect / udpv6_sendmsg write to 0xffff888136d47a94 of 4 bytes by task 7681 on cpu 0: __ip6_datagram_connect+0x6e2/0x930 net/ipv6/datagram.c:221 ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272 inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576 __sys_connect_file net/socket.c:1900 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1917 __do_sys_connect net/socket.c:1927 [inline] __se_sys_connect net/socket.c:1924 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1924 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888136d47a94 of 4 bytes by task 7670 on cpu 1: udpv6_sendmsg+0xc60/0x16e0 net/ipv6/udp.c:1436 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:652 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000000 -> 0xffffff9b Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 7670 Comm: syz-executor.3 Tainted: G W 5.18.0-rc1-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 I chose to not add Fixes: tag because race has minor consequences and stable teams busy enough. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:31:05 +01:00
Coco Li	80e425b613	ipv6: Add hop-by-hop header to jumbograms in ip6_output Instead of simply forcing a 0 payload_len in IPv6 header, implement RFC 2675 and insert a custom extension header. Note that only TCP stack is currently potentially generating jumbograms, and that this extension header is purely local, it wont be sent on a physical link. This is needed so that packet capture (tcpdump and friends) can properly dissect these large packets. Signed-off-by: Coco Li <lixiaoyan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:18:56 +01:00
Alexander Duyck	0fe79f28bf	net: allow gro_max_size to exceed 65536 Allow the gro_max_size to exceed a value larger than 65536. There weren't really any external limitations that prevented this other than the fact that IPv4 only supports a 16 bit length field. Since we have the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to exceed this value and for IPv4 and non-TCP flows we can cap things at 65536 via a constant rather than relying on gro_max_size. [edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:18:56 +01:00
Eric Dumazet	81fbc81213	ipv6/gro: insert temporary HBH/jumbo header Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build BIG TCP ipv6 packets (bigger than 64K). This patch changes ipv6_gro_complete() to insert a HBH/jumbo header so that resulting packet can go through IPv6/TCP stacks. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:18:56 +01:00
Eric Dumazet	09f3d1a3a5	ipv6/gso: remove temporary HBH/jumbo header ipv6 tcp and gro stacks will soon be able to build big TCP packets, with an added temporary Hop By Hop header. If GSO is involved for these large packets, we need to remove the temporary HBH header before segmentation happens. v2: perform HBH removal from ipv6_gso_segment() instead of skb_segment() (Alexander feedback) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:18:56 +01:00
Eric Dumazet	9957b38b5e	tcp_cubic: make hystart_ack_delay() aware of BIG TCP hystart_ack_delay() had the assumption that a TSO packet would not be bigger than GSO_MAX_SIZE. This will no longer be true. We should use sk->sk_gso_max_size instead. This reduces chances of spurious Hystart ACK train detections. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:18:56 +01:00
Alexander Duyck	7c4e983c4f	net: allow gso_max_size to exceed 65536 The code for gso_max_size was added originally to allow for debugging and workaround of buggy devices that couldn't support TSO with blocks 64K in size. The original reason for limiting it to 64K was because that was the existing limits of IPv4 and non-jumbogram IPv6 length fields. With the addition of Big TCP we can remove this limit and allow the value to potentially go up to UINT_MAX and instead be limited by the tso_max_size value. So in order to support this we need to go through and clean up the remaining users of the gso_max_size value so that the values will cap at 64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper limit for GSO_MAX_SIZE. v6: (edumazet) fixed a compile error if CONFIG_IPV6=n, in a new sk_trim_gso_size() helper. netif_set_tso_max_size() caps the requested TSO size with GSO_MAX_SIZE. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:18:55 +01:00
Eric Dumazet	89527be8d8	net: add IFLA_TSO_{MAX_SIZE\|SEGS} attributes New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS are used to report to user-space the device TSO limits. ip -d link sh dev eth1 ... tso_max_size 65536 tso_max_segs 65535 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:18:55 +01:00
David S. Miller	1a01a07517	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next This is v2 including deadlock fix in conntrack ecache rework reported by Jakub Kicinski. The following patchset contains Netfilter updates for net-next, mostly updates to conntrack from Florian Westphal. 1) Add a dedicated list for conntrack event redelivery. 2) Include event redelivery list in conntrack dumps of dying type. 3) Remove per-cpu dying list for event redelivery, not used anymore. 4) Add netns .pre_exit to cttimeout to zap timeout objects before synchronize_rcu() call. 5) Remove nf_ct_unconfirmed_destroy. 6) Add generation id for conntrack extensions for conntrack timeout and helpers. 7) Detach timeout policy from conntrack on cttimeout module removal. 8) Remove __nf_ct_unconfirmed_destroy. 9) Remove unconfirmed list. 10) Remove unconditional local_bh_disable in init_conntrack(). 11) Consolidate conntrack iterator nf_ct_iterate_cleanup(). 12) Detect if ctnetlink listeners exist to short-circuit event path early. 13) Un-inline nf_ct_ecache_ext_add(). 14) Add nf_conntrack_events autodetect ctnetlink listener mode and make it default. 15) Add nf_ct_ecache_exist() to check for event cache extension. 16) Extend flowtable reverse route lookup to include source, iif, tos and mark, from Sven Auhagen. 17) Do not verify zero checksum UDP packets in nf_reject, from Kevin Mitchell. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-16 10:10:37 +01:00
Jonas Jelonek	569cf386ec	mac80211: minstrel_ht: support ieee80211_rate_status This patch adds support for the new struct ieee80211_rate_status and its annotation in struct ieee80211_tx_status in minstrel_ht. In minstrel_ht_tx_status, a check for the presence of instances of the new struct in ieee80211_tx_status is added. Based on this, minstrel_ht then gets and updates internal rate stats with either struct ieee80211_rate_status or ieee80211_tx_info->status.rates. Adjusted variants of minstrel_ht_txstat_valid, minstrel_ht_get_stats, minstrel_{ht/vht}_get_group_idx are added which use struct ieee80211_rate_status and struct rate_info instead of the legacy structs. struct rate_info from cfg80211.h does not provide whether short preamble was used for the transmission. So we retrieve this information from VIF and STA configuration and cache it in a new flag in struct minstrel_ht_sta per rate control instance. Compile-Tested: current wireless-next tree with all flags on Tested-on: Xiaomi 4A Gigabit (MediaTek MT7603E, MT7612E) with OpenWrt Linux 5.10.113 Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Link: https://lore.kernel.org/r/20220509173958.1398201-3-jelonek.jonas@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 10:07:58 +02:00
Jonas Jelonek	44fa75f207	mac80211: extend current rate control tx status API This patch adds the new struct ieee80211_rate_status and replaces 'struct rate_info rate' in ieee80211_tx_status with pointer and length annotation. The struct ieee80211_rate_status allows to: (1) receive tx power status feedback for transmit power control (TPC) per packet or packet retry (2) dynamic mapping of wifi chip specific multi-rate retry (mrr) chains with different lengths (3) increase the limit of annotatable rate indices to support IEEE802.11ac rate sets and beyond ieee80211_tx_info, control and status buffer, and ieee80211_tx_rate cannot be used to achieve these goals due to fixed size limitations. Our new struct contains a struct rate_info to annotate the rate that was used, retry count of the rate and tx power. It is intended for all information related to RC and TPC that needs to be passed from driver to mac80211 and its RC/TPC algorithms like Minstrel_HT. It corresponds to one stage in an mrr. Multiple subsequent instances of this struct can be included in struct ieee80211_tx_status via a pointer and a length variable. Those instances can be allocated on-stack. The former reference to a single instance of struct rate_info is replaced with our new annotation. An extension is introduced to struct ieee80211_hw. There are two new members called 'tx_power_levels' and 'max_txpwr_levels_idx' acting as a tx power level table. When a wifi device is registered, the driver shall supply all supported power levels in this list. This allows to support several quirks like differing power steps in power level ranges or alike. TPC can use this for algorithm and thus be designed more abstract instead of handling all possible step widths individually. Further mandatory changes in status.c, mt76 and ath11k drivers due to the removal of 'struct rate_info rate' are also included. status.c already uses the information in ieee80211_tx_status->rate in radiotap, this is now changed to use ieee80211_rate_status->rate_idx. mt76 driver already uses struct rate_info to pass the tx rate to status path. The new members of the ieee80211_tx_status are set to NULL and 0 because the previously passed rate is not relevant to rate control and accurate information is passed via tx_info->status.rates. For ath11k, the txrate can be passed via this struct because ath11k uses firmware RC and thus the information does not interfere with software RC. Compile-Tested: current wireless-next tree with all flags on Tested-on: Xiaomi 4A Gigabit (MediaTek MT7603E, MT7612E) with OpenWrt Linux 5.10.113 Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Link: https://lore.kernel.org/r/20220509173958.1398201-2-jelonek.jonas@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 10:05:02 +02:00
Peter Seiderer	ee0e16ab75	mac80211: minstrel_ht: fill all requested rates Fill all requested rates (in case of ath9k 4 rate slots are available, so fill all 4 instead of only 3), improves throughput in noisy environment. Signed-off-by: Peter Seiderer <ps.report@gmx.net> Link: https://lore.kernel.org/r/20220402153014.31332-2-ps.report@gmx.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 10:03:39 +02:00
Lavanya Suresh	195b9a0fd5	mac80211: disable BSS color collision detection in case of no free colors AP may run out of BSS color after color collision detection event from driver. Disable BSS color collision detection if no free colors are available based on bss color disabled bit sent as a part of NL80211_ATTR_HE_BSS_COLOR attribute sent in NL80211_CMD_SET_BEACON. It can be reenabled once new color is available. Signed-off-by: Lavanya Suresh <lavaks@codeaurora.org> Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com> Link: https://lore.kernel.org/r/1649867295-7204-3-git-send-email-quic_ramess@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:46:30 +02:00
Rameshkumar Sundaram	3d48cb7481	nl80211: Parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beacon NL80211_ATTR_HE_BSS_COLOR attribute can be included in both NL80211_CMD_START_AP and NL80211_CMD_SET_BEACON commands. Move he_bss_color from cfg80211_ap_settings to cfg80211_beacon_data and parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beacon() to have bss color settings parsed for both start ap and set beacon commands. Add a new flag he_bss_color_valid to indicate whether NL80211_ATTR_HE_BSS_COLOR attribute is included. Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com> Link: https://lore.kernel.org/r/1649867295-7204-2-git-send-email-quic_ramess@quicinc.com [fix build ...] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:45:21 +02:00
Eyal Birger	e6175a2ed1	xfrm: fix "disable_policy" flag use when arriving from different devices In IPv4 setting the "disable_policy" flag on a device means no policy should be enforced for traffic originating from the device. This was implemented by seting the DST_NOPOLICY flag in the dst based on the originating device. However, dsts are cached in nexthops regardless of the originating devices, in which case, the DST_NOPOLICY flag value may be incorrect. Consider the following setup: +------------------------------+ \| ROUTER \| +-------------+ \| +-----------------+ \| \| ipsec src \|----\|-\|ipsec0 \| \| +-------------+ \| \|disable_policy=0 \| +----+ \| \| +-----------------+ \|eth1\|-\|----- +-------------+ \| +-----------------+ +----+ \| \| noipsec src \|----\|-\|eth0 \| \| +-------------+ \| \|disable_policy=1 \| \| \| +-----------------+ \| +------------------------------+ Where ROUTER has a default route towards eth1. dst entries for traffic arriving from eth0 would have DST_NOPOLICY and would be cached and therefore can be reused by traffic originating from ipsec0, skipping policy check. Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead of the DST in IN/FWD IPv4 policy checks. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-05-16 09:31:26 +02:00
Johannes Berg	5dfad10812	mac80211: mlme: track assoc_bss/associated separately We currently track whether we're associated and which the BSS is in the same variable (ifmgd->associated), but for MLD we'll need to move the BSS pointer to be per link, while the question whether we're associated or not is for the whole interface. Add ifmgd->assoc_bss that stores the pointer and change ifmgd->associated to be just a bool, so the question of whether we're associated can continue working after MLD rework, without requiring changes, while the BSS pointer will have to be changed/used checked per link. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:16:20 +02:00
Johannes Berg	16d0364c72	mac80211: remove useless bssid copy We don't need to copy this locally, we now only use the variable to print before doing other things. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:15:19 +02:00
Johannes Berg	53da4c45ca	mac80211: remove unused argument to ieee80211_sta_connection_lost() We never use the bssid argument to ieee80211_sta_connection_lost() so we might as well just remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:15:04 +02:00
Johannes Berg	926101d2b7	mac80211: mlme: use local SSID copy There's no need to look it up from the ifmgd->associated BSS configuration, we already maintain a local copy since commit b0140fda626e ("mac80211: mlme: save ssid info to ieee80211_bss_conf while assoc"). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:14:51 +02:00
Johannes Berg	c8fe4b0b37	mac80211: use ifmgd->bssid instead of ifmgd->associated->bssid Since we always track the BSSID there when we get associated, these are equivalent, but ifmgd->bssid saves a dereference and thus makes the code a bit smaller, and more readable. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:13:22 +02:00
Johannes Berg	f344c58c25	mac80211: mlme: move in RSSI reporting code This code is tightly coupled to the sdata->u.mgd data structure, so there's no reason for it to be in utils. Move it to mlme.c. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:12:34 +02:00
Johannes Berg	97f7a47024	mac80211: unify CCMP/GCMP AAD construction Ping-Ke's previous patch adjusted the CCMP AAD construction to properly take the order bit into account, but failed to update the (identical) GCMP AAD construction as well. Unify the AAD construction between the two cases. Reported-by: Jouni Malinen <j@w1.fi> Link: https://lore.kernel.org/r/20220506105150.51d66e2a6f3c.I65f12be82c112365169e8a9f48c7a71300e814b9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-05-16 09:10:38 +02:00
Paolo Abeni	95d6865178	mptcp: fix subflow accounting on close If the PM closes a fully established MPJ subflow or the subflow creation errors out in it's early stage the subflows counter is not bumped accordingly. This change adds the missing accounting, additionally taking care of updating accordingly the 'accept_subflow' flag. Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-13 17:04:30 -07:00
Linus Torvalds	6dd5884d1d	NFS client bugfixes for Linux 5.18 Highlights include: Stable fixes: - SUNRPC: Ensure that the gssproxy client can start in a connected state Bugfixes: - Revert "SUNRPC: Ensure gss-proxy connects on setup" - nfs: fix broken handling of the softreval mount option -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJ+j5kACgkQZwvnipYK APIQag/+JXtDt9xo0SsGTMJ2PJArnRoyd3QcZjJtoabaZTylZyILZ20sxvt/jgby miOKrI+bbsykbrwQijRLF/Yys1G+iMBzSWy4lpJ9eH4AVkfjr5qqauRbPobo/ZiI i5fDLR84FlzKYWBY1Nv6YE5VREukIlXbrq7KWe/HoS7/hSAamhkUv+a0M8iNLGO4 QtH7+M0iBZTI9yM4gFAcMAANV21SxvjqP1z62kbCp00qO2mL0PgF/2pxC9WgX24/ EZX037ykzKFkjgWfzT8+/eIfCQGIPi/9e6Ir4Bc99psVFOYd1YxkTLBycNwm1VOz 5RLORbURDVMPQH2/qZ57u7B0gJF76UZM4pH3gv+i9nFhoUqf3kFZAOy48MZEFz3X sPiQZLck63mvO9Bd3QX6pFZc0datiYmhuXRknjxV6Bz/Y41y3NzeJeX4r88s4q7l tisZDmaIm0Q+H07QOTL0aCk456amP6XLnO1+nu/PSR/3ImwJSaOpypHx/BxDJUTG TvpY1ouBZ8irfT6JrbfVnSdbedIZx4c+6btVw6mlT40edZF6M3r+3s8s2TU7x+co uiBMB4Qj/C19zqcf/DziiL1PZEJPm3lk0fBHc6JIWCV4I3eYxQ5J4nFJsy8/jPm1 lTgreoHWfnYC3WhlxUk+N3+9X/9+iFEJUxqGYfANf8GFaGqcYVQ= =/Yg+ -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "One more pull request. There was a bug in the fix to ensure that gss- proxy continues to work correctly after we fixed the AF_LOCAL socket leak in the RPC code. This therefore reverts that broken patch, and replaces it with one that works correctly. Stable fixes: - SUNRPC: Ensure that the gssproxy client can start in a connected state Bugfixes: - Revert "SUNRPC: Ensure gss-proxy connects on setup" - nfs: fix broken handling of the softreval mount option" * tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix broken handling of the softreval mount option SUNRPC: Ensure that the gssproxy client can start in a connected state Revert "SUNRPC: Ensure gss-proxy connects on setup"	2022-05-13 11:04:37 -07:00
Jakub Kicinski	2c5f153647	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-05-13 1) Cleanups for the code behind the XFRM offload API. This is a preparation for the extension of the API for policy offload. From Leon Romanovsky. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: drop not needed flags variable in XFRM offload struct net/mlx5e: Use XFRM state direction instead of flags netdevsim: rely on XFRM state direction instead of flags ixgbe: propagate XFRM offload state direction instead of flags xfrm: store and rely on direction to construct offload flags xfrm: rename xfrm_state_offload struct to allow reuse xfrm: delete not used number of external headers xfrm: free not used XFRM_ESP_NO_TRAILER flag ==================== Link: https://lore.kernel.org/r/20220513151218.4010119-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-13 10:25:08 -07:00
Kevin Mitchell	4f9bd53084	netfilter: conntrack: skip verification of zero UDP checksum The checksum is optional for UDP packets. However nf_reject would previously require a valid checksum to elicit a response such as ICMP_DEST_UNREACH. Add some logic to nf_reject_verify_csum to determine if a UDP packet has a zero checksum and should therefore not be verified. Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:28 +02:00
Sven Auhagen	3412e16418	netfilter: flowtable: nft_flow_route use more data for reverse route When creating a flow table entry, the reverse route is looked up based on the current packet. There can be scenarios where the user creates a custom ip rule to route the traffic differently. In order to support those scenarios, the lookup needs to add more information based on the current packet. The patch adds multiple new information to the route lookup. Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:28 +02:00
Florian Westphal	90d1daa458	netfilter: conntrack: add nf_conntrack_events autodetect mode This adds the new nf_conntrack_events=2 mode and makes it the default. This leverages the earlier flag in struct net to allow to avoid the event extension as long as no event listener is active in the namespace. This avoids, for most cases, allocation of ct->ext area. A followup patch will take further advantage of this by avoiding calls down into the event framework if the extension isn't present. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-05-13 18:56:28 +02:00

... 3 4 5 6 7 ...

69469 Commits