linux

iv/linux

Author	SHA1	Message	Date
Kuniyuki Iwashima	8702cf12e6	tcp: Fix listen() warning with v4-mapped-v6 address. syzbot reported a warning [0] introduced by commit c48ef9c4aed3 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address."). After the cited commit, a v4 socket's address matches the corresponding v4-mapped-v6 tb2 in inet_bind2_bucket_match_addr(), not vice versa. During X.X.X.X -> ::ffff:X.X.X.X order bind()s, the second bind() uses bhash and conflicts properly without checking bhash2 so that we need not check if a v4-mapped-v6 sk matches the corresponding v4 address tb2 in inet_bind2_bucket_match_addr(). However, the repro shows that we need to check that in a no-conflict case. The repro bind()s two sockets to the 2-tuples using SO_REUSEPORT and calls listen() for the first socket: from socket import * s1 = socket() s1.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1) s1.bind(('127.0.0.1', 0)) s2 = socket(AF_INET6) s2.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1) s2.bind(('::ffff:127.0.0.1', s1.getsockname()[1])) s1.listen() The second socket should belong to the first socket's tb2, but the second bind() creates another tb2 bucket because inet_bind2_bucket_find() returns NULL in inet_csk_get_port() as the v4-mapped-v6 sk does not match the corresponding v4 address tb2. bhash2[] -> tb2(::ffff:X.X.X.X) -> tb2(X.X.X.X) Then, listen() for the first socket calls inet_csk_get_port(), where the v4 address matches the v4-mapped-v6 tb2 and WARN_ON() is triggered. To avoid that, we need to check if v4-mapped-v6 sk address matches with the corresponding v4 address tb2 in inet_bind2_bucket_match(). The same checks are needed in inet_bind2_bucket_addr_match() too, so we can move all checks there and call it from inet_bind2_bucket_match(). Note that now tb->family is just an address family of tb->(v6_)?rcv_saddr and not of sockets in the bucket. This could be refactored later by defining tb->rcv_saddr as tb->v6_rcv_saddr.s6_addr32[3] and prepending ::ffff: when creating v4 tb2. [0]: WARNING: CPU: 0 PID: 5049 at net/ipv4/inet_connection_sock.c:587 inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587 Modules linked in: CPU: 0 PID: 5049 Comm: syz-executor288 Not tainted 6.6.0-rc2-syzkaller-00018-g2cf0f7156238 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 RIP: 0010:inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587 Code: 7c 24 08 e8 4c b6 8a 01 31 d2 be 88 01 00 00 48 c7 c7 e0 94 ae 8b e8 59 2e a3 f8 2e 2e 2e 31 c0 e9 04 fe ff ff e8 ca 88 d0 f8 <0f> 0b e9 0f f9 ff ff e8 be 88 d0 f8 49 8d 7e 48 e8 65 ca 5a 00 31 RSP: 0018:ffffc90003abfbf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888026429100 RCX: 0000000000000000 RDX: ffff88807edcbb80 RSI: ffffffff88b73d66 RDI: ffff888026c49f38 RBP: ffff888026c49f30 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9260f200 R13: ffff888026c49880 R14: 0000000000000000 R15: ffff888026429100 FS: 00005555557d5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 0000000025754000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> inet_csk_listen_start+0x155/0x360 net/ipv4/inet_connection_sock.c:1256 __inet_listen_sk+0x1b8/0x5c0 net/ipv4/af_inet.c:217 inet_listen+0x93/0xd0 net/ipv4/af_inet.c:239 __sys_listen+0x194/0x270 net/socket.c:1866 __do_sys_listen net/socket.c:1875 [inline] __se_sys_listen net/socket.c:1873 [inline] __x64_sys_listen+0x53/0x80 net/socket.c:1873 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f3a5bce3af9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc1a1c79e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000032 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a5bce3af9 RDX: 00007f3a5bce3af9 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007f3a5bd565f0 R08: 0000000000000006 R09: 0000000000000006 R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001 R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 </TASK> Fixes: c48ef9c4aed3 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.") Reported-by: syzbot+71e724675ba3958edb31@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=71e724675ba3958edb31 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231010013814.70571-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-13 10:01:49 -07:00
Heng Guo	cf8b49fbd0	net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1800 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1800 Reproduce: VM1 send 1600 bytes UDP data to VM3 using tools scapy with flags='DF'. scapy command: send(IP(dst="192.168.123.240",flags='DF')/UDP()/str('0'*1600),count=1, inter=1.000000) Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 7 0 2 2 0 0 2 5 0 0 0 0 0 0 0 1 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- ForwDatagrams is always keeping 2 without increment. Issue description and patch: ip_exceeds_mtu() in ip_forward() drops this IP datagram because skb len (1600 sending by scapy) is over MTU(1500 in VM2) if "DF" is set. According to RFC 4293 "3.2.3. IP Statistics Tables", +-------+------>------+----->-----+----->-----+ \| InForwDatagrams (6) \| OutForwDatagrams (6) \| \| V +->-+ OutFragReqds \| InNoRoutes \| \| (packets) / (local packet (3) \| \| \| IF is that of the address \| +--> OutFragFails \| and may not be the receiving IF) \| \| (packets) the IPSTATS_MIB_OUTFORWDATAGRAMS should be counted before fragment check. The existing implementation, instead, would incease the counter after fragment check: ip_exceeds_mtu() in ipv4 and ip6_pkt_too_big() in ipv6. So do patch to move IPSTATS_MIB_OUTFORWDATAGRAMS counter to ip_forward() for ipv4 and ip6_forward() for ipv6. Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 7 0 2 3 0 0 2 5 0 0 0 0 0 0 0 1 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- ForwDatagrams is updated from 2 to 3. Reviewed-by: Filip Pudak <filip.pudak@windriver.com> Signed-off-by: Heng Guo <heng.guo@windriver.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231011015137.27262-1-heng.guo@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-13 09:58:45 -07:00
Sabrina Dubroca	9f0c824551	tls: use fixed size for tls_offload_context_{tx,rx}.driver_state driver_state is a flex array, but is always allocated by the tls core to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by making that size explicit so that sizeof(struct tls_offload_context_{tx,rx}) works. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	1cf7fbcee6	tls: validate crypto_info in a separate helper Simplify do_tls_setsockopt_conf a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	4f48669918	tls: remove tls_context argument from tls_set_device_offload It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. While at it, fix up the reverse xmas tree ordering. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	b6a30ec923	tls: remove tls_context argument from tls_set_sw_offload It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	0137407999	tls: add a helper to allocate/initialize offload_ctx_tx Simplify tls_set_device_offload a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	1a074f7618	tls: also use init_prot_info in tls_set_device_offload Most values are shared. Nonce size turns out to be equal to IV size for all offloadable ciphers. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	a9937816ed	tls: move tls_prot_info initialization out of tls_set_sw_offload Simplify tls_set_sw_offload, and allow reuse for the tls_device code. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	615580cbc9	tls: extract context alloc/initialization out of tls_set_sw_offload Simplify tls_set_sw_offload a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	1c1cb3110d	tls: store iv directly within cipher_context TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit in cipher_context's size and can simplify the init code a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:10 +01:00
Sabrina Dubroca	bee6b7b307	tls: rename MAX_IV_SIZE to TLS_MAX_IV_SIZE It's defined in include/net/tls.h, avoid using an overly generic name. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:09 +01:00
Sabrina Dubroca	6d5029e547	tls: store rec_seq directly within cipher_context TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:09 +01:00
Sabrina Dubroca	8f1d532b4a	tls: drop unnecessary cipher_type checks in tls offload We should never reach tls_device_reencrypt, tls_enc_record, or tls_enc_skb with a cipher_type that can't be offloaded. Replace those checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of hard-coding offloadable cipher types. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:09 +01:00
Sabrina Dubroca	3bab3ee0f9	tls: get salt using crypto_info_salt in tls_enc_skb I skipped this conversion in my previous series. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 11:26:09 +01:00
Amit Cohen	38985e8c27	net: Handle bulk delete policy in bridge driver The merge commit 92716869375b ("Merge branch 'br-flush-filtering'") added support for FDB flushing in bridge driver. The following patches will extend VXLAN driver to support FDB flushing as well. The netlink message for bulk delete is shared between the drivers. With the existing implementation, there is no way to prevent user from flushing with attributes that are not supported per driver. For example, when VNI will be added, user will not get an error for flush FDB entries in bridge with VNI, although this attribute is not relevant for bridge. As preparation for support of FDB flush in VXLAN driver, move the policy to be handled in bridge driver, later a new policy for VXLAN will be added in VXLAN driver. Do not pass 'vid' as part of ndo_fdb_del_bulk(), as this field is relevant only for bridge. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-13 10:00:30 +01:00
Eric Dumazet	de5724ca38	xfrm: fix a data-race in xfrm_lookup_with_ifid() syzbot complains about a race in xfrm_lookup_with_ifid() [1] When preparing commit 0a9e5794b21e ("xfrm: annotate data-race around use_time") I thought xfrm_lookup_with_ifid() was modifying a still private structure. [1] BUG: KCSAN: data-race in xfrm_lookup_with_ifid / xfrm_lookup_with_ifid write to 0xffff88813ea41108 of 8 bytes by task 8150 on cpu 1: xfrm_lookup_with_ifid+0xce7/0x12d0 net/xfrm/xfrm_policy.c:3218 xfrm_lookup net/xfrm/xfrm_policy.c:3270 [inline] xfrm_lookup_route+0x3b/0x100 net/xfrm/xfrm_policy.c:3281 ip6_dst_lookup_flow+0x98/0xc0 net/ipv6/ip6_output.c:1246 send6+0x241/0x3c0 drivers/net/wireguard/socket.c:139 wg_socket_send_skb_to_peer+0xbd/0x130 drivers/net/wireguard/socket.c:178 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 write to 0xffff88813ea41108 of 8 bytes by task 15867 on cpu 0: xfrm_lookup_with_ifid+0xce7/0x12d0 net/xfrm/xfrm_policy.c:3218 xfrm_lookup net/xfrm/xfrm_policy.c:3270 [inline] xfrm_lookup_route+0x3b/0x100 net/xfrm/xfrm_policy.c:3281 ip6_dst_lookup_flow+0x98/0xc0 net/ipv6/ip6_output.c:1246 send6+0x241/0x3c0 drivers/net/wireguard/socket.c:139 wg_socket_send_skb_to_peer+0xbd/0x130 drivers/net/wireguard/socket.c:178 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 value changed: 0x00000000651cd9d1 -> 0x00000000651cd9d2 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 15867 Comm: kworker/u4:58 Not tainted 6.6.0-rc4-syzkaller-00016-g5e62ed3b1c8a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Workqueue: wg-kex-wg2 wg_packet_handshake_send_worker Fixes: 0a9e5794b21e ("xfrm: annotate data-race around use_time") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2023-10-13 07:57:27 +02:00
Jakub Kicinski	0e6bb5b7f4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: kernel/bpf/verifier.c 829955981c55 ("bpf: Fix verifier log for async callback return values") a923819fb2c5 ("bpf: Treat first argument as return value for bpf_throw") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-12 17:07:34 -07:00
Florian Westphal	2f0968a030	net: gso_test: fix build with gcc-12 and earlier gcc 12 errors out with: net/core/gso_test.c:58:48: error: initializer element is not constant 58 \| .segs = (const unsigned int[]) { gso_size }, This version isn't old (2022), so switch to preprocessor-bsaed constant instead of 'static const int'. Cc: Willem de Bruijn <willemb@google.com> Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com> Closes: https://lore.kernel.org/netdev/79fbe35c-4dd1-4f27-acb2-7a60794bc348@linux.vnet.ibm.com/ Fixes: 1b4fa28a8b07 ("net: parametrize skb_segment unit test to expand coverage") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231012120901.10765-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-12 15:34:08 +02:00
Florian Westphal	d351c1ea2d	netfilter: nft_payload: fix wrong mac header matching mcast packets get looped back to the local machine. Such packets have a 0-length mac header, we should treat this like "mac header not set" and abort rule evaluation. As-is, we just copy data from the network header instead. Fixes: 96518518cc41 ("netfilter: add nftables") Reported-by: Blažej Krajňák <krajnak@levonet.sk> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Xingyuan Mo	505ce0630a	nf_tables: fix NULL pointer dereference in nft_expr_inner_parse() We should check whether the NFTA_EXPR_NAME netlink attribute is present before accessing it, otherwise a null pointer deference error will occur. Call Trace: <TASK> dump_stack_lvl+0x4f/0x90 print_report+0x3f0/0x620 kasan_report+0xcd/0x110 __asan_load2+0x7d/0xa0 nla_strcmp+0x2f/0x90 __nft_expr_type_get+0x41/0xb0 nft_expr_inner_parse+0xe3/0x200 nft_inner_init+0x1be/0x2e0 nf_tables_newrule+0x813/0x1230 nfnetlink_rcv_batch+0xec3/0x1170 nfnetlink_rcv+0x1e4/0x220 netlink_unicast+0x34e/0x4b0 netlink_sendmsg+0x45c/0x7e0 __sys_sendto+0x355/0x370 __x64_sys_sendto+0x84/0xa0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Signed-off-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Xingyuan Mo	52177bbf19	nf_tables: fix NULL pointer dereference in nft_inner_init() We should check whether the NFTA_INNER_NUM netlink attribute is present before accessing it, otherwise a null pointer deference error will occur. Call Trace: dump_stack_lvl+0x4f/0x90 print_report+0x3f0/0x620 kasan_report+0xcd/0x110 __asan_load4+0x84/0xa0 nft_inner_init+0x128/0x2e0 nf_tables_newrule+0x813/0x1230 nfnetlink_rcv_batch+0xec3/0x1170 nfnetlink_rcv+0x1e4/0x220 netlink_unicast+0x34e/0x4b0 netlink_sendmsg+0x45c/0x7e0 __sys_sendto+0x355/0x370 __x64_sys_sendto+0x84/0xa0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Signed-off-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Pablo Neira Ayuso	4c90bba60c	netfilter: nf_tables: do not refresh timeout when resetting element The dump and reset command should not refresh the timeout, this command is intended to allow users to list existing stateful objects and reset them, element expiration should be refresh via transaction instead with a specific command to achieve this, otherwise this is entering combo semantics that will be hard to be undone later (eg. a user asking to retrieve counters but _not_ requiring to refresh expiration). Fixes: 079cd633219d ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Kees Cook	d51c42cdef	netfilter: nf_tables: Annotate struct nft_pipapo_match with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct nft_pipapo_match. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Florian Westphal	2e1d175410	netfilter: nfnetlink_log: silence bogus compiler warning net/netfilter/nfnetlink_log.c:800:18: warning: variable 'ctinfo' is uninitialized The warning is bogus, the variable is only used if ct is non-NULL and always initialised in that case. Init to 0 too to silence this. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309100514.ndBFebXN-lkp@intel.com/ Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Pablo Neira Ayuso	ebd032fa88	netfilter: nf_tables: do not remove elements if set backend implements .abort pipapo set backend maintains two copies of the datastructure, removing the elements from the copy that is going to be discarded slows down the abort path significantly, from several minutes to few seconds after this patch. Fixes: 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2023-10-12 10:28:45 +02:00
Jeremy Cline	354a6e707e	nfc: nci: assert requested protocol is valid The protocol is used in a bit mask to determine if the protocol is supported. Assert the provided protocol is less than the maximum defined so it doesn't potentially perform a shift-out-of-bounds and provide a clearer error for undefined protocols vs unsupported ones. Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Reported-and-tested-by: syzbot+0839b78e119aae1fec78@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0839b78e119aae1fec78 Signed-off-by: Jeremy Cline <jeremy@jcline.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231009200054.82557-1-jeremy@jcline.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-12 09:32:10 +02:00
Kuniyuki Iwashima	e2bca4870f	af_packet: Fix fortified memcpy() without flex array. Sergei Trofimovich reported a regression [0] caused by commit a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname()."). It introduced a flex array sll_addr_flex in struct sockaddr_ll as a union-ed member with sll_addr to work around the fortified memcpy() check. However, a userspace program uses a struct that has struct sockaddr_ll in the middle, where a flex array is illegal to exist. include/linux/if_packet.h:24:17: error: flexible array member 'sockaddr_ll::<unnamed union>::<unnamed struct>::sll_addr_flex' not at end of 'struct packet_info_t' 24 \| __DECLARE_FLEX_ARRAY(unsigned char, sll_addr_flex); \| ^~~~~~~~~~~~~~~~~~~~ To fix the regression, let's go back to the first attempt [1] telling memcpy() the actual size of the array. Reported-by: Sergei Trofimovich <slyich@gmail.com> Closes: https://github.com/NixOS/nixpkgs/pull/252587#issuecomment-1741733002 [0] Link: https://lore.kernel.org/netdev/20230720004410.87588-3-kuniyu@amazon.com/ [1] Fixes: a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20231009153151.75688-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-12 09:15:15 +02:00
Jakub Kicinski	21b2e2624d	netfilter net-next pull request 2023-10-10 -----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUlYWENHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2AC68D/0WnxpQHiCMl7rUWrDJjmhYenUY1uM9VZb1fpd7 hH543aVbJmWNQj9uzWmKceuLKtIV5jCgsgzqHcxq5e0oHLc3IjIOX1j6L6Dtv7Wf 8obzfCZOYvp0rfaB/2W8d+ehet69i8UQlWM6uKVD/vEedAynfu8AaCgaNy6WwgrK ElXa3lZzflfoeqvwZkndxIkeuoXQB4mYCcuIXuseB+pYv5pz2th2OsGNxeeae0Q0 XlOPY7LNrAWMbUFfJUiRMb7bEXINOcmZMGC/9OTpYTJe3j32ybi0pZY91AV8HEQY xuQY2k4dVs4bSfxPFkrXlansPh7fbtl/EF6LNoDbCobV6RAZrc0SEM7eSybRoigm FrsWdms4bp5RRaBWxi3MRj1ir8nBHjLJ3AHlec1h6EMcXOWsF91u8mk0+0i9eF8E htw7Rxc7ZR3xRUKFxA5+53oYwMYnst9PdZ54DKrAMJOfykU6pGx2hJbsmTfodeGl zAT1GXPjEDejbXdmD84CzIs+bCmuGNrQQZ6o9gAaYVm9DJJMEXUpp1yK+/ApEnsz pcZNi7NeZRlwJhayLCYdvTT3sUlcDYqGuJoDE/krKNH6Aq0iAb/zvHxbOAn7R550 UFTVpB78W+0G0eufpgCIsxGHIgAZOGYWTe/TcrQIghWPNxfrt6p5XZVh2N4g8n9C nPiOdg== =pmbB -----END PGP SIGNATURE----- Merge tag 'nf-next-23-10-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for next First 5 patches, from Phil Sutter, clean up nftables dumpers to use the context buffer in the netlink_callback structure rather than a kmalloc'd buffer. Patch 6, from myself, zaps dead code and replaces the helper function with a small inlined helper. Patch 7, also from myself, removes another pr_debug and replaces it with the existing nf_log-based debug helpers. Last patch, from George Guo, gets nft_table comments back in sync with the structure members. * tag 'nf-next-23-10-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: cleanup struct nft_table netfilter: conntrack: prefer tcp_error_log to pr_debug netfilter: conntrack: simplify nf_conntrack_alter_reply netfilter: nf_tables: Don't allocate nft_rule_dump_ctx netfilter: nf_tables: Carry s_idx in nft_rule_dump_ctx netfilter: nf_tables: Carry reset flag in nft_rule_dump_ctx netfilter: nf_tables: Drop pointless memset when dumping rules netfilter: nf_tables: Always allocate nft_rule_dump_ctx ==================== Link: https://lore.kernel.org/r/20231010145343.12551-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-11 17:37:58 -07:00
Daan De Meyer	859051dd16	bpf: Implement cgroup sockaddr hooks for unix sockets These hooks allows intercepting connect(), getsockname(), getpeername(), sendmsg() and recvmsg() for unix sockets. The unix socket hooks get write access to the address length because the address length is not fixed when dealing with unix sockets and needs to be modified when a unix socket address is modified by the hook. Because abstract socket unix addresses start with a NUL byte, we cannot recalculate the socket address in kernelspace after running the hook by calculating the length of the unix socket path using strlen(). These hooks can be used when users want to multiplex syscall to a single unix socket to multiple different processes behind the scenes by redirecting the connect() and other syscalls to process specific sockets. We do not implement support for intercepting bind() because when using bind() with unix sockets with a pathname address, this creates an inode in the filesystem which must be cleaned up. If we rewrite the address, the user might try to clean up the wrong file, leaking the socket in the filesystem where it is never cleaned up. Until we figure out a solution for this (and a use case for intercepting bind()), we opt to not allow rewriting the sockaddr in bind() calls. We also implement recvmsg() support for connected streams so that after a connect() that is modified by a sockaddr hook, any corresponding recmvsg() on the connected socket can also be modified to make the connected program think it is connected to the "intended" remote. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-10-11 17:27:47 -07:00
Jakub Kicinski	71c299c711	net: tcp: fix crashes trying to free half-baked MTU probes tcp_stream_alloc_skb() initializes the skb to use tcp_tsorted_anchor which is a union with the destructor. We need to clean that TCP-iness up before freeing. Fixes: 736013292e3c ("tcp: let tcp_mtu_probe() build headless packets") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231010173651.3990234-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-11 17:24:46 -07:00
Daan De Meyer	53e380d214	bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf As prep for adding unix socket support to the cgroup sockaddr hooks, let's add a kfunc bpf_sock_addr_set_sun_path() that allows modifying a unix sockaddr from bpf. While this is already possible for AF_INET and AF_INET6, we'll need this kfunc when we add unix socket support since modifying the address for those requires modifying both the address and the sockaddr length. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-4-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-10-11 16:29:25 -07:00
Daan De Meyer	fefba7d1ae	bpf: Propagate modified uaddrlen from cgroup sockaddr programs As prep for adding unix socket support to the cgroup sockaddr hooks, let's propagate the sockaddr length back to the caller after running a bpf cgroup sockaddr hook program. While not important for AF_INET or AF_INET6, the sockaddr length is important when working with AF_UNIX sockaddrs as the size of the sockaddr cannot be determined just from the address family or the sockaddr's contents. __cgroup_bpf_run_filter_sock_addr() is modified to take the uaddrlen as an input/output argument. After running the program, the modified sockaddr length is stored in the uaddrlen pointer. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-3-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-10-11 15:03:40 -07:00
Ziyang Xuan	c7f59461f5	Bluetooth: Fix a refcnt underflow problem for hci_conn Syzbot reports a warning as follows: WARNING: CPU: 1 PID: 26946 at net/bluetooth/hci_conn.c:619 hci_conn_timeout+0x122/0x210 net/bluetooth/hci_conn.c:619 ... Call Trace: <TASK> process_one_work+0x884/0x15c0 kernel/workqueue.c:2630 process_scheduled_works kernel/workqueue.c:2703 [inline] worker_thread+0x8b9/0x1290 kernel/workqueue.c:2784 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> It is because the HCI_EV_SIMPLE_PAIR_COMPLETE event handler drops hci_conn directly without check Simple Pairing whether be enabled. But the Simple Pairing process can only be used if both sides have the support enabled in the host stack. Add hci_conn_ssp_enabled() for hci_conn in HCI_EV_IO_CAPA_REQUEST and HCI_EV_SIMPLE_PAIR_COMPLETE event handlers to fix the problem. Fixes: 0493684ed239 ("[Bluetooth] Disable disconnect timer during Simple Pairing") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:17:08 -07:00
Pauli Virtanen	a239110ee8	Bluetooth: hci_sync: always check if connection is alive before deleting In hci_abort_conn_sync it is possible that conn is deleted concurrently by something else, also e.g. when waiting for hdev->lock. This causes double deletion of the conn, so UAF or conn_hash.list corruption. Fix by having all code paths check that the connection is still in conn_hash before deleting it, while holding hdev->lock which prevents any races. Log (when powering off while BAP streaming, occurs rarely): ======================================================================= kernel BUG at lib/list_debug.c:56! ... ? __list_del_entry_valid (lib/list_debug.c:56) hci_conn_del (net/bluetooth/hci_conn.c:154) bluetooth hci_abort_conn_sync (net/bluetooth/hci_sync.c:5415) bluetooth ? __pfx_hci_abort_conn_sync+0x10/0x10 [bluetooth] ? lock_release+0x1d5/0x3c0 ? hci_disconnect_all_sync.constprop.0+0xb2/0x230 [bluetooth] ? __pfx_lock_release+0x10/0x10 ? __kmem_cache_free+0x14d/0x2e0 hci_disconnect_all_sync.constprop.0+0xda/0x230 [bluetooth] ? __pfx_hci_disconnect_all_sync.constprop.0+0x10/0x10 [bluetooth] ? hci_clear_adv_sync+0x14f/0x170 [bluetooth] ? __pfx_set_powered_sync+0x10/0x10 [bluetooth] hci_set_powered_sync+0x293/0x450 [bluetooth] ======================================================================= Fixes: 94d9ba9f9888 ("Bluetooth: hci_sync: Fix UAF in hci_disconnect_all_sync") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:16:46 -07:00
Lee, Chun-Yi	1ffc6f8cc3	Bluetooth: Reject connection with the device which has same BD_ADDR This change is used to relieve CVE-2020-26555. The description of the CVE: Bluetooth legacy BR/EDR PIN code pairing in Bluetooth Core Specification 1.0B through 5.2 may permit an unauthenticated nearby device to spoof the BD_ADDR of the peer device to complete pairing without knowledge of the PIN. [1] The detail of this attack is in IEEE paper: BlueMirror: Reflections on Bluetooth Pairing and Provisioning Protocols [2] It's a reflection attack. The paper mentioned that attacker can induce the attacked target to generate null link key (zero key) without PIN code. In BR/EDR, the key generation is actually handled in the controller which is below HCI. A condition of this attack is that attacker should change the BR_ADDR of his hacking device (Host B) to equal to the BR_ADDR with the target device being attacked (Host A). Thus, we reject the connection with device which has same BD_ADDR both on HCI_Create_Connection and HCI_Connection_Request to prevent the attack. A similar implementation also shows in btstack project. [3][4] Cc: stable@vger.kernel.org Link: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-26555 [1] Link: https://ieeexplore.ieee.org/abstract/document/9474325/authors#authors [2] Link: https://github.com/bluekitchen/btstack/blob/master/src/hci.c#L3523 [3] Link: https://github.com/bluekitchen/btstack/blob/master/src/hci.c#L7297 [4] Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:16:24 -07:00
Lee, Chun-Yi	33155c4aae	Bluetooth: hci_event: Ignore NULL link key This change is used to relieve CVE-2020-26555. The description of the CVE: Bluetooth legacy BR/EDR PIN code pairing in Bluetooth Core Specification 1.0B through 5.2 may permit an unauthenticated nearby device to spoof the BD_ADDR of the peer device to complete pairing without knowledge of the PIN. [1] The detail of this attack is in IEEE paper: BlueMirror: Reflections on Bluetooth Pairing and Provisioning Protocols [2] It's a reflection attack. The paper mentioned that attacker can induce the attacked target to generate null link key (zero key) without PIN code. In BR/EDR, the key generation is actually handled in the controller which is below HCI. Thus, we can ignore null link key in the handler of "Link Key Notification event" to relieve the attack. A similar implementation also shows in btstack project. [3] v3: Drop the connection when null link key be detected. v2: - Used Link: tag instead of Closes: - Used bt_dev_dbg instead of BT_DBG - Added Fixes: tag Cc: stable@vger.kernel.org Fixes: 55ed8ca10f35 ("Bluetooth: Implement link key handling for the management interface") Link: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-26555 [1] Link: https://ieeexplore.ieee.org/abstract/document/9474325/authors#authors [2] Link: https://github.com/bluekitchen/btstack/blob/master/src/hci.c#L3722 [3] Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:16:02 -07:00
Iulia Tanasescu	acab8ff29a	Bluetooth: ISO: Fix invalid context error This moves the hci_le_terminate_big_sync call from rx_work to cmd_sync_work, to avoid calling sleeping function from an invalid context. Reported-by: syzbot+c715e1bd8dfbcb1ab176@syzkaller.appspotmail.com Fixes: a0bfde167b50 ("Bluetooth: ISO: Add support for connecting multiple BISes") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2023-10-11 11:15:40 -07:00
Johannes Berg	f2ac54ebf8	net: rfkill: reduce data->mtx scope in rfkill_fop_open In syzbot runs, lockdep reports that there's a (potential) deadlock here of data->mtx being locked recursively. This isn't really a deadlock since they are different instances, but lockdep cannot know, and teaching it would be far more difficult than other fixes. At the same time we don't even really _need_ the mutex to be locked in rfkill_fop_open(), since we're modifying only a completely fresh instance of 'data' (struct rfkill_data) that's not yet added to the global list. However, to avoid any reordering etc. within the globally locked section, and to make the code look more symmetric, we should still lock the data->events list manipulation, but also need to lock _only_ that. So do that. Reported-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-10-11 16:55:10 +02:00
Josua Mayer	b2f750c3a8	net: rfkill: gpio: prevent value glitch during probe When either reset- or shutdown-gpio have are initially deasserted, e.g. after a reboot - or when the hardware does not include pull-down, there will be a short toggle of both IOs to logical 0 and back to 1. It seems that the rfkill default is unblocked, so the driver should not glitch to output low during probe. It can lead e.g. to unexpected lte modem reconnect: [1] root@localhost:~# dmesg \| grep "usb 2-1" [ 2.136124] usb 2-1: new SuperSpeed USB device number 2 using xhci-hcd [ 21.215278] usb 2-1: USB disconnect, device number 2 [ 28.833977] usb 2-1: new SuperSpeed USB device number 3 using xhci-hcd The glitch has been discovered on an arm64 board, now that device-tree support for the rfkill-gpio driver has finally appeared :). Change the flags for devm_gpiod_get_optional from GPIOD_OUT_LOW to GPIOD_ASIS to avoid any glitches. The rfkill driver will set the intended value during rfkill_sync_work. Fixes: 7176ba23f8b5 ("net: rfkill: add generic gpio rfkill driver") Signed-off-by: Josua Mayer <josua@solid-run.com> Link: https://lore.kernel.org/r/20231004163928.14609-1-josua@solid-run.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-10-11 16:36:52 +02:00
Johannes Berg	02e0e426a2	wifi: mac80211: fix error path key leak In the previous key leak fix for the other error paths, I meant to unify all of them to the same place, but used the wrong label, which I noticed when doing the merge into wireless-next. Fix it. Fixes: d097ae01ebd4 ("wifi: mac80211: fix potential key leak") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-10-11 16:36:14 +02:00
Johannes Berg	91d20ab9d9	wifi: cfg80211: use system_unbound_wq for wiphy work Since wiphy work items can run pretty much arbitrary code in the stack/driver, it can take longer to run all of this, so we shouldn't be using system_wq via schedule_work(). Also, we lock the wiphy (which is the reason this exists), so use system_unbound_wq. Reported-and-tested-by: Kalle Valo <kvalo@kernel.org> Fixes: a3ee4dc84c4e ("wifi: cfg80211: add a work abstraction with special semantics") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2023-10-11 16:36:12 +02:00
Willem de Bruijn	4688ecb138	net: expand skb_segment unit test with frag_list coverage Expand the test with these variants that use skb frag_list: - GSO_TEST_FRAG_LIST: frag_skb length is gso_size - GSO_TEST_FRAG_LIST_PURE: same, data exclusively in frag skbs - GSO_TEST_FRAG_LIST_NON_UNIFORM: frag_skb length may vary - GSO_TEST_GSO_BY_FRAGS: frag_skb length defines gso_size, i.e., segs may have varying sizes. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:39:01 +01:00
Willem de Bruijn	1b4fa28a8b	net: parametrize skb_segment unit test to expand coverage Expand the test with variants - GSO_TEST_NO_GSO: payload size less than or equal to gso_size - GSO_TEST_FRAGS: payload in both linear and page frags - GSO_TEST_FRAGS_PURE: payload exclusively in page frags - GSO_TEST_GSO_PARTIAL: produce one gso segment of multiple of gso_size, plus optionally one non-gso trailer segment Define a test struct that encodes the input gso skb and output segs. Input in terms of linear and fragment lengths. Output as length of each segment. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:39:01 +01:00
Willem de Bruijn	b3098d32ed	net: add skb_segment kunit test Add unit testing for skb segment. This function is exercised by many different code paths, such as GSO_PARTIAL or GSO_BY_FRAGS, linear (with or without head_frag), frags or frag_list skbs, etc. It is infeasible to manually run tests that cover all code paths when making changes. The long and complex function also makes it hard to establish through analysis alone that a patch has no unintended side-effects. Add code coverage through kunit regression testing. Introduce kunit infrastructure for tests under net/core, and add this first test. This first skb_segment test exercises a simple case: a linear skb. Follow-on patches will parametrize the test and add more variants. Tested: Built and ran the test with make ARCH=um mrproper ./tools/testing/kunit/kunit.py run \ --kconfig_add CONFIG_NET=y \ --kconfig_add CONFIG_DEBUG_KERNEL=y \ --kconfig_add CONFIG_DEBUG_INFO=y \ --kconfig_add=CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y \ net_core_gso Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:39:01 +01:00
Nils Hoppmann	a950a5921d	net/smc: Fix pos miscalculation in statistics SMC_STAT_PAYLOAD_SUB(_smc_stats, _tech, key, _len, _rc) will calculate wrong bucket positions for payloads of exactly 4096 bytes and (1 << (m + 12)) bytes, with m == SMC_BUF_MAX - 1. Intended bucket distribution: Assume l == size of payload, m == SMC_BUF_MAX - 1. Bucket 0 : 0 < l <= 2^13 Bucket n, 1 <= n <= m-1 : 2^(n+12) < l <= 2^(n+13) Bucket m : l > 2^(m+12) Current solution: _pos = fls64((l) >> 13) [...] _pos = (_pos < m) ? ((l == 1 << (_pos + 12)) ? _pos - 1 : _pos) : m For l == 4096, _pos == -1, but should be _pos == 0. For l == (1 << (m + 12)), _pos == m, but should be _pos == m - 1. In order to avoid special treatment of these corner cases, the calculation is adjusted. The new solution first subtracts the length by one, and then calculates the correct bucket by shifting accordingly, i.e. _pos = fls64((l - 1) >> 13), l > 0. This not only fixes the issues named above, but also makes the whole bucket assignment easier to follow. Same is done for SMC_STAT_RMB_SIZE_SUB(_smc_stats, _tech, k, _len), where the calculation of the bucket position is similar to the one named above. Fixes: e0e4b8fa5338 ("net/smc: Add SMC statistics support") Suggested-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Nils Hoppmann <niho@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:36:35 +01:00
Yajun Deng	5247dbf16c	net/core: Introduce netdev_core_stats_inc() Although there is a kfree_skb_reason() helper function that can be used to find the reason why this skb is dropped, but most callers didn't increase one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. For the users, people are more concerned about why the dropped in ip is increasing. Introduce netdev_core_stats_inc() for trace the caller of dev_core_stats_*_inc(). Also, add __code to netdev_core_stats_alloc(), as it's called with small probability. And add noinline make sure netdev_core_stats_inc was never inlined. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:07:40 +01:00
Russell King (Oracle)	63b9f7a19f	net: dsa: remove dsa_port_phylink_validate() As all drivers now provide phylink capabilities (including MAC), the if() condition in dsa_port_phylink_validate() will always be true. We will always use the generic validator, which phylink will call itself if the .validate method isn't populated. Thus, there is now no need to implement the .validate method, so this implementation can be removed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-10-11 10:06:05 +01:00
Jakub Kicinski	ad98426a88	bpf-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZSXLzwAKCRDbK58LschI g1wuAQDTT1mrUmRqrpPob/U3HCcTg64hgdRwyF+6IU39/+neGwEAoP0FKZoy3DDf C8FOdVChBjapPsp9zTeYPv0nlZMITAE= =1Shl -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-10-11 We've added 14 non-merge commits during the last 5 day(s) which contain a total of 12 files changed, 398 insertions(+), 104 deletions(-). The main changes are: 1) Fix s390 JIT backchain issues in the trampoline code generation which previously clobbered the caller's backchain, from Ilya Leoshkevich. 2) Fix zero-size allocation warning in xsk sockets when the configured ring size was close to SIZE_MAX, from Andrew Kanner. 3) Fixes for bpf_mprog API that were found when implementing support in the ebpf-go library along with selftests, from Daniel Borkmann and Lorenz Bauer. 4) Fix riscv JIT to properly sign-extend the return register in programs. This fixes various test_progs selftests on riscv, from Björn Töpel. 5) Fix verifier log for async callback return values where the allowed range was displayed incorrectly, from David Vernet. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: s390/bpf: Fix unwinding past the trampoline s390/bpf: Fix clobbering the caller's backchain in the trampoline selftests/bpf: Add testcase for async callback return value failure bpf: Fix verifier log for async callback return values xdp: Fix zero-size allocation warning in xskq_create() riscv, bpf: Track both a0 (RISC-V ABI) and a5 (BPF) return values riscv, bpf: Sign-extend return values selftests/bpf: Make seen_tc* variable tests more robust selftests/bpf: Test query on empty mprog and pass revision into attach selftests/bpf: Adapt assert_mprog_count to always expect 0 count selftests/bpf: Test bpf_mprog query API via libbpf and raw syscall bpf: Refuse unused attributes in bpf_prog_{attach,detach} bpf: Handle bpf_mprog_query with NULL entry bpf: Fix BPF_PROG_QUERY last field check ==================== Link: https://lore.kernel.org/r/20231010223610.3984-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-10 19:59:49 -07:00
Kory Maincent	108a36d07c	ethtool: Fix mod state of verbose no_mask bitset A bitset without mask in a _SET request means we want exactly the bits in the bitset to be set. This works correctly for compact format but when verbose format is parsed, ethnl_update_bitset32_verbose() only sets the bits present in the request bitset but does not clear the rest. The commit 6699170376ab fixes this issue by clearing the whole target bitmap before we start iterating. The solution proposed brought an issue with the behavior of the mod variable. As the bitset is always cleared the old val will always differ to the new val. Fix it by adding a new temporary variable which save the state of the old bitmap. Fixes: 6699170376ab ("ethtool: fix application of verbose no_mask bitset") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231009133645.44503-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-10 19:48:15 -07:00

1 2 3 4 5 ...

74962 Commits