linux

iv/linux

Author	SHA1	Message	Date
Chuck Lever	28ee0ec894	svcrdma: De-duplicate completion ID initialization helpers Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:31 -05:00
Chuck Lever	018f34051b	svcrdma: Move the svc_rdma_cc_init() call Now that the chunk_ctxt for Reads is no longer dynamically allocated it can be initialized once for the life of the object that contains it (struct svc_rdma_recv_ctxt). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:31 -05:00
Chuck Lever	57666bbb4e	svcrdma: Remove struct svc_rdma_read_info The remaining fields of struct svc_rdma_read_info are no longer referenced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:31 -05:00
Chuck Lever	efd02cb0dd	svcrdma: Update the synopsis of svc_rdma_read_special() Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_special() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:31 -05:00
Chuck Lever	23bab3b22d	svcrdma: Update the synopsis of svc_rdma_read_call_chunk() Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_call_chunk() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:31 -05:00
Chuck Lever	740a3c895d	svcrdma: Update synopsis of svc_rdma_read_multiple_chunks() Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_multiple_chunks() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:31 -05:00
Chuck Lever	6518204d23	svcrdma: Update synopsis of svc_rdma_copy_inline_range() Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_copy_inline_range() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:30 -05:00
Chuck Lever	6e4b9b8643	svcrdma: Update the synopsis of svc_rdma_read_data_item() Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_data_item() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:30 -05:00
Chuck Lever	c7eb4feb1b	svcrdma: Update synopsis of svc_rdma_read_chunk_range() Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:30 -05:00
Chuck Lever	02e8fe1eca	svcrdma: Update synopsis of svc_rdma_build_read_chunk() Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_chunk() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:30 -05:00
Chuck Lever	fc20f19b4d	svcrdma: Update synopsis of svc_rdma_build_read_segment() Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_segment() can use the recv_ctxt to derive that information rather than the other way around. This removes one usage of the ri_readctxt field, enabling its removal in a subsequent patch. At the same time, the use of ri_rqst can similarly be replaced with a passed-in function parameter. Start with build_read_segment() because it is a common utility function at the bottom of the Read chunk path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:30 -05:00
Chuck Lever	919f6e790a	svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxt Further clean up: move the starting byte offset field into svc_rdma_recv_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:30 -05:00
Chuck Lever	8e12258268	svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt Further clean up: move the page index field into svc_rdma_recv_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:29 -05:00
Chuck Lever	b1818412d0	svcrdma: Start moving fields out of struct svc_rdma_read_info Since the request's svc_rdma_recv_ctxt will stay around for the duration of the RDMA Read operation, the contents of struct svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt rather than being allocated separately. This will eventually save a call to kmalloc() in a hot path. Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:29 -05:00
Chuck Lever	6a04a43493	svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.h Prepare for nestling these into the send and recv ctxts so they no longer have to be allocated dynamically. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:29 -05:00
Chuck Lever	2cc0f23b53	svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma field In every instance, the pointer address in that field is now available by other means. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:29 -05:00
Chuck Lever	bc8fd4e915	svcrdma: Pass a pointer to the transport to svc_rdma_cc_release() Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:29 -05:00
Chuck Lever	83fe6dd6a8	svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt() Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:29 -05:00
Chuck Lever	4a68edd93f	svcrdma: Explicitly pass the transport into Read chunk I/O paths Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:28 -05:00
Chuck Lever	c3899b7107	svcrdma: Explicitly pass the transport into Write chunk I/O paths Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:28 -05:00
Chuck Lever	c4fd9f4525	svcrdma: Acquire the svcxprt_rdma pointer from the CQ context Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:28 -05:00
Chuck Lever	5ef6c66676	svcrdma: Reduce size of struct svc_rdma_rw_ctxt SG_CHUNK_SIZE is 128, making struct svc_rdma_rw_ctxt + the first SGL array more than 4200 bytes in length, pushing the memory allocation well into order 1. Even so, the RDMA rw core doesn't seem to use more than max_send_sge entries in that array (typically 32 or less), so that is all wasted space. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:28 -05:00
Chuck Lever	2dd6e29a3e	svcrdma: Update some svcrdma DMA-related tracepoints A send/recv_ctxt already records transport-related information in the cq.id, thus there is no need to record the IP addresses of the transport endpoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:28 -05:00
Chuck Lever	848760a9e7	svcrdma: DMA error tracepoints should report completion IDs Update the DMA error flow tracepoints to report the completion ID of the failing context. This ties the wait/failure to a particular operation or request, which is more useful than knowing only the failing transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:28 -05:00
Chuck Lever	ad3656bd84	svcrdma: SQ error tracepoints should report completion IDs Update the Send Queue's error flow tracepoints to report the completion ID of the waiting or failing context. This ties the wait/failure to a particular operation or request, which is a little more useful than knowing only the transport that is about to close. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:27 -05:00
Chuck Lever	be2acb1048	rpcrdma: Introduce a simple cid tracepoint class De-duplicate some code, making it easier to add new tracepoints that report only a completion ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:27 -05:00
Chuck Lever	907e34a7d0	svcrdma: Add lockdep class keys for transport locks Two svcrdma-related transport locks can become quite contended. Collate their use and make them easy to find in /proc/lock_stat for better observability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:27 -05:00
Chuck Lever	bfb81535c2	svcrdma: Clean up locking There's no need to protect llist_entry() with a spin lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:27 -05:00
Chuck Lever	f09c36c8df	svcrdma: Add an async version of svc_rdma_write_info_free() DMA unmapping can take quite some time, so it should not be handled in a single-threaded completion handler. Defer releasing write_info structs to the recently-added workqueue. With this patch, DMA unmapping can be handled in parallel, and it does not cause head-of-queue blocking of Write completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:27 -05:00
Chuck Lever	ae225fe27b	svcrdma: Add an async version of svc_rdma_send_ctxt_put() DMA unmapping can take quite some time, so it should not be handled in a single-threaded completion handler. Defer releasing send_ctxts to the recently-added workqueue. With this patch, DMA unmapping can be handled in parallel, and it does not cause head-of-queue blocking of Send completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:27 -05:00
Chuck Lever	9c7e1a0658	svcrdma: Add a utility workqueue to svcrdma To handle work in the background, set up an UNBOUND workqueue for svcrdma. Subsequent patches will make use of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:26 -05:00
Chuck Lever	877118c667	svcrdma: Pre-allocate svc_rdma_recv_ctxt objects The original reason for allocating svc_rdma_recv_ctxt objects during Receive completion was to ensure the objects were allocated on the NUMA node closest to the underlying IB device. Since commit c5d68d25bd6b ("svcrdma: Clean up allocation of svc_rdma_recv_ctxt"), however, the device's favored node is explicitly passed to the memory allocator. To enable switching Receive completion to soft IRQ context, move memory allocation out of completion handling, since it can be costly, and it can sleep. A limited number of objects is now allocated at "accept" time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:26 -05:00
Chuck Lever	b541dd554b	svcrdma: Eliminate allocation of recv_ctxt objects in backchannel The svc_rdma_recv_ctxt free list uses a lockless list to avoid the need for a spin lock in the fast path. llist_del_first(), which is used by svc_rdma_recv_ctxt_get(), requires serialization, however, when there are multiple list producers that are unserialized. I mistakenly thought there was only one caller of svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit serialization would not be necessary. But there is another caller: svc_rdma_bc_sendto(), and these two are not serialized against each other. I haven't seen ill effects that I could directly ascribe to a lack of serialization. It's just an observation based on code audit. When DMA-mapping before sending a Reply, the passed-in struct svc_rdma_recv_ctxt is used only for its write and reply PCLs. These are currently always empty in the backchannel case. So, instead of passing a full svc_rdma_recv_ctxt object to svc_rdma_map_reply_msg(), let's pass in just the Write and Reply PCLs. This change makes it unnecessary for the backchannel to acquire a dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need for svc_rdma_recv_ctxt free list serialization is now completely avoided. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:26 -05:00
Chuck Lever	3587b5c753	SUNRPC: Remove RQ_SPLICE_OK This flag is no longer used. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:26 -05:00
Chuck Lever	deb704281f	SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavor NFSD will use this new API to determine whether nfsd_splice_read is safe to use. This avoids the need to add a dependency to NFSD for CONFIG_SUNRPC_GSS. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-01-07 17:54:25 -05:00
Eric Dumazet	d375b98e02	ip6_tunnel: fix NEXTHDR_FRAGMENT handling in ip6_tnl_parse_tlv_enc_lim() syzbot pointed out [1] that NEXTHDR_FRAGMENT handling is broken. Reading frag_off can only be done if we pulled enough bytes to skb->head. Currently we might access garbage. [1] BUG: KMSAN: uninit-value in ip6_tnl_parse_tlv_enc_lim+0x94f/0xbb0 ip6_tnl_parse_tlv_enc_lim+0x94f/0xbb0 ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline] ip6_tnl_start_xmit+0xab2/0x1a70 net/ipv6/ip6_tunnel.c:1432 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564 __dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x569/0x660 net/core/neighbour.c:1592 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x23a9/0x2b30 net/ipv6/ip6_output.c:137 ip6_finish_output+0x855/0x12b0 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x323/0x610 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:451 [inline] ip6_local_out+0xe9/0x140 net/ipv6/output_core.c:155 ip6_send_skb net/ipv6/ip6_output.c:1952 [inline] ip6_push_pending_frames+0x1f9/0x560 net/ipv6/ip6_output.c:1972 rawv6_push_pending_frames+0xbe8/0xdf0 net/ipv6/raw.c:582 rawv6_sendmsg+0x2b66/0x2e70 net/ipv6/raw.c:920 inet_sendmsg+0x105/0x190 net/ipv4/af_inet.c:847 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768 slab_alloc_node mm/slub.c:3478 [inline] __kmem_cache_alloc_node+0x5c9/0x970 mm/slub.c:3517 __do_kmalloc_node mm/slab_common.c:1006 [inline] __kmalloc_node_track_caller+0x118/0x3c0 mm/slab_common.c:1027 kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582 pskb_expand_head+0x226/0x1a00 net/core/skbuff.c:2098 __pskb_pull_tail+0x13b/0x2310 net/core/skbuff.c:2655 pskb_may_pull_reason include/linux/skbuff.h:2673 [inline] pskb_may_pull include/linux/skbuff.h:2681 [inline] ip6_tnl_parse_tlv_enc_lim+0x901/0xbb0 net/ipv6/ip6_tunnel.c:408 ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline] ip6_tnl_start_xmit+0xab2/0x1a70 net/ipv6/ip6_tunnel.c:1432 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564 __dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x569/0x660 net/core/neighbour.c:1592 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x23a9/0x2b30 net/ipv6/ip6_output.c:137 ip6_finish_output+0x855/0x12b0 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x323/0x610 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:451 [inline] ip6_local_out+0xe9/0x140 net/ipv6/output_core.c:155 ip6_send_skb net/ipv6/ip6_output.c:1952 [inline] ip6_push_pending_frames+0x1f9/0x560 net/ipv6/ip6_output.c:1972 rawv6_push_pending_frames+0xbe8/0xdf0 net/ipv6/raw.c:582 rawv6_sendmsg+0x2b66/0x2e70 net/ipv6/raw.c:920 inet_sendmsg+0x105/0x190 net/ipv4/af_inet.c:847 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b CPU: 0 PID: 7345 Comm: syz-executor.3 Not tainted 6.7.0-rc8-syzkaller-00024-gac865f00af29 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-07 15:27:29 +00:00
David Howells	4fc68c4c1a	rxrpc: Fix skbuff cleanup of call's recvmsg_queue and rx_oos_queue Fix rxrpc_cleanup_ring() to use rxrpc_purge_queue() rather than skb_queue_purge() so that the count of outstanding skbuffs is correctly updated when a failed call is cleaned up. Without this rmmod may hang waiting for rxrpc_n_rx_skbs to become zero. Fixes: 5d7edbc9231e ("rxrpc: Get rid of the Rx ring") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-07 15:25:30 +00:00
Zhengchao Shao	c4a5ee9c09	fib: rules: remove repeated assignment in fib_nl2rule In fib_nl2rule(), 'err' variable has been set to -EINVAL during declaration, and no need to set the 'err' variable to -EINVAL again. So, remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-07 15:16:19 +00:00
Pedro Tammela	405cd9fc6f	net/sched: simplify tc_action_load_ops parameters Instead of using two bools derived from a flags passed as arguments to the parent function of tc_action_load_ops, just pass the flags itself to tc_action_load_ops to simplify its parameters. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-07 14:58:26 +00:00
Ahmed Zaki	948f97f9d8	net: ethtool: reject unsupported RSS input xfrm values RXFH input_xfrm currently has three supported values: 0 (clear all), symmetric_xor and NO_CHANGE. Reject any other value sent from user-space. Fixes: 13e59344fb9d ("net: ethtool: add support for symmetric-xor RSS hash") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://lore.kernel.org/r/20240104212653.394424-1-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 19:23:15 -08:00
Jakub Kicinski	8158a50f90	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZZgrfgAKCRDbK58LschI g87JAQDu+oUG3aWnRJi+lJTK8vGnKRuBwUxgnI5Ze99N0tuPmAEAz1gpXLYP+fKE eqRhZGGhujdHC9if3Le+nG6nvf8Gvw0= =KPkZ -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-01-05 We've added 40 non-merge commits during the last 2 day(s) which contain a total of 73 files changed, 1526 insertions(+), 951 deletions(-). The main changes are: 1) Fix a memory leak when streaming AF_UNIX sockets were inserted into multiple sockmap slots/maps, from John Fastabend. 2) Fix gotol in s390 BPF JIT with large offsets, from Ilya Leoshkevich. 3) Fix reattachment branch in bpf_tracing_prog_attach() and reject the request if there is no valid attach_btf, from Jiri Olsa. 4) Remove deprecated bpfilter kernel leftovers given the project is developed in user space (https://github.com/facebook/bpfilter), from Quentin Deslandes. 5) Relax tracing BPF program recursive attach rules given right now it is not possible to create tracing program call cycles, from Dmitrii Dolgov. 6) Fix excessive memory consumption for the bpf_global_percpu_ma for systems with a large number of CPUs, from Yonghong Song. 7) Small x86 BPF JIT cleanup to reuse emit_nops instead of open-coding memcpy of x86_nops, from Leon Hwang. 8) Follow-up for libbpf to support __arg_ctx global function argument tag semantics to complement the merged kernel side, from Andrii Nakryiko. 9) Introduce "volatile compare" macros for BPF selftests in order to make the latter more robust against compiler optimization, from Alexei Starovoitov. 10) Small simplification in verifier's size checking of helper accesses along with additional selftests, from Andrei Matei. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (40 commits) selftests/bpf: Test re-attachment fix for bpf_tracing_prog_attach bpf: Fix re-attachment branch in bpf_tracing_prog_attach selftests/bpf: Add test for recursive attachment of tracing progs bpf: Relax tracing prog recursive attach rules bpf, x86: Use emit_nops to replace memcpy x86_nops selftests/bpf: Test gotol with large offsets selftests/bpf: Double the size of test_loader log s390/bpf: Fix gotol with large offsets bpfilter: remove bpfilter bpf: Remove unnecessary cpu == 0 check in memalloc selftests/bpf: add __arg_ctx BTF rewrite test selftests/bpf: add arg:ctx cases to test_global_funcs tests libbpf: implement __arg_ctx fallback logic libbpf: move BTF loading step after relocation step libbpf: move exception callbacks assignment logic into relocation step libbpf: use stable map placeholder FDs libbpf: don't rely on map->fd as an indicator of map being created libbpf: use explicit map reuse flag to skip map creation steps libbpf: make uniform use of btf__fd() accessor inside libbpf selftests/bpf: Add a selftest with > 512-byte percpu allocation size ... ==================== Link: https://lore.kernel.org/r/20240105170105.21070-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 19:15:32 -08:00
Richard Gobert	dff0b0161a	net: gro: parse ipv6 ext headers without frag0 invalidation The existing code always pulls the IPv6 header and sets the transport offset initially. Then optionally again pulls any extension headers in ipv6_gso_pull_exthdrs and sets the transport offset again on return from that call. skb->data is set at the start of the first extension header before calling ipv6_gso_pull_exthdrs, and must disable the frag0 optimization because that function uses pskb_may_pull/pskb_pull instead of skb_gro_ helpers. It sets the GRO offset to the TCP header with skb_gro_pull and sets the transport header. Then returns skb->data to its position before this block. This commit introduces a new helper function - ipv6_gro_pull_exthdrs - which is used in ipv6_gro_receive to pull ipv6 ext headers instead of ipv6_gso_pull_exthdrs. Thus, there is no modification of skb->data, all operations use skb_gro_* helpers, and the frag0 fast path can be taken for IPv6 packets with ext headers. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/504130f6-b56c-4dcc-882c-97942c59f5b7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 08:11:49 -08:00
Richard Gobert	f2e3fc2158	net: gso: add HBH extension header offload support This commit adds net_offload to IPv6 Hop-by-Hop extension headers (as it is done for routing and dstopts) since it is supported in GSO and GRO. This allows to remove specific HBH conditionals in GSO and GRO when pulling and parsing an incoming packet. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/d4f8825a-1d55-4b12-9d67-a254dbbfa6ae@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 08:11:49 -08:00
Tao Liu	3f14b377d0	net/sched: act_ct: fix skb leak and crash on ooo frags act_ct adds skb->users before defragmentation. If frags arrive in order, the last frag's reference is reset in: inet_frag_reasm_prepare skb_morph which is not straightforward. However when frags arrive out of order, nobody unref the last frag, and all frags are leaked. The situation is even worse, as initiating packet capture can lead to a crash[0] when skb has been cloned and shared at the same time. Fix the issue by removing skb_get() before defragmentation. act_ct returns TC_ACT_CONSUMED when defrag failed or in progress. [0]: [ 843.804823] ------------[ cut here ]------------ [ 843.809659] kernel BUG at net/core/skbuff.c:2091! [ 843.814516] invalid opcode: 0000 [#1] PREEMPT SMP [ 843.819296] CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G S 6.7.0-rc3 #2 [ 843.824107] Hardware name: XFUSION 1288H V6/BC13MBSBD, BIOS 1.29 11/25/2022 [ 843.828953] RIP: 0010:pskb_expand_head+0x2ac/0x300 [ 843.833805] Code: 8b 70 28 48 85 f6 74 82 48 83 c6 08 bf 01 00 00 00 e8 38 bd ff ff 8b 83 c0 00 00 00 48 03 83 c8 00 00 00 e9 62 ff ff ff 0f 0b <0f> 0b e8 8d d0 ff ff e9 b3 fd ff ff 81 7c 24 14 40 01 00 00 4c 89 [ 843.843698] RSP: 0018:ffffc9000cce07c0 EFLAGS: 00010202 [ 843.848524] RAX: 0000000000000002 RBX: ffff88811a211d00 RCX: 0000000000000820 [ 843.853299] RDX: 0000000000000640 RSI: 0000000000000000 RDI: ffff88811a211d00 [ 843.857974] RBP: ffff888127d39518 R08: 00000000bee97314 R09: 0000000000000000 [ 843.862584] R10: 0000000000000000 R11: ffff8881109f0000 R12: 0000000000000880 [ 843.867147] R13: ffff888127d39580 R14: 0000000000000640 R15: ffff888170f7b900 [ 843.871680] FS: 0000000000000000(0000) GS:ffff889ffffc0000(0000) knlGS:0000000000000000 [ 843.876242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 843.880778] CR2: 00007fa42affcfb8 CR3: 000000011433a002 CR4: 0000000000770ef0 [ 843.885336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 843.889809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 843.894229] PKRU: 55555554 [ 843.898539] Call Trace: [ 843.902772] <IRQ> [ 843.906922] ? __die_body+0x1e/0x60 [ 843.911032] ? die+0x3c/0x60 [ 843.915037] ? do_trap+0xe2/0x110 [ 843.918911] ? pskb_expand_head+0x2ac/0x300 [ 843.922687] ? do_error_trap+0x65/0x80 [ 843.926342] ? pskb_expand_head+0x2ac/0x300 [ 843.929905] ? exc_invalid_op+0x50/0x60 [ 843.933398] ? pskb_expand_head+0x2ac/0x300 [ 843.936835] ? asm_exc_invalid_op+0x1a/0x20 [ 843.940226] ? pskb_expand_head+0x2ac/0x300 [ 843.943580] inet_frag_reasm_prepare+0xd1/0x240 [ 843.946904] ip_defrag+0x5d4/0x870 [ 843.950132] nf_ct_handle_fragments+0xec/0x130 [nf_conntrack] [ 843.953334] tcf_ct_act+0x252/0xd90 [act_ct] [ 843.956473] ? tcf_mirred_act+0x516/0x5a0 [act_mirred] [ 843.959657] tcf_action_exec+0xa1/0x160 [ 843.962823] fl_classify+0x1db/0x1f0 [cls_flower] [ 843.966010] ? skb_clone+0x53/0xc0 [ 843.969173] tcf_classify+0x24d/0x420 [ 843.972333] tc_run+0x8f/0xf0 [ 843.975465] __netif_receive_skb_core+0x67a/0x1080 [ 843.978634] ? dev_gro_receive+0x249/0x730 [ 843.981759] __netif_receive_skb_list_core+0x12d/0x260 [ 843.984869] netif_receive_skb_list_internal+0x1cb/0x2f0 [ 843.987957] ? mlx5e_handle_rx_cqe_mpwrq_rep+0xfa/0x1a0 [mlx5_core] [ 843.991170] napi_complete_done+0x72/0x1a0 [ 843.994305] mlx5e_napi_poll+0x28c/0x6d0 [mlx5_core] [ 843.997501] __napi_poll+0x25/0x1b0 [ 844.000627] net_rx_action+0x256/0x330 [ 844.003705] __do_softirq+0xb3/0x29b [ 844.006718] irq_exit_rcu+0x9e/0xc0 [ 844.009672] common_interrupt+0x86/0xa0 [ 844.012537] </IRQ> [ 844.015285] <TASK> [ 844.017937] asm_common_interrupt+0x26/0x40 [ 844.020591] RIP: 0010:acpi_safe_halt+0x1b/0x20 [ 844.023247] Code: ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 65 48 8b 04 25 00 18 03 00 48 8b 00 a8 08 75 0c 66 90 0f 00 2d 81 d0 44 00 fb f4 <fa> c3 0f 1f 00 89 fa ec 48 8b 05 ee 88 ed 00 a9 00 00 00 80 75 11 [ 844.028900] RSP: 0018:ffffc90000533e70 EFLAGS: 00000246 [ 844.031725] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0000000000000000 [ 844.034553] RDX: ffff889ffffc0000 RSI: ffffffff828b7f20 RDI: ffff88a090f45c64 [ 844.037368] RBP: ffff88a0901a2800 R08: ffff88a090f45c00 R09: 00000000000317c0 [ 844.040155] R10: 00ec812281150475 R11: ffff889fffff0e04 R12: ffffffff828b7fa0 [ 844.042962] R13: ffffffff828b7f20 R14: 0000000000000001 R15: 0000000000000000 [ 844.045819] acpi_idle_enter+0x7b/0xc0 [ 844.048621] cpuidle_enter_state+0x7f/0x430 [ 844.051451] cpuidle_enter+0x2d/0x40 [ 844.054279] do_idle+0x1d4/0x240 [ 844.057096] cpu_startup_entry+0x2a/0x30 [ 844.059934] start_secondary+0x104/0x130 [ 844.062787] secondary_startup_64_no_verify+0x16b/0x16b [ 844.065674] </TASK> Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Tao Liu <taoliu828@163.com> Link: https://lore.kernel.org/r/20231228081457.936732-1-taoliu828@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 08:08:24 -08:00
Jakub Kicinski	cb42010690	net: fill in MODULE_DESCRIPTION()s for CAIF W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the CAIF sub-modules. Link: https://lore.kernel.org/r/20240104144855.1320993-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 08:06:35 -08:00
Jakub Kicinski	b8549d8598	net: fill in MODULE_DESCRIPTION() for AF_PACKET W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add description to net/packet/af_packet.c Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240104144119.1319055-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 08:06:35 -08:00
Jakub Kicinski	0ed6e95255	net: fill in MODULE_DESCRIPTION()s for DSA tags W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the DSA tag modules. The descriptions are copy/pasted Kconfig names, with s/^Tag/DSA tag/. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20240104143759.1318137-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 08:06:19 -08:00
Jakub Kicinski	fc0caed81b	net: fill in MODULE_DESCRIPTION()s for ATM W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the ATM modules and drivers. Link: https://lore.kernel.org/r/20240104143737.1317945-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-05 08:04:23 -08:00
Jiri Pirko	94e2557d08	net: sched: move block device tracking into tcf_block_get/put_ext() Inserting the device to block xarray in qdisc_create() is not suitable place to do this. As it requires use of tcf_block() callback, it causes multiple issues. It is called for all qdisc types, which is incorrect. So, instead, move it to more suitable place, which is tcf_block_get_ext() and make sure it is only done for qdiscs that use block infrastructure and also only for blocks which are shared. Symmetrically, alter the cleanup path, move the xarray entry removal into tcf_block_put_ext(). Fixes: 913b47d3424e ("net/sched: Introduce tc block netdev tracking infra") Reported-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/all/ZY1hBb8GFwycfgvd@shredder/ Reported-by: Kui-Feng Lee <sinquersw@gmail.com> Closes: https://lore.kernel.org/all/ce8d3e55-b8bc-409c-ace9-5cf1c4f7c88e@gmail.com/ Reported-and-tested-by: syzbot+84339b9e7330daae4d66@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/0000000000007c85f5060dcc3a28@google.com/ Reported-and-tested-by: syzbot+806b0572c8d06b66b234@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/00000000000082f2f2060dcc3a92@google.com/ Reported-and-tested-by: syzbot+0039110f932d438130f9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/0000000000007fbc8c060dcc3a5c@google.com/ Signed-off-by: Jiri Pirko <jiri@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-05 11:19:08 +00:00
Jakub Kicinski	e63c1822ac	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c e009b2efb7a8 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()") 0f2b21477988 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL") https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-04 18:06:46 -08:00

1 2 3 4 5 ...

75832 Commits