linux

iv/linux

Author	SHA1	Message	Date
Ricardo B. Marliere	2ad2018aa3	net: ppp: make ppp_class constant Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the ppp_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-3-8fa378595b93@marliere.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 11:21:18 -08:00
Ricardo B. Marliere	63767a7631	net: wan: framer: make framer_class constant Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the framer_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Herve Codina <herve.codina@bootlin.com> Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-2-8fa378595b93@marliere.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 11:21:17 -08:00
Ricardo B. Marliere	b6e3c115ef	net: hns: make hnae_class constant Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the hnae_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-1-8fa378595b93@marliere.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 11:21:17 -08:00
Jakub Kicinski	e8efd372b2	Merge branch 'net-phy-micrel-lan8814-erratas' Horatiu Vultur says: ==================== net: phy: micrel: lan8814 erratas Add two erratas for lan8814. First one fix the led which might stay on even that there is no link. The second one improves increases length of the cable that can be used when used in 1000Base-T. ==================== Link: https://lore.kernel.org/r/20240304091548.1386022-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 11:17:56 -08:00
Horatiu Vultur	ad080db448	net: phy: micrel: lan8814 cable improvement errata When the length of the cable is more than 100m and the lan8814 is configured to run in 1000Base-T Slave then the register of the device needs to be optimized. Workaround this by setting the measure time to a value of 0xb. This value can be set regardless of the configuration. This issue is described in 'LAN8814 Silicon Errata and Data Sheet Clarification' and according to that, this will not be corrected in a future silicon revision. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Link: https://lore.kernel.org/r/20240304091548.1386022-3-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 11:17:46 -08:00
Horatiu Vultur	e9097f8e1e	net: phy: micrel: lan8814 led errata Lan8814 phy led behavior is not correct. It was noticed that the led still remains ON when the cable is unplugged while there was traffic passing at that time. The fix consists in clearing bit 10 of register 0x38, in this way the led behaviour is correct and gets OFF when there is no link. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240304091548.1386022-2-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 11:17:46 -08:00
Eric Dumazet	685f7d5312	net/ipv6: avoid possible UAF in ip6_route_mpath_notify() syzbot found another use-after-free in ip6_route_mpath_notify() [1] Commit f7225172f25a ("net/ipv6: prevent use after free in ip6_route_mpath_notify") was not able to fix the root cause. We need to defer the fib6_info_release() calls after ip6_route_mpath_notify(), in the cleanup phase. [1] BUG: KASAN: slab-use-after-free in rt6_fill_node+0x1460/0x1ac0 Read of size 4 at addr ffff88809a07fc64 by task syz-executor.2/23037 CPU: 0 PID: 23037 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-01035-gea7f3cfaa588 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0x167/0x540 mm/kasan/report.c:488 kasan_report+0x142/0x180 mm/kasan/report.c:601 rt6_fill_node+0x1460/0x1ac0 inet6_rt_notify+0x13b/0x290 net/ipv6/route.c:6184 ip6_route_mpath_notify net/ipv6/route.c:5198 [inline] ip6_route_multipath_add net/ipv6/route.c:5404 [inline] inet6_rtm_newroute+0x1d0f/0x2300 net/ipv6/route.c:5517 rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f73dd87dda9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f73de6550c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f73dd9ac050 RCX: 00007f73dd87dda9 RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005 RBP: 00007f73dd8ca47a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000006e R14: 00007f73dd9ac050 R15: 00007ffdbdeb7858 </TASK> Allocated by task 23037: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:372 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:389 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slub.c:3981 [inline] __kmalloc+0x22e/0x490 mm/slub.c:3994 kmalloc include/linux/slab.h:594 [inline] kzalloc include/linux/slab.h:711 [inline] fib6_info_alloc+0x2e/0xf0 net/ipv6/ip6_fib.c:155 ip6_route_info_create+0x445/0x12b0 net/ipv6/route.c:3758 ip6_route_multipath_add net/ipv6/route.c:5298 [inline] inet6_rtm_newroute+0x744/0x2300 net/ipv6/route.c:5517 rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 Freed by task 16: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x4e/0x60 mm/kasan/generic.c:640 poison_slab_object+0xa6/0xe0 mm/kasan/common.c:241 __kasan_slab_free+0x34/0x70 mm/kasan/common.c:257 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2121 [inline] slab_free mm/slub.c:4299 [inline] kfree+0x14a/0x380 mm/slub.c:4409 rcu_do_batch kernel/rcu/tree.c:2190 [inline] rcu_core+0xd76/0x1810 kernel/rcu/tree.c:2465 __do_softirq+0x2bb/0x942 kernel/softirq.c:553 Last potentially related work creation: kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47 __kasan_record_aux_stack+0xae/0x100 mm/kasan/generic.c:586 __call_rcu_common kernel/rcu/tree.c:2715 [inline] call_rcu+0x167/0xa80 kernel/rcu/tree.c:2829 fib6_info_release include/net/ip6_fib.h:341 [inline] ip6_route_multipath_add net/ipv6/route.c:5344 [inline] inet6_rtm_newroute+0x114d/0x2300 net/ipv6/route.c:5517 rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 The buggy address belongs to the object at ffff88809a07fc00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 100 bytes inside of freed 512-byte region [ffff88809a07fc00, ffff88809a07fe00) The buggy address belongs to the physical page: page:ffffea0002681f00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9a07c head:ffffea0002681f00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfff00000000840(slab\|head\|node=0\|zone=1\|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 00fff00000000840 ffff888014c41c80 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO\|__GFP_FS\|__GFP_NOWARN\|__GFP_NORETRY\|__GFP_COMP\|__GFP_NOMEMALLOC\|__GFP_HARDWALL), pid 23028, tgid 23027 (syz-executor.4), ts 2340253595219, free_ts 2339107097036 set_page_owner include/linux/page_owner.h:31 [inline] post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533 prep_new_page mm/page_alloc.c:1540 [inline] get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311 __alloc_pages+0x255/0x680 mm/page_alloc.c:4567 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page+0x5f/0x160 mm/slub.c:2190 allocate_slab mm/slub.c:2354 [inline] new_slab+0x84/0x2f0 mm/slub.c:2407 ___slab_alloc+0xd17/0x13e0 mm/slub.c:3540 __slab_alloc mm/slub.c:3625 [inline] __slab_alloc_node mm/slub.c:3678 [inline] slab_alloc_node mm/slub.c:3850 [inline] __do_kmalloc_node mm/slub.c:3980 [inline] __kmalloc+0x2e0/0x490 mm/slub.c:3994 kmalloc include/linux/slab.h:594 [inline] kzalloc include/linux/slab.h:711 [inline] new_dir fs/proc/proc_sysctl.c:956 [inline] get_subdir fs/proc/proc_sysctl.c:1000 [inline] sysctl_mkdir_p fs/proc/proc_sysctl.c:1295 [inline] __register_sysctl_table+0xb30/0x1440 fs/proc/proc_sysctl.c:1376 neigh_sysctl_register+0x416/0x500 net/core/neighbour.c:3859 devinet_sysctl_register+0xaf/0x1f0 net/ipv4/devinet.c:2644 inetdev_init+0x296/0x4d0 net/ipv4/devinet.c:286 inetdev_event+0x338/0x15c0 net/ipv4/devinet.c:1555 notifier_call_chain+0x18f/0x3b0 kernel/notifier.c:93 call_netdevice_notifiers_extack net/core/dev.c:1987 [inline] call_netdevice_notifiers net/core/dev.c:2001 [inline] register_netdevice+0x15b2/0x1a20 net/core/dev.c:10340 br_dev_newlink+0x27/0x100 net/bridge/br_netlink.c:1563 rtnl_newlink_create net/core/rtnetlink.c:3497 [inline] __rtnl_newlink net/core/rtnetlink.c:3717 [inline] rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3730 page last free pid 11583 tgid 11583 stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1140 [inline] free_unref_page_prepare+0x968/0xa90 mm/page_alloc.c:2346 free_unref_page+0x37/0x3f0 mm/page_alloc.c:2486 kasan_depopulate_vmalloc_pte+0x74/0x90 mm/kasan/shadow.c:415 apply_to_pte_range mm/memory.c:2619 [inline] apply_to_pmd_range mm/memory.c:2663 [inline] apply_to_pud_range mm/memory.c:2699 [inline] apply_to_p4d_range mm/memory.c:2735 [inline] __apply_to_page_range+0x8ec/0xe40 mm/memory.c:2769 kasan_release_vmalloc+0x9a/0xb0 mm/kasan/shadow.c:532 __purge_vmap_area_lazy+0x163f/0x1a10 mm/vmalloc.c:1770 drain_vmap_area_work+0x40/0xd0 mm/vmalloc.c:1804 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706 worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787 kthread+0x2ef/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 Memory state around the buggy address: ffff88809a07fb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88809a07fb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88809a07fc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88809a07fc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88809a07fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240303144801.702646-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 11:16:11 -08:00
Sasha Neftin	ba54b1a276	intel: legacy: Partial revert of field get conversion Refactoring of the field get conversion introduced a regression in the legacy Wake On Lan from a magic packet with i219 devices. Rx address copied not correctly from MAC to PHY with FIELD_GET macro. Fixes: b9a452545075 ("intel: legacy: field get conversion") Suggested-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-03-05 10:07:04 -08:00
Florian Kauer	ef27f655b4	igc: avoid returning frame twice in XDP_REDIRECT When a frame can not be transmitted in XDP_REDIRECT (e.g. due to a full queue), it is necessary to free it by calling xdp_return_frame_rx_napi. However, this is the responsibility of the caller of the ndo_xdp_xmit (see for example bq_xmit_all in kernel/bpf/devmap.c) and thus calling it inside igc_xdp_xmit (which is the ndo_xdp_xmit of the igc driver) as well will lead to memory corruption. In fact, bq_xmit_all expects that it can return all frames after the last successfully transmitted one. Therefore, break for the first not transmitted frame, but do not call xdp_return_frame_rx_napi in igc_xdp_xmit. This is equally implemented in other Intel drivers such as the igb. There are two alternatives to this that were rejected: 1. Return num_frames as all the frames would have been transmitted and release them inside igc_xdp_xmit. While it might work technically, it is not what the return value is meant to represent (i.e. the number of SUCCESSFULLY transmitted packets). 2. Rework kernel/bpf/devmap.c and all drivers to support non-consecutively dropped packets. Besides being complex, it likely has a negative performance impact without a significant gain since it is anyway unlikely that the next frame can be transmitted if the previous one was dropped. The memory corruption can be reproduced with the following script which leads to a kernel panic after a few seconds. It basically generates more traffic than a i225 NIC can transmit and pushes it via XDP_REDIRECT from a virtual interface to the physical interface where frames get dropped. #!/bin/bash INTERFACE=enp4s0 INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex` sudo ip link add dev veth1 type veth peer name veth2 sudo ip link set up $INTERFACE sudo ip link set up veth1 sudo ip link set up veth2 cat << EOF > redirect.bpf.c SEC("prog") int redirect(struct xdp_md ctx) { return bpf_redirect($INTERFACE_IDX, 0); } char _license[] SEC("license") = "GPL"; EOF clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o sudo ip link set veth2 xdp obj redirect.bpf.o cat << EOF > pass.bpf.c SEC("prog") int pass(struct xdp_md ctx) { return XDP_PASS; } char _license[] SEC("license") = "GPL"; EOF clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o sudo ip link set $INTERFACE xdp obj pass.bpf.o cat << EOF > trafgen.cfg { /* Ethernet Header / 0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, const16(ETH_P_IP), / IPv4 Header / 0b01000101, 0, # IPv4 version, IHL, TOS const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header)) const16(2), # IPv4 ident 0b01000000, 0, # IPv4 flags, fragmentation off 64, # IPv4 TTL 17, # Protocol UDP csumip(14, 33), # IPv4 checksum / UDP Header / 10, 0, 1, 1, # IP Src - adapt as needed 10, 0, 1, 2, # IP Dest - adapt as needed const16(6666), # UDP Src Port const16(6666), # UDP Dest Port const16(1008), # UDP length (UDP header 8 bytes + payload length) csumudp(14, 34), # UDP checksum / Payload */ fill('W', 1000), } EOF sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-03-05 09:50:33 -08:00
Ivan Vecera	36c824ca3e	i40e: Fix firmware version comparison function Helper i40e_is_fw_ver_eq() compares incorrectly given firmware version as it returns true when the major version of running firmware is greater than the given major version that is wrong and results in failure during getting of DCB configuration where this helper is used. Fix the check and return true only if the running FW version is exactly equals to the given version. Reproducer: 1. Load i40e driver 2. Check dmesg output [root@host ~]# modprobe i40e [root@host ~]# dmesg \| grep 'i40e.*DCB' [ 74.750642] i40e 0000:02:00.0: Query for DCB configuration failed, err -EIO aq_err I40E_AQ_RC_EINVAL [ 74.759770] i40e 0000:02:00.0: DCB init failed -5, disabled [ 74.966550] i40e 0000:02:00.1: Query for DCB configuration failed, err -EIO aq_err I40E_AQ_RC_EINVAL [ 74.975683] i40e 0000:02:00.1: DCB init failed -5, disabled Fixes: cf488e13221f ("i40e: Add other helpers to check version of running firmware and AQ API") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-03-05 09:49:59 -08:00
Jesse Brandeburg	6c5b6ca764	ice: fix typo in assignment Fix an obviously incorrect assignment, created with a typo or cut-n-paste error. Fixes: 5995ef88e3a8 ("ice: realloc VSI stats arrays") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-03-05 09:49:54 -08:00
Michal Schmidt	9224fc86f1	ice: fix uninitialized dplls mutex usage The pf->dplls.lock mutex is initialized too late, after its first use. Move it to the top of ice_dpll_init. Note that the "err_exit" error path destroys the mutex. And the mutex is the last thing destroyed in ice_dpll_deinit. This fixes the following warning with CONFIG_DEBUG_MUTEXES: ice 0000:10:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.36.0 ice 0000:10:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link) ice 0000:10:00.0: PTP init successful ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 0 PID: 410 at kernel/locking/mutex.c:587 __mutex_lock+0x773/0xd40 Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ice(+) nvme nvme_c> CPU: 0 PID: 410 Comm: kworker/0:4 Not tainted 6.8.0-rc5+ #3 Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 10/19/2023 Workqueue: events work_for_cpu_fn RIP: 0010:__mutex_lock+0x773/0xd40 Code: c0 0f 84 1d f9 ff ff 44 8b 35 0d 9c 69 01 45 85 f6 0f 85 0d f9 ff ff 48 c7 c6 12 a2 a9 85 48 c7 c7 12 f1 a> RSP: 0018:ff7eb1a3417a7ae0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff85ac2bff RDI: 00000000ffffffff RBP: ff7eb1a3417a7b80 R08: 0000000000000000 R09: 00000000ffffbfff R10: ff7eb1a3417a7978 R11: ff32b80f7fd2e568 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ff32b7f02c50e0d8 FS: 0000000000000000(0000) GS:ff32b80efe800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b5852cc000 CR3: 000000003c43a004 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x84/0x170 ? __mutex_lock+0x773/0xd40 ? report_bug+0x1c7/0x1d0 ? prb_read_valid+0x1b/0x30 ? handle_bug+0x42/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? __mutex_lock+0x773/0xd40 ? rcu_is_watching+0x11/0x50 ? __kmalloc_node_track_caller+0x346/0x490 ? ice_dpll_lock_status_get+0x28/0x50 [ice] ? __pfx_ice_dpll_lock_status_get+0x10/0x10 [ice] ? ice_dpll_lock_status_get+0x28/0x50 [ice] ice_dpll_lock_status_get+0x28/0x50 [ice] dpll_device_get_one+0x14f/0x2e0 dpll_device_event_send+0x7d/0x150 dpll_device_register+0x124/0x180 ice_dpll_init_dpll+0x7b/0xd0 [ice] ice_dpll_init+0x224/0xa40 [ice] ? _dev_info+0x70/0x90 ice_load+0x468/0x690 [ice] ice_probe+0x75b/0xa10 [ice] ? _raw_spin_unlock_irqrestore+0x4f/0x80 ? process_one_work+0x1a3/0x500 local_pci_probe+0x47/0xa0 work_for_cpu_fn+0x17/0x30 process_one_work+0x20d/0x500 worker_thread+0x1df/0x3e0 ? __pfx_worker_thread+0x10/0x10 kthread+0x103/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> irq event stamp: 125197 hardirqs last enabled at (125197): [<ffffffff8416409d>] finish_task_switch.isra.0+0x12d/0x3d0 hardirqs last disabled at (125196): [<ffffffff85134044>] __schedule+0xea4/0x19f0 softirqs last enabled at (105334): [<ffffffff84e1e65a>] napi_get_frags_check+0x1a/0x60 softirqs last disabled at (105332): [<ffffffff84e1e65a>] napi_get_frags_check+0x1a/0x60 ---[ end trace 0000000000000000 ]--- Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-03-05 09:49:54 -08:00
Rand Deeb	06e456a05d	net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink() The function ice_bridge_setlink() may encounter a NULL pointer dereference if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently in nla_for_each_nested(). To address this issue, add a check to ensure that br_spec is not NULL before proceeding with the nested attribute iteration. Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Signed-off-by: Rand Deeb <rand.sec96@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-03-05 09:49:27 -08:00
Jacob Keller	2652b99e43	ice: virtchnl: stop pretending to support RSS over AQ or registers The E800 series hardware uses the same iAVF driver as older devices, including the virtchnl negotiation scheme. This negotiation scheme includes a mechanism to determine what type of RSS should be supported, including RSS over PF virtchnl messages, RSS over firmware AdminQ messages, and RSS via direct register access. The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its supported by the VF driver. However, if an older VF driver is loaded, it may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ. The ice driver happily agrees to support these methods. Unfortunately, the underlying hardware does not support these mechanisms. The E800 series VFs don't have the appropriate registers for RSS_REG. The mailbox queue used by VFs for VF to PF communication blocks messages which do not have the VF-to-PF opcode. Stop lying to the VF that it could support RSS over AdminQ or registers, as these interfaces do not work when the hardware is operating on an E800 series device. In practice this is unlikely to be hit by any normal user. The iAVF driver has supported RSS over PF virtchnl commands since 2016, and always defaults to using RSS_PF if possible. In principle, nothing actually stops the existing VF from attempting to access the registers or send an AQ command. However a properly coded VF will check the capability flags and will report a more useful error if it detects a case where the driver does not support the RSS offloads that it does. Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Alan Brady <alan.brady@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-03-05 09:49:21 -08:00
Emil Tantilov	3300685893	idpf: disable local BH when scheduling napi for marker packets Fix softirq's not being handled during napi_schedule() call when receiving marker packets for queue disable by disabling local bottom half. The issue can be seen on ifdown: NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!! Using ftrace to catch the failing scenario: ifconfig [003] d.... 22739.830624: softirq_raise: vec=3 [action=NET_RX] <idle>-0 [003] ..s.. 22739.831357: softirq_entry: vec=3 [action=NET_RX] No interrupt and CPU is idle. After the patch when disabling local BH before calling napi_schedule: ifconfig [003] d.... 22993.928336: softirq_raise: vec=3 [action=NET_RX] ifconfig [003] ..s1. 22993.928337: softirq_entry: vec=3 [action=NET_RX] Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2024-03-05 09:47:38 -08:00
Jakub Kicinski	5ad051235c	Merge branch 'selftests-forwarding-various-improvements' Ido Schimmel says: ==================== selftests: forwarding: Various improvements This patchset speeds up the multipath tests (patches #1-#2) and makes other tests more stable (patches #3-#6) so that they will not randomly fail in the netdev CI. On my system, after applying the first two patches, the run time of gre_multipath_nh_res.sh is reduced by over 90%. ==================== Link: https://lore.kernel.org/r/20240304095612.462900-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 09:18:21 -08:00
Ido Schimmel	35df2ce896	selftests: forwarding: Make {, ip6}gre-inner-v6-multipath tests more robust These tests generate various IPv6 flows, encapsulate them in GRE packets and check that the encapsulated packets are distributed between the available nexthops according to the configured weights. Unlike the corresponding IPv4 tests, these tests sometimes fail in the netdev CI because of large discrepancies between the expected and measured ratios [1]. This can be explained by the fact that the IPv4 tests generate about 3,600 different flows whereas the IPv6 tests only generate about 784 different flows (potentially by mistake). Fix by aligning the IPv6 tests to the IPv4 ones and increase the number of generated flows. [1] [...] # TEST: ping [ OK ] # INFO: Running IPv6 over GRE over IPv4 multipath tests # TEST: ECMP [FAIL] # Too large discrepancy between expected and measured ratios # INFO: Expected ratio 1.00 Measured ratio 1.18 [...] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240304095612.462900-7-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 09:18:17 -08:00
Ido Schimmel	f0008b0497	selftests: forwarding: Make VXLAN ECN encap tests more robust These tests sometimes fail on the netdev CI because the expected number of packets is larger than expected [1]. Make the tests more robust by specifically matching on VXLAN encapsulated packets and allowing up to five stray packets instead of just two. [1] [...] # TEST: VXLAN: ECN encap: 0x00->0x00 [FAIL] # v1: Expected to capture 10 packets, got 13. # TEST: VXLAN: ECN encap: 0x01->0x01 [ OK ] # TEST: VXLAN: ECN encap: 0x02->0x02 [ OK ] # TEST: VXLAN: ECN encap: 0x03->0x02 [ OK ] [...] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240304095612.462900-6-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 09:18:13 -08:00
Ido Schimmel	dfbab74044	selftests: forwarding: Make vxlan-bridge-1q pass on debug kernels The ageing time used by the test is too short for debug kernels and results in entries being aged out prematurely [1]. Fix by increasing the ageing time. [1] # ./vxlan_bridge_1q.sh [...] INFO: learning vlan 10 TEST: VXLAN: flood before learning [ OK ] TEST: VXLAN: show learned FDB entry [ OK ] TEST: VXLAN: learned FDB entry [FAIL] swp4: Expected to capture 0 packets, got 10. RTNETLINK answers: No such file or directory TEST: VXLAN: deletion of learned FDB entry [ OK ] TEST: VXLAN: Ageing of learned FDB entry [FAIL] swp4: Expected to capture 0 packets, got 10. TEST: VXLAN: learning toggling on bridge port [ OK ] [...] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240304095612.462900-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 09:18:10 -08:00
Ido Schimmel	4aca9eae6f	selftests: forwarding: Make tc-police pass on debug kernels The test configures a policer with a rate of 80Mbps and expects to measure a rate close to it. This is a too high rate for debug kernels, causing the test to fail [1]. Fix by reducing the rate to 10Mbps. [1] # ./tc_police.sh TEST: police on rx [FAIL] Expected rate 76.2Mbps, got 29.6Mbps, which is -61% off. Required accuracy is +-10%. TEST: police on tx [FAIL] Expected rate 76.2Mbps, got 30.4Mbps, which is -60% off. Required accuracy is +-10%. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240304095612.462900-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 09:18:07 -08:00
Ido Schimmel	748d27447d	selftests: forwarding: Parametrize mausezahn delay The various multipath tests use mausezahn to generate different flows and check how they are distributed between the available nexthops. The tool is currently invoked with an hard coded transmission delay of 1 ms. This is unnecessary when the tests are run with veth pairs and needlessly prolongs the tests. Parametrize this delay and default it to 0 us. It can be overridden using the forwarding.config file. On my system, this reduces the run time of router_multipath.sh by 93%. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240304095612.462900-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 09:18:04 -08:00
Ido Schimmel	7b2d64f933	selftests: forwarding: Remove IPv6 L3 multipath hash tests The multipath tests currently test both the L3 and L4 multipath hash policies for IPv6, but only the L4 policy for IPv4. The reason is mostly historic: When the initial multipath test was added (router_multipath.sh) the IPv6 L4 policy did not exist and was later added to the test. The other multipath tests copied this pattern although there is little value in testing both policies. Align the IPv4 and IPv6 tests and only test the L4 policy. On my system, this reduces the run time of router_multipath.sh by 89% because of the repeated ping6 invocations to randomize the flow label. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240304095612.462900-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-05 09:18:00 -08:00
Eric Dumazet	00af2aa93b	net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() Many syzbot reports show extreme rtnl pressure, and many of them hint that smc acquires rtnl in netns creation for no good reason [1] This patch returns early from smc_pnet_net_init() if there is no netdevice yet. I am not even sure why smc_pnet_create_pnetids_list() even exists, because smc_pnet_netdev_event() is also calling smc_pnet_add_base_pnetid() when handling NETDEV_UP event. [1] extract of typical syzbot reports 2 locks held by syz-executor.3/12252: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.4/12253: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.1/12257: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.2/12261: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.0/12265: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.3/12268: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.4/12271: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.1/12274: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.2/12280: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Jan Karcher <jaka@linux.ibm.com> Cc: "D. Wythe" <alibuda@linux.alibaba.com> Cc: Tony Lu <tonylu@linux.alibaba.com> Cc: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20240302100744.3868021-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 15:49:35 +01:00
Paolo Abeni	eead059950	linux-can-next-for-6.9-20240304 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmXlkC8THG1rbEBwZW5n dXRyb25peC5kZQAKCRAoOKI+ei28b1tVB/9S9z5OTWJipe5PqY3z2zodTi7KzUXA Y0svHGDJN86P7COGKdAGDhhXmzTZ05uxhHw2TJyim3C67HDHZwVRc/AmLa+SQB8S TZP9uJEqE3O18rYGjhkE0Fni3viJjB+1WinoElRrmfQO+3u480xzyv09nIImY6kf WIk9PcQvWyf/nWr/D60BVe9Ifw4VSID+jsoppQTgE6XK3NI4IFjXt93W1Ln0E8u0 ld5vqEjanwevljBcIGPaI3WrHOlwfTAQhVDMXvvEr5NJ5AnvC2BkXkEj6AXnT1mj 8xaIi/e3rAfYBTypZ2EY1ZGKDslTMVAuztDhPqmoxommsw3ldoQi6Yz3 =JobK -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-6.9-20240304' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2024-03-04 this is a pull request of 4 patches for net-next/master. The 1st patch is by Jimmy Assarsson and adds support for the Leaf v3 to the kvaser_usb driver. Martin Jocić's patch targets the kvaser_pciefd driver and adds support for the Kvaser PCIe 8xCAN device. Followed by a patch by me that adds a missing a cpu_to_le32() to the gs_usb driver, the change is not critical as the assigned value is 0. The last patch is also by me and replaces a literal 256 with a proper define. linux-can-next-for-6.9-20240304 * tag 'linux-can-next-for-6.9-20240304' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: mcp251xfd: __mcp251xfd_get_berr_counter(): use CAN_BUS_OFF_THRESHOLD instead of open coding it can: gs_usb: gs_cmd_reset(): use cpu_to_le32() to assign mode can: kvaser_pciefd: Add support for Kvaser PCIe 8xCAN can: kvaser_usb: Add support for Leaf v3 ==================== Link: https://lore.kernel.org/r/20240304092051.3631481-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 13:57:31 +01:00
Abhishek Chauhan	885c36e59f	net: Re-use and set mono_delivery_time bit for userspace tstamp packets Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc checks the packet coming from userspace and encounters to be 0 thereby dropping time sensitive packets. These changes will allow userspace timestamps packets to be forwarded from the bridge to NIC drivers. Setting the same bit (mono_delivery_time) to avoid dropping of userspace tstamp packets in the forwarding path. Existing functionality of mono_delivery_time remains unaltered here, instead just extended with userspace tstamp support for bridge forwarding path. Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240301201348.2815102-1-quic_abchauha@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 13:41:16 +01:00
Paolo Abeni	d35c9659e5	Merge branch 'net-gro-cleanups-and-fast-path-refinement' Eric Dumazet says: ==================== net: gro: cleanups and fast path refinement Current GRO stack has a 'fast path' for a subset of drivers, users of napi_frags_skb(). With TCP zerocopy/direct uses, header split at receive is becoming more important, and GRO fast path is disabled. This series makes GRO (a bit) more efficient for almost all use cases. ==================== Link: https://lore.kernel.org/r/20240301193740.3436871-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 13:30:16 +01:00
Eric Dumazet	8f78010b70	tcp: gro: micro optimizations in tcp[4]_gro_complete() In tcp_gro_complete() : Moving the skb->inner_transport_header setting allows the compiler to reuse the previously loaded value of skb->transport_header. Caching skb_shinfo() avoids duplications as well. In tcp4_gro_complete(), doing a single change on skb_shinfo(skb)->gso_type also generates better code. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 13:30:11 +01:00
Eric Dumazet	c7583e9f76	net: gro: enable fast path for more cases Currently the so-called GRO fast path is only enabled for napi_frags_skb() callers. After the prior patch, we no longer have to clear frag0 whenever we pulled bytes to skb->head. We therefore can initialize frag0 to skb->data so that GRO fast path can be used in the following additional cases: - Drivers using header split (populating skb->data with headers, and having payload in one or more page fragments). - Drivers not using any page frag (entire packet is in skb->data) Add a likely() in skb_gro_may_pull() to help the compiler to generate better code if possible. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 13:30:11 +01:00
Eric Dumazet	bd56a29c7a	net: gro: change skb_gro_network_header() Change skb_gro_network_header() to accept a const sk_buff and to no longer check if frag0 is NULL or not. This allows to remove skb_gro_frag0_invalidate() which is seen in profiles when header-split is enabled. sk_buff parameter is constified for skb_gro_header_fast(), inet_gro_compute_pseudo() and ip6_gro_compute_pseudo(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 13:30:11 +01:00
Eric Dumazet	93e16ea025	net: gro: rename skb_gro_header_hard() skb_gro_header_hard() is renamed to skb_gro_may_pull() to match the convention used by common helpers like pskb_may_pull(). This means the condition is inverted: if (skb_gro_header_hard(skb, hlen)) slow_path(); becomes: if (!skb_gro_may_pull(skb, hlen)) slow_path(); Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 13:30:11 +01:00
Paolo Abeni	9452c8b459	Merge branch 'mt7530-dsa-subdriver-improvements-act-iii' says: ==================== MT7530 DSA Subdriver Improvements Act III This is the third patch series with the goal of simplifying the MT7530 DSA subdriver and improving support for MT7530, MT7531, and the switch on the MT7988 SoC. I have done a simple ping test to confirm basic communication on all switch ports on MCM and standalone MT7530, and MT7531 switch with this patch series applied. MT7621 Unielec, MCM MT7530: rgmii-only-gmac0-mt7621-unielec-u7621-06-16m.dtb gmac0-and-gmac1-mt7621-unielec-u7621-06-16m.dtb tftpboot 0x80008000 mips-uzImage.bin; tftpboot 0x83000000 mips-rootfs.cpio.uboot; tftpboot 0x83f00000 $dtb; bootm 0x80008000 0x83000000 0x83f00000 MT7622 Bananapi, MT7531: gmac0-and-gmac1-mt7622-bananapi-bpi-r64.dtb tftpboot 0x40000000 arm64-Image; tftpboot 0x45000000 arm64-rootfs.cpio.uboot; tftpboot 0x4a000000 $dtb; booti 0x40000000 0x45000000 0x4a000000 MT7623 Bananapi, standalone MT7530: rgmii-only-gmac0-mt7623n-bananapi-bpi-r2.dtb gmac0-and-gmac1-mt7623n-bananapi-bpi-r2.dtb tftpboot 0x80008000 arm-zImage; tftpboot 0x83000000 arm-rootfs.cpio.uboot; tftpboot 0x83f00000 $dtb; bootz 0x80008000 0x83000000 0x83f00000 This patch series is the continuation of the patch series linked below. https://lore.kernel.org/r/20230522121532.86610-1-arinc.unal@arinc9.com Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> --- Changes in v3: - Patch 8 - Explain properly the behaviour of setting link down on all ports at setup. - Split the changes for simplifying the link settings operations out to another patch. - Link to v2: https://lore.kernel.org/r/20240216-for-netnext-mt7530-improvements-3-v2-0-094cae3ff23b@arinc9.com Changes in v2: - Patch 8 - Use a single mt7530_rmw() instead of two mt7530_clear() and mt7530_set() commands. - Link to v1: https://lore.kernel.org/r/20240208-for-netnext-mt7530-improvements-3-v1-0-d7c1cfd502ca@arinc9.com ==================== Link: https://lore.kernel.org/r/20240301-for-netnext-mt7530-improvements-3-v3-0-449f4f166454@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:46 +01:00
Arınç ÜNAL	b04097c7a7	net: dsa: mt7530: simplify link operations The "MT7621 Giga Switch Programming Guide v0.3", "MT7531 Reference Manual for Development Board v1.0", and "MT7988A Wi-Fi 7 Generation Router Platform: Datasheet (Open Version) v0.1" documents show that these bits are enabled at reset: PMCR_IFG_XMIT(1) (not part of PMCR_LINK_SETTINGS_MASK) PMCR_MAC_MODE (not part of PMCR_LINK_SETTINGS_MASK) PMCR_TX_EN PMCR_RX_EN PMCR_BACKOFF_EN (not part of PMCR_LINK_SETTINGS_MASK) PMCR_BACKPR_EN (not part of PMCR_LINK_SETTINGS_MASK) PMCR_TX_FC_EN PMCR_RX_FC_EN These bits also don't exist on the MT7530_PMCR_P(6) register of the switch on the MT7988 SoC: PMCR_IFG_XMIT() PMCR_MAC_MODE PMCR_BACKOFF_EN PMCR_BACKPR_EN Remove the setting of the bits not part of PMCR_LINK_SETTINGS_MASK on phylink_mac_config as they're already set. The bit for setting the port on force mode is already done on mt7530_setup() and mt7531_setup_common(). So get rid of PMCR_FORCE_MODE_ID() which helped determine which bit to use for the switch model. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:43 +01:00
Arınç ÜNAL	6324230b3b	net: dsa: mt7530: sort link settings ops and force link down on all ports port_enable and port_disable clears the link settings. Move that to mt7530_setup() and mt7531_setup_common() which set up the switches. This way, the link settings are cleared on all ports at setup, and then only once with phylink_mac_link_down() when a link goes down. Enable force mode at setup to apply the force part of the link settings. This ensures that disabled ports will have their link down. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:43 +01:00
Arınç ÜNAL	3a87131e3d	net: dsa: mt7530: put initialising PCS devices code back to original order The commit fae463084032 ("net: dsa: mt753x: fix pcs conversion regression") fixes regression caused by cpu_port_config manually calling phylink operations. cpu_port_config was deemed useless and was removed. Therefore, put initialising PCS devices code back to its original order. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:43 +01:00
Arınç ÜNAL	1192ed898c	net: dsa: mt7530: get rid of mt753x_mac_config() There is no need for a separate function to call priv->info->mac_port_config(). Call it from mt753x_phylink_mac_config() instead and remove mt753x_mac_config(). Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:43 +01:00
Arınç ÜNAL	22fa10170a	net: dsa: mt7530: get rid of priv->info->cpu_port_config() priv->info->cpu_port_config() is used for MT7531 and the switch on the MT7988 SoC. It sets up the ports described as a CPU port earlier than the phylink code path would do. This function is useless as: - Configuring the MACs can be done from the phylink_mac_config code path instead. - All the link configuration it does on the CPU ports are later undone with the port_enable, phylink_mac_config, and then phylink_mac_link_up code path [1]. priv->p5_interface and priv->p6_interface were being used to prevent configuring the MACs from the phylink_mac_config code path. Remove them now that they hold no purpose. Remove priv->info->cpu_port_config(). On mt753x_phylink_mac_config, switch to if statements to simplify the code. Remove the overwriting of the speed and duplex interfaces for certain interface modes. Phylink already provides the speed and duplex variables with proper values. Phylink already sets the max speed of TRGMII to SPEED_1000. Add SPEED_2500 for PHY_INTERFACE_MODE_2500BASEX to where the speed and EEE bits are set instead. On the switch on the MT7988 SoC, PHY_INTERFACE_MODE_INTERNAL is being used to describe the interface mode of the 10G MAC, which is of port 6. On mt7988_cpu_port_config() PMCR_FORCE_SPEED_1000 was set via the PMCR_CPU_PORT_SETTING() mask. Add SPEED_10000 case to where the speed bits are set to cover this. No need to add it to where the EEE bits are set as the "MT7988A Wi-Fi 7 Generation Router Platform: Datasheet (Open Version) v0.1" document shows that these bits don't exist on the MT7530_PMCR_P(6) register. Remove the definition of PMCR_CPU_PORT_SETTING() now that it holds no purpose. Change mt753x_cpu_port_enable() to void now that there're no error cases left. Link: https://lore.kernel.org/netdev/ZHy2jQLesdYFMQtO@shell.armlinux.org.uk/ [1] Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:43 +01:00
Arınç ÜNAL	adf4ae24ba	net: dsa: mt7530: get rid of useless error returns on phylink code path Remove error returns on the cases where they are already handled with the function the mac_port_get_caps member in mt753x_table points to. mt7531_mac_config() is also called from mt7531_cpu_port_config() outside of phylink but the port and interface modes are already handled there. Change the functions and the mac_port_config function pointer to void now that there're no error returns anymore. Remove mt753x_is_mac_port() that used to help the said error returns. On mt7531_mac_config(), switch to if statements to simplify the code. Remove internal phy cases from mt753x_phylink_mac_config(), there is no need to check the interface mode as that's already handled with the function the mac_port_get_caps member in mt753x_table points to. Acked-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:42 +01:00
Arınç ÜNAL	a565f98d7d	net: dsa: mt7530: do not use SW_PHY_RST to reset MT7531 switch According to the document MT7531 Reference Manual for Development Board v1.0, the SW_PHY_RST bit on the SYS_CTRL register doesn't exist for MT7531. This is likely why forcing link down on all ports is necessary for MT7531. Therefore, do not set SW_PHY_RST on mt7531_setup(). Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:42 +01:00
Arınç ÜNAL	804cd5f705	net: dsa: mt7530: set interrupt register only for MT7530 Setting this register related to interrupts is only needed for the MT7530 switch. Make an exclusive check to ensure this. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:42 +01:00
Arınç ÜNAL	6ebe414b48	net: dsa: mt7530: remove .mac_port_config for MT7988 and make it optional For the switch on the MT7988 SoC, the mac_port_config member for ID_MT7988 in mt753x_table is not needed as the interfaces of all MACs are already handled on mt7988_mac_port_get_caps(). Therefore, remove the mac_port_config member from ID_MT7988 in mt753x_table. Before calling priv->info->mac_port_config(), if there's no mac_port_config member in mt753x_table, exit mt753x_mac_config() successfully. Remove calling priv->info->mac_port_config() from the sanity check as the sanity check requires a pointer to a mac_port_config function to be non-NULL. This will fail for MT7988 as mac_port_config won't be a member of its info table. Co-developed-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 12:23:42 +01:00
Paolo Abeni	6702d60d3c	Merge branch 'remove-page-frag-implementation-in-vhost_net' Yunsheng Lin says: ==================== remove page frag implementation in vhost_net Currently there are three implementations for page frag: 1. mm/page_alloc.c: net stack seems to be using it in the rx part with 'struct page_frag_cache' and the main API being page_frag_alloc_align(). 2. net/core/sock.c: net stack seems to be using it in the tx part with 'struct page_frag' and the main API being skb_page_frag_refill(). 3. drivers/vhost/net.c: vhost seems to be using it to build xdp frame, and it's implementation seems to be a mix of the above two. This patchset tries to unfiy the page frag implementation a little bit by unifying gfp bit for order 3 page allocation and replacing page frag implementation in vhost.c with the one in page_alloc.c. After this patchset, we are not only able to unify the page frag implementation a little, but also able to have about 0.5% performance boost testing by using the vhost_net_test introduced in the last patch. Before this patchset: Performance counter stats for './vhost_net_test' (10 runs): 305325.78 msec task-clock # 1.738 CPUs utilized ( +- 0.12% ) 1048668 context-switches # 3.435 K/sec ( +- 0.00% ) 11 cpu-migrations # 0.036 /sec ( +- 17.64% ) 33 page-faults # 0.108 /sec ( +- 0.49% ) 244651819491 cycles # 0.801 GHz ( +- 0.43% ) (64) 64714638024 stalled-cycles-frontend # 26.45% frontend cycles idle ( +- 2.19% ) (67) 30774313491 stalled-cycles-backend # 12.58% backend cycles idle ( +- 7.68% ) (70) 201749748680 instructions # 0.82 insn per cycle # 0.32 stalled cycles per insn ( +- 0.41% ) (66.76%) 65494787909 branches # 214.508 M/sec ( +- 0.35% ) (64) 4284111313 branch-misses # 6.54% of all branches ( +- 0.45% ) (66) 175.699 +- 0.189 seconds time elapsed ( +- 0.11% ) After this patchset: Performance counter stats for './vhost_net_test' (10 runs): 303974.38 msec task-clock # 1.739 CPUs utilized ( +- 0.14% ) 1048807 context-switches # 3.450 K/sec ( +- 0.00% ) 14 cpu-migrations # 0.046 /sec ( +- 12.86% ) 33 page-faults # 0.109 /sec ( +- 0.46% ) 251289376347 cycles # 0.827 GHz ( +- 0.32% ) (60) 67885175415 stalled-cycles-frontend # 27.01% frontend cycles idle ( +- 0.48% ) (63) 27809282600 stalled-cycles-backend # 11.07% backend cycles idle ( +- 0.36% ) (71) 195543234672 instructions # 0.78 insn per cycle # 0.35 stalled cycles per insn ( +- 0.29% ) (69.04%) 62423183552 branches # 205.357 M/sec ( +- 0.48% ) (67) 4135666632 branch-misses # 6.63% of all branches ( +- 0.63% ) (67) 174.764 +- 0.214 seconds time elapsed ( +- 0.12% ) Changelog: V6: Add timeout for poll() and simplify some logic as suggested by Jason. V5: Address the comment from jason in vhost_net_test.c and the comment about leaving out the gfp change for page frag in sock.c as suggested by Paolo. V4: Resend based on latest net-next branch. V3: 1. Add __page_frag_alloc_align() which is passed with the align mask the original function expected as suggested by Alexander. 2. Drop patch 3 in v2 suggested by Alexander. 3. Reorder patch 4 & 5 in v2 suggested by Alexander. Note that placing this gfp flags handing for order 3 page in an inline function is not considered, as we may be able to unify the page_frag and page_frag_cache handling. V2: Change 'xor'd' to 'masked off', add vhost tx testing for vhost_net_test. V1: Fix some typo, drop RFC tag and rebase on latest net-next. ==================== Link: https://lore.kernel.org/r/20240228093013.8263-1-linyunsheng@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:38:17 +01:00
Yunsheng Lin	c5d3705cfd	tools: virtio: introduce vhost_net_test introduce vhost_net_test for both vhost_net tx and rx basing on virtio_test to test vhost_net changing in the kernel. Steps for vhost_net tx testing: 1. Prepare a out buf. 2. Kick the vhost_net to do tx processing. 3. Do the receiving in the tun side. 4. verify the data received by tun is correct. Steps for vhost_net rx testing: 1. Prepare a in buf. 2. Do the sending in the tun side. 3. Kick the vhost_net to do rx processing. 4. verify the data received by vhost_net is correct. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:38:14 +01:00
Yunsheng Lin	4051bd8129	vhost/net: remove vhost_net_page_frag_refill() The page frag in vhost_net_page_frag_refill() uses the 'struct page_frag' from skb_page_frag_refill(), but it's implementation is similar to page_frag_alloc_align() now. This patch removes vhost_net_page_frag_refill() by using 'struct page_frag_cache' instead of 'struct page_frag', and allocating frag using page_frag_alloc_align(). The added benefit is that not only unifying the page frag implementation a little, but also having about 0.5% performance boost testing by using the vhost_net_test introduced in the last patch. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:38:14 +01:00
Yunsheng Lin	a0727489ac	net: introduce page_frag_cache_drain() When draining a page_frag_cache, most user are doing the similar steps, so introduce an API to avoid code duplication. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:38:14 +01:00
Yunsheng Lin	4bc0d63a23	page_frag: unify gfp bits for order 3 page allocation Currently there seems to be three page frag implementations which all try to allocate order 3 page, if that fails, it then fail back to allocate order 0 page, and each of them all allow order 3 page allocation to fail under certain condition by using specific gfp bits. The gfp bits for order 3 page allocation are different between different implementation, __GFP_NOMEMALLOC is or'd to forbid access to emergency reserves memory for __page_frag_cache_refill(), but it is not or'd in other implementions, __GFP_DIRECT_RECLAIM is masked off to avoid direct reclaim in vhost_net_page_frag_refill(), but it is not masked off in __page_frag_cache_refill(). This patch unifies the gfp bits used between different implementions by or'ing __GFP_NOMEMALLOC and masking off __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid possible pressure for mm. Leave the gfp unifying for page frag implementation in sock.c for now as suggested by Paolo Abeni. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> CC: Alexander Duyck <alexander.duyck@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:38:14 +01:00
Yunsheng Lin	411c5f3680	mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument napi_alloc_frag_align() and netdev_alloc_frag_align() accept align as an argument, and they are thin wrappers around the __napi_alloc_frag_align() and __netdev_alloc_frag_align() APIs doing the alignment checking and align mask conversion, in order to call page_frag_alloc_align() directly. The intention here is to keep the alignment checking and the alignmask conversion in in-line wrapper to avoid those kind of operations during execution time since it can usually be handled during compile time. We are going to use page_frag_alloc_align() in vhost_net.c, it need the same kind of alignment checking and alignmask conversion, so split up page_frag_alloc_align into an inline wrapper doing the above operation, and add __page_frag_alloc_align() which is passed with the align mask the original function expected as suggested by Alexander. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Alexander Duyck <alexander.duyck@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:38:14 +01:00
Jiawen Wu	0e71862a20	net: txgbe: fix to clear interrupt status after handling IRQ GPIO EOI is not set to clear interrupt status after handling the interrupt. It should be done in irq_chip->irq_ack, but this function is not called in handle_nested_irq(). So executing function txgbe_gpio_irq_ack() manually in txgbe_gpio_irq_handler(). Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20240301092956.18544-2-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:13:04 +01:00
Jiawen Wu	b4a2496c17	net: txgbe: fix GPIO interrupt blocking The register of GPIO interrupt status is masked before MAC IRQ is enabled. This is because of hardware deficiency. So manually clear the interrupt status before using them. Otherwise, GPIO interrupts will never be reported again. There is a workaround for clearing interrupts to set GPIO EOI in txgbe_up_complete(). Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20240301092956.18544-1-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:13:04 +01:00
Mike Yu	8688ab2170	xfrm: set skb control buffer based on packet offload as well In packet offload, packets are not encrypted in XFRM stack, so the next network layer which the packets will be forwarded to should depend on where the packet came from (either xfrm4_output or xfrm6_output) rather than the matched SA's family type. Test: verified IPv6-in-IPv4 packets on Android device with IPsec packet offload enabled Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-03-05 10:49:44 +01:00
Mike Yu	d4872d70fc	xfrm: fix xfrm child route lookup for packet offload In current code, xfrm_bundle_create() always uses the matched SA's family type to look up a xfrm child route for the skb. The route returned by xfrm_dst_lookup() will eventually be used in xfrm_output_resume() (skb_dst(skb)->ops->local_out()). If packet offload is used, the above behavior can lead to calling ip_local_out() for an IPv6 packet or calling ip6_local_out() for an IPv4 packet, which is likely to fail. This change fixes the behavior by checking if the matched SA has packet offload enabled. If not, keep the same behavior; if yes, use the matched SP's family type for the lookup. Test: verified IPv6-in-IPv4 packets on Android device with IPsec packet offload enabled Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-03-05 10:48:18 +01:00

... 2 3 4 5 6 ...

1253168 Commits