linux

iv/linux

Author	SHA1	Message	Date
Jakub Kicinski	ee8d72a157	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+/uBgAKCRDbK58LschI g0ngAPwJHd1RicBuy2C4fLv0nGKZtmYZBAnTGlI2RisPxU6BRwEAwUDLHuc5K6nR j261okOxOy/MRxdN1NhmR6Qe7nMyQAk= =tYU+ -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-02-17 We've added 64 non-merge commits during the last 7 day(s) which contain a total of 158 files changed, 4190 insertions(+), 988 deletions(-). The main changes are: 1) Add a rbtree data structure following the "next-gen data structure" precedent set by recently-added linked-list, that is, by using kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky. 2) Add a new benchmark for hashmap lookups to BPF selftests, from Anton Protopopov. 3) Fix bpf_fib_lookup to only return valid neighbors and add an option to skip the neigh table lookup, from Martin KaFai Lau. 4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accouting for container environments, from Yafang Shao. 5) Batch of ice multi-buffer and driver performance fixes, from Alexander Lobakin. 6) Fix a bug in determining whether global subprog's argument is PTR_TO_CTX, which is based on type names which breaks kprobe progs, from Andrii Nakryiko. 7) Prep work for future -mcpu=v4 LLVM option which includes usage of BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier, from Eduard Zingerman. 8) More prep work for later building selftests with Memory Sanitizer in order to detect usages of undefined memory, from Ilya Leoshkevich. 9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer dereference via sendmsg(), from Maciej Fijalkowski. 10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui. 11) Fix BPF memory allocator in combination with BPF hashtab where it could corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao. 12) Fix LoongArch BPF JIT to always use 4 instructions for function address so that instruction sequences don't change between passes, from Hengqi Chen. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits) selftests/bpf: Add bpf_fib_lookup test bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup riscv, bpf: Add bpf trampoline support for RV64 riscv, bpf: Add bpf_arch_text_poke support for RV64 riscv, bpf: Factor out emit_call for kernel and bpf context riscv: Extend patch_text for multiple instructions Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" selftests/bpf: Add global subprog context passing tests selftests/bpf: Convert test_global_funcs test to test_loader framework bpf: Fix global subprog context argument resolution logic LoongArch, bpf: Use 4 instructions for function address in JIT bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state bpf: Disable bh in bpf_test_run for xdp and tc prog xsk: check IFF_UP earlier in Tx path Fix typos in selftest/bpf files selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd() ... ==================== Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-20 16:31:14 -08:00
Yury Norov	01bb11ad82	sched/topology: fix KASAN warning in hop_cmp() Despite that prev_hop is used conditionally on cur_hop is not the first hop, it's initialized unconditionally. Because initialization implies dereferencing, it might happen that the code dereferences uninitialized memory, which has been spotted by KASAN. Fix it by reorganizing hop_cmp() logic. Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()") Signed-off-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/Y+7avK6V9SyAWsXi@yury-laptop/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-20 11:45:33 -08:00
Florian Fainelli	3fcdf2dfef	net: bcmgenet: Support wake-up from s2idle When we suspend into s2idle we also need to enable the interrupt line that generates the MPD and HFB interrupts towards the host CPU interrupt controller (typically the ARM GIC or MIPS L1) to make it exit s2idle. When we suspend into other modes such as "standby" or "mem" we engage a power management state machine which will gate off the CPU L1 controller (priv->irq0) and ungate the side band wake-up interrupt (priv->wol_irq). It is safe to have both enabled as wake-up sources because they are mutually exclusive given any suspend mode. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:40:53 +00:00
Eric Dumazet	5f1eb1ff58	scm: add user copy checks to put_cmsg() This is a followup of commit 2558b8039d05 ("net: use a bounce buffer for copying skb->mark") x86 and powerpc define user_access_begin, meaning that they are not able to perform user copy checks when using user_write_access_begin() / unsafe_copy_to_user() and friends [1] Instead of waiting bugs to trigger on other arches, add a check_object_size() in put_cmsg() to make sure that new code tested on x86 with CONFIG_HARDENED_USERCOPY=y will perform more security checks. [1] We can not generically call check_object_size() from unsafe_copy_to_user() because UACCESS is enabled at this point. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:39:59 +00:00
Paolo Abeni	fce10282a0	devlink: drop leftover duplicate/unused code The recent merge from net left-over some unused code in leftover.c - nomen omen. Just drop the unused bits. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:38:35 +00:00
David S. Miller	f6aa90a7a9	linux-can-next-for-6.3-20230217 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmPvPaoTHG1rbEBwZW5n dXRyb25peC5kZQAKCRC+UBxKKooE6AVYB/9E7c1PRrjsG2MO035QqzYxag5RCn6x m6gPPkp4DURYO7/LbCDYyTC7ZwXV5Le4fZgpgOPk3SYWo5WU72CR4SfA8so1pM9/ M4eFuJ1FB81HHYZ8sndBU3jhTYj1jo3Px/sV4UCNhFcc0GWpzWzLeIQHA9uP8LDW ZRelEeptLEPiEfMEQqQzAt1EDEzN+pWm0DmgC0DSxJTH82BW3BzZqjRzh2GL0p3t sIFH+8rwl1R6CLEvIf9tnk4wZhzR5/8fuuVArOkDh8no5lFI4LeBmmA8fuUgHA3v IgVdibS115qxh/B8vozghgtsRMLcJFUY71VeKdW+QXi7XS0xzEmgwh59 =T7os -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-6.3-20230217' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2023-02-17 - fixed this is a pull request of 4 patches for net-next/master. The first patch is by Yang Li and converts the ctucanfd driver to devm_platform_ioremap_resource(). The last 3 patches are by Frank Jungclaus, target the esd_usb driver and contains preparations for the upcoming support of the esd CAN-USB/3 hardware. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:31:22 +00:00
Horatiu Vultur	4d3e050b54	net: lan966x: Use automatic selection of VCAP rule actionset Since commit 81e164c4aec5 ("net: microchip: sparx5: Add automatic selection of VCAP rule actionset") the VCAP API has the capability to select automatically the actionset based on the actions that are attached to the rule. So it is not needed anymore to hardcode the actionset in the driver, therefore it is OK to remove this. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:30:08 +00:00
David S. Miller	38d711aacc	Merge branch 'default_rps_mask-follow-up' Paolo Abeni says: ==================== net: default_rps_mask follow-up The first patch namespacify the setting. In the common case, once proper isolation is in place in the main namespace, forwarding to/from each child netns will allways happen on the desidered CPUs. Any additional RPS stage inside the child namespace will not provide additional isolation and could hurt performance badly if picking a CPU on a remote node. The 2nd patch adds more self-tests coverage. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:22:54 +00:00
Paolo Abeni	3a7d84eae0	self-tests: more rps self tests Explicitly check for child netns and main ns independency Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:22:54 +00:00
Paolo Abeni	50bcfe8df7	net: make default_rps_mask a per netns attribute That really was meant to be a per netns attribute from the beginning. The idea is that once proper isolation is in place in the main namespace, additional demux in the child namespaces will be redundant. Let's make child netns default rps mask empty by default. To avoid bloating the netns with a possibly large cpumask, allocate it on-demand during the first write operation. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:22:54 +00:00
David S. Miller	e469b6268d	wireless-next patches for v6.3 Third set of patches for v6.3. This time only a set of small fixes submitted during the last day or two. -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmPvtW8RHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZtXKQgAijngq4w2/w/jb9VtVGpHnS6SGYIH4L2U bI2zUyNaQcAQTKcA+W4fLP2b37i4xvhfSqXkZ6d62eKNn0q25VI1A6OVFjSDMpDC xsf5MV5esOt+pgnhLvhfhf9+XCyIy15CKnaEVYDXSrZsl9hseUIWILjBT7jETqO4 j0uQZodD+C3KPcjty0nypqj/otUlmtvzbeDGd8G1maXdIJhQI2gSciINaE0TnAev 2DKexHQ/kKYgIIlmKO/E68EMK0CVV18+fXt9xcHspVI1S0AOrT3fki0U1/Zr7heL Y2KxDs32jsGkss6V1c1gkQ/XrPUwnl1o8M2Ftr6KYFykh8xb/Xxufg== =DkcS -----END PGP SIGNATURE----- Merge tag 'wireless-next-2023-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.3 Third set of patches for v6.3. This time only a set of small fixes submitted during the last day or two. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 11:18:13 +00:00
David S. Miller	1155a2281d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add safeguard to check for NULL tupe in objects updates via NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari. 2) Incorrect pointer check in the new destroy rule command, from Yang Yingliang. 3) Incorrect status bitcheck in nf_conntrack_udp_packet(), from Florian Westphal. 4) Simplify seq_print_acct(), from Ilia Gavrilov. 5) Use 2-arg optimal variant of kfree_rcu() in IPVS, from Julian Anastasov. 6) TCP connection enters CLOSE state in conntrack for locally originated TCP reset packet from the reject target, from Florian Westphal. The fixes #2 and #3 in this series address issues from the previous pull nf-next request in this net-next cycle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:53:56 +00:00
Arnd Bergmann	129ff4de58	net: microchip: sparx5: reduce stack usage The vcap_admin structures in vcap_api_next_lookup_advanced_test() take several hundred bytes of stack frame, but when CONFIG_KASAN_STACK is enabled, each one of them also has extra padding before and after it, which ends up blowing the warning limit: In file included from drivers/net/ethernet/microchip/vcap/vcap_api.c:3521: drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c: In function 'vcap_api_next_lookup_advanced_test': drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1954:1: error: the frame size of 1448 bytes is larger than 1400 bytes [-Werror=frame-larger-than=] 1954 \| } Reduce the total stack usage by replacing the five structures with an array that only needs one pair of padding areas. Fixes: 1f741f001160 ("net: microchip: sparx5: Add KUNIT tests for enabling/disabling chains") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:50:47 +00:00
Arnd Bergmann	a59f832a71	sfc: use IS_ENABLED() checks for CONFIG_SFC_SRIOV One local variable has become unused after a recent change: drivers/net/ethernet/sfc/ef100_nic.c: In function 'ef100_probe_netdev_pf': drivers/net/ethernet/sfc/ef100_nic.c:1155:21: error: unused variable 'net_dev' [-Werror=unused-variable] struct net_device *net_dev = efx->net_dev; ^~~~~~~ The variable is still used in an #ifdef. Replace the #ifdef with an if(IS_ENABLED()) check that lets the compiler see where it is used, rather than adding another #ifdef. This also fixes an uninitialized return value in ef100_probe_netdev_pf() that gcc did not spot. Fixes: 7e056e2360d9 ("sfc: obtain device mac address based on firmware handle for ef100") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:49:30 +00:00
Michal Swiatkowski	8173c2f9a1	ice: properly alloc ICE_VSI_LB Devlink reload patchset introduced regression. ICE_VSI_LB wasn't taken into account when doing default allocation. Fix it by adding a case for ICE_VSI_LB in ice_vsi_alloc_def(). Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:47:34 +00:00
Colin Ian King	0d39ad3e1b	sfc: Fix spelling mistake "creationg" -> "creating" There is a spelling mistake in a pci_warn message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Alejandro Lucero <alejandro.lucero-palau@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:44:33 +00:00
Geetha sowjanya	933a01ad59	octeontx2-af: Add NIX Errata workaround on CN10K silicon This patch adds workaround for below 2 HW erratas 1. Due to improper clock gating, NIXRX may free the same NPA buffer multiple times.. to avoid this, always enable NIX RX conditional clock. 2. NIX FIFO does not get initialized on reset, if the SMQ flush is triggered before the first packet is processed, it will lead to undefined state. The workaround to perform SMQ flush only if packet count is non-zero in MDQ. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:42:37 +00:00
Andrew Lunn	c2a978c171	net: phy: Read EEE abilities when using .features A PHY driver can use a static integer value to indicate what link mode features it supports, i.e, its abilities.. This is the old way, but useful when dynamically determining the devices features does not work, e.g. support of fibre. EEE support has been moved into phydev->supported_eee. This needs to be set otherwise the code assumes EEE is not supported. It is normally set as part of reading the devices abilities. However if a static integer value was used, the dynamic reading of the abilities is not performed. Add a call to genphy_c45_read_eee_abilities() to read the EEE abilities. Fixes: 8b68710a3121 ("net: phy: start using genphy_c45_ethtool_get/set_eee()") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:06:49 +00:00
David S. Miller	0b1dbf889d	Merge branch 'phydev-locks' Andrew Lunn says: ==================== Add additional phydev locks The phydev lock should be held when accessing members of phydev, or calling into the driver. Some of the phy_ethtool_ functions are missing locks. Add them. To avoid deadlock the marvell driver is modified since it calls one of the functions which gain locks, which would result in a deadlock. The missing locks have not caused noticeable issues, so these patches are for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:04:22 +00:00
Andrew Lunn	2f987d4866	net: phy: Add locks to ethtool functions The phydev lock should be held while accessing members of phydev, or calling into the driver. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:04:22 +00:00
Andrew Lunn	3365777a6a	net: phy: marvell: Use the unlocked genphy_c45_ethtool_get_eee() phy_ethtool_get_eee() is about to gain locking of the phydev lock. This means it cannot be used within a PHY driver without causing a deadlock. Swap to using genphy_c45_ethtool_get_eee() which assumes the lock has already been taken. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 10:04:22 +00:00
David S. Miller	cf06eef0c8	Merge branch 'icmp6-drop-reason' Eric Dumazet says: ==================== ipv6: icmp6: better drop reason support This series aims to have more precise drop reason reports for icmp6. This should reduce false positives on most usual cases. This can be extended as needed later. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:24 +00:00
Eric Dumazet	ac03694bc0	ipv6: icmp6: add drop reason support to icmpv6_echo_reply() Change icmpv6_echo_reply() to return a drop reason. For the moment, return NOT_SPECIFIED or SKB_CONSUMED. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:23 +00:00
Eric Dumazet	c34b8bb11e	ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_NS_OTHERHOST Hosts can often receive neighbour discovery messages that are not for them. Use a dedicated drop reason to make clear the packet is dropped for this normal case. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:23 +00:00
Eric Dumazet	784d4477f0	ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS This is a generic drop reason for any error detected in ndisc_parse_options(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:23 +00:00
Eric Dumazet	ec993edf05	ipv6: icmp6: add drop reason support to ndisc_redirect_rcv() Change ndisc_redirect_rcv() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED and values from icmpv6_notify(). More reasons are added later. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:23 +00:00
Eric Dumazet	2f326d9d9f	ipv6: icmp6: add drop reason support to ndisc_router_discovery() Change ndisc_router_discovery() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED and SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:23 +00:00
Eric Dumazet	243e37c642	ipv6: icmp6: add drop reason support to ndisc_recv_rs() Change ndisc_recv_rs() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:23 +00:00
Eric Dumazet	3009f9ae21	ipv6: icmp6: add drop reason support to ndisc_recv_na() Change ndisc_recv_na() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:23 +00:00
Eric Dumazet	7c9c8913f4	ipv6: icmp6: add drop reason support to ndisc_recv_ns() Change ndisc_recv_ns() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:54:23 +00:00
Eric Dumazet	dd1b527831	net: add location to trace_consume_skb() kfree_skb() includes the location, it makes sense to add it to consume_skb() as well. After patch: taskd_EventMana 8602 [004] 420.406239: skb:consume_skb: skbaddr=0xffff893a4a6d0500 location=unix_stream_read_generic swapper 0 [011] 422.732607: skb:consume_skb: skbaddr=0xffff89597f68cee0 location=mlx4_en_free_tx_desc discipline 9141 [043] 423.065653: skb:consume_skb: skbaddr=0xffff893a487e9c00 location=skb_consume_udp swapper 0 [010] 423.073166: skb:consume_skb: skbaddr=0xffff8949ce9cdb00 location=icmpv6_rcv borglet 8672 [014] 425.628256: skb:consume_skb: skbaddr=0xffff8949c42e9400 location=netlink_dump swapper 0 [028] 426.263317: skb:consume_skb: skbaddr=0xffff893b1589dce0 location=net_rx_action wget 14339 [009] 426.686380: skb:consume_skb: skbaddr=0xffff893a51b552e0 location=tcp_rcv_state_process Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:28:49 +00:00
Xuan Zhuo	9f78bf330a	xsk: support use vaddr as ring When we try to start AF_XDP on some machines with long running time, due to the machine's memory fragmentation problem, there is no sufficient contiguous physical memory that will cause the start failure. If the size of the queue is 8 * 1024, then the size of the desc[] is 8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is 16page+. This is necessary to apply for a 4-order memory. If there are a lot of queues, it is difficult to these machine with long running time. Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but we only use 17 pages. This patch replaces __get_free_pages() by vmalloc() to allocate memory to solve these problems. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-20 08:22:12 +00:00
Paolo Abeni	b148d400f8	Merge branch 'taprio-queuemaxsdu-fixes' Vladimir Oltean says: ==================== taprio queueMaxSDU fixes This fixes 3 issues noticed while attempting to reoffload the dynamically calculated queueMaxSDU values. These are: - Dynamic queueMaxSDU is not calculated correctly due to a lost patch - Dynamically calculated queueMaxSDU needs to be clamped on the low end - Dynamically calculated queueMaxSDU needs to be clamped on the high end ==================== Link: https://lore.kernel.org/r/20230215224632.2532685-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:46:59 +01:00
Vladimir Oltean	64cb6aad12	net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimited It makes no sense to keep randomly large max_sdu values, especially if larger than the device's max_mtu. These are visible in "tc qdisc show". Such a max_sdu is practically unlimited and will cause no packets for that traffic class to be dropped on enqueue. Just set max_sdu_dynamic to U32_MAX, which in the logic below causes taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user space of 0 (unlimited). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:46:57 +01:00
Vladimir Oltean	bdf366bd86	net/sched: taprio: don't allow dynamic max_sdu to go negative after stab adjustment The overhead specified in the size table comes from the user. With small time intervals (or gates always closed), the overhead can be larger than the max interval for that traffic class, and their difference is negative. What we want to happen is for max_sdu_dynamic to have the smallest non-zero value possible (1) which means that all packets on that traffic class are dropped on enqueue. However, since max_sdu_dynamic is u32, a negative is represented as a large value and oversized dropping never happens. Use max_t with int to force a truncation of max_frm_len to no smaller than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no smaller than 1. Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:46:57 +01:00
Vladimir Oltean	09dbdf28f9	net/sched: taprio: fix calculation of maximum gate durations taprio_calculate_gate_durations() depends on netdev_get_num_tc() and this returns 0. So it calculates the maximum gate durations for no traffic class. I had tested the blamed commit only with another patch in my tree, one which in the end I decided isn't valuable enough to submit ("net/sched: taprio: mask off bits in gate mask that exceed number of TCs"). The problem is that having this patch threw off my testing. By moving the netdev_set_num_tc() call earlier, we implicitly gave to taprio_calculate_gate_durations() the information it needed. Extract only the portion from the unsubmitted change which applies the mqprio configuration to the netdev earlier. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/ Fixes: a306a90c8ffe ("net/sched: taprio: calculate tc gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:46:57 +01:00
David Howells	c078381856	rxrpc: Fix overproduction of wakeups to recvmsg() Fix three cases of overproduction of wakeups: (1) rxrpc_input_split_jumbo() conditionally notifies the app that there's data for recvmsg() to collect if it queues some data - and then its only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway. Fix the rxrpc_input_data() to only do the wakeup in failure cases. (2) If a DATA packet is received for a call by the I/O thread whilst recvmsg() is busy draining the call's rx queue in the app thread, the call will left on the recvmsg() queue for recvmsg() to pick up, even though there isn't any data on it. This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR set after the reply has been posted to a service call. Fix this by discarding pending calls from the recvmsg() queue that don't need servicing yet. (3) Not-yet-completed calls get requeued after having data read from them, even if they have no data to read. Fix this by only requeuing them if they have data waiting on them; if they don't, the I/O thread will requeue them when data arrives or they fail. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/3386149.1676497685@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:33:25 +01:00
Paolo Abeni	d269ac136e	Merge branch 'net-final-gsi-register-updates' Alex Elder says: ==================== net: final GSI register updates I believe this is the last set of changes required to allow IPA v5.0 to be supported. There is a little cleanup work remaining, but that can happen in the next Linux release cycle. Otherwise we just need config data and register definitions for IPA v5.0 (and DTS updates). These are ready but won't be posted without further testing. The first patch in this series fixes a minor bug in a patch just posted, which I found too late. The second eliminates the GSI memory "adjustment"; this was done previously to avoid/delay the need to implement a more general way to define GSI register offsets. Note that this patch causes "checkpatch" warnings due to indentation that aligns with an open parenthesis. The third patch makes use of the newly-defined register offsets, to eliminate the need for a function that hid a few details. The next modifies a different helper function to work properly for IPA v5.0+. The fifth patch changes the way the event ring size is specified based on how it's now done for IPA v5.0+. And the last defines a new register required for IPA v5.0+. ==================== Link: https://lore.kernel.org/r/20230215195352.755744-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:14:22 +01:00
Alex Elder	f651334e1e	net: ipa: add HW_PARAM_4 GSI register Starting at IPA v5.0, the number of event rings per EE is defined in a field in a new HW_PARAM_4 GSI register rather than HW_PARAM_2. Define this new register and its fields, and update the code that checks the number of rings supported by hardware to use the proper field based on IPA version. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:14:20 +01:00
Alex Elder	37cd29ec84	net: ipa: support different event ring encoding Starting with IPA v5.0, a channel's event ring index is encoded in a field in the CH_C_CNTXT_1 GSI register rather than CH_C_CNTXT_0. Define a new field ID for the former register and encode the event ring in the appropriate register. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:14:20 +01:00
Alex Elder	62747512eb	net: ipa: avoid setting an undefined field The GSI channel protocol field in the CH_C_CNTXT_0 GSI register is widened starting IPA v5.0, making the CHTYPE_PROTOCOL_MSB field added in IPA v4.5 unnecessary. Update the code to reflect this. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:14:20 +01:00
Alex Elder	f75f44ddd4	net: ipa: kill ev_ch_e_cntxt_1_length_encode() Now that we explicitly define each register field width there is no need to have a special encoding function for the event ring length. Add a field for this to the EV_CH_E_CNTXT_1 GSI register, and use it in place of ev_ch_e_cntxt_1_length_encode() (which can be removed). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:14:20 +01:00
Alex Elder	59b12b1d27	net: ipa: kill gsi->virt_raw Starting at IPA v4.5, almost all GSI registers had their offsets changed by a fixed amount (shifted downward by 0xd000). Rather than defining offsets for all those registers dependent on version, an adjustment was applied for most register accesses. This was implemented in commit cdeee49f3ef7f ("net: ipa: adjust GSI register addresses"). It was later modified to be a bit more obvious about the adjusment, in commit 571b1e7e58ad3 ("net: ipa: use a separate pointer for adjusted GSI memory"). We now are able to define every GSI register with its own offset, so there's no need to implement this special adjustment. So get rid of the "virt_raw" pointer, and just maintain "virt" as the (non-adjusted) base address of I/O mapped GSI register memory. Redefine the offsets of all GSI registers (other than the INTER_EE ones, which were not subject to the adjustment) for IPA v4.5+, subtracting 0xd000 from their defined offsets instead. Move the ERROR_LOG and ERROR_LOG_CLR definitions further down in the register definition files so all registers are defined in order of their offset. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:14:20 +01:00
Alex Elder	ecfa80ce3b	net: ipa: fix an incorrect assignment I spotted an error in a patch posted this week, unfortunately just after it got accepted. The effect of the bug is that time-based interrupt moderation is disabled. This is not technically a bug, but it is not what is intended. The problem is that a \|= assignment got implemented as a simple assignment, so the previously assigned value was ignored. Fixes: edc6158b18af ("net: ipa: define fields for event-ring related registers") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 08:14:20 +01:00
Lorenzo Bianconi	1c93e48cc3	net: dpaa2-eth: do not always set xsk support in xdp_features flag Do not always add NETDEV_XDP_ACT_XSK_ZEROCOPY bit in xdp_features flag but check if the NIC really supports it. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/3dba6ea42dc343a9f2d7d1a6a6a6c173235e1ebf.1676471386.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-20 07:49:44 +01:00
Martin KaFai Lau	168de02335	selftests/bpf: Add bpf_fib_lookup test This patch tests the bpf_fib_lookup helper when looking up a neigh in NUD_FAILED and NUD_STALE state. It also adds test for the new BPF_FIB_LOOKUP_SKIP_NEIGH flag. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-2-martin.lau@linux.dev	2023-02-17 22:12:04 +01:00
Martin KaFai Lau	31de4105f0	bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev	2023-02-17 22:12:04 +01:00
Pu Lehui	49b5e77ae3	riscv, bpf: Add bpf trampoline support for RV64 BPF trampoline is the critical infrastructure of the BPF subsystem, acting as a mediator between kernel functions and BPF programs. Numerous important features, such as using BPF program for zero overhead kernel introspection, rely on this key component. We can't wait to support bpf trampoline on RV64. The related tests have passed, as well as the test_verifier with no new failure ceses. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-5-pulehui@huaweicloud.com	2023-02-17 21:45:30 +01:00
Pu Lehui	596f2e6f9c	riscv, bpf: Add bpf_arch_text_poke support for RV64 Implement bpf_arch_text_poke for RV64. For call scenario, to make BPF trampoline compatible with the kernel and BPF context, we follow the framework of RV64 ftrace to reserve 4 nops for BPF programs as function entry, and use auipc+jalr instructions for function call. However, since auipc+jalr call instruction is non-atomic operation, we need to use stop-machine to make sure instructions patching in atomic context. Also, we use auipc+jalr pair and need to patch in stop-machine context for jump scenario. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-4-pulehui@huaweicloud.com	2023-02-17 21:45:30 +01:00
Pu Lehui	0fd1fd0104	riscv, bpf: Factor out emit_call for kernel and bpf context The current emit_call function is not suitable for kernel function call as it store return value to bpf R0 register. We can separate it out for common use. Meanwhile, simplify judgment logic, that is, fixed function address can use jal or auipc+jalr, while the unfixed can use only auipc+jalr. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-3-pulehui@huaweicloud.com	2023-02-17 21:45:30 +01:00

1 2 3 4 5 ...

1157584 Commits