linux

iv/linux

Author	SHA1	Message	Date
David S. Miller	2ba6d15786	Merge branch 'fix-changing-dsa-conduit' Marek Behún says: ==================== Fix changing DSA conduit This series fixes an issue in the DSA code related to host interface UC address installed into port FDB and port conduit address database when live-changing port conduit. The first patch refactores/deduplicates the installation/uninstallation of the interface's MAC address and the second patch fixes the issue. Cover letter for v1 and v2: https://patchwork.kernel.org/project/netdevbpf/cover/20240429163627.16031-1-kabel@kernel.org/ https://patchwork.kernel.org/project/netdevbpf/cover/20240502122922.28139-1-kabel@kernel.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 13:48:06 +01:00
Marek Behún	eef8e906ae	net: dsa: update the unicast MAC address when changing conduit When changing DSA user interface conduit while the user interface is up, DSA exhibits different behavior in comparison to when the interface is down. This different behavior concerns the primary unicast MAC address stored in the port standalone FDB and in the conduit device UC database. If we put a switch port down while changing the conduit with ip link set sw0p0 down ip link set sw0p0 type dsa conduit conduit1 ip link set sw0p0 up we delete the address in dsa_user_close() and install the (possibly different) address in dsa_user_open(). But when changing the conduit on the fly, the old address is not deleted and the new one is not installed. Since we explicitly want to support live-changing the conduit, uninstall the old address before calling dsa_port_assign_conduit() and install the (possibly different) new address after the call. Because conduit change might also trigger address change (the user interface is supposed to inherit the conduit interface MAC address if no address is defined in hardware (dp->mac is a zero address)), move the eth_hw_addr_inherit() call from dsa_user_change_conduit() to dsa_port_change_conduit(), just before installing the new address. Although this is in theory a flaw in DSA core, it needs not be backported, since there is currently no DSA driver that can be affected by this. The only DSA driver that supports changing conduit is felix, and, as explained by Vladimir Oltean [1]: There are 2 reasons why with felix the bug does not manifest itself. First is because both the 'ocelot' and the alternate 'ocelot-8021q' tagging protocols have the 'promisc_on_conduit = true' flag. So the unicast address doesn't have to be in the conduit's RX filter - neither the old or the new conduit. Second, dsa_user_host_uc_install() theoretically leaves behind host FDB entries installed towards the wrong (old) CPU port. But in felix_fdb_add(), we treat any FDB entry requested towards any CPU port as if it was a multicast FDB entry programmed towards _all_ CPU ports. For that reason, it is installed towards the port mask of the PGID_CPU port group ID: if (dsa_port_is_cpu(dp)) port = PGID_CPU; Therefore no Fixes tag for this change. [1] https://lore.kernel.org/netdev/20240507201827.47suw4fwcjrbungy@skbuf/ Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 13:48:06 +01:00
Marek Behún	77f7541248	net: dsa: deduplicate code adding / deleting the port address to fdb The sequence if (dsa_switch_supports_uc_filtering(ds)) dsa_port_standalone_host_fdb_add(dp, addr, 0); if (!ether_addr_equal(addr, conduit->dev_addr)) dev_uc_add(conduit, addr); is executed both in dsa_user_open() and dsa_user_set_mac_addr(). Its reverse is executed both in dsa_user_close() and dsa_user_set_mac_addr(). Refactor these sequences into new functions dsa_user_host_uc_install() and dsa_user_host_uc_uninstall(). Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 13:48:06 +01:00
David S. Miller	395059c52e	Merge branch 'rtnetlink-rtnl_lock' Jakub Kicinski says: ==================== rtnetlink: move rtnl_lock handling out of af_netlink With the changes done in commit 5b4b62a169e1 ("rtnetlink: make the "split" NLM_DONE handling generic") we can also move the rtnl locking out of af_netlink. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 13:15:40 +01:00
Jakub Kicinski	5fbf57a937	net: netlink: remove the cb_mutex "injection" from netlink core Back in 2007, in commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it") netlink core was extended to allow subsystems to replace the dump mutex lock with its own lock. The mechanism was used by rtnetlink to take rtnl_lock but it isn't sufficiently flexible for other users. Over the 17 years since it was added no other user appeared. Since rtnetlink needs conditional locking now, and doesn't use it either, axe this feature complete. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 13:15:40 +01:00
Jakub Kicinski	5380d64f8d	rtnetlink: move rtnl_lock handling out of af_netlink Now that we have an intermediate layer of code for handling rtnl-level netlink dump quirks, we can move the rtnl_lock taking there. For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can avoid taking rtnl_lock just to generate NLM_DONE, once again. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 13:15:40 +01:00
Andy Shevchenko	c917b26e16	net: dsa: hellcreek: Replace kernel.h with what is used kernel.h is included solely for some other existing headers. Include them directly and get rid of kernel.h. While at it, sort headers alphabetically for easier maintenance. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 13:13:02 +01:00
David S. Miller	a9522664c6	Merge branch 'tcp-up-pin-tw-timer' Florian Westphal says: ==================== net: tcp: un-pin tw timer Changes since previous iteration: - Patch 1: update a comment, I copied Erics v7 RvB tag. - Patch 2: move bh off/on into hashdance_schedule and get rid of comment mentioning pinned tw timer. I did not copy Erics RvB tag over from v7 because of the change. - Patch 3 is unchanged, so I kept Erics RvB tag. This is v8 of the series where the tw_timer is un-pinned to get rid of interferences in isolated CPUs setups. First patch makes necessary preparations, existing code relies on TIMER_PINNED to avoid races. Second patch un-pins the TW timer. Could be folded into the first one, but it might help wrt. bisection. Third patch is a minor cleanup to move a helper from .h to the only remaining compilation unit. Tested with iperf3 and stress-ng socket mode. ==================== Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:54:19 +01:00
Florian Westphal	f81d0dd2fd	tcp: move inet_twsk_schedule helper out of header Its no longer used outside inet_timewait_sock.c, so move it there. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:54:18 +01:00
Florian Westphal	c75ad7c759	net: tcp: un-pin the tw_timer After previous patch, even if timer fires immediately on another CPU, context that schedules the timer now holds the ehash spinlock, so timer cannot reap tw socket until ehash lock is released. BH disable is moved into hashdance_schedule. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:54:18 +01:00
Valentin Schneider	b334b924c9	net: tcp/dccp: prepare for tw_timer un-pinning The TCP timewait timer is proving to be problematic for setups where scheduler CPU isolation is achieved at runtime via cpusets (as opposed to statically via isolcpus=domains). What happens there is a CPU goes through tcp_time_wait(), arming the time_wait timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer fires, causing interference for the now-isolated CPU. This is conceptually similar to the issue described in commit e02b93124855 ("workqueue: Unbind kworkers before sending them to exit()") Move inet_twsk_schedule() to within inet_twsk_hashdance(), with the ehash lock held. Expand the lock's critical section from inet_twsk_kill() to inet_twsk_deschedule_put(), serializing the scheduling vs descheduling of the timer. IOW, this prevents the following race: tcp_time_wait() inet_twsk_hashdance() inet_twsk_deschedule_put() del_timer_sync() inet_twsk_schedule() Thanks to Paolo Abeni for suggesting to leverage the ehash lock. This also restores a comment from commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance") as inet_twsk_hashdance() had a "Step 1" and "Step 3" comment, but the "Step 2" had gone missing. inet_twsk_deschedule_put() now acquires the ehash spinlock to synchronize with inet_twsk_hashdance_schedule(). To ease possible regression search, actual un-pin is done in next patch. Link: https://lore.kernel.org/all/ZPhpfMjSiHVjQkTk@localhost.localdomain/ Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:54:18 +01:00
David S. Miller	8d466c8f45	Merge branch 'mlxsw-acl-fixes' Petr Machata says: ==================== mlxsw: ACL fixes Ido Schimmel writes: Patches #1-#3 fix various spelling mistakes I noticed while working on the code base. Patch #4 fixes a general protection fault by bailing out when the error occurs and warning. Patch #5 fixes the warning. Patch #6 fixes ACL scale regression and firmware errors. See the commit messages for more info. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:14:53 +01:00
Ido Schimmel	75d8d7a630	mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and newer ASICs can share the same mask if their masks only differ in up to 8 consecutive bits. For example, consider the following filters: # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop The second filter can use the same mask as the first (dst_ip/24) with a delta of 1 bit. However, the above only works because the two filters have different values in the common unmasked part (dst_ip/24). When entries have the same value in the common unmasked part they create undesired collisions in the device since many entries now have the same key. This leads to firmware errors such as [1] and to a reduced scale. Fix by adjusting the hash table key to only include the value in the common unmasked part. That is, without including the delta bits. That way the driver will detect the collision during filter insertion and spill the filter into the circuit TCAM (C-TCAM). Add a test case that fails without the fix and adjust existing cases that check C-TCAM spillage according to the above limitation. [1] mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available)) Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP") Reported-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:14:52 +01:00
Ido Schimmel	97d833ceb2	mlxsw: spectrum_acl_erp: Fix object nesting warning ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM (A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can contain more ACLs (i.e., tc filters), but the number of masks in each region (i.e., tc chain) is limited. In order to mitigate the effects of the above limitation, the device allows filters to share a single mask if their masks only differ in up to 8 consecutive bits. For example, dst_ip/25 can be represented using dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the number of masks being used (and therefore does not support mask aggregation), but can contain a limited number of filters. The driver uses the "objagg" library to perform the mask aggregation by passing it objects that consist of the filter's mask and whether the filter is to be inserted into the A-TCAM or the C-TCAM since filters in different TCAMs cannot share a mask. The set of created objects is dependent on the insertion order of the filters and is not necessarily optimal. Therefore, the driver will periodically ask the library to compute a more optimal set ("hints") by looking at all the existing objects. When the library asks the driver whether two objects can be aggregated the driver only compares the provided masks and ignores the A-TCAM / C-TCAM indication. This is the right thing to do since the goal is to move as many filters as possible to the A-TCAM. The driver also forbids two identical masks from being aggregated since this can only happen if one was intentionally put in the C-TCAM to avoid a conflict in the A-TCAM. The above can result in the following set of hints: H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta After getting the hints from the library the driver will start migrating filters from one region to another while consulting the computed hints and instructing the device to perform a lookup in both regions during the transition. Assuming a filter with mask X is being migrated into the A-TCAM in the new region, the hints lookup will return H1. Since H2 is the parent of H1, the library will try to find the object associated with it and create it if necessary in which case another hints lookup (recursive) will be performed. This hints lookup for {mask Y, A-TCAM} will either return H2 or H3 since the driver passes the library an object comparison function that ignores the A-TCAM / C-TCAM indication. This can eventually lead to nested objects which are not supported by the library [1]. Fix by removing the object comparison function from both the driver and the library as the driver was the only user. That way the lookup will only return exact matches. I do not have a reliable reproducer that can reproduce the issue in a timely manner, but before the fix the issue would reproduce in several minutes and with the fix it does not reproduce in over an hour. Note that the current usefulness of the hints is limited because they include the C-TCAM indication and represent aggregation that cannot actually happen. This will be addressed in net-next. [1] WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0 Modules linked in: CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42 Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0 [...] Call Trace: <TASK> __objagg_obj_get+0x2bb/0x580 objagg_obj_get+0xe/0x80 mlxsw_sp_acl_erp_mask_get+0xb5/0xf0 mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270 mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510 process_one_work+0x151/0x370 Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:14:52 +01:00
Ido Schimmel	b4a3a89fff	lib: objagg: Fix general protection fault The library supports aggregation of objects into other objects only if the parent object does not have a parent itself. That is, nesting is not supported. Aggregation happens in two cases: Without and with hints, where hints are a pre-computed recommendation on how to aggregate the provided objects. Nesting is not possible in the first case due to a check that prevents it, but in the second case there is no check because the assumption is that nesting cannot happen when creating objects based on hints. The violation of this assumption leads to various warnings and eventually to a general protection fault [1]. Before fixing the root cause, error out when nesting happens and warn. [1] general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G W 6.9.0-rc6-custom-gd9b4f1cca7fb #7 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_sp_acl_erp_bf_insert+0x25/0x80 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_entry_add+0x256/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270 mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation") Reported-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:14:52 +01:00
Ido Schimmel	06fcdf2494	mlxsw: spectrum_acl_atcam: Fix wrong comment The key is encoded, not encrypted. Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:14:52 +01:00
Ido Schimmel	2aad28ec45	lib: test_objagg: Fix spelling Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:14:52 +01:00
Ido Schimmel	c1e156ae50	lib: objagg: Fix spelling Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-10 11:14:52 +01:00
Dan Carpenter	28f961f9d5	dmaengine: ti: k3-udma-glue: clean up return in k3_udma_glue_rx_get_irq() Currently the k3_udma_glue_rx_get_irq() function returns either negative error codes or zero on error. Generally, in the kernel, zero means success so this be confusing and has caused bugs in the past. Also the "tx" version of this function only returns negative error codes. Let's clean this "rx" function so both functions match. This patch has no effect on runtime. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-09 17:35:21 +01:00
Jakub Kicinski	924ee53175	tools: ynl: make user space policies const Dan, who's working on C++ YNL, pointed out that the C code does not make policies const. Sprinkle some 'const's around. Reported-by: Dan Melnic <dmm@meta.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-09 15:51:40 +01:00
David Wei	3e61103b2f	page_pool: remove WARN_ON() with OR Having an OR in WARN_ON() makes me sad because it's impossible to tell which condition is true when triggered. Split a WARN_ON() with an OR in page_pool_disable_direct_recycling(). Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-09 15:50:43 +01:00
MD Danish Anwar	a999973236	net: ti: icssg-prueth: Add multicast filtering support Add multicast filtering support for ICSSG Driver. Multicast addresses will be updated by __dev_mc_sync() API. icssg_prueth_add_macst () and icssg_prueth_del_mcast() will be sync and unsync APIs for the driver respectively. To add a mac_address for a port, driver needs to call icssg_fdb_add_del() and pass the mac_address and BIT(port_id) to the API. The ICSSG firmware will then configure the rules and allow filtering. If a mac_address is added to port0 and the same mac_address needs to be added for port1, driver needs to pass BIT(port0) \| BIT(port1) to the icssg_fdb_add_del() API. If driver just pass BIT(port1) then the entry for port0 will be overwritten / lost. This is a design constraint on the firmware side. To overcome this in the driver, to add any mac_address for let's say portX driver first checks if the same mac_address is already added for any other port. If yes driver calls icssg_fdb_add_del() with BIT(portX) \| BIT(other_existing_port). If not, driver calls icssg_fdb_add_del() with BIT(portX). The same thing is applicable for deleting mac_addresses as well. This logic is in icssg_prueth_add_mcast / icssg_prueth_del_mcast APIs. Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-07 14:21:19 +01:00
Milan Broz	cb6cf0820f	r8152: Set NET_ADDR_STOLEN if using passthru MAC Some docks support MAC pass-through - MAC address is taken from another device. Driver should indicate that with NET_ADDR_STOLEN flag. This should help to avoid collisions if network interface names are generated with MAC policy. Reported and discussed here https://github.com/systemd/systemd/issues/33104 Signed-off-by: Milan Broz <gmazyland@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240605153340.25694-1-gmazyland@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-06 17:33:12 -07:00
Jakub Kicinski	62b5bf58b9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/pensando/ionic/ionic_txrx.c d9c04209990b ("ionic: Mark error paths in the data path as unlikely") 491aee894a08 ("ionic: fix kernel panic in XDP_TX action") net/ipv6/ip6_fib.c b4cb4a1391dc ("net: use unrcu_pointer() helper") b01e1c030770 ("ipv6: fix possible race in __fib6_drop_pcpu_from()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-06 12:06:56 -07:00
Linus Torvalds	d30d0e49da	Including fixes from BPF and big collection of fixes for WiFi core and drivers. Current release - regressions: - vxlan: fix regression when dropping packets due to invalid src addresses - bpf: fix a potential use-after-free in bpf_link_free() - xdp: revert support for redirect to any xsk socket bound to the same UMEM as it can result in a corruption - virtio_net: - add missing lock protection when reading return code from control_buf - fix false-positive lockdep splat in DIM - Revert "wifi: wilc1000: convert list management to RCU" - wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config Previous releases - regressions: - rtnetlink: make the "split" NLM_DONE handling generic, restore the old behavior for two cases where we started coalescing those messages with normal messages, breaking sloppily-coded userspace - wifi: - cfg80211: validate HE operation element parsing - cfg80211: fix 6 GHz scan request building - mt76: mt7615: add missing chanctx ops - ath11k: move power type check to ASSOC stage, fix connecting to 6 GHz AP - ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs - rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS - iwlwifi: mvm: fix a crash on 7265 Previous releases - always broken: - ncsi: prevent multi-threaded channel probing, a spec violation - vmxnet3: disable rx data ring on dma allocation failure - ethtool: init tsinfo stats if requested, prevent unintentionally reporting all-zero stats on devices which don't implement any - dst_cache: fix possible races in less common IPv6 features - tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED - ax25: fix two refcounting bugs - eth: ionic: fix kernel panic in XDP_TX action Misc: - tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZh3mUACgkQMUZtbf5S IrvPwRAApv8X0ZIbPD5PuVEkiYuSkSE6QVou5GaVO7DzF4gj07zPNtCe6B/ZZdBu RLdlppxjAmVwdCRmUo0plxSydYZcqFpQqV6lRH/rbWmktWIp0pGIOAcOG7ISRPCC FAYJ4udSt4+wrq0hXTsE1KO1JZ0p7zE2bXxNC8uR8wgM9yonUjqhYdAUZhrl3yCY zOCD/+kvWFLYtehDcmyNK0ANS3yNveTNkRhXDc1UrpOGMtza60lf5u3bWK+sU5VS NGPe9cU60WKMQi6QnWFBZKIcp4Vgy2MukOLdNn9e8BRjFLh2dbY86LAmE4HWPA7I ONZagOfEjeOcRSCMdFHxui/PUDZLBZNhrnqQ6x8uC2yKwwIMr+CgEt5sCmVFwH6n 3HTlWSjL38yuiVuYuhxGchmVnZfC4bLi2qAFF1oxhlDGViBDhAwi36MSCnjDpN8k Jo0x6crQLS/uvwVXPKWAUcQhy7OE69A3FwwA1PtkxRX5EQPn1if2Z7yq7YfYb9aD bChvCarlfuVDm+CBItphXg0ajVZc+im7+JK62Zn50A1cTbEK0lnYCOcmqzqiqrXI Vr3XXt6gVVnvwY374JDO1vmB5ft2IYBn7sWnLcIvR2UlggqEfqMdKSSwm7pOprG9 YJ/LDAXVmG0kLN7rZUYUBLItnpuHAhYDrBOsV5HaFeksWauc1oY= =mwEJ -----END PGP SIGNATURE----- Merge tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from BPF and big collection of fixes for WiFi core and drivers. Current release - regressions: - vxlan: fix regression when dropping packets due to invalid src addresses - bpf: fix a potential use-after-free in bpf_link_free() - xdp: revert support for redirect to any xsk socket bound to the same UMEM as it can result in a corruption - virtio_net: - add missing lock protection when reading return code from control_buf - fix false-positive lockdep splat in DIM - Revert "wifi: wilc1000: convert list management to RCU" - wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config Previous releases - regressions: - rtnetlink: make the "split" NLM_DONE handling generic, restore the old behavior for two cases where we started coalescing those messages with normal messages, breaking sloppily-coded userspace - wifi: - cfg80211: validate HE operation element parsing - cfg80211: fix 6 GHz scan request building - mt76: mt7615: add missing chanctx ops - ath11k: move power type check to ASSOC stage, fix connecting to 6 GHz AP - ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs - rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS - iwlwifi: mvm: fix a crash on 7265 Previous releases - always broken: - ncsi: prevent multi-threaded channel probing, a spec violation - vmxnet3: disable rx data ring on dma allocation failure - ethtool: init tsinfo stats if requested, prevent unintentionally reporting all-zero stats on devices which don't implement any - dst_cache: fix possible races in less common IPv6 features - tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED - ax25: fix two refcounting bugs - eth: ionic: fix kernel panic in XDP_TX action Misc: - tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB" * tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits) selftests: net: lib: set 'i' as local selftests: net: lib: avoid error removing empty netns name selftests: net: lib: support errexit with busywait net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool() ipv6: fix possible race in __fib6_drop_pcpu_from() af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill(). af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen(). af_unix: Use skb_queue_empty_lockless() in unix_release_sock(). af_unix: Use unix_recvq_full_lockless() in unix_stream_connect(). af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen. af_unix: Annotate data-races around sk->sk_sndbuf. af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG. af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb(). af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg(). af_unix: Annotate data-race of sk->sk_state in unix_accept(). af_unix: Annotate data-race of sk->sk_state in unix_stream_connect(). af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll(). af_unix: Annotate data-race of sk->sk_state in unix_inq_len(). af_unix: Annodate data-races around sk->sk_state for writers. af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer. ...	2024-06-06 09:55:27 -07:00
Linus Torvalds	2faf6332c5	Single patch, no behavior changes. Tetsuo Handa (1): tomoyo: update project links Documentation/admin-guide/LSM/tomoyo.rst \| 35 +++++++++---------------------- MAINTAINERS \| 2 - security/tomoyo/Kconfig \| 2 - security/tomoyo/common.c \| 2 - 4 files changed, 14 insertions(+), 27 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJmYchgAAoJEEJfEo0MZPUqqXYP/ROdUgeoGCYo4Fv7PKoQtiwm cCf53gQD0ozv2pVpYQH6TnF4MUfnqxEjskYgL9sJahwSQ8pdyj8SO08uVACBgwuJ 1cXAGrSBFJEYTZY9/V3JTSbdTvqQsVTwSii3hj/VABfYTTQtnLdqPFmsslStIstx sGNwIZPvwX5xCTG6YkZCXBPtZGxtAhVvUueRF46525qZvmsgV7ziGfqNecNdfHyi 6wiw9HXZJlaKcj+RNQrcc10JeX0g3we3gpVIa8FJ5+wnpOvVuQOtq9lm9Idzw6xo AsKyg3jTjDaJjIv125lv7++DIXyipvDK8TPZJOwiC8WYsChLveb+fZV3YMNLpz2N Qepgzejf1O7rLT55zJID4KQGwCkCTg7TJILLA57wFAa+7VGLspkvIXsNzOjpe9P7 9ufclnrAkM1RbBIUqSOj1OcTm6dSBkNG32MI869NZ6M8UH3gDbmCLTsMNv7JT2Qe ax7E8zRqDTJBzH4dcAIKJ1pFF4lIj6H7dhbDJf0TPB89UGJdBdil4b+JIaJyZXEn 0M/RFdPiiw/vGsaFn1m6RCkV0WuuLhUHCOhq+0ukzsVfs9XqXWs/Yfngt07I3ldH ALB+dE7sddFI0dvyrJub/MTd3KRHZfB6TF1mKeHQe7Y4lR1TNctQxUuqClDJVXaT a38bb4G+qgIOcVMHeSaL =cwIu -----END PGP SIGNATURE----- Merge tag 'tomoyo-pr-20240606' of git://git.code.sf.net/p/tomoyo/tomoyo Pull tomoyo fixlet from Tetsuo Handa: "Single patch to update project links, no behavior changes" * tag 'tomoyo-pr-20240606' of git://git.code.sf.net/p/tomoyo/tomoyo: tomoyo: update project links	2024-06-06 09:48:57 -07:00
Linus Torvalds	a34adf6010	EFI fixes for v6.10 #2 - Ensure that .discard sections are really discarded in the EFI zboot image build - Return proper error numbers from efi-pstore - Add __nocfi annotations to EFI runtime wrappers -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZmAmwwAKCRAwbglWLn0t XNbNAQDsnOTRK4Azr0rqHUvOoB2g+0XlIL9yR+r5MwV8lAdL+QD9GJpX7p7pzT4q aT4zzzoS1h9FFUNTDtE7by18bDBElgI= =RxkM -----END PGP SIGNATURE----- Merge tag 'efi-fixes-for-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Ensure that .discard sections are really discarded in the EFI zboot image build - Return proper error numbers from efi-pstore - Add __nocfi annotations to EFI runtime wrappers * tag 'efi-fixes-for-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: Add missing __nocfi annotations to runtime wrappers efi: pstore: Return proper errors on UEFI failures efi/libstub: zboot.lds: Discard .discard sections	2024-06-06 09:39:36 -07:00
Jakub Kicinski	27bc865408	Merge branch 'selftests-net-lib-small-fixes' Matthieu Baerts says: ==================== selftests: net: lib: small fixes While looking at using 'lib.sh' for the MPTCP selftests [1], we found some small issues with 'lib.sh'. Here they are: - Patch 1: fix 'errexit' (set -e) support with busywait. 'errexit' is supported in some functions, not all. A fix for v6.8+. - Patch 2: avoid confusing error messages linked to the cleaning part when the netns setup fails. A fix for v6.8+. - Patch 3: set a variable as local to avoid accidentally changing the value of a another one with the same name on the caller side. A fix for v6.10-rc1+. Link: https://lore.kernel.org/mptcp/5f4615c3-0621-43c5-ad25-55747a4350ce@kernel.org/T/ [1] ==================== Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-0-b3afadd368c9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-06 08:29:07 -07:00
Matthieu Baerts (NGI0)	84a8bc3ec2	selftests: net: lib: set 'i' as local Without this, the 'i' variable declared before could be overridden by accident, e.g. for i in "${@}"; do __ksft_status_merge "${i}" ## 'i' has been modified foo "${i}" ## using 'i' with an unexpected value done After a quick look, it looks like 'i' is currently not used after having been modified in __ksft_status_merge(), but still, better be safe than sorry. I saw this while modifying the same file, not because I suspected an issue somewhere. Fixes: 596c8819cb78 ("selftests: forwarding: Have RET track kselftest framework constants") Acked-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-3-b3afadd368c9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-06 08:29:07 -07:00
Matthieu Baerts (NGI0)	79322174bc	selftests: net: lib: avoid error removing empty netns name If there is an error to create the first netns with 'setup_ns()', 'cleanup_ns()' will be called with an empty string as first parameter. The consequences is that 'cleanup_ns()' will try to delete an invalid netns, and wait 20 seconds if the netns list is empty. Instead of just checking if the name is not empty, convert the string separated by spaces to an array. Manipulating the array is cleaner, and calling 'cleanup_ns()' with an empty array will be a no-op. Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Cc: stable@vger.kernel.org Acked-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-2-b3afadd368c9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-06 08:29:07 -07:00
Matthieu Baerts (NGI0)	41b02ea4c0	selftests: net: lib: support errexit with busywait If errexit is enabled ('set -e'), loopy_wait -- or busywait and others using it -- will stop after the first failure. Note that if the returned status of loopy_wait is checked, and even if errexit is enabled, Bash will not stop at the first error. Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Cc: stable@vger.kernel.org Acked-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-1-b3afadd368c9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-06 08:29:07 -07:00
Paolo Abeni	7493328144	Merge branch 'tcp-small-code-reorg' Eric Dumazet says: ==================== tcp: small code reorg Replace a WARN_ON_ONCE() that never triggered to DEBUG_NET_WARN_ON_ONCE() in reqsk_free(). Move inet_reqsk_alloc() and reqsk_alloc() to inet_connection_sock.c, to unclutter net/ipv4/tcp_input.c and include/net/request_sock.h ==================== Link: https://lore.kernel.org/r/20240605071553.1365557-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 15:18:06 +02:00
Eric Dumazet	6971d21672	tcp: move reqsk_alloc() to inet_connection_sock.c reqsk_alloc() has a single caller, no need to expose it in include/net/request_sock.h. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 15:18:04 +02:00
Eric Dumazet	adbe695a97	tcp: move inet_reqsk_alloc() close to inet_reqsk_clone() inet_reqsk_alloc() does not belong to tcp_input.c, move it to inet_connection_sock.c instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 15:18:04 +02:00
Eric Dumazet	c34506406d	tcp: small changes in reqsk_put() and reqsk_free() In reqsk_free(), use DEBUG_NET_WARN_ON_ONCE() instead of WARN_ON_ONCE() for a condition which never fired. In reqsk_put() directly call __reqsk_free(), there is no point checking rsk_refcnt again right after a transition to zero. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 15:18:04 +02:00
Paolo Abeni	fe300258a5	Merge branch 'mptcp-misc-cleanups' Matthieu Baerts says: ==================== mptcp: misc. cleanups Here is a small collection of miscellaneous cleanups: - Patch 1 uses an MPTCP helper, instead of a TCP one, to do the same thing. - Patch 2 adds a similar MPTCP helper, instead of using a TCP one directly. - Patch 3 uses more appropriated terms in comments. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> ==================== Link: https://lore.kernel.org/r/20240605-upstream-net-next-20240604-misc-cleanup-v1-0-ae2e35c3ecc5@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 15:13:49 +02:00
Davide Caratti	92f74c1e05	mptcp: refer to 'MPTCP' socket in comments We used to call it 'master' socket at the early stages of MPTCP development, but the correct wording is 'MPTCP' socket opposed to 'TCP subflows': convert the last 3 comments to use a more appropriate term. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 15:13:47 +02:00
Geliang Tang	5cdedad62e	mptcp: add mptcp_space_from_win helper As a wrapper of __tcp_space_from_win(), this patch adds a MPTCP dedicated space_from_win helper mptcp_space_from_win() in protocol.h to paired with mptcp_win_from_space(). Use it instead of __tcp_space_from_win() in both mptcp_rcv_space_adjust() and mptcp_set_rcvlowat(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 15:13:47 +02:00
Geliang Tang	5f0d0649c8	mptcp: use mptcp_win_from_space helper The MPTCP dedicated win_from_space helper mptcp_win_from_space() is defined in protocol.h, use it in mptcp_rcv_space_adjust() instead of using the TCP one. Here scaling_ratio is the same as msk->scaling_ratio. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 15:13:47 +02:00
Su Hui	0dcc53abf5	net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool() Clang static checker (scan-build) warning: net/ethtool/ioctl.c:line 2233, column 2 Called function pointer is null (null dereference). Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix this typo error. Fixes: 201ed315f967 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 13:34:33 +02:00
Jason Xing	9b6a30febd	net: allow rps/rfs related configs to be switched After John Sperbeck reported a compile error if the CONFIG_RFS_ACCEL is off, I found that I cannot easily enable/disable the config because of lack of the prompt when using 'make menuconfig'. Therefore, I decided to change rps/rfc related configs altogether. Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240605022932.33703-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 13:18:48 +02:00
Eric Dumazet	b01e1c0307	ipv6: fix possible race in __fib6_drop_pcpu_from() syzbot found a race in __fib6_drop_pcpu_from() [1] If compiler reads more than once (*ppcpu_rt), second read could read NULL, if another cpu clears the value in rt6_get_pcpu_route(). Add a READ_ONCE() to prevent this race. Also add rcu_read_lock()/rcu_read_unlock() because we rely on RCU protection while dereferencing pcpu_rt. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097] CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Workqueue: netns cleanup_net RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984 Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48 RSP: 0018:ffffc900040df070 EFLAGS: 00010206 RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16 RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091 RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007 R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8 R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline] fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline] fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038 fib6_del_route net/ipv6/ip6_fib.c:1998 [inline] fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043 fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205 fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127 fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175 fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255 __fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271 rt6_sync_down_dev net/ipv6/route.c:4906 [inline] rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911 addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855 addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992 call_netdevice_notifiers_extack net/core/dev.c:2030 [inline] call_netdevice_notifiers net/core/dev.c:2044 [inline] dev_close_many+0x333/0x6a0 net/core/dev.c:1585 unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193 unregister_netdevice_many net/core/dev.c:11276 [inline] default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759 ops_exit_list+0x128/0x180 net/core/net_namespace.c:178 cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 13:05:54 +02:00
Paolo Abeni	411c0ea696	Merge branch 'af_unix-fix-lockless-access-of-sk-sk_state-and-others-fields' Kuniyuki Iwashima says: ==================== af_unix: Fix lockless access of sk->sk_state and others fields. The patch 1 fixes a bug where SOCK_DGRAM's sk->sk_state is changed to TCP_CLOSE even if the socket is connect()ed to another socket. The rest of this series annotates lockless accesses to the following fields. * sk->sk_state * sk->sk_sndbuf * net->unx.sysctl_max_dgram_qlen * sk->sk_receive_queue.qlen * sk->sk_shutdown Note that with this series there is skb_queue_empty() left in unix_dgram_disconnected() that needs to be changed to lockless version, and unix_peer(other) access there should be protected by unix_state_lock(). This will require some refactoring, so another series will follow. Changes: v2: * Patch 1: Fix wrong double lock v1: https://lore.kernel.org/netdev/20240603143231.62085-1-kuniyu@amazon.com/ ==================== Link: https://lore.kernel.org/r/20240604165241.44758-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 12:57:18 +02:00
Kuniyuki Iwashima	efaf24e30e	af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill(). While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock(). Let's use READ_ONCE() to read sk->sk_shutdown. Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima	5d915e584d	af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen(). We can dump the socket queue length via UNIX_DIAG by specifying UDIAG_SHOW_RQLEN. If sk->sk_state is TCP_LISTEN, we return the recv queue length, but here we do not hold recvq lock. Let's use skb_queue_len_lockless() in sk_diag_show_rqlen(). Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima	83690b82d2	af_unix: Use skb_queue_empty_lockless() in unix_release_sock(). If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock() checks the length of the peer socket's recvq under unix_state_lock(). However, unix_stream_read_generic() calls skb_unlink() after releasing the lock. Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks skb without unix_state_lock(). Thues, unix_state_lock() does not protect qlen. Let's use skb_queue_empty_lockless() in unix_release_sock(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima	45d872f0e6	af_unix: Use unix_recvq_full_lockless() in unix_stream_connect(). Once sk->sk_state is changed to TCP_LISTEN, it never changes. unix_accept() takes advantage of this characteristics; it does not hold the listener's unix_state_lock() and only acquires recvq lock to pop one skb. It means unix_state_lock() does not prevent the queue length from changing in unix_stream_connect(). Thus, we need to use unix_recvq_full_lockless() to avoid data-race. Now we remove unix_recvq_full() as no one uses it. Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in unix_recvq_full_lockless() because of the following reasons: (1) For SOCK_DGRAM, it is a written-once field in unix_create1() (2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the listener's unix_state_lock() in unix_listen(), and we hold the lock in unix_stream_connect() Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima	bd9f2d0573	af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen. net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be changed concurrently. Let's use READ_ONCE() in unix_create1(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima	b0632e53e0	af_unix: Annotate data-races around sk->sk_sndbuf. sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's not used in af_unix.c. Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(), unix_dgram_sendmsg(), and unix_stream_sendmsg(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima	0aa3be7b3e	af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG. While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read locklessly. Let's use READ_ONCE() there. Note that the result could be inconsistent if the socket is dumped during the state change. This is common for other SOCK_DIAG and similar interfaces. Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report") Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA") Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-06 12:57:15 +02:00

1 2 3 4 5 ...

1280152 Commits