linux

iv/linux

Author	SHA1	Message	Date
Alex Elder	dae5d6e8f0	net: ipa: kill the IPA_POWER_FLAG_RESUMED flag The IPA_POWER_FLAG_RESUMED was originally used to avoid calling pm_wakeup_dev_event() more than once when handling a SUSPEND interrupt. This call is no longer made, so there' no need for the flag, so get rid of it. That leaves no more IPA power flags usefully defined, so just get rid of the bitmap in the IPA power structure and the definition of the ipa_power_flag enumerated type. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-02-27 11:24:04 +01:00
Alex Elder	54f19ff767	net: ipa: kill IPA_POWER_FLAG_SYSTEM The SYSTEM IPA power flag is set, cleared, and tested. But nothing happens based on its value when tested, so it serves no purpose. Get rid of this flag. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-02-27 11:24:04 +01:00
Alex Elder	4b2274d381	net: ipa: don't bother aborting system resume The IPA interrupt can fire if there is data to be delivered to a GSI channel that is suspended. This condition occurs in three scenarios. First, runtime power management automatically suspends the IPA hardware after half a second of inactivity. This has nothing to do with system suspend, so a SYSTEM IPA power flag is used to avoid calling pm_wakeup_dev_event() when runtime suspended. Second, if the system is suspended, the receipt of an IPA interrupt should trigger a system resume. Configuring the IPA interrupt for wakeup accomplishes this. Finally, if system suspend is underway and the IPA interrupt fires, we currently call pm_wakeup_dev_event() to abort the system suspend. The IPA driver correctly handles quiescing the hardware before suspending it, so there's really no need to abort a suspend in progress in the third case. We can simply quiesce and suspend things, and be done. Incoming data can still wake the system after it's suspended. The IPA interrupt has wakeup mode enabled, so if it fires after we've suspended, it will trigger a wakeup (if not disabled via sysfs). Stop calling pm_wakeup_dev_event() to abort a system suspend in progress in ipa_power_suspend_handler(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-02-27 11:24:03 +01:00
Heiner Kallweit	b38061fe9c	net: phy: simplify genphy_c45_ethtool_set_eee Simplify the function, no functional change intended. - Remove not needed variable unsupp, I think code is even better readable now. - Move setting phydev->eee_enabled out of the if clause - Simplify return value handling Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/442277c7-7431-4542-80b5-1d3d691714d7@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-02-27 09:07:34 +01:00
Jakub Kicinski	55a7246025	Merge branch 'mptcp-various-small-improvements' Matthieu Baerts says: ==================== mptcp: various small improvements This series brings various small improvements to MPTCP and its selftests: Patch 1 prints an error if there are duplicated subtests names. It is important to have unique (sub)tests names in TAP, because some CI environments drop (sub)tests with duplicated names. Patch 2 is a preparation for patches 3 and 4, which check the protocol in tcp_sk() and mptcp_sk() with DEBUG_NET, only in code from net/mptcp/. We recently had the case where an MPTCP socket was wrongly treated as a TCP one, and fuzzers and static checkers never spot the issue. This would prevent such issues in the future. Patches 5 to 7 are some cleanup for the MPTCP selftests. These patches are not supposed to change the behaviour. Patch 8 sets the poll timeout in diag selftest to the same value as the one used in the other selftests. ==================== Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-0-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:15 -08:00
Geliang Tang	e8ddc5f255	selftests: mptcp: diag: change timeout_poll to 30 Even if it is set to 100ms from the beginning with commit df62f2ec3df6 ("selftests/mptcp: add diag interface tests"), there is no reason not to have it to 30ms like all the other tests. "diag.sh" is not supposed to be slower than the other ones. To maintain consistency with other scripts, this patch changes it to 30. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:12 -08:00
Geliang Tang	8c6f6b4bb5	selftests: mptcp: join: change capture/checksum as bool To maintain consistency with other scripts, this patch changes vars 'capture' and 'checksum' as bool vars in mptcp_join. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-7-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:12 -08:00
Geliang Tang	fccf7c9224	selftests: mptcp: simult flows: define missing vars The variables 'large', 'small', 'sout', 'cout', 'capout' and 'size' are used in multiple functions, so they should be clearly defined as global variables at the top of the file. This patch redefines them at the beginning of simult_flows.sh. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-6-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:12 -08:00
Geliang Tang	488ccbe76c	selftests: mptcp: netlink: drop duplicate var ret The variable 'ret' are defined twice in pm_netlink.sh. This patch drops this duplicate one that has been defined from the beginning, with commit eedbc685321b ("selftests: add PM netlink functional tests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-5-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:12 -08:00
Matthieu Baerts (NGI0)	14d29ec530	mptcp: check the protocol in mptcp_sk() with DEBUG_NET Fuzzers and static checkers might not detect when mptcp_sk() is used with a non mptcp_sock structure. This is similar to the parent commit, where it is easy to use mptcp_sk() with a TCP sock, e.g. with a subflow sk. So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell kernel devs when a non-MPTCP socket is being used as an MPTCP one. 'mptcp_sk()' macro is then defined differently: with an extra WARN to complain when an unexpected socket is being used. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-4-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:12 -08:00
Matthieu Baerts (NGI0)	dcc03f270d	mptcp: check the protocol in tcp_sk() with DEBUG_NET Fuzzers and static checkers might not detect when tcp_sk() is used with a non tcp_sock structure. This kind of mistake already happened a few times with MPTCP: when wrongly using TCP-specific helpers with mptcp_sock pointers. On the other hand, there are many 'tcp_xxx()' helpers that are taking a 'struct sock' pointer as arguments, and some of them are only looking at fields from 'struct sock', and nothing from 'struct tcp_sock'. It is then tempting to use them with a 'struct mptcp_sock'. So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell kernel devs when a non-TCP socket is being used as a TCP one. 'tcp_sk()' macro is then re-defined to add a WARN when an unexpected socket is being used. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-3-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:12 -08:00
Matthieu Baerts (NGI0)	28de50eeb7	mptcp: token kunit: set protocol As it would be done when initiating an MPTCP sock. This is not strictly needed for this test, but it will be when a later patch will check if the right protocol is being used when calling mptcp_sk(). Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-2-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:12 -08:00
Matthieu Baerts (NGI0)	9da7483674	selftests: mptcp: lib: catch duplicated subtest entries It is important to have a unique (sub)test name in TAP, because some CI environments drop tests with duplicated name. When adding a new subtest entry, an error message is printed in case of duplicated entries. If there were duplicated entries and if all features were expected to work, the script exits with an error at the end, after having printed all subtests in the TAP format. Thanks to that, the MPTCP CI will catch such issues early. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-1-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:11 -08:00
Jakub Kicinski	3980cf1655	Merge branch 'mptcp-more-misc-fixes-for-v6-8' Matthieu Baerts says: ==================== mptcp: more misc. fixes for v6.8 This series includes 6 types of fixes: - Patch 1 fixes v4 mapped in v6 addresses support for the userspace PM, when asking to delete a subflow. It was done everywhere else, but not there. Patch 2 validates the modification, thanks to a subtest in mptcp_join.sh. These patches can be backported up to v5.19. - Patch 3 is a small fix for a recent bug-fix patch, just to avoid printing an irrelevant warning (pr_warn()) once. It can be backported up to v5.6, alongside the bug-fix that has been introduced in the v6.8-rc5. - Patches 4 to 6 are fixes for bugs found by Paolo while working on TCP_NOTSENT_LOWAT support for MPTCP. These fixes can improve the performances in some cases. Patches can be backported up to v5.6, v5.11 and v6.7 respectively. - Patch 7 makes sure 'ss -M' is available when starting MPTCP Join selftest as it is required for some subtests since v5.18. - Patch 8 fixes a possible double-free on socket dismantle. The issue always existed, but was unnoticed because it was not causing any problem so far. This fix can be backported up to v5.6. - Patch 9 is a fix for a very recent patch causing lockdep warnings in subflow diag. The patch causing the regression -- which fixes another issue present since v5.7 -- should be part of the future v6.8-rc6. Patch 10 validates the modification, thanks to a new subtest in diag.sh. ==================== Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-0-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:42:05 -08:00
Paolo Abeni	b4b51d36bb	selftests: mptcp: explicitly trigger the listener diag code-path The mptcp diag interface already experienced a few locking bugs that lockdep and appropriate coverage have detected in advance. Let's add a test-case triggering the relevant code path, to prevent similar issues in the future. Be careful to cope with very slow environments. Note that we don't need an explicit timeout on the mptcp_connect subprocess to cope with eventual bug/hang-up as the final cleanup terminating the child processes will take care of that. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:56 -08:00
Paolo Abeni	d6a9608af9	mptcp: fix possible deadlock in subflow diag Syzbot and Eric reported a lockdep splat in the subflow diag: WARNING: possible circular locking dependency detected 6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted syz-executor.2/24141 is trying to acquire lock: ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 but task is already holding lock: ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&h->lhash2[i].lock){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743 inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261 __inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217 inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239 rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316 rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577 ops_init+0x352/0x610 net/core/net_namespace.c:136 __register_pernet_operations net/core/net_namespace.c:1214 [inline] register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283 register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370 rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735 do_one_initcall+0x238/0x830 init/main.c:1236 do_initcall_level+0x157/0x210 init/main.c:1298 do_initcalls+0x3f/0x80 init/main.c:1314 kernel_init_freeable+0x42f/0x5d0 init/main.c:1551 kernel_init+0x1d/0x2a0 init/main.c:1441 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 -> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 lock_sock_fast include/net/sock.h:1723 [inline] subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28 tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345 inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061 __inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263 inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371 netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405 sock_diag_rcv_msg+0xe7/0x410 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 As noted by Eric we can break the lock dependency chain avoid dumping any extended info for the mptcp subflow listener: nothing actually useful is presented there. Fixes: b8adb69a7d29 ("mptcp: fix lockless access in subflow ULP diag") Cc: stable@vger.kernel.org Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/netdev/CANn89iJ=Oecw6OZDwmSYc9HJKQ_G32uN11L+oUcMu+TOD5Xiaw@mail.gmail.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-9-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:56 -08:00
Davide Caratti	10048689de	mptcp: fix double-free on socket dismantle when MPTCP server accepts an incoming connection, it clones its listener socket. However, the pointer to 'inet_opt' for the new socket has the same value as the original one: as a consequence, on program exit it's possible to observe the following splat: BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0 Free of addr ffff888485950880 by task swapper/25/0 CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609 Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013 Call Trace: <IRQ> dump_stack_lvl+0x32/0x50 print_report+0xca/0x620 kasan_report_invalid_free+0x64/0x90 __kasan_slab_free+0x1aa/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 rcu_do_batch+0x34e/0xd90 rcu_core+0x559/0xac0 __do_softirq+0x183/0x5a4 irq_exit_rcu+0x12d/0x170 sysvec_apic_timer_interrupt+0x6b/0x80 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:cpuidle_enter_state+0x175/0x300 Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202 RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000 RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588 RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080 R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0 R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80 cpuidle_enter+0x4a/0xa0 do_idle+0x310/0x410 cpu_startup_entry+0x51/0x60 start_secondary+0x211/0x270 secondary_startup_64_no_verify+0x184/0x18b </TASK> Allocated by task 6853: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0xa6/0xb0 __kmalloc+0x1eb/0x450 cipso_v4_sock_setattr+0x96/0x360 netlbl_sock_setattr+0x132/0x1f0 selinux_netlbl_socket_post_create+0x6c/0x110 selinux_socket_post_create+0x37b/0x7f0 security_socket_post_create+0x63/0xb0 __sock_create+0x305/0x450 __sys_socket_create.part.23+0xbd/0x130 __sys_socket+0x37/0xb0 __x64_sys_socket+0x6f/0xb0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Freed by task 6858: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x12c/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 subflow_ulp_release+0x1f0/0x250 tcp_cleanup_ulp+0x6e/0x110 tcp_v4_destroy_sock+0x5a/0x3a0 inet_csk_destroy_sock+0x135/0x390 tcp_fin+0x416/0x5c0 tcp_data_queue+0x1bc8/0x4310 tcp_rcv_state_process+0x15a3/0x47b0 tcp_v4_do_rcv+0x2c1/0x990 tcp_v4_rcv+0x41fb/0x5ed0 ip_protocol_deliver_rcu+0x6d/0x9f0 ip_local_deliver_finish+0x278/0x360 ip_local_deliver+0x182/0x2c0 ip_rcv+0xb5/0x1c0 __netif_receive_skb_one_core+0x16e/0x1b0 process_backlog+0x1e3/0x650 __napi_poll+0xa6/0x500 net_rx_action+0x740/0xbb0 __do_softirq+0x183/0x5a4 The buggy address belongs to the object at ffff888485950880 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888485950880, ffff8884859508c0) The buggy address belongs to the physical page: page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950 flags: 0x57ffffc0000800(slab\|node=1\|zone=2\|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006 raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix this by duplicating IP / IPv6 options after clone, so that ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice. Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Cc: stable@vger.kernel.org Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:56 -08:00
Geliang Tang	9480f388a2	selftests: mptcp: join: add ss mptcp support check Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP sockets. So it must be checked if ss tool supports MPTCP in this script. Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:56 -08:00
Paolo Abeni	b111d8fbd2	mptcp: fix potential wake-up event loss After the blamed commit below, the send buffer auto-tuning can happen after that the mptcp_propagate_sndbuf() completes - via the delegated action infrastructure. We must check for write space even after such change or we risk missing the wake-up event. Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-6-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:56 -08:00
Paolo Abeni	adf1bb78da	mptcp: fix snd_wnd initialization for passive socket Such value should be inherited from the first subflow, but passive sockets always used 'rsk_rcv_wnd'. Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-5-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:56 -08:00
Paolo Abeni	b9cd26f640	mptcp: push at DSS boundaries when inserting not contiguous data in the subflow write queue, the protocol creates a new skb and prevent the TCP stack from merging it later with already queued skbs by setting the EOR marker. Still no push flag is explicitly set at the end of previous GSO packet, making the aggregation on the receiver side sub-optimal - and packetdrill self-tests less predictable. Explicitly mark the end of not contiguous DSS with the push flag. Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:55 -08:00
Matthieu Baerts (NGI0)	5b49c41ac8	mptcp: avoid printing warning once on client side After the 'Fixes' commit mentioned below, the client side might print the following warning once when a subflow is fully established at the reception of any valid additional ack: MPTCP: bogus mpc option on established client sk That's a normal situation, and no warning should be printed for that. We can then skip the check when the label is used. Fixes: e4a0fa47e816 ("mptcp: corner case locking for rx path fields initialization") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-3-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:55 -08:00
Geliang Tang	7092dbee23	selftests: mptcp: rm subflow with v4/v4mapped addr Now both a v4 address and a v4-mapped address are supported when destroying a userspace pm subflow, this patch adds a second subflow to "userspace pm add & remove address" test, and two subflows could be removed two different ways, one with the v4mapped and one with v4. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 48d73f609dcc ("selftests: mptcp: update userspace pm addr tests") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:55 -08:00
Geliang Tang	535d620ea5	mptcp: map v4 address to v6 when destroying subflow Address family of server side mismatches with that of client side, like in "userspace pm add & remove address" test: userspace_pm_add_addr $ns1 10.0.2.1 10 userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED That's because on the server side, the family is set to AF_INET6 and the v4 address is mapped in a v6 one. This patch fixes this issue. In mptcp_pm_nl_subflow_destroy_doit(), before checking local address family with remote address family, map an IPv4 address to an IPv6 address if the pair is a v4-mapped address. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-1-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:41:55 -08:00
Eric Dumazet	c3718936ec	ipv6: anycast: complete RCU handling of struct ifacaddr6 struct ifacaddr6 are already freed after RCU grace period. Add __rcu qualifier to aca_next pointer, and idev->ac_list Add relevant rcu_assign_pointer() and dereference accessors. ipv6_chk_acast_dev() no longer needs to acquire idev->lock. /proc/net/anycast6 is now purely RCU protected, it no longer acquires idev->lock. Similarly in6_dump_addrs() can use RCU protection to iterate through anycast addresses. It was relying on a mixture of RCU and RTNL but next patches will get rid of RTNL there. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:40:34 -08:00
Varshini Rajendran	5c237967e6	dt-bindings: net: cdns,macb: add sam9x7 ethernet interface Add documentation for sam9x7 ethernet interface. Signed-off-by: Varshini Rajendran <varshini.rajendran@microchip.com> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240223172228.671553-1-varshini.rajendran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:39:56 -08:00
Eric Dumazet	0d60d8df6f	dpll: rely on rcu for netdev_dpll_pin() This fixes a possible UAF in if_nlmsg_size(), which can run without RTNL. Add rcu protection to "struct dpll_pin" Move netdev_dpll_pin() from netdevice.h to dpll.h to decrease name pollution. Note: This looks possible to no longer acquire RTNL in netdev_dpll_pin_assign() later in net-next. v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko) Fixes: 5f1842692880 ("netdev: expose DPLL pin handle for netdevice") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:39:34 -08:00
Breno Leitao	3a25e21230	net/vsockmon: Do not set zeroed statistics Do not set rtnl_link_stats64 fields to zero, since they are zeroed before ops->ndo_get_stats64 is called in core dev_get_stats() function. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240223115839.3572852-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:39:10 -08:00
Breno Leitao	bcd53aff4d	net/vsockmon: Leverage core stats allocator With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the vsockmon driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20240223115839.3572852-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:39:09 -08:00
Oleksij Rempel	0e67899abf	lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected Same as LAN7800, LAN7850 can be used without EEPROM. If EEPROM is not present or not flashed, LAN7850 will fail to sync the speed detected by the PHY with the MAC. In case link speed is 100Mbit, it will accidentally work, otherwise no data can be transferred. Better way would be to implement link_up callback, or set auto speed configuration unconditionally. But this changes would be more intrusive. So, for now, set it only if no EEPROM is found. Fixes: e69647a19c87 ("lan78xx: Set ASD in MAC_CR when EEE is enabled.") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20240222123839.2816561-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-26 18:38:01 -08:00
Linus Torvalds	45ec2f5f6e	Many NAND page layouts have been added to the Marvell NAND controller but could not be used in practice so they are being removed. Regarding the SPI-NAND area, Gigadevice chips were not using the right buffer for an ECC status check operation. Aside from these driver fixes, there is also a refcount fix in the MTD core nodes parsing logic. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmXccMwACgkQJWrqGEe9 VoRSjAf/SdgiZdXz6Fv2uUQYXsHNXO9PnFPSzJ8XNg16g4Jdm/uG0dTIN9W12ZTK QtI2v8+HLtmIzxcGpHEVSzxMTfMbNbB+CibZm+lQFABeqxrhasFHAEDE0Nmo0dNs /Dh5Em6fxxOW/pYUoPQk9zm5RNNg+yr9eQk9FKrv723Wprk6tjHjQQJEpUchtXUz B/qfNSszIQUHPIUtYdNc0A0E1Q4Dr525gBnKFP2lvd6OmjjI35fJ4QvOQuVCQ5b9 Q01y50SojypoFMb+SjiYxFBSnjO3ZECnbf4wvAE0S08tsXUdOfg4ksoiciZbC4mz 6BCw+MooPwi2vUPldeITlAm0Ij0G1A== =0Nf5 -----END PGP SIGNATURE----- Merge tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal: "Many NAND page layouts have been added to the Marvell NAND controller but could not be used in practice so they are being removed. Regarding the SPI-NAND area, Gigadevice chips were not using the right buffer for an ECC status check operation. Aside from these driver fixes, there is also a refcount fix in the MTD core nodes parsing logic" * tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: marvell: fix layouts mtd: Fix possible refcounting issue when going through partition nodes mtd: spinand: gigadevice: Fix the get ecc status issue	2024-02-26 11:06:30 -08:00
Linus Torvalds	b6c1f1ecb3	for-6.8-rc6-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmXcsfAACgkQxWXV+ddt WDt3XA/6AkPT8QNT+mOyp4NjPzquR4UMIPVGGvjWTeKNtjNnco9gPkOBWsHeeDQe aiihh3X2NpNtsduEmqaz717EJW4za9lplGiyPR51H/pTfGfOthWL6Nj+auTPva3t GnlYh+GUQ+44JJ5+biOK5HUpEEeUR87EN2z5lTWsHAxg7PolBiKYKvV4Wp33xJqR ILGlYw04reOAljTn0Zf738IL5WpY9etj1GnNxQeEKFRrdF1GH1i6r/JRONU1hGHu EiZT6XwoN07V+JURB+fPqtY1IXODDC8904OwLg5fKhBggWvR2IaiW1XH+ToFXQgU idae1+Dy85Hi4s40SL5GcSO8mVHPEGEspwM/5G87YqIu3uH4L9+Wd4zTwVYLcwNm SSUCDGj2d+/JIug5dPBV8GL7jrhPNnPOu8HR+bIxY9XUhyf+IZVlUNYlorup3lbm rAouZiCevRhQRBAx33Id5ZOMhlIpPONKObcCEKmdm6WLlnkkqgKQbnapd/I/1mfT nP5N7oWUtfXO4oq4k5XpJBcTVhXU+DzpQ7EMDGv3mSmIem0wsDmXPbF2MfoSIim8 UuToZ1YF5MuxNLGwYnpkUaxWhKKOFWMvAe65eXP+ureIjOJwQ4f85Nkro0JvKbr8 nVdzl3rDy49tnqW7Qu3vaNPOQneuWaOqCoQcYDcVAiqk11UhH9E= =mBP6 -----END PGP SIGNATURE----- Merge tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A more fixes for recently reported or discovered problems: - fix corner case of send that would generate potentially large stream of zeros if there's a hole at the end of the file - fix chunk validation in zoned mode on conventional zones, it was possible to create chunks that would not be allowed on sequential zones - fix validation of dev-replace ioctl filenames - fix KCSAN warnings about access to block reserve struct members" * tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve btrfs: fix data races when accessing the reserved amount of block reserves btrfs: send: don't issue unnecessary zero writes for trailing hole btrfs: dev-replace: properly validate device names btrfs: zoned: don't skip block group profile checks on conventional zones	2024-02-26 11:00:54 -08:00
Mark O'Donovan	c8e314624a	fs/ntfs3: fix build without CONFIG_NTFS3_LZX_XPRESS When CONFIG_NTFS3_LZX_XPRESS is not set then we get the following build error: fs/ntfs3/frecord.c:2460:16: error: unused variable ‘i_size’ Signed-off-by: Mark O'Donovan <shiftee@posteo.net> Fixes: 4fd6c08a16d7 ("fs/ntfs3: Use i_size_read and i_size_write") Tested-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-02-26 09:32:23 -08:00
Mickaël Salaün	d9818b3e90	landlock: Fix asymmetric private inodes referring When linking or renaming a file, if only one of the source or destination directory is backed by an S_PRIVATE inode, then the related set of layer masks would be used as uninitialized by is_access_to_paths_allowed(). This would result to indeterministic access for one side instead of always being allowed. This bug could only be triggered with a mounted filesystem containing both S_PRIVATE and !S_PRIVATE inodes, which doesn't seem possible. The collect_domain_accesses() calls return early if is_nouser_or_private() returns false, which means that the directory's superblock has SB_NOUSER or its inode has S_PRIVATE. Because rename or link actions are only allowed on the same mounted filesystem, the superblock is always the same for both source and destination directories. However, it might be possible in theory to have an S_PRIVATE parent source inode with an !S_PRIVATE parent destination inode, or vice versa. To make sure this case is not an issue, explicitly initialized both set of layer masks to 0, which means to allow all actions on the related side. If at least on side has !S_PRIVATE, then collect_domain_accesses() and is_access_to_paths_allowed() check for the required access rights. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Jann Horn <jannh@google.com> Cc: Shervin Oloumi <enlightened@chromium.org> Cc: stable@vger.kernel.org Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Link: https://lore.kernel.org/r/20240219190345.2928627-1-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>	2024-02-26 18:23:53 +01:00
David S. Miller	25d4342574	Merge branch 'pcs-xpcs-cleanups' Serge Semin says: ==================== net: pcs: xpcs: Cleanups before adding MMIO dev support As stated in the subject this series is a short prequel before submitting the main patches adding the memory-mapped DW XPCS support to the DW XPCS and DW *MAC (STMMAC) drivers. Originally it was a part of the bigger patchset (see the changelog v2 link below) but was detached to a preparation set to shrink down the main series thus simplifying it' review. The patchset' content is straightforward: drop the redundant sentinel entry and the header files; return EINVAL errno from the soft-reset method and make sure that the interface validation method return EINVAL straight away if the requested interface isn't supported by the XPCS device instance. All of these changes are required to simplify the changes being introduced a bit later in the framework of the memory-mapped DW XPCS support patches. Link: https://lore.kernel.org/netdev/20231205103559.9605-1-fancer.lancer@gmail.com Changelog v2: - Move the preparation patches to a separate series. - Simplify the commit messages (@Russell, @Vladimir). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 13:09:09 +00:00
Serge Semin	361dd531a1	net: pcs: xpcs: Explicitly return error on caps validation If an unsupported interface is passed to the PCS validation callback there is no need in further link-modes calculations since the resultant array will be initialized with zeros which will be perceived by the phylink subsystem as error anyway (see phylink_validate_mac_and_pcs()). Instead let's explicitly return the -EINVAL error to inform the caller about the unsupported interface as it's done in the rest of the pcs_validate callbacks. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 13:09:09 +00:00
Serge Semin	f5151005d3	net: pcs: xpcs: Return EINVAL in the internal methods In particular the xpcs_soft_reset() and xpcs_do_config() functions currently return -1 if invalid auto-negotiation mode is specified. That value might be then passed to the generic kernel subsystems which require a standard kernel errno value. Even though the erroneous conditions are very specific (memory corruption or buggy driver implementation) using a hard-coded -1 literal doesn't seem correct anyway especially when it comes to passing it higher to the network subsystem or printing to the system log. Convert the hard-coded error values to -EINVAL then. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 13:09:08 +00:00
Serge Semin	e26802ebd2	net: pcs: xpcs: Drop redundant workqueue.h include directive There is nothing CM workqueue-related in the driver. So the respective include directive can be dropped. While at it add an empty line delimiter between the generic and local path include directives to visually separate them. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 13:09:08 +00:00
Serge Semin	0ffc3292c0	net: pcs: xpcs: Drop sentinel entry from 2500basex ifaces list There are currently only two methods (xpcs_find_compat() and xpcs_get_interfaces()) defined in the driver which loop over the available interfaces. All of them rely on the xpcs_compat::num_interfaces field value to get the total number of supported interfaces. Thus the interface arrays are supposed to be filled with actual interface IDs and there is no need in the dummy terminating ID placed at the end of the arrays. Based on the above drop the PHY_INTERFACE_MODE_MAX entry from the xpcs_2500basex_interfaces array and the PHY_INTERFACE_MODE_MAX-based conditional statement from the xpcs_get_interfaces() method as redundant. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 13:09:08 +00:00
Eric Dumazet	10bfd453da	ipv6: fix potential "struct net" leak in inet6_rtm_getaddr() It seems that if userspace provides a correct IFA_TARGET_NETNSID value but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr() returns -EINVAL with an elevated "struct net" refcount. Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:56:23 +00:00
David S. Miller	5fc3903c46	Merge branch 'rtnetlink-reduce-rtnl-pressure' Eric Dumazet says: ==================== rtnetlink: reduce RTNL pressure for dumps This series restarts the conversion of rtnl dump operations to RCU protection, instead of requiring RTNL. In this new attempt (prior one failed in 2011), I chose to allow a gradual conversion of selected operations. After this series, "ip -6 addr" and "ip -4 ro" no longer need to acquire RTNL. I refrained from changing inet_dump_ifaddr() and inet6_dump_addr() to avoid merge conflicts because of two fixes in net tree. I also started the work for "ip link" future conversion. v2: rtnl_fill_link_ifmap() always emit IFLA_MAP (Jiri Pirko) Added "nexthop: allow nexthop_mpath_fill_node() to be called without RTNL" to avoid a lockdep splat (Ido Schimmel) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:13 +00:00
Eric Dumazet	0ec4e48c3a	rtnetlink: provide RCU protection to rtnl_fill_prop_list() We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. dev->name_node items are already rcu protected. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:13 +00:00
Eric Dumazet	74808e72e0	rtnetlink: make rtnl_fill_link_ifmap() RCU ready Use READ_ONCE() to read the following device fields: dev->mem_start dev->mem_end dev->base_addr dev->irq dev->dma dev->if_port Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:13 +00:00
Eric Dumazet	4ce5dc9316	inet: switch inet_dump_fib() to RCU protection No longer hold RTNL while calling inet_dump_fib(). Also change return value for a completed dump: Returning 0 instead of skb->len allows NLMSG_DONE to be appended to the skb. User space does not have to call us again to get a standalone NLMSG_DONE marker. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:13 +00:00
Eric Dumazet	0ac3fa0c3b	nexthop: allow nexthop_mpath_fill_node() to be called without RTNL nexthop_mpath_fill_node() will be potentially called from contexts holding rcu_lock instead of RTNL. Suggested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:12 +00:00
Eric Dumazet	22e36ea9f5	inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCU Add a new field into struct fib_dump_filter, to let callers tell if they use RTNL locking or RCU. This is used in the following patch, when inet_dump_fib() no longer holds RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:12 +00:00
Eric Dumazet	69fdb7e411	ipv6: switch inet6_dump_ifinfo() to RCU protection No longer hold RTNL while calling inet6_dump_ifinfo() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:12 +00:00
Eric Dumazet	386520e0ec	rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:12 +00:00
Eric Dumazet	e39951d965	rtnetlink: change nlk->cb_mutex role In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it"), Patrick McHardy used a common mutex to protect both nlk->cb and the dump() operations. The override is used for rtnl dumps, registered with rntl_register() and rntl_register_module(). We want to be able to opt-out some dump() operations to not acquire RTNL, so we need to protect nlk->cb with a per socket mutex. This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex The optional pointer to the mutex used to protect dump() call is stored in nlk->dump_cb_mutex Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:12 +00:00
Eric Dumazet	b559027006	netlink: hold nlk->cb_mutex longer in __netlink_dump_start() __netlink_dump_start() releases nlk->cb_mutex right before calling netlink_dump() which grabs it again. This seems dangerous, even if KASAN did not bother yet. Add a @lock_taken parameter to netlink_dump() to let it grab the mutex if called from netlink_recvmsg() only. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-02-26 11:46:12 +00:00

... 4 5 6 7 8 ...

1252683 Commits