1214693 Commits

Author SHA1 Message Date
Gavrilov Ilia
a613ed1afd ipv4: igmp: Remove redundant comparison in igmp_mcf_get_next()
The 'state->im' value will always be non-zero after
the 'while' statement, so the check can be removed.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230912084039.1501984-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 17:20:17 +02:00
Paolo Abeni
4e519fb4ee Merge branch 'udp-round-of-data-races-fixes'
Eric Dumazet says:

====================
udp: round of data-races fixes

This series is inspired by multiple syzbot reports.

Many udp fields reads or writes are racy.

Add a proper udp->udp_flags and move there all
flags needing atomic safety.

Also add missing READ_ONCE()/WRITE_ONCE() when
lockless readers need access to specific fields.
====================

Link: https://lore.kernel.org/r/20230912091730.1591459-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:42 +02:00
Eric Dumazet
882af43a0f udplite: fix various data-races
udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy.

Move udp->pcflag to udp->udp_flags for atomicity,
and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen.

Fixes: ba4e58eca8aa ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
729549aa35 udplite: remove UDPLITE_BIT
This flag is set but never read, we can remove it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
70a36f5713 udp: annotate data-races around udp->encap_type
syzbot/KCSAN complained about UDP_ENCAP_L2TPINUDP setsockopt() racing.

Add READ_ONCE()/WRITE_ONCE() to document races on this lockless field.

syzbot report was:
BUG: KCSAN: data-race in udp_lib_setsockopt / udp_lib_setsockopt

read-write to 0xffff8881083603fa of 1 bytes by task 16557 on cpu 0:
udp_lib_setsockopt+0x682/0x6c0
udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697
__sys_setsockopt+0x1c9/0x230 net/socket.c:2263
__do_sys_setsockopt net/socket.c:2274 [inline]
__se_sys_setsockopt net/socket.c:2271 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2271
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read-write to 0xffff8881083603fa of 1 bytes by task 16554 on cpu 1:
udp_lib_setsockopt+0x682/0x6c0
udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697
__sys_setsockopt+0x1c9/0x230 net/socket.c:2263
__do_sys_setsockopt net/socket.c:2274 [inline]
__se_sys_setsockopt net/socket.c:2271 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2271
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x01 -> 0x05

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 16554 Comm: syz-executor.5 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
ac9a7f4ce5 udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO
Move udp->encap_enabled to udp->udp_flags.

Add udp_test_and_set_bit() helper to allow lockless
udp_tunnel_encap_enable() implementation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
f5f52f0884 udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flags
These are read locklessly, move them to udp_flags to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
6d5a12eb91 udp: add missing WRITE_ONCE() around up->encap_rcv
UDP_ENCAP_ESPINUDP_NON_IKE setsockopt() writes over up->encap_rcv
while other cpus read it.

Fixes: 067b207b281d ("[UDP]: Cleanup UDP encapsulation code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
e1dc0615c6 udp: move udp->gro_enabled to udp->udp_flags
syzbot reported that udp->gro_enabled can be read locklessly.
Use one atomic bit from udp->udp_flags.

Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
bcbc1b1de8 udp: move udp->no_check6_rx to udp->udp_flags
syzbot reported that udp->no_check6_rx can be read locklessly.
Use one atomic bit from udp->udp_flags.

Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
a0002127cd udp: move udp->no_check6_tx to udp->udp_flags
syzbot reported that udp->no_check6_tx can be read locklessly.
Use one atomic bit from udp->udp_flags

Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
81b36803ac udp: introduce udp->udp_flags
According to syzbot, it is time to use proper atomic flags
for various UDP flags.

Add udp_flags field, and convert udp->corkflag to first
bit in it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Lorenzo Bianconi
486e6ca6b4 net: ethernet: mtk_wed: check update_wo_rx_stats in mtk_wed_update_rx_stats()
Check if update_wo_rx_stats function pointer is properly set in
mtk_wed_update_rx_stats routine before accessing it.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/b0d233386e059bccb59f18f69afb79a7806e5ded.1694507226.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 15:33:00 +02:00
Lorenzo Bianconi
5c33c09c89 net: ethernet: mtk_eth_soc: rely on mtk_pse_port definitions in mtk_flow_set_output_device
Similar to ethernet ports, rely on mtk_pse_port definitions for
pse wdma ports as well.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/b86bdb717e963e3246c1dec5f736c810703cf056.1694506814.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 15:28:25 +02:00
Jiawen Wu
f557524029 net: wangxun: move MDIO bus implementation to the library
Move similar code of accessing MDIO bus from txgbe/ngbe to libwx.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230912031424.721386-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 15:18:40 +02:00
Sieng-Piaw Liew
86565682e9 atl1c: Work around the DMA RX overflow issue
This is based on alx driver commit 881d0327db37 ("net: alx: Work around
the DMA RX overflow issue").

The alx and atl1c drivers had RX overflow error which was why a custom
allocator was created to avoid certain addresses. The simpler workaround
then created for alx driver, but not for atl1c due to lack of tester.

Instead of using a custom allocator, check the allocated skb address and
use skb_reserve() to move away from problematic 0x...fc0 address.

Tested on AR8131 on Acer 4540.

Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
Link: https://lore.kernel.org/r/20230912010711.12036-1-liew.s.piaw@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 10:14:52 +02:00
Paolo Abeni
ea8f505ec5 Merge branch 'vsock-handle-writes-to-shutdowned-socket'
Arseniy Krasnov says:

====================
vsock: handle writes to shutdowned socket

this small patchset adds POSIX compliant behaviour on writes to the
socket which was shutdowned with 'shutdown()' (both sides - local with
SHUT_WR flag, peer - with SHUT_RD flag). According POSIX we must send
SIGPIPE in such cases (but SIGPIPE is not send when MSG_NOSIGNAL is set).

First patch is implemented in the same way as net/ipv4/tcp.c:tcp_sendmsg_locked().
It uses 'sk_stream_error()' function which handles EPIPE error. Another
way is to use code from net/unix/af_unix.c:unix_stream_sendmsg() where
same logic from 'sk_stream_error()' is implemented "from scratch", but
it doesn't check 'sk_err' field. I think error from this field has more
priority to be returned from syscall. So I guess it is better to reuse
currently implemented 'sk_stream_error()' function.

Test is also added.
====================

Link: https://lore.kernel.org/r/20230911202027.1928574-1-avkrasnov@salutedevices.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 08:19:59 +02:00
Arseniy Krasnov
b698bd97c5 test/vsock: shutdowned socket test
This adds two tests for 'shutdown()' call. It checks that SIGPIPE is
sent when MSG_NOSIGNAL is not set and vice versa. Both flags SHUT_WR
and SHUT_RD are tested.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 08:19:55 +02:00
Arseniy Krasnov
8ecf0cedc0 vsock: send SIGPIPE on write to shutdowned socket
POSIX requires to send SIGPIPE on write to SOCK_STREAM socket which was
shutdowned with SHUT_WR flag or its peer was shutdowned with SHUT_RD
flag. Also we must not send SIGPIPE if MSG_NOSIGNAL flag is set.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 08:19:55 +02:00
David S. Miller
ca5ab9638e Merge branch 'selftests-classid'
Pedro Tammela says:

====================
selftests/tc-testing: add tests covering classid

Patches 1-3 add missing tests covering classid behaviour on tdc for cls_fw,
cls_route and cls_fw. This behaviour was recently fixed by valis[0].

Patch 4 comes from the development done in the previous patches as it turns out
cls_route never returns meaningful errors.

[0] https://lore.kernel.org/all/20230729123202.72406-1-jhs@mojatatu.com/

v2->v3: https://lore.kernel.org/all/20230825155148.659895-1-pctammela@mojatatu.com/
   - Added changes that were left in the working tree (Jakub)
   - Fixed two typos in commit message titles
   - Added Victor tags

v1->v2: https://lore.kernel.org/all/20230818163544.351104-1-pctammela@mojatatu.com/
   - Drop u32 updates
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:38:53 +01:00
Pedro Tammela
ef765c2587 net/sched: cls_route: make netlink errors meaningful
Use netlink extended ack and parsing policies to return more meaningful
errors instead of the relying solely on errnos.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:38:52 +01:00
Pedro Tammela
e2f2fb3c35 selftests/tc-testing: cls_u32: add tests for classid
As discussed in '3044b16e7c6f', cls_u32 was handling the use of classid
incorrectly. Add a test to check if it's conforming to the correct
behaviour.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:38:52 +01:00
Pedro Tammela
7c33908361 selftests/tc-testing: cls_route: add tests for classid
As discussed in 'b80b829e9e2c', cls_route was handling the use of classid
incorrectly. Add a test to check if it's conforming to the correct
behaviour.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:38:52 +01:00
Pedro Tammela
70ad43333c selftests/tc-testing: cls_fw: add tests for classid
As discussed in '76e42ae83199', cls_fw was handling the use of classid
incorrectly. Add a few tests to check if it's conforming to the correct
behaviour.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:38:52 +01:00
Andy Gospodarek
a4a09ac64e MAINTAINERS: update tg3 maintainer list
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Pavan Chebbi pavan.chebbi@broadcom.com
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:21:57 +01:00
Christophe JAILLET
9cc91173cf net: hinic: Use devm_kasprintf()
Use devm_kasprintf() instead of hand writing it.
This is less verbose and less error prone.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:21:09 +01:00
David S. Miller
7e6cadf51a Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-09-11 (i40e, iavf)

This series contains updates to i40e and iavf drivers.

Andrii ensures all VSIs are cleaned up for remove in i40e.

Brett reworks logic for setting promiscuous mode that can, currently, cause
incorrect states on iavf.
---
v2:
 - Remove redundant i40e_vsi_free_q_vectors() and kfree() calls (patch 1)

v1: https://lore.kernel.org/netdev/20230905180521.887861-1-anthony.l.nguyen@intel.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 06:50:58 +01:00
Paolo Abeni
8fc8911b66 Merge branch 'tcp-backlog-processing-optims'
Eric Dumazet says:

====================
tcp: backlog processing optims

First patches are mostly preparing the ground for the last one.

Last patch of the series implements sort of ACK reduction
only for the cases a TCP receiver is under high stress,
which happens for high throughput flows.

This gives us a ~20% increase of single TCP flow (100Gbit -> 120Gbit)
====================

Link: https://lore.kernel.org/r/20230911170531.828100-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:04 +02:00
Eric Dumazet
133c4c0d37 tcp: defer regular ACK while processing socket backlog
This idea came after a particular workload requested
the quickack attribute set on routes, and a performance
drop was noticed for large bulk transfers.

For high throughput flows, it is best to use one cpu
running the user thread issuing socket system calls,
and a separate cpu to process incoming packets from BH context.
(With TSO/GRO, bottleneck is usually the 'user' cpu)

Problem is the user thread can spend a lot of time while holding
the socket lock, forcing BH handler to queue most of incoming
packets in the socket backlog.

Whenever the user thread releases the socket lock, it must first
process all accumulated packets in the backlog, potentially
adding latency spikes. Due to flood mitigation, having too many
packets in the backlog increases chance of unexpected drops.

Backlog processing unfortunately shifts a fair amount of cpu cycles
from the BH cpu to the 'user' cpu, thus reducing max throughput.

This patch takes advantage of the backlog processing,
and the fact that ACK are mostly cumulative.

The idea is to detect we are in the backlog processing
and defer all eligible ACK into a single one,
sent from tcp_release_cb().

This saves cpu cycles on both sides, and network resources.

Performance of a single TCP flow on a 200Gbit NIC:

- Throughput is increased by 20% (100Gbit -> 120Gbit).
- Number of generated ACK per second shrinks from 240,000 to 40,000.
- Number of backlog drops per second shrinks from 230 to 0.

Benchmark context:
 - Regular netperf TCP_STREAM (no zerocopy)
 - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
 - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)

This feature is guarded by a new sysctl, and enabled by default:
 /proc/sys/net/ipv4/tcp_backlog_ack_defer

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
Eric Dumazet
4505dc2a52 net: call prot->release_cb() when processing backlog
__sk_flush_backlog() / sk_flush_backlog() are used
when TCP recvmsg()/sendmsg() process large chunks,
to not let packets in the backlog too long.

It makes sense to call tcp_release_cb() to also
process actions held in sk->sk_tsq_flags for smoother
scheduling.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
Eric Dumazet
11445469de net: sock_release_ownership() cleanup
sock_release_ownership() should only be called by user
owning the socket lock.

After prior commit, we can remove one condition.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
Eric Dumazet
b49d252216 tcp: no longer release socket ownership in tcp_release_cb()
This partially reverts c3f9b01849ef ("tcp: tcp_release_cb()
should release socket ownership").

prequeue has been removed by Florian in commit e7942d0633c4
("tcp: remove prequeue support")

__tcp_checksum_complete_user() being gone, we no longer
have to release socket ownership in tcp_release_cb().

This is a prereq for third patch in the series
("net: call prot->release_cb() when processing backlog").

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
Andy Shevchenko
cd8bae8581 wwan: core: Use the bitmap API to allocate bitmaps
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them.
It is less verbose and it improves the type checking and semantic.

While at it, add missing header inclusion (should be bitops.h,
but with the above change it becomes bitmap.h).

Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230911131618.4159437-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 11:49:22 +02:00
Zhengchao Shao
762c8dc7f2 net: dst: remove unnecessary input parameter in dst_alloc and dst_init
Since commit 1202cdd66531("Remove DECnet support from kernel") has been
merged, all callers pass in the initial_ref value of 1 when they call
dst_alloc(). Therefore, remove initial_ref when the dst_alloc() is
declared and replace initial_ref with 1 in dst_alloc().
Also when all callers call dst_init(), the value of initial_ref is 1.
Therefore, remove the input parameter initial_ref of the dst_init() and
replace initial_ref with the value 1 in dst_init.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20230911125045.346390-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 11:42:25 +02:00
Paolo Abeni
be3af13fc1 Merge branch 'add-support-for-icssg-on-am64x-evm'
MD Danish Anwar says:

====================
Add support for ICSSG on AM64x EVM

This series adds support for ICSSG driver on AM64x EVM.

First patch of the series adds compatible for AM64x EVM in icssg-prueth
dt binding. Second patch adds support for AM64x compatible in the ICSSG
driver.

This series addresses comments on [v1] (which was posted as RFC).
This series is based on the latest net-next/main. This series has no
dependency.

Changes from v1 to v2:
*) Made the compatible list in patch 1 alphanumerically ordered as asked
   by Krzysztof.
*) Dropped the RFC tag.
*) Added RB tags of Andrew and Roger.

[v1] https://lore.kernel.org/all/20230830113724.1228624-1-danishanwar@ti.com/
====================

Link: https://lore.kernel.org/r/20230911054308.2163076-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 10:23:54 +02:00
MD Danish Anwar
b256e13378 net: ti: icssg-prueth: Add AM64x icssg support
Add AM64x ICSSG support which is similar to am65x SR2.0, but required:
- all ring configured in exposed ring mode
- always fill both original and buffer fields in cppi5 desc

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 10:23:50 +02:00
MD Danish Anwar
0caab0a46d dt-bindings: net: Add compatible for AM64x in ICSSG
Add compatible for AM64x in icssg-prueth dt bindings. AM64x supports
ICSSG similar to AM65x SR2.0.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 10:23:50 +02:00
Brett Creeley
221465de6b iavf: Fix promiscuous mode configuration flow messages
Currently when configuring promiscuous mode on the AVF we detect a
change in the netdev->flags. We use IFF_PROMISC and IFF_ALLMULTI to
determine whether or not we need to request/release promiscuous mode
and/or multicast promiscuous mode. The problem is that the AQ calls for
setting/clearing promiscuous/multicast mode are treated separately. This
leads to a case where we can trigger two promiscuous mode AQ calls in
a row with the incorrect state. To fix this make a few changes.

Use IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE instead of the previous
IAVF_FLAG_AQ_[REQUEST|RELEASE]_[PROMISC|ALLMULTI] flags.

In iavf_set_rx_mode() detect if there is a change in the
netdev->flags in comparison with adapter->flags and set the
IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE aq_required bit. Then in
iavf_process_aq_command() only check for IAVF_FLAG_CONFIGURE_PROMISC_MODE
and call iavf_set_promiscuous() if it's set.

In iavf_set_promiscuous() check again to see which (if any) promiscuous
mode bits have changed when comparing the netdev->flags with the
adapter->flags. Use this to set the flags which get sent to the PF
driver.

Add a spinlock that is used for updating current_netdev_promisc_flags
and only allows one promiscuous mode AQ at a time.

[1] Fixes the fact that we will only have one AQ call in the aq_required
queue at any one time.

[2] Streamlines the change in promiscuous mode to only set one AQ
required bit.

[3] This allows us to keep track of the current state of the flags and
also makes it so we can take the most recent netdev->flags promiscuous
mode state.

[4] This fixes the problem where a change in the netdev->flags can cause
IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE to be set in iavf_set_rx_mode(),
but cleared in iavf_set_promiscuous() before the change is ever made via
AQ call.

Fixes: 47d3483988f6 ("i40evf: Add driver support for promiscuous mode")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-11 08:53:20 -07:00
Andrii Staikov
5ca636d927 i40e: fix potential memory leaks in i40e_remove()
Instead of freeing memory of a single VSI, make sure
the memory for all VSIs is cleared before releasing VSIs.
Add releasing of their resources in a loop with the iteration
number equal to the number of allocated VSIs.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-11 08:53:08 -07:00
Lorenzo Bianconi
5a124b1fd3 net: ethernet: mtk_eth_soc: fix pse_port configuration for MT7988
MT7988 SoC support 3 NICs. Fix pse_port configuration in
mtk_flow_set_output_device routine if the traffic is offloaded to eth2.
Rely on mtk_pse_port definitions.

Fixes: 88efedf517e6 ("net: ethernet: mtk_eth_soc: enable nft hw flowtable_offload for MT7988 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 10:40:49 +01:00
Daniel Golle
e10a35abb3 net: ethernet: mtk_eth_soc: fix uninitialized variable
Variable dma_addr in function mtk_poll_rx can be uninitialized on
some of the error paths. In practise this doesn't matter, even random
data present in uninitialized stack memory can safely be used in the
way it happens in the error path.

However, in order to make Smatch happy make sure the variable is
always initialized.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 10:37:50 +01:00
Shigeru Yoshida
c821a88bd7 kcm: Fix memory leak in error path of kcm_sendmsg()
syzbot reported a memory leak like below:

BUG: memory leak
unreferenced object 0xffff88810b088c00 (size 240):
  comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s)
  hex dump (first 32 bytes):
    00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634
    [<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline]
    [<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815
    [<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline]
    [<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748
    [<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494
    [<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548
    [<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577
    [<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append
newly allocated skbs to 'head'. If some bytes are copied, an error occurred,
and jumped to out_error label, 'last_skb' is left unmodified. A later
kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the
'head' frag_list and causing the leak.

This patch fixes this issue by properly updating the last allocated skb in
'last_skb'.

Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-and-tested-by: syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccb
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 10:03:08 +01:00
Hayes Wang
a7b8d60b37 r8152: check budget for r8152_poll()
According to the document of napi, there is no rx process when the
budget is 0. Therefore, r8152_poll() has to return 0 directly when the
budget is equal to 0.

Fixes: d2187f8e4454 ("r8152: divide the tx and rx bottom functions")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 09:58:34 +01:00
David S. Miller
904de9858e Merge branch 'sha1105-regressions'
Vladimir Oltean says:

====================
Fixes for SJA1105 DSA FDB regressions

A report by Yanan Yang has prompted an investigation into the sja1105
driver's behavior w.r.t. multicast. The report states that when adding
multicast L2 addresses with "bridge mdb add", only the most recently
added address works - the others seem to be overwritten. This is solved
by patch 3/5 (with patch 2/5 as a dependency for it).

Patches 4/5 and 5/5 fix a series of race conditions introduced during
the same patch set as the bug above, namely this one:
https://patchwork.kernel.org/project/netdevbpf/cover/20211024171757.3753288-1-vladimir.oltean@nxp.com/

Finally, patch 1/5 fixes an issue found ever since the introduction of
multicast forwarding offload in sja1105, which is that the multicast
addresses are visible (with the "self" flag) in "bridge fdb show".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 08:32:30 +01:00
Vladimir Oltean
86899e9e1e net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset
Currently, when we add the first sja1105 port to a bridge with
vlan_filtering 1, then we sometimes see this output:

sja1105 spi2.2: port 4 failed to read back entry for be:79:b4:9e:9e:96 vid 3088: -ENOENT
sja1105 spi2.2: Reset switch and programmed static config. Reason: VLAN filtering
sja1105 spi2.2: port 0 failed to add be:79:b4:9e:9e:96 vid 0 to fdb: -2

It is because sja1105_fdb_add() runs from the dsa_owq which is no longer
serialized with switch resets since it dropped the rtnl_lock() in the
blamed commit.

Either performing the FDB accesses before the reset, or after the reset,
is equally fine, because sja1105_static_fdb_change() backs up those
changes in the static config, but FDB access during reset isn't ok.

Make sja1105_static_config_reload() take the fdb_lock to fix that.

Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 08:32:30 +01:00
Vladimir Oltean
ea32690daf net: dsa: sja1105: serialize sja1105_port_mcast_flood() with other FDB accesses
sja1105_fdb_add() runs from the dsa_owq, and sja1105_port_mcast_flood()
runs from switchdev_deferred_process_work(). Prior to the blamed commit,
they used to be indirectly serialized through the rtnl_lock(), which
no longer holds true because dsa_owq dropped that.

So, it is now possible that we traverse the static config BLK_IDX_L2_LOOKUP
elements concurrently compared to when we change them, in
sja1105_static_fdb_change(). That is not ideal, since it might result in
data corruption.

Introduce a mutex which serializes accesses to the hardware FDB and to
the static config elements for the L2 Address Lookup table.

I can't find a good reason to add locking around sja1105_fdb_dump().
I'll add it later if needed.

Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 08:32:30 +01:00
Vladimir Oltean
7cef293b9a net: dsa: sja1105: fix multicast forwarding working only for last added mdb entry
The commit cited in Fixes: did 2 things: it refactored the read-back
polling from sja1105_dynamic_config_read() into a new function,
sja1105_dynamic_config_wait_complete(), and it called that from
sja1105_dynamic_config_write() too.

What is problematic is the refactoring.

The refactored code from sja1105_dynamic_config_poll_valid() works like
the previous one, but the problem is that it uses another packed_buf[]
SPI buffer, and there was code at the end of sja1105_dynamic_config_read()
which was relying on the read-back packed_buf[]:

	/* Don't dereference possibly NULL pointer - maybe caller
	 * only wanted to see whether the entry existed or not.
	 */
	if (entry)
		ops->entry_packing(packed_buf, entry, UNPACK);

After the change, the packed_buf[] that this code sees is no longer the
entry read back from hardware, but the original entry that the caller
passed to the sja1105_dynamic_config_read(), packed into this buffer.

This difference is the most notable with the SJA1105_SEARCH uses from
sja1105pqrs_fdb_add() - used for both fdb and mdb. There, we have logic
added by commit 728db843df88 ("net: dsa: sja1105: ignore the FDB entry
for unknown multicast when adding a new address") to figure out whether
the address we're trying to add matches on any existing hardware entry,
with the exception of the catch-all multicast address.

That logic was broken, because with sja1105_dynamic_config_read() not
working properly, it doesn't return us the entry read back from
hardware, but the entry that we passed to it. And, since for multicast,
a match will always exist, it will tell us that any mdb entry already
exists at index=0 L2 Address Lookup table. It is index=0 because the
caller doesn't know the index - it wants to find it out, and
sja1105_dynamic_config_read() does:

	if (index < 0) { // SJA1105_SEARCH
		/* Avoid copying a signed negative number to an u64 */
		cmd.index = 0; // <- this
		cmd.search = true;
	} else {
		cmd.index = index;
		cmd.search = false;
	}

So, to the caller of sja1105_dynamic_config_read(), the returned info
looks entirely legit, and it will add all mdb entries to FDB index 0.
There, they will always overwrite each other (not to mention,
potentially they can also overwrite a pre-existing bridge fdb entry),
and the user-visible impact will be that only the last mdb entry will be
forwarded as it should. The others won't (will be flooded or dropped,
depending on the egress flood settings).

Fixing is a bit more complicated, and involves either passing the same
packed_buf[] to sja1105_dynamic_config_wait_complete(), or moving all
the extra processing on the packed_buf[] to
sja1105_dynamic_config_wait_complete(). I've opted for the latter,
because it makes sja1105_dynamic_config_wait_complete() a bit more
self-contained.

Fixes: df405910ab9f ("net: dsa: sja1105: wait for dynamic config command completion on writes too")
Reported-by: Yanan Yang <yanan.yang@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 08:32:30 +01:00
Vladimir Oltean
c956798062 net: dsa: sja1105: propagate exact error code from sja1105_dynamic_config_poll_valid()
Currently, sja1105_dynamic_config_wait_complete() returns either 0 or
-ETIMEDOUT, because it just looks at the read_poll_timeout() return code.

There will be future changes which move some more checks to
sja1105_dynamic_config_poll_valid(). It is important that we propagate
their exact return code (-ENOENT, -EINVAL), because callers of
sja1105_dynamic_config_read() depend on them.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 08:32:30 +01:00
Vladimir Oltean
02c652f546 net: dsa: sja1105: hide all multicast addresses from "bridge fdb show"
Commit 4d9423549501 ("net: dsa: sja1105: offload bridge port flags to
device") has partially hidden some multicast entries from showing up in
the "bridge fdb show" output, but it wasn't enough. Addresses which are
added through "bridge mdb add" still show up. Hide them all.

Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 08:32:30 +01:00
Ciprian Regus
32530dba1b net:ethernet:adi:adin1110: Fix forwarding offload
Currently, when a new fdb entry is added (with both ports of the
ADIN2111 bridged), the driver configures the MAC filters for the wrong
port, which results in the forwarding being done by the host, and not
actually hardware offloaded.

The ADIN2111 offloads the forwarding by setting filters on the
destination MAC address of incoming frames. Based on these, they may be
routed to the other port. Thus, if a frame has to be forwarded from port
1 to port 2, the required configuration for the ADDR_FILT_UPRn register
should set the APPLY2PORT1 bit (instead of APPLY2PORT2, as it's
currently the case).

Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support")
Signed-off-by: Ciprian Regus <ciprian.regus@analog.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 08:30:22 +01:00