71724 Commits

Author SHA1 Message Date
Eyal Birger
ee9a113ab6 xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c
This change allows adding additional files to the xfrm_interface module.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20221203084659.1837829-2-eyal.birger@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-05 21:58:27 -08:00
Kees Cook
e329e71013 NFC: nci: Bounds check struct nfc_target arrays
While running under CONFIG_FORTIFY_SOURCE=y, syzkaller reported:

  memcpy: detected field-spanning write (size 129) of single field "target->sensf_res" at net/nfc/nci/ntf.c:260 (size 18)

This appears to be a legitimate lack of bounds checking in
nci_add_new_protocol(). Add the missing checks.

Reported-by: syzbot+210e196cef4711b65139@syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/0000000000001c590f05ee7b3ff4@google.com
Fixes: 019c4fbaa790 ("NFC: Add NCI multiple targets support")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221202214410.never.693-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-05 17:46:25 -08:00
Sudheer Mogilappagari
7112a04664 ethtool: add netlink based get rss support
Add netlink based support for "ethtool -x <dev> [context x]"
command by implementing ETHTOOL_MSG_RSS_GET netlink message.
This is equivalent to functionality provided via ETHTOOL_GRSSH
in ioctl path. It sends RSS table, hash key and hash function
of an interface to user space.

This patch implements existing functionality available
in ioctl path and enables addition of new RSS context
based parameters in future.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Link: https://lore.kernel.org/r/20221202002555.241580-1-sudheer.mogilappagari@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-05 17:25:00 -08:00
Christian Schoenebeck
a31b3cffbd net/9p: fix response size check in p9_check_errors()
Since commit 60ece0833b6c ("net/9p: allocate appropriate reduced message
buffers") it is no longer appropriate to check server's response size
against msize. Check against the previously allocated buffer capacity
instead.

- Omit this size check entirely for zero-copy messages, as those always
  allocate 4k (P9_ZC_HDR_SZ) linear buffers which are not used for actual
  payload and can be much bigger than 4k.

- Replace p9_debug() by pr_err() to make sure this message is always
  printed in case this error is triggered.

- Add 9p message type to error message to ease investigation.

Link: https://lkml.kernel.org/r/e0edec84b1c80119ae937ce854b4f5f6dbe2d08c.1669144861.git.linux_oss@crudebyte.com
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-12-06 07:31:18 +09:00
Christian Schoenebeck
8e4c2eee1e net/9p: distinguish zero-copy requests
Add boolean `zc` member to struct p9_fcall to distinguish zero-copy
messages (not using the linear `sdata` buffer for message payload) from
regular messages (which do copy message payload to `sdata` before being
further processed).

This new member is appended to end of structure to avoid inserting huge
padding in generated layout.

Link: https://lkml.kernel.org/r/8f2a5c12a446c3b544da64e0b1550e1fb2d6f972.1669144861.git.linux_oss@crudebyte.com
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-12-06 07:30:55 +09:00
David S. Miller
27e521c59e rxrpc io-thread part 3
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmOIsPkACgkQ+7dXa6fL
 C2v9hA//SghS7kndRPwQ1Rf2PtZrc5+RKxhJaztZ0aTxLcS6HF1DMWBn0gL00h02
 noODj1nr59Ptb6NoJ7PON1clx6ZjVSGjyRiYkGM35v2w5GxFNcABg/bvXxS0egI5
 cThFzP/t4rPw7a/jhiOyXkbLnUqxUptgFMxE0gZ8I/xWt9ZcBdS5DNkIOZzfa8xR
 Lo4zxqRCc2Yoeb1o0PjwNnwxMyGAuh8crH0Wuv0YLSqKwZdTq3mITMeqa9ZyqeQX
 2GAdkZ37ER6zDzFCP9c8eTbJQm8OJFY3NP+1NmTBu9VtZEFOPJZrHUMi6zMsQH4U
 9KgufEwwvxXafpm1EtG5zlm2oJm1T57Hbv0YKN1Qr6HjILi/g/8IBarIu4wOrxjo
 x7os1bgbXMmXVIQpXdHJw2tNPl3gspzEN7ysa5y7VnFj/729YlDOFOV7f7yurfsb
 Tqln9A/mvHbSFkK7bJEGD9J+/OWtifsIYrTtJjd/zbDqK4f5PQo3dRM0ALupLFA/
 D8n7FrHOCZrQyKrXLQeFASSYy4iILtMGXs7oJXbbXOBg5OIQfzbTQDSl+68aQoWZ
 qsiM6bguQDDhjMQCzt12rZ7TGXhL7TDIxMRaI9cEdnJLgJg3M+Bvws9+HLCBjTNU
 Pz6Q8RhCrKbvSb3MaiFrpudGzJQPnme9JIgAKXnfXvyWMRRA244=
 =WjuN
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-next-20221201-b' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Increasing SACK size and moving away from softirq, parts 2 & 3

Here are the second and third parts of patches in the process of moving
rxrpc from doing a lot of its stuff in softirq context to doing it in an
I/O thread in process context and thereby making it easier to support a
larger SACK table.

The full description is in the description for the first part[1] which is
already in net-next.

The second part includes some cleanups, adds some testing and overhauls
some tracing:

 (1) Remove declaration of rxrpc_kernel_call_is_complete() as the
     definition is no longer present.

 (2) Remove the knet() and kproto() macros in favour of using tracepoints.

 (3) Remove handling of duplicate packets from recvmsg.  The input side
     isn't now going to insert overlapping/duplicate packets into the
     recvmsg queue.

 (4) Don't use the rxrpc_conn_parameters struct in the rxrpc_connection or
     rxrpc_bundle structs - rather put the members in directly.

 (5) Extract the abort code from a received abort packet right up front
     rather than doing it in multiple places later.

 (6) Use enums and symbol lists rather than __builtin_return_address() to
     indicate where a tracepoint was triggered for local, peer, conn, call
     and skbuff tracing.

 (7) Add a refcount tracepoint for the rxrpc_bundle struct.

 (8) Implement an in-kernel server for the AFS rxperf testing program to
     talk to (enabled by a Kconfig option).

This is tagged as rxrpc-next-20221201-a.

The third part introduces the I/O thread and switches various bits over to
running there:

 (1) Fix call timers and call and connection workqueues to not hold refs on
     the rxrpc_call and rxrpc_connection structs to thereby avoid messy
     cleanup when the last ref is put in softirq mode.

 (2) Split input.c so that the call packet processing bits are separate
     from the received packet distribution bits.  Call packet processing
     gets bumped over to the call event handler.

 (3) Create a per-local endpoint I/O thread.  Barring some tiny bits that
     still get done in softirq context, all packet reception, processing
     and transmission is done in this thread.  That will allow a load of
     locking to be removed.

 (4) Perform packet processing and error processing from the I/O thread.

 (5) Provide a mechanism to process call event notifications in the I/O
     thread rather than queuing a work item for that call.

 (6) Move data and ACK transmission into the I/O thread.  ACKs can then be
     transmitted at the point they're generated rather than getting
     delegated from softirq context to some process context somewhere.

 (7) Move call and local processor event handling into the I/O thread.

 (8) Move cwnd degradation to after packets have been transmitted so that
     they don't shorten the window too quickly.

A bunch of simplifications can then be done:

 (1) The input_lock is no longer necessary as exclusion is achieved by
     running the code in the I/O thread only.

 (2) Don't need to use sk->sk_receive_queue.lock to guard socket state
     changes as the socket mutex should suffice.

 (3) Don't take spinlocks in RCU callback functions as they get run in
     softirq context and thus need _bh annotations.

 (4) RCU is then no longer needed for the peer's error_targets list.

 (5) Simplify the skbuff handling in the receive path by dropping the ref
     in the basic I/O thread loop and getting an extra ref as and when we
     need to queue the packet for recvmsg or another context.

 (6) Get the peer address earlier in the input process and pass it to the
     users so that we only do it once.

This is tagged as rxrpc-next-20221201-b.

Changes:
========
ver #2)
 - Added a patch to change four assertions into warnings in rxrpc_read()
   and fixed a checker warning from a __user annotation that should have
   been removed..
 - Change a min() to min_t() in rxperf as PAGE_SIZE doesn't seem to match
   type size_t on i386.
 - Three error handling issues in rxrpc_new_incoming_call():
   - If not DATA or not seq #1, should drop the packet, not abort.
   - Fix a goto that went to the wrong place, dropping a non-held lock.
   - Fix an rcu_read_lock that should've been an unlock.

Tested-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: kafs-testing+fedora36_64checkkafs-build-144@auristor.com
Link: https://lore.kernel.org/r/166794587113.2389296.16484814996876530222.stgit@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/166982725699.621383.2358362793992993374.stgit@warthog.procyon.org.uk/ # v1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-05 10:58:17 +00:00
Leon Romanovsky
f3da86dc2c xfrm: add support to HW update soft and hard limits
Both in RX and TX, the traffic that performs IPsec packet offload
transformation is accounted by HW. It is needed to properly handle
hard limits that require to drop the packet.

It means that XFRM core needs to update internal counters with the one
that accounted by the HW, so new callbacks are introduced in this patch.

In case of soft or hard limit is occurred, the driver should call to
xfrm_state_check_expire() that will perform key rekeying exactly as
done by XFRM core.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05 10:38:31 +01:00
Leon Romanovsky
3c611d40c6 xfrm: speed-up lookup of HW policies
Devices that implement IPsec packet offload mode should offload SA and
policies too. In RX path, it causes to the situation that HW will always
have higher priority over any SW policies.

It means that we don't need to perform any search of inexact policies
and/or priority checks if HW policy was discovered. In such situation,
the HW will catch the packets anyway and HW can still implement inexact
lookups.

In case specific policy is not found, we will continue with packet lookup and
check for existence of HW policies in inexact list.

HW policies are added to the head of SPD to ensure fast lookup, as XFRM
iterates over all policies in the loop.

The same solution of adding HW SAs at the begging of the list is applied
to SA database too. However, we don't need to change lookups as they are
sorted by insertion order and not priority.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05 10:37:33 +01:00
Leon Romanovsky
f8a70afafc xfrm: add TX datapath support for IPsec packet offload mode
In IPsec packet mode, the device is going to encrypt and encapsulate
packets that are associated with offloaded policy. After successful
policy lookup to indicate if packets should be offloaded or not,
the stack forwards packets to the device to do the magic.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05 10:34:49 +01:00
Leon Romanovsky
919e43fad5 xfrm: add an interface to offload policy
Extend netlink interface to add and delete XFRM policy from the device.
This functionality is a first step to implement packet IPsec offload solution.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05 10:33:13 +01:00
Leon Romanovsky
62f6eca5de xfrm: allow state packet offload mode
Allow users to configure xfrm states with packet offload mode.
The packet mode must be requested both for policy and state, and
such requires us to do not implement fallback.

We explicitly return an error if requested packet mode can't
be configured.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05 10:32:44 +01:00
Leon Romanovsky
d14f28b8c1 xfrm: add new packet offload flag
In the next patches, the xfrm core code will be extended to support
new type of offload - packet offload. In that mode, both policy and state
should be specially configured in order to perform whole offloaded data
path.

Full offload takes care of encryption, decryption, encapsulation and
other operations with headers.

As this mode is new for XFRM policy flow, we can "start fresh" with flag
bits and release first and second bit for future use.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05 10:30:47 +01:00
Wei Yongjun
b3d72d3135 mac802154: fix missing INIT_LIST_HEAD in ieee802154_if_add()
Kernel fault injection test reports null-ptr-deref as follows:

BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:cfg802154_netdev_notifier_call+0x120/0x310 include/linux/list.h:114
Call Trace:
 <TASK>
 raw_notifier_call_chain+0x6d/0xa0 kernel/notifier.c:87
 call_netdevice_notifiers_info+0x6e/0xc0 net/core/dev.c:1944
 unregister_netdevice_many_notify+0x60d/0xcb0 net/core/dev.c:1982
 unregister_netdevice_queue+0x154/0x1a0 net/core/dev.c:10879
 register_netdevice+0x9a8/0xb90 net/core/dev.c:10083
 ieee802154_if_add+0x6ed/0x7e0 net/mac802154/iface.c:659
 ieee802154_register_hw+0x29c/0x330 net/mac802154/main.c:229
 mcr20a_probe+0xaaa/0xcb1 drivers/net/ieee802154/mcr20a.c:1316

ieee802154_if_add() allocates wpan_dev as netdev's private data, but not
init the list in struct wpan_dev. cfg802154_netdev_notifier_call() manage
the list when device register/unregister, and may lead to null-ptr-deref.

Use INIT_LIST_HEAD() on it to initialize it correctly.

Fixes: fcf39e6e88e9 ("ieee802154: add wpan_dev_list")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Alexander Aring <aahringo@redhat.com>

Link: https://lore.kernel.org/r/20221130091705.1831140-1-weiyongjun@huaweicloud.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-12-05 09:53:08 +01:00
Eric Dumazet
0a182f8d60 bpf, sockmap: fix race in sock_map_free()
sock_map_free() calls release_sock(sk) without owning a reference
on the socket. This can cause use-after-free as syzbot found [1]

Jakub Sitnicki already took care of a similar issue
in sock_hash_free() in commit 75e68e5bf2c7 ("bpf, sockhash:
Synchronize delete from bucket list on map free")

[1]
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 0 PID: 3785 at lib/refcount.c:31 refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
Modules linked in:
CPU: 0 PID: 3785 Comm: kworker/u4:6 Not tainted 6.1.0-rc7-syzkaller-00103-gef4d3ea40565 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: events_unbound bpf_map_free_deferred
RIP: 0010:refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31
Code: 68 8b 31 c0 e8 75 71 15 fd 0f 0b e9 64 ff ff ff e8 d9 6e 4e fd c6 05 62 9c 3d 0a 01 48 c7 c7 80 bb 68 8b 31 c0 e8 54 71 15 fd <0f> 0b e9 43 ff ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a2 fe ff
RSP: 0018:ffffc9000456fb60 EFLAGS: 00010246
RAX: eae59bab72dcd700 RBX: 0000000000000004 RCX: ffff8880207057c0
RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
RBP: 0000000000000004 R08: ffffffff816fdabd R09: fffff520008adee5
R10: fffff520008adee5 R11: 1ffff920008adee4 R12: 0000000000000004
R13: dffffc0000000000 R14: ffff88807b1c6c00 R15: 1ffff1100f638dcf
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b30c30000 CR3: 000000000d08e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
__sock_put include/net/sock.h:779 [inline]
tcp_release_cb+0x2d0/0x360 net/ipv4/tcp_output.c:1092
release_sock+0xaf/0x1c0 net/core/sock.c:3468
sock_map_free+0x219/0x2c0 net/core/sock_map.c:356
process_one_work+0x81c/0xd10 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Fixes: 7e81a3530206 ("bpf: Sockmap, ensure sock lock held during tear down")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20221202111640.2745533-1-edumazet@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-04 18:53:51 -08:00
Toke Høiland-Jørgensen
578ce69ffd bpf: Add dummy type reference to nf_conn___init to fix type deduplication
The bpf_ct_set_nat_info() kfunc is defined in the nf_nat.ko module, and
takes as a parameter the nf_conn___init struct, which is allocated through
the bpf_xdp_ct_alloc() helper defined in the nf_conntrack.ko module.
However, because kernel modules can't deduplicate BTF types between each
other, and the nf_conn___init struct is not referenced anywhere in vmlinux
BTF, this leads to two distinct BTF IDs for the same type (one in each
module). This confuses the verifier, as described here:

https://lore.kernel.org/all/87leoh372s.fsf@toke.dk/

As a workaround, add an explicit BTF_TYPE_EMIT for the type in
net/filter.c, so the type definition gets included in vmlinux BTF. This
way, both modules can refer to the same type ID (as they both build on top
of vmlinux BTF), and the verifier is no longer confused.

v2:

- Use BTF_TYPE_EMIT (which is a statement so it has to be inside a function
  definition; use xdp_func_proto() for this, since this is mostly
  xdp-related).

Fixes: 820dc0523e05 ("net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221201123939.696558-1-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-04 18:52:20 -08:00
Heiner Kallweit
d93607082e net: add netdev_sw_irq_coalesce_default_on()
Add a helper for drivers wanting to set SW IRQ coalescing
by default. The related sysfs attributes can be used to
override the default values.

Follow Jakub's suggestion and put this functionality into
net core so that drivers wanting to use software interrupt
coalescing per default don't have to open-code it.

Note that this function needs to be called before the
netdevice is registered.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-03 21:48:36 +00:00
Artem Chernyshev
8948876335 net: dsa: sja1105: Check return value
Return NULL if we got unexpected value from skb_trim_rcsum() in
sja1110_rcv_inband_control_extension()

Fixes: 4913b8ebf8a9 ("net: dsa: add support for the SJA1110 native tagging protocol")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221201140032.26746-3-artem.chernyshev@red-soft.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-02 20:46:52 -08:00
Artem Chernyshev
d4edb50688 net: dsa: hellcreek: Check return value
Return NULL if we got unexpected value from skb_trim_rcsum()
in hellcreek_rcv()

Fixes: 01ef09caad66 ("net: dsa: Add tag handling for Hirschmann Hellcreek switches")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20221201140032.26746-2-artem.chernyshev@red-soft.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-02 20:46:52 -08:00
Artem Chernyshev
3d8fdcbf1f net: dsa: ksz: Check return value
Return NULL if we got unexpected value from skb_trim_rcsum()
in ksz_common_rcv()

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: bafe9ba7d908 ("net: dsa: ksz: Factor out common tag code")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221201140032.26746-1-artem.chernyshev@red-soft.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-02 20:46:51 -08:00
Eric Dumazet
55fb80d518 tcp: use 2-arg optimal variant of kfree_rcu()
kfree_rcu(1-arg) should be avoided as much as possible,
since this is only possible from sleepable contexts,
and incurr extra rcu barriers.

I wish the 1-arg variant of kfree_rcu() would
get a distinct name, like kfree_rcu_slow()
to avoid it being abused.

Fixes: 459837b522f7 ("net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20221202052847.2623997-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-02 20:44:45 -08:00
Jakub Kicinski
edd4e25a23 wireless-next patches for v6.2
Third set of patches for v6.2. mt76 has a new driver for mt7996 Wi-Fi 7
 devices and iwlwifi also got initial Wi-Fi 7 support. Otherwise
 smaller features and fixes.
 
 Major changes:
 
 ath10k
 
 * store WLAN firmware version in SMEM image table
 
 mt76
 
 * mt7996: new driver for MediaTek Wi-Fi 7 (802.11be) devices
 
 * mt7986, mt7915: enable Wireless Ethernet Dispatch (WED) offload support
 
 * mt7915: add ack signal support
 
 * mt7915: enable coredump support
 
 * mt7921: remain_on_channel support
 
 * mt7921: channel context support
 
 iwlwifi
 
 * enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
 
 * 320 MHz channels support
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmOKcMARHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZv3cgf+KjlbxtCZvEIfK+jsd2/VK635ucUdC1d5
 QZB5SCHyVCqTMEsBBw0WCmFdfnqQRQUE9Qe5s0hlwhyrjLP4FQ6/jGTarFvRV43E
 xO8jJd7e4mnVVoQySeKIRfvtYPFKT5GpaDVs4ytfdSs+KYoCE7akMBcvHVO8Fr2M
 MepdqyoJakhRybFUJZMts8W8IsBikv9hdnb2Mr/E32JFLeP6ggs9tKCZKBbpxyXk
 BzfYkDMXffFl95prlmy4rXP223FjvgUuRNWaatseR7S6A/Ik9Xk3B1qv3mtocPZF
 LiTlFtmn3qkgyX5bfm6NRe/2FqgRUYfIrN0XtVw6Sy8WUe1GCf3opA==
 =pkqE
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.2

Third set of patches for v6.2. mt76 has a new driver for mt7996 Wi-Fi 7
devices and iwlwifi also got initial Wi-Fi 7 support. Otherwise
smaller features and fixes.

Major changes:

ath10k
 - store WLAN firmware version in SMEM image table

mt76
 - mt7996: new driver for MediaTek Wi-Fi 7 (802.11be) devices
 - mt7986, mt7915: enable Wireless Ethernet Dispatch (WED) offload support
 - mt7915: add ack signal support
 - mt7915: enable coredump support
 - mt7921: remain_on_channel support
 - mt7921: channel context support

iwlwifi
 - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
 - 320 MHz channels support

* tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (144 commits)
  wifi: ath10k: fix QCOM_SMEM dependency
  wifi: mt76: mt7921e: add pci .shutdown() support
  wifi: mt76: mt7915: mmio: fix naming convention
  wifi: mt76: mt7996: add support to configure spatial reuse parameter set
  wifi: mt76: mt7996: enable ack signal support
  wifi: mt76: mt7996: enable use_cts_prot support
  wifi: mt76: mt7915: rely on band_idx of mt76_phy
  wifi: mt76: mt7915: enable per bandwidth power limit support
  wifi: mt76: mt7915: introduce mt7915_get_power_bound()
  mt76: mt7915: Fix PCI device refcount leak in mt7915_pci_init_hif2()
  wifi: mt76: do not send firmware FW_FEATURE_NON_DL region
  wifi: mt76: mt7921: Add missing __packed annotation of struct mt7921_clc
  wifi: mt76: fix coverity overrun-call in mt76_get_txpower()
  wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices
  wifi: mt76: mt76x0: remove dead code in mt76x0_phy_get_target_power
  wifi: mt76: mt7915: fix band_idx usage
  wifi: mt76: mt7915: enable .sta_set_txpwr support
  wifi: mt76: mt7915: add basedband Txpower info into debugfs
  wifi: mt76: mt7915: add support to configure spatial reuse parameter set
  wifi: mt76: mt7915: add missing MODULE_PARM_DESC
  ...
====================

Link: https://lore.kernel.org/r/20221202214254.D0D3DC433C1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-02 20:33:30 -08:00
Luiz Augusto von Dentz
b5ca338751 Bluetooth: Fix crash when replugging CSR fake controllers
It seems fake CSR 5.0 clones can cause the suspend notifier to be
registered twice causing the following kernel panic:

[   71.986122] Call Trace:
[   71.986124]  <TASK>
[   71.986125]  blocking_notifier_chain_register+0x33/0x60
[   71.986130]  hci_register_dev+0x316/0x3d0 [bluetooth 99b5497ea3d09708fa1366c1dc03288bf3cca8da]
[   71.986154]  btusb_probe+0x979/0xd85 [btusb e1e0605a4f4c01984a4b9c8ac58c3666ae287477]
[   71.986159]  ? __pm_runtime_set_status+0x1a9/0x300
[   71.986162]  ? ktime_get_mono_fast_ns+0x3e/0x90
[   71.986167]  usb_probe_interface+0xe3/0x2b0
[   71.986171]  really_probe+0xdb/0x380
[   71.986174]  ? pm_runtime_barrier+0x54/0x90
[   71.986177]  __driver_probe_device+0x78/0x170
[   71.986180]  driver_probe_device+0x1f/0x90
[   71.986183]  __device_attach_driver+0x89/0x110
[   71.986186]  ? driver_allows_async_probing+0x70/0x70
[   71.986189]  bus_for_each_drv+0x8c/0xe0
[   71.986192]  __device_attach+0xb2/0x1e0
[   71.986195]  bus_probe_device+0x92/0xb0
[   71.986198]  device_add+0x422/0x9a0
[   71.986201]  ? sysfs_merge_group+0xd4/0x110
[   71.986205]  usb_set_configuration+0x57a/0x820
[   71.986208]  usb_generic_driver_probe+0x4f/0x70
[   71.986211]  usb_probe_device+0x3a/0x110
[   71.986213]  really_probe+0xdb/0x380
[   71.986216]  ? pm_runtime_barrier+0x54/0x90
[   71.986219]  __driver_probe_device+0x78/0x170
[   71.986221]  driver_probe_device+0x1f/0x90
[   71.986224]  __device_attach_driver+0x89/0x110
[   71.986227]  ? driver_allows_async_probing+0x70/0x70
[   71.986230]  bus_for_each_drv+0x8c/0xe0
[   71.986232]  __device_attach+0xb2/0x1e0
[   71.986235]  bus_probe_device+0x92/0xb0
[   71.986237]  device_add+0x422/0x9a0
[   71.986239]  ? _dev_info+0x7d/0x98
[   71.986242]  ? blake2s_update+0x4c/0xc0
[   71.986246]  usb_new_device.cold+0x148/0x36d
[   71.986250]  hub_event+0xa8a/0x1910
[   71.986255]  process_one_work+0x1c4/0x380
[   71.986259]  worker_thread+0x51/0x390
[   71.986262]  ? rescuer_thread+0x3b0/0x3b0
[   71.986264]  kthread+0xdb/0x110
[   71.986266]  ? kthread_complete_and_exit+0x20/0x20
[   71.986268]  ret_from_fork+0x1f/0x30
[   71.986273]  </TASK>
[   71.986274] ---[ end trace 0000000000000000 ]---
[   71.986284] btusb: probe of 2-1.6:1.0 failed with error -17

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216683
Cc: stable@vger.kernel.org
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Leonardo Eugênio <lelgenio@disroot.org>
2022-12-02 13:22:56 -08:00
Chen Zhongjin
2f3957c7eb Bluetooth: Fix not cleanup led when bt_init fails
bt_init() calls bt_leds_init() to register led, but if it fails later,
bt_leds_cleanup() is not called to unregister it.

This can cause panic if the argument "bluetooth-power" in text is freed
and then another led_trigger_register() tries to access it:

BUG: unable to handle page fault for address: ffffffffc06d3bc0
RIP: 0010:strcmp+0xc/0x30
  Call Trace:
    <TASK>
    led_trigger_register+0x10d/0x4f0
    led_trigger_register_simple+0x7d/0x100
    bt_init+0x39/0xf7 [bluetooth]
    do_one_initcall+0xd0/0x4e0

Fixes: e64c97b53bc6 ("Bluetooth: Add combined LED trigger for controller power")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:31 -08:00
Chethan T N
828cea2b71 Bluetooth: Fix support for Read Local Supported Codecs V2
Handling of Read Local Supported Codecs was broken during the
HCI serialization design change patches.

Fixes: d0b137062b2d ("Bluetooth: hci_sync: Rework init stages")
Signed-off-by: Chethan T N <chethan.tumkur.narayan@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:31 -08:00
Sungwoo Kim
bcd70260ef Bluetooth: L2CAP: Fix u8 overflow
By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases
multiple times and eventually it will wrap around the maximum number
(i.e., 255).
This patch prevents this by adding a boundary check with
L2CAP_MAX_CONF_RSP

Btmon log:
Bluetooth monitor ver 5.64
= Note: Linux version 6.1.0-rc2 (x86_64)                               0.264594
= Note: Bluetooth subsystem version 2.22                               0.264636
@ MGMT Open: btmon (privileged) version 1.22                  {0x0001} 0.272191
= New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0)          [hci0] 13.877604
@ RAW Open: 9496 (privileged) version 2.22                   {0x0002} 13.890741
= Open Index: 00:00:00:00:00:00                                [hci0] 13.900426
(...)
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #32 [hci0] 14.273106
        invalid packet size (12 != 1033)
        08 00 01 00 02 01 04 00 01 10 ff ff              ............
> ACL Data RX: Handle 200 flags 0x00 dlen 1547             #33 [hci0] 14.273561
        invalid packet size (14 != 1547)
        0a 00 01 00 04 01 06 00 40 00 00 00 00 00        ........@.....
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #34 [hci0] 14.274390
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04  ........@.......
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #35 [hci0] 14.274932
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00  ........@.......
= bluetoothd: Bluetooth daemon 5.43                                   14.401828
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #36 [hci0] 14.275753
        invalid packet size (12 != 1033)
        08 00 01 00 04 01 04 00 40 00 00 00              ........@...

Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:30 -08:00
Mateusz Jończyk
696bd36221 Bluetooth: silence a dmesg error message in hci_request.c
On kernel 6.1-rcX, I have been getting the following dmesg error message
on every boot, resume from suspend and rfkill unblock of the Bluetooth
device:

	Bluetooth: hci0: HCI_REQ-0xfcf0

After some investigation, it turned out to be caused by
commit dd50a864ffae ("Bluetooth: Delete unreferenced hci_request code")
which modified hci_req_add() in net/bluetooth/hci_request.c to always
print an error message when it is executed. In my case, the function was
executed by msft_set_filter_enable() in net/bluetooth/msft.c, which
provides support for Microsoft vendor opcodes.

As explained by Brian Gix, "the error gets logged because it is using a
deprecated (but still working) mechanism to issue HCI opcodes" [1]. So
this is just a debugging tool to show that a deprecated function is
executed. As such, it should not be included in the mainline kernel.
See for example
commit 771c035372a0 ("deprecate the '__deprecated' attribute warnings entirely and for good")
Additionally, this error message is cryptic and the user is not able to
do anything about it.

[1]
Link: https://lore.kernel.org/lkml/beb8dcdc3aee4c5c833aa382f35995f17e7961a1.camel@intel.com/

Fixes: dd50a864ffae ("Bluetooth: Delete unreferenced hci_request code")
Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Cc: Brian Gix <brian.gix@intel.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:30 -08:00
Wang ShaoBo
7e7df2c10c Bluetooth: hci_conn: add missing hci_dev_put() in iso_listen_bis()
hci_get_route() takes reference, we should use hci_dev_put() to release
it when not need anymore.

Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support")
Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:30 -08:00
Wang ShaoBo
747da1308b Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn()
hci_get_route() takes reference, we should use hci_dev_put() to release
it when not need anymore.

Fixes: 6b8d4a6a0314 ("Bluetooth: 6LoWPAN: Use connected oriented channel instead of fixed one")
Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:30 -08:00
Ismael Ferreras Morezuelas
42d7731e3e Bluetooth: btusb: Fix CSR clones again by re-adding ERR_DATA_REPORTING quirk
A patch series by a Qualcomm engineer essentially removed my
quirk/workaround because they thought it was unnecessary.

It wasn't, and it broke everything again:

https://patchwork.kernel.org/project/netdevbpf/list/?series=661703&archive=both&state=*

He argues that the quirk is not necessary because the code should check
if the dongle says if it's supported or not. The problem is that for
these Chinese CSR clones they say that it would work:

= New Index: 00:00:00:00:00:00 (Primary,USB,hci0)
= Open Index: 00:00:00:00:00:00
< HCI Command: Read Local Version Information (0x04|0x0001) plen 0
> HCI Event: Command Complete (0x0e) plen 12
> [hci0] 11.276039
      Read Local Version Information (0x04|0x0001) ncmd 1
        Status: Success (0x00)
        HCI version: Bluetooth 5.0 (0x09) - Revision 2064 (0x0810)
        LMP version: Bluetooth 5.0 (0x09) - Subversion 8978 (0x2312)
        Manufacturer: Cambridge Silicon Radio (10)
...
< HCI Command: Read Local Supported Features (0x04|0x0003) plen 0
> HCI Event: Command Complete (0x0e) plen 68
> [hci0] 11.668030
      Read Local Supported Commands (0x04|0x0002) ncmd 1
        Status: Success (0x00)
        Commands: 163 entries
          ...
          Read Default Erroneous Data Reporting (Octet 18 - Bit 2)
          Write Default Erroneous Data Reporting (Octet 18 - Bit 3)
          ...
...
< HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0
= Close Index: 00:1A:7D:DA:71:XX

So bring it back wholesale.

Fixes: 63b1a7dd38bf ("Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING")
Fixes: e168f6900877 ("Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR")
Fixes: 766ae2422b43 ("Bluetooth: hci_sync: Check LMP feature bit instead of quirk")
Cc: stable@vger.kernel.org
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Tested-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com>
Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02 13:09:30 -08:00
Dominique Martinet
f15e006b83 9p/xen: do not memcpy header into req->rc
while 'h' is packed and can be assumed to match the request payload,
req->rc is a struct p9_fcall which is not packed and that memcpy
could be wrong.

Fix this by copying each fields individually instead.

Reported-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Suggested-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2211211454540.1049131@ubuntu-linux-20-04-desktop
Link: https://lkml.kernel.org/r/20221122001025.119121-1-asmadeus@codewreck.org
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-12-03 00:04:37 +09:00
Schspa Shi
26273ade77 9p: set req refcount to zero to avoid uninitialized usage
When a new request is allocated, the refcount will be zero if it is
reused, but if the request is newly allocated from slab, it is not fully
initialized before being added to idr.

If the p9_read_work got a response before the refcount initiated. It will
use a uninitialized req, which will result in a bad request data struct.

Here is the logs from syzbot.

Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
0x00 0x00 . . . . . . . . ] (in kfence-#110):
 p9_fcall_fini net/9p/client.c:248 [inline]
 p9_req_put net/9p/client.c:396 [inline]
 p9_req_put+0x208/0x250 net/9p/client.c:390
 p9_client_walk+0x247/0x540 net/9p/client.c:1165
 clone_fid fs/9p/fid.h:21 [inline]
 v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
 v9fs_xattr_set fs/9p/xattr.c:100 [inline]
 v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
 __vfs_setxattr+0x119/0x180 fs/xattr.c:182
 __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
 __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
 vfs_setxattr+0x143/0x340 fs/xattr.c:309
 setxattr+0x146/0x160 fs/xattr.c:617
 path_setxattr+0x197/0x1c0 fs/xattr.c:636
 __do_sys_setxattr fs/xattr.c:652 [inline]
 __se_sys_setxattr fs/xattr.c:648 [inline]
 __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
 __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
 do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
 entry_SYSENTER_compat_after_hwframe+0x70/0x82

Below is a similar scenario, the scenario in the syzbot log looks more
complicated than this one, but this patch can fix it.

     T21124                   p9_read_work
======================== second trans =================================
p9_client_walk
  p9_client_rpc
    p9_client_prepare_req
      p9_tag_alloc
        req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
        tag = idr_alloc
        << preempted >>
        req->tc.tag = tag;
                            /* req->[refcount/tag] == uninitialized */
                            m->rreq = p9_tag_lookup(m->client, m->rc.tag);
                              /* increments uninitalized refcount */

        refcount_set(&req->refcount, 2);
                            /* cb drops one ref */
                            p9_client_cb(req)
                            /* reader thread drops its ref:
                               request is incorrectly freed */
                            p9_req_put(req)
    /* use after free and ref underflow */
    p9_req_put(req)

To fix it, we can initialize the refcount to zero before add to idr.

Link: https://lkml.kernel.org/r/20221201033310.18589-1-schspa@gmail.com
Cc: stable@vger.kernel.org # 6.0+ due to 6cda12864cb0 ("9p: Drop kref usage")
Fixes: 728356dedeff ("9p: Add refcount to p9_req_t")
Reported-by: syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-12-02 23:59:20 +09:00
Christophe JAILLET
31fff92c9c 9p/net: Remove unneeded idr.h #include
The 9p net files don't use IDR or IDA functionalities. So there is no point
in including <linux/idr.h>.
Remove it.

Link: https://lkml.kernel.org/r/9e386018601d7e4a9e5d7da8fc3e9555ebb25c87.1669560387.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-12-02 23:59:20 +09:00
Jiri Pirko
47b438cc27 net: devlink: convert port_list into xarray
Some devlink instances may contain thousands of ports. Storing them in
linked list and looking them up is not scalable. Convert the linked list
into xarray.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02 10:37:03 +00:00
Sebastian Andrzej Siewior
20d3c1e9b8 hsr: Use a single struct for self_node.
self_node_db is a list_head with one entry of struct hsr_node. The
purpose is to hold the two MAC addresses of the node itself.
It is convenient to recycle the structure. However having a list_head
and fetching always the first entry is not really optimal.

Created a new data strucure contaning the two MAC addresses named
hsr_self_node. Access that structure like an RCU protected pointer so
it can be replaced on the fly without blocking the reader.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:26:22 -08:00
Sebastian Andrzej Siewior
5c7aa13210 hsr: Synchronize sequence number updates.
hsr_register_frame_out() compares new sequence_nr vs the old one
recorded in hsr_node::seq_out and if the new sequence_nr is higher then
it will be written to hsr_node::seq_out as the new value.

This operation isn't locked so it is possible that two frames with the
same sequence number arrive (via the two slave devices) and are fed to
hsr_register_frame_out() at the same time. Both will pass the check and
update the sequence counter later to the same value. As a result the
content of the same packet is fed into the stack twice.

This was noticed by running ping and observing DUP being reported from
time to time.

Instead of using the hsr_priv::seqnr_lock for the whole receive path (as
it is for sending in the master node) add an additional lock that is only
used for sequence number checks and updates.

Add a per-node lock that is used during sequence number reads and
updates.

Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:26:21 -08:00
Sebastian Andrzej Siewior
06afd2c31d hsr: Synchronize sending frames to have always incremented outgoing seq nr.
Sending frames via the hsr (master) device requires a sequence number
which is tracked in hsr_priv::sequence_nr and protected by
hsr_priv::seqnr_lock. Each time a new frame is sent, it will obtain a
new id and then send it via the slave devices.
Each time a packet is sent (via hsr_forward_do()) the sequence number is
checked via hsr_register_frame_out() to ensure that a frame is not
handled twice. This make sense for the receiving side to ensure that the
frame is not injected into the stack twice after it has been received
from both slave ports.

There is no locking to cover the sending path which means the following
scenario is possible:

  CPU0				CPU1
  hsr_dev_xmit(skb1)		hsr_dev_xmit(skb2)
   fill_frame_info()             fill_frame_info()
    hsr_fill_frame_info()         hsr_fill_frame_info()
     handle_std_frame()            handle_std_frame()
      skb1's sequence_nr = 1
                                    skb2's sequence_nr = 2
   hsr_forward_do()              hsr_forward_do()

                                   hsr_register_frame_out(, 2)  // okay, send)

    hsr_register_frame_out(, 1) // stop, lower seq duplicate

Both skbs (or their struct hsr_frame_info) received an unique id.
However since skb2 was sent before skb1, the higher sequence number was
recorded in hsr_register_frame_out() and the late arriving skb1 was
dropped and never sent.

This scenario has been observed in a three node HSR setup, with node1 +
node2 having ping and iperf running in parallel. From time to time ping
reported a missing packet. Based on tracing that missing ping packet did
not leave the system.

It might be possible (didn't check) to drop the sequence number check on
the sending side. But if the higher sequence number leaves on wire
before the lower does and the destination receives them in that order
and it will drop the packet with the lower sequence number and never
inject into the stack.
Therefore it seems the only way is to lock the whole path from obtaining
the sequence number and sending via dev_queue_xmit() and assuming the
packets leave on wire in the same order (and don't get reordered by the
NIC).

Cover the whole path for the master interface from obtaining the ID
until after it has been forwarded via hsr_forward_skb() to ensure the
skbs are sent to the NIC in the order of the assigned sequence numbers.

Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:26:21 -08:00
Sebastian Andrzej Siewior
d5c7652eb1 hsr: Disable netpoll.
The hsr device is a software device. Its
net_device_ops::ndo_start_xmit() routine will process the packet and
then pass the resulting skb to dev_queue_xmit().
During processing, hsr acquires a lock with spin_lock_bh()
(hsr_add_node()) which needs to be promoted to the _irq() suffix in
order to avoid a potential deadlock.
Then there are the warnings in dev_queue_xmit() (due to
local_bh_disable() with disabled interrupts) left.

Instead trying to address those (there is qdisc and…) for netpoll sake,
just disable netpoll on hsr.

Disable netpoll on hsr and replace the _irqsave() locking with _bh().

Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:26:21 -08:00
Sebastian Andrzej Siewior
0c74d9f79e hsr: Avoid double remove of a node.
Due to the hashed-MAC optimisation one problem become visible:
hsr_handle_sup_frame() walks over the list of available nodes and merges
two node entries into one if based on the information in the supervision
both MAC addresses belong to one node. The list-walk happens on a RCU
protected list and delete operation happens under a lock.

If the supervision arrives on both slave interfaces at the same time
then this delete operation can occur simultaneously on two CPUs. The
result is the first-CPU deletes the from the list and the second CPUs
BUGs while attempting to dereference a poisoned list-entry. This happens
more likely with the optimisation because a new node for the mac_B entry
is created once a packet has been received and removed (merged) once the
supervision frame has been received.

Avoid removing/ cleaning up a hsr_node twice by adding a `removed' field
which is set to true after the removal and checked before the removal.

Fixes: f266a683a4804 ("net/hsr: Better frame dispatch")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:26:21 -08:00
Sebastian Andrzej Siewior
5aa2820177 hsr: Add a rcu-read lock to hsr_forward_skb().
hsr_forward_skb() a skb and keeps information in an on-stack
hsr_frame_info. hsr_get_node() assigns hsr_frame_info::node_src which is
from a RCU list. This pointer is used later in hsr_forward_do().
I don't see a reason why this pointer can't vanish midway since there is
no guarantee that hsr_forward_skb() is invoked from an RCU read section.

Use rcu_read_lock() to protect hsr_frame_info::node_src from its
assignment until it is no longer used.

Fixes: f266a683a4804 ("net/hsr: Better frame dispatch")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:26:21 -08:00
Sebastian Andrzej Siewior
e012764ceb Revert "net: hsr: use hlist_head instead of list_head for mac addresses"
The hlist optimisation (which not only uses hlist_head instead of
list_head but also splits hsr_priv::node_db into an array of 256 slots)
does not consider the "node merge":
Upon starting the hsr network (with three nodes) a packet that is
sent from node1 to node3 will also be sent from node1 to node2 and then
forwarded to node3.
As a result node3 will receive 2 packets because it is not able
to filter out the duplicate. Each packet received will create a new
struct hsr_node with macaddress_A only set the MAC address it received
from (the two MAC addesses from node1).
At some point (early in the process) two supervision frames will be
received from node1. They will be processed by hsr_handle_sup_frame()
and one frame will leave early ("Node has already been merged") and does
nothing. The other frame will be merged as portB and have its MAC
address written to macaddress_B and the hsr_node (that was created for
it as macaddress_A) will be removed.
From now on HSR is able to identify a duplicate because both packets
sent from one node will result in the same struct hsr_node because
hsr_get_node() will find the MAC address either on macaddress_A or
macaddress_B.

Things get tricky with the optimisation: If sender's MAC address is
saved as macaddress_A then the lookup will work as usual. If the MAC
address has been merged into macaddress_B of another hsr_node then the
lookup won't work because it is likely that the data structure is in
another bucket. This results in creating a new struct hsr_node and not
recognising a possible duplicate.

A way around it would be to add another hsr_node::mac_list_B and attach
it to the other bucket to ensure that this hsr_node will be looked up
either via macaddress_A _or_ macaddress_B.

I however prefer to revert it because it sounds like an academic problem
rather than real life workload plus it adds complexity. I'm not an HSR
expert with what is usual size of a network but I would guess 40 to 60
nodes. With 10.000 nodes and assuming 60us for pass-through (from node
to node) then it would take almost 600ms for a packet to almost wrap
around which sounds a lot.

Revert the hash MAC addresses optimisation.

Fixes: 4acc45db71158 ("net: hsr: use hlist_head instead of list_head for mac addresses")
Cc: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:26:20 -08:00
Xin Long
7d802c8098 sctp: delete free member from struct sctp_sched_ops
After commit 9ed7bfc79542 ("sctp: fix memory leak in
sctp_stream_outq_migrate()"), sctp_sched_set_sched() is the only
place calling sched->free(), and it can actually be replaced by
sched->free_sid() on each stream, and yet there's already a loop
to traverse all streams in sctp_sched_set_sched().

This patch adds a function sctp_sched_free_sched() where it calls
sched->free_sid() for each stream to replace sched->free() calls
in sctp_sched_set_sched() and then deletes the unused free member
from struct sctp_sched_ops.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/e10aac150aca2686cb0bd0570299ec716da5a5c0.1669849471.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:14:23 -08:00
Geliang Tang
f8c9dfbd87 mptcp: add pm listener events
This patch adds two new MPTCP netlink event types for PM listening
socket create and close, named MPTCP_EVENT_LISTENER_CREATED and
MPTCP_EVENT_LISTENER_CLOSED.

Add a new function mptcp_event_pm_listener() to push the new events
with family, port and addr to userspace.

Invoke mptcp_event_pm_listener() with MPTCP_EVENT_LISTENER_CREATED in
mptcp_listen() and mptcp_pm_nl_create_listen_socket(), invoke it with
MPTCP_EVENT_LISTENER_CLOSED in __mptcp_close_ssk().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 20:06:06 -08:00
Dmitry Safonov
c5b8b515a2 net/tcp: Separate initialization of twsk
Convert BUG_ON() to WARN_ON_ONCE() and warn as well for unlikely
static key int overflow error-path.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 15:53:05 -08:00
Dmitry Safonov
b389d1affc net/tcp: Do cleanup on tcp_md5_key_copy() failure
If the kernel was short on (atomic) memory and failed to allocate it -
don't proceed to creation of request socket. Otherwise the socket would
be unsigned and userspace likely doesn't expect that the TCP is not
MD5-signed anymore.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 15:53:05 -08:00
Dmitry Safonov
459837b522 net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction
To do that, separate two scenarios:
- where it's the first MD5 key on the system, which means that enabling
  of the static key may need to sleep;
- copying of an existing key from a listening socket to the request
  socket upon receiving a signed TCP segment, where static key was
  already enabled (when the key was added to the listening socket).

Now the life-time of the static branch for TCP-MD5 is until:
- last tcp_md5sig_info is destroyed
- last socket in time-wait state with MD5 key is closed.

Which means that after all sockets with TCP-MD5 keys are gone, the
system gets back the performance of disabled md5-key static branch.

While at here, provide static_key_fast_inc() helper that does ref
counter increment in atomic fashion (without grabbing cpus_read_lock()
on CONFIG_JUMP_LABEL=y). This is needed to add a new user for
a static_key when the caller controls the lifetime of another user.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 15:53:05 -08:00
Dmitry Safonov
f62c7517ff net/tcp: Separate tcp_md5sig_info allocation into tcp_md5sig_info_add()
Add a helper to allocate tcp_md5sig_info, that will help later to
do/allocate things when info allocated, once per socket.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01 15:53:05 -08:00
Felix Fietkau
94b9b9de05 wifi: mac80211: fix and simplify unencrypted drop check for mesh
ieee80211_drop_unencrypted is called from ieee80211_rx_h_mesh_fwding and
ieee80211_frame_allowed.

Since ieee80211_rx_h_mesh_fwding can forward packets for other mesh nodes
and is called earlier, it needs to check the decryptions status and if the
packet is using the control protocol on its own, instead of deferring to
the later call from ieee80211_frame_allowed.

Because of that, ieee80211_drop_unencrypted has a mesh specific check
that skips over the mesh header in order to check the payload protocol.
This code is invalid when called from ieee80211_frame_allowed, since that
happens after the 802.11->802.3 conversion.

Fix this by moving the mesh specific check directly into
ieee80211_rx_h_mesh_fwding.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221201135730.19723-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01 15:11:11 +01:00
Felix Fietkau
7d360f6061 wifi: mac80211: add support for restricting netdev features per vif
This can be used to selectively disable feature flags for checksum offload,
scatter/gather or GSO by changing vif->netdev_features.
Removing features from vif->netdev_features does not affect the netdev
features themselves, but instead fixes up skbs in the tx path so that the
offloads are not needed in the driver.

Aside from making it easier to deal with vif type based hardware limitations,
this also makes it possible to optimize performance on hardware without native
GSO support by declaring GSO support in hw->netdev_features and removing it
from vif->netdev_features. This allows mac80211 to handle GSO segmentation
after the sta lookup, but before itxq enqueue, thus reducing the number of
unnecessary sta lookups, as well as some other per-packet processing.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221010094338.78070-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01 15:09:10 +01:00
Kieran Frewen
209d70d34a wifi: mac80211: update TIM for S1G specification changes
Updates to the TIM information element to match changes made in the
IEEE Std 802.11ah-2020.

Signed-off-by: Kieran Frewen <kieran.frewen@morsemicro.com>
Co-developed-by: Gilad Itzkovitch <gilad.itzkovitch@morsemicro.com>
Signed-off-by: Gilad Itzkovitch <gilad.itzkovitch@morsemicro.com>
Link: https://lore.kernel.org/r/20221106221602.25714-1-gilad.itzkovitch@morsemicro.com
[use skb_put_data/skb_put_u8]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01 15:09:10 +01:00
Johannes Berg
8950b5988a wifi: mac80211: don't parse multi-BSSID in assoc resp
It's not valid to have the multiple BSSID element in the
association response (per 802.11 REVme D1.0), so don't
try to parse it there, but only in the fallback beacon
elements if needed.

The other case that was parsing association requests was
already changed in a previous commit.

Change-Id: I659d2ef1253e079cc71c46a017044e116e31c024
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01 15:09:10 +01:00