Commit Graph

76640 Commits

Author SHA1 Message Date
Eric Dumazet
026763ece8 ipv6: raw: check sk->sk_rcvbuf earlier
There is no point cloning an skb and having to free the clone
if the receive queue of the raw socket is full.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240307162943.2523817-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 11:39:29 -08:00
Ido Schimmel
5d9b7cb383 nexthop: Simplify dump error handling
The only error that can happen during a nexthop dump is insufficient
space in the skb caring the netlink messages (EMSGSIZE). If this happens
and some messages were already filled in, the nexthop code returns the
skb length to signal the netlink core that more objects need to be
dumped.

After commit b5a899154a ("netlink: handle EMSGSIZE errors in the
core") there is no need to handle this error in the nexthop code as it
is now handled in the core.

Simplify the code and simply return the error to the core.

No regressions in nexthop tests:

 # ./fib_nexthops.sh
 Tests passed: 234
 Tests failed:   0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240307154727.3555462-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 11:39:12 -08:00
Eric Dumazet
1cface552a net: add skb_data_unref() helper
Similar to skb_unref(), add skb_data_unref() to save an expensive
atomic operation (and cache line dirtying) when last reference
on shinfo->dataref is released.

I saw this opportunity on hosts with RAW sockets accidentally
bound to UDP protocol, forcing an skb_clone() on all received packets.

These RAW sockets had their receive queue full, so all clone
packets were immediately dropped.

When UDP recvmsg() consumes later the original skb, skb_release_data()
is hitting atomic_sub_return() quite badly, because skb->clone
has been set permanently.

Note that this patch helps TCP TX performance, because
TCP stack also use (fast) clones.

This means that at least one of the two packets (the main skb or
its clone) will no longer have to perform this atomic operation
in skb_release_data().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240307123446.2302230-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 11:38:45 -08:00
Jakub Kicinski
75c2946db3 wireless-next patches for v6.9
The fourth "new features" pull request for v6.9 with changes both in
 stack and in drivers. The theme in this pull request is to fix sparse
 warnings but we still have some left in wireless subsystem. Otherwise
 quite normal.
 
 Major changes:
 
 rtw89
 
 * NL80211_EXT_FEATURE_SCAN_RANDOM_SN support
 
 * NL80211_EXT_FEATURE_SET_SCAN_DWELL support
 
 rtw88
 
 * support for more rtw8811cu and rtw8821cu devices
 
 mt76
 
 * mt76x2u: add Netgear WNDA3100v3 USB
 
 * mt7915: newer ADIE version support
 
 * mt7925: radio temperature sensor support
 
 * mt7996: remove GCMP IGTK offload
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmXq4hARHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZtOawf9Gf2FAi56zA/4vKJPE/mZzRvNodj/u9WL
 mEX3KERw744IEmWY0yXEAyvzKkkNqUUtmdUbbsnXnnEtzsVZ2oRmOZdXsvEW3vOD
 IEsjWY/405MBWyuBttAa6orBSgelr99k86HzoLN86s52HmliVDhr2EUnYIf2O++9
 SVhHFKE4BMVCO6hlyEg419K9M2VhWtBDNYweoXAfn8Y1byAw6Pt6WunjRuGwJG5n
 qvcrZcFCFSa3daPpx0uIA/yiSjZlq0hwVC3r/PnoX/r1FDR8tS2ecvC2rP3MaZJ+
 1x3IcNvwC97D80wvdW+f+qKtV4OXZefsZpzJJpvREH8FbAgYLDef0Q==
 =gln7
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The fourth "new features" pull request for v6.9 with changes both in
stack and in drivers. The theme in this pull request is to fix sparse
warnings but we still have some left in wireless subsystem. Otherwise
quite normal.

Major changes:

rtw89
 * NL80211_EXT_FEATURE_SCAN_RANDOM_SN support
 * NL80211_EXT_FEATURE_SET_SCAN_DWELL support

rtw88
 * support for more rtw8811cu and rtw8821cu devices

mt76
 * mt76x2u: add Netgear WNDA3100v3 USB
 * mt7915: newer ADIE version support
 * mt7925: radio temperature sensor support
 * mt7996: remove GCMP IGTK offload

* tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (125 commits)
  wifi: rtw89: wow: move release offload packet earlier for WoWLAN mode
  wifi: rtw89: wow: set security engine options for 802.11ax chips only
  wifi: rtw89: update suspend/resume for different generation
  wifi: rtw89: wow: update config mac function with different generation
  wifi: rtw89: update DMA function with different generation
  wifi: rtw89: wow: update WoWLAN status register for different generation
  wifi: rtw89: wow: update WoWLAN reason register for different chips
  wifi: brcm80211: handle pmk_op allocation failure
  wifi: rtw89: coex: Add coexistence policy to decrease WiFi packet CRC-ERR
  wifi: rtw89: coex: When Bluetooth not available don't set power/gain
  wifi: rtw89: coex: add return value to ensure H2C command is success or not
  wifi: rtw89: coex: Reorder H2C command index to align with firmware
  wifi: rtw89: coex: add BTC ctrl_info version 7 and related logic
  wifi: rtw89: coex: add init_info H2C command format version 7
  wifi: rtw89: 8922a: add coexistence helpers of SW grant
  wifi: rtw89: mac: add coexistence helpers {cfg/get}_plt
  wifi: cw1200: restore endian swapping
  wifi: wlcore: sdio: Rate limit wl12xx_sdio_raw_{read,write}() failures warns
  wifi: rtlwifi: Remove rtl_intf_ops.read_efuse_byte
  wifi: rtw88: 8821c: Fix false alarm count
  ...
====================

Link: https://lore.kernel.org/r/20240308100429.B8EA2C433F1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 09:05:49 -08:00
Luiz Augusto von Dentz
3d1c16e920 Bluetooth: hci_sync: Fix UAF in hci_acl_create_conn_sync
This fixes the following error caused by hci_conn being freed while
hcy_acl_create_conn_sync is pending:

==================================================================
BUG: KASAN: slab-use-after-free in hci_acl_create_conn_sync+0xa7/0x2e0
Write of size 2 at addr ffff888002ae0036 by task kworker/u3:0/848

CPU: 0 PID: 848 Comm: kworker/u3:0 Not tainted 6.8.0-rc6-g2ab3e8d67fc1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38
04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x21/0x70
 print_report+0xce/0x620
 ? preempt_count_sub+0x13/0xc0
 ? __virt_addr_valid+0x15f/0x310
 ? hci_acl_create_conn_sync+0xa7/0x2e0
 kasan_report+0xdf/0x110
 ? hci_acl_create_conn_sync+0xa7/0x2e0
 hci_acl_create_conn_sync+0xa7/0x2e0
 ? __pfx_hci_acl_create_conn_sync+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_hci_acl_create_conn_sync+0x10/0x10
 hci_cmd_sync_work+0x138/0x1c0
 process_one_work+0x405/0x800
 ? __pfx_lock_acquire+0x10/0x10
 ? __pfx_process_one_work+0x10/0x10
 worker_thread+0x37b/0x670
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x19b/0x1e0
 ? kthread+0xfe/0x1e0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2f/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Allocated by task 847:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x8f/0xa0
 hci_conn_add+0xc6/0x970
 hci_connect_acl+0x309/0x410
 pair_device+0x4fb/0x710
 hci_sock_sendmsg+0x933/0xef0
 sock_write_iter+0x2c3/0x2d0
 do_iter_readv_writev+0x21a/0x2e0
 vfs_writev+0x21c/0x7b0
 do_writev+0x14a/0x180
 do_syscall_64+0x77/0x150
 entry_SYSCALL_64_after_hwframe+0x6c/0x74

Freed by task 847:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 __kasan_slab_free+0xfa/0x150
 kfree+0xcb/0x250
 device_release+0x58/0xf0
 kobject_put+0xbb/0x160
 hci_conn_del+0x281/0x570
 hci_conn_hash_flush+0xfc/0x130
 hci_dev_close_sync+0x336/0x960
 hci_dev_close+0x10e/0x140
 hci_sock_ioctl+0x14a/0x5c0
 sock_ioctl+0x58a/0x5d0
 __x64_sys_ioctl+0x480/0xf60
 do_syscall_64+0x77/0x150
 entry_SYSCALL_64_after_hwframe+0x6c/0x74

Fixes: 45340097ce ("Bluetooth: hci_conn: Only do ACL connections sequentially")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-08 11:06:14 -05:00
Frédéric Danis
2ab3e8d67f Bluetooth: Fix eir name length
According to Section 1.2 of Core Specification Supplement Part A the
complete or short name strings are defined as utf8s, which should not
include the trailing NULL for variable length array as defined in Core
Specification Vol1 Part E Section 2.9.3.

Removing the trailing NULL allows PTS to retrieve the random address based
on device name, e.g. for SM/PER/KDU/BV-02-C, SM/PER/KDU/BV-08-C or
GAP/BROB/BCST/BV-03-C.

Fixes: f61851f64b ("Bluetooth: Fix append max 11 bytes of name to scan rsp data")
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-08 10:22:17 -05:00
Eric Dumazet
155549a668 ipv6: remove RTNL protection from inet6_dump_addr()
We can now remove RTNL acquisition while running
inet6_dump_addr(), inet6_dump_ifmcaddr()
and inet6_dump_ifacaddr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:36 +00:00
Eric Dumazet
9cc4cc329d ipv6: use xa_array iterator to implement inet6_dump_addr()
inet6_dump_addr() can use the new xa_array iterator
for better scalability.

Make it ready for RCU-only protection.
RTNL use is removed in the following patch.

Also properly return 0 at the end of a dump to avoid
and extra recvmsg() to get NLMSG_DONE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:36 +00:00
Eric Dumazet
46f5182dd7 ipv6: make in6_dump_addrs() lockless
in6_dump_addrs() is called with RCU protection.

There is no need holding idev->lock to iterate through unicast addresses.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:35 +00:00
Eric Dumazet
f0a7da7020 ipv6: make inet6_fill_ifaddr() lockless
Make inet6_fill_ifaddr() lockless, and add approriate annotations
on ifa->tstamp, ifa->valid_lft, ifa->preferred_lft, ifa->ifa_proto
and ifa->rt_priority.

Also constify 2nd argument of inet6_fill_ifaddr(), inet6_fill_ifmcaddr()
and inet6_fill_ifacaddr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:35 +00:00
David S. Miller
3dbf6d67f2 ipsec-next-2024-03-06
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmXoQdQACgkQrB3Eaf9P
 W7dnTQ//RnTEaOPgTsHzhSwVOfWhsWkHx2xqUAlPNY8W2jrzxGgAIknPzobivvRJ
 U2bYPXDocDHUJAHIELUlu+lzATEz8baBN5zK5a+pPx5hXJlf5UI95linNZ5rEIiV
 RoxLpicnJqtWn1oMZ8d7Y0CknsLR/f4ruiVApzoifk1JaXC/zX8FcqqKsSPwVlqA
 GKy4+f71rNrIE9fbBAqDpmt6RuyRp/5yXPHLBoZlEXfYrYU1JOG8b/HLtGMD0SzV
 yHbDcgRPtbkWgAwNO/zxSDKa+PZr7NbVgakDzyHK+TltpU+6cOsajCaSXHWwsTBB
 +AebDschYY1H49oQe4bwLbNdGY+4lFvXxtk02sa8eM5a104MWxxTEB1QGAEri6gQ
 biAh3xTTbDpls26qkm97iZ6LlDE6pVIzF744buOYedvR8gjjoLt1z1PId05wMYGB
 A/4P6WkM8I1CZL++ODVfT8qR2N6lwFAQ6AM/eqHLvc6QpZ5Hm3lQAdLz1tK6QlCP
 MIV9uuNz8dFPrX1QifmLGojjdedB+4ASglxffOaoqRpHnMgHgzWTOux8tSFpuJGu
 mIYO/Dv5sHMdH8Jm+xXX1549bRzR+KGuqjXPxOSiO1jbOb5VC5ZDd3LVWb7fpDid
 K4eaU4Bo4R3eiCo1Bapt/1jKV1YFuyBKqTvObCDslVuN3Fu9d7I=
 =e4aa
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
1) Introduce forwarding of ICMP Error messages. That is specified
   in RFC 4301 but was never implemented. From Antony Antony.

2) Use KMEM_CACHE instead of kmem_cache_create in xfrm6_tunnel_init()
   and xfrm_policy_init(). From Kunwu Chan.

3) Do not allocate stats in the xfrm interface driver, this can be done
   on net core now. From Breno Leitao.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:56:05 +00:00
Ido Schimmel
5072ae00ae net: nexthop: Expose nexthop group HW stats to user space
Add netlink support for reading NH group hardware stats.

Stats collection is done through a new notifier,
NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for
a given NH group are thereby asked to collect the stats and report back to
core by calling nh_grp_hw_stats_report_delta(). This is similar to what
netdevice L3 stats do.

Besides exposing number of packets that passed in the HW datapath, also
include information on whether any driver actually realizes the counters.
The core can tell based on whether it got any _report_delta() reports from
the drivers. This allows enabling the statistics at the group at any time,
with drivers opting into supporting them. This is also in line with what
netdevice L3 stats are doing.

So as not to waste time and space, tie the collection and reporting of HW
stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS.

Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:47 +00:00
Ido Schimmel
746c19a52e net: nexthop: Add ability to enable / disable hardware statistics
Add netlink support for enabling collection of HW statistics on nexthop
groups.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:47 +00:00
Ido Schimmel
5877786fcf net: nexthop: Add hardware statistics notifications
Add hw_stats field to several notifier structures to communicate to the
drivers that HW statistics should be configured for nexthops within a given
group.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:47 +00:00
Ido Schimmel
95fedd7685 net: nexthop: Expose nexthop group stats to user space
Add netlink support for reading NH group stats.

This data is only for statistics of the traffic in the SW datapath. HW
nexthop group statistics will be added in the following patches.

Emission of the stats is keyed to a new op_stats flag to avoid cluttering
the netlink message with stats if the user doesn't need them:
NHA_OP_FLAG_DUMP_STATS.

Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:47 +00:00
Ido Schimmel
f4676ea74b net: nexthop: Add nexthop group entry stats
Add nexthop group entry stats to count the number of packets forwarded
via each nexthop in the group. The stats will be exposed to user space
for better data path observability in the next patch.

The per-CPU stats pointer is placed at the beginning of 'struct
nh_grp_entry', so that all the fields accessed for the data path reside
on the same cache line:

struct nh_grp_entry {
        struct nexthop *           nh;                   /*     0     8 */
        struct nh_grp_entry_stats * stats;               /*     8     8 */
        u8                         weight;               /*    16     1 */

        /* XXX 7 bytes hole, try to pack */

        union {
                struct {
                        atomic_t   upper_bound;          /*    24     4 */
                } hthr;                                  /*    24     4 */
                struct {
                        struct list_head uw_nh_entry;    /*    24    16 */
                        u16        count_buckets;        /*    40     2 */
                        u16        wants_buckets;        /*    42     2 */
                } res;                                   /*    24    24 */
        };                                               /*    24    24 */
        struct list_head           nh_list;              /*    48    16 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct nexthop *           nh_parent;            /*    64     8 */

        /* size: 72, cachelines: 2, members: 6 */
        /* sum members: 65, holes: 1, sum holes: 7 */
        /* last cacheline: 8 bytes */
};

Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:46 +00:00
Petr Machata
a207eab103 net: nexthop: Add NHA_OP_FLAGS
In order to add per-nexthop statistics, but still not increase netlink
message size for consumers that do not care about them, there needs to be a
toggle through which the user indicates their desire to get the statistics.
To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to
use the attribute for carrying of arbitrary operation-specific flags, i.e.
not make it specific for get / dump.

Add the new attribute to get and dump policies, but do not actually allow
any flags yet -- those will come later as the flags themselves are defined.
Add the necessary parsing code.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:46 +00:00
Petr Machata
2118f9390d net: nexthop: Adjust netlink policy parsing for a new attribute
A following patch will introduce a new attribute, op-specific flags to
adjust the behavior of an operation. Different operations will recognize
different flags.

- To make the differentiation possible, stop sharing the policies for get
  and del operations.

- To allow querying for presence of the attribute, have all the attribute
  arrays sized to NHA_MAX, regardless of what is permitted by policy, and
  pass the corresponding value to nlmsg_parse() as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:35:46 +00:00
Jakub Kicinski
6025b9135f net: dqs: add NIC stall detector based on BQL
softnet_data->time_squeeze is sometimes used as a proxy for
host overload or indication of scheduling problems. In practice
this statistic is very noisy and has hard to grasp units -
e.g. is 10 squeezes a second to be expected, or high?

Delaying network (NAPI) processing leads to drops on NIC queues
but also RTT bloat, impacting pacing and CA decisions.
Stalls are a little hard to detect on the Rx side, because
there may simply have not been any packets received in given
period of time. Packet timestamps help a little bit, but
again we don't know if packets are stale because we're
not keeping up or because someone (*cough* cgroups)
disabled IRQs for a long time.

We can, however, use Tx as a proxy for Rx stalls. Most drivers
use combined Rx+Tx NAPIs so if Tx gets starved so will Rx.
On the Tx side we know exactly when packets get queued,
and completed, so there is no uncertainty.

This patch adds stall checks to BQL. Why BQL? Because
it's a convenient place to add such checks, already
called by most drivers, and it has copious free space
in its structures (this patch adds no extra cache
references or dirtying to the fast path).

The algorithm takes one parameter - max delay AKA stall
threshold and increments a counter whenever NAPI got delayed
for at least that amount of time. It also records the length
of the longest stall.

To be precise every time NAPI has not polled for at least
stall thrs we check if there were any Tx packets queued
between last NAPI run and now - stall_thrs/2.

Unlike the classic Tx watchdog this mechanism does not
ignore stalls caused by Tx being disabled, or loss of link.
I don't think the check is worth the complexity, and
stall is a stall, whether due to host overload, flow
control, link down... doesn't matter much to the application.

We have been running this detector in production at Meta
for 2 years, with the threshold of 8ms. It's the lowest
value where false positives become rare. There's still
a constant stream of reported stalls (especially without
the ksoftirqd deferral patches reverted), those who like
their stall metrics to be 0 may prefer higher value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:23:26 +00:00
Eric Dumazet
b0ec2abf98 net: ip_tunnel: make sure to pull inner header in ip_tunnel_rcv()
Apply the same fix than ones found in :

8d975c15c0 ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()")
1ca1ba465e ("geneve: make sure to pull inner header in geneve_rx()")

We have to save skb->network_header in a temporary variable
in order to be able to recompute the network_header pointer
after a pskb_inet_may_pull() call.

pskb_inet_may_pull() makes sure the needed headers are in skb->head.

syzbot reported:
BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
 BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
 BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
 BUG: KMSAN: uninit-value in ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409
  __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
  INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
  IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
  ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409
  __ipgre_rcv+0x9bc/0xbc0 net/ipv4/ip_gre.c:389
  ipgre_rcv net/ipv4/ip_gre.c:411 [inline]
  gre_rcv+0x423/0x19f0 net/ipv4/ip_gre.c:447
  gre_rcv+0x2a4/0x390 net/ipv4/gre_demux.c:163
  ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205
  ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233
  NF_HOOK include/linux/netfilter.h:314 [inline]
  ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
  dst_input include/net/dst.h:461 [inline]
  ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
  NF_HOOK include/linux/netfilter.h:314 [inline]
  ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569
  __netif_receive_skb_one_core net/core/dev.c:5534 [inline]
  __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648
  netif_receive_skb_internal net/core/dev.c:5734 [inline]
  netif_receive_skb+0x58/0x660 net/core/dev.c:5793
  tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1556
  tun_get_user+0x53b9/0x66e0 drivers/net/tun.c:2009
  tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055
  call_write_iter include/linux/fs.h:2087 [inline]
  new_sync_write fs/read_write.c:497 [inline]
  vfs_write+0xb6b/0x1520 fs/read_write.c:590
  ksys_write+0x20f/0x4c0 fs/read_write.c:643
  __do_sys_write fs/read_write.c:655 [inline]
  __se_sys_write fs/read_write.c:652 [inline]
  __x64_sys_write+0x93/0xd0 fs/read_write.c:652
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Uninit was created at:
  __alloc_pages+0x9a6/0xe00 mm/page_alloc.c:4590
  alloc_pages_mpol+0x62b/0x9d0 mm/mempolicy.c:2133
  alloc_pages+0x1be/0x1e0 mm/mempolicy.c:2204
  skb_page_frag_refill+0x2bf/0x7c0 net/core/sock.c:2909
  tun_build_skb drivers/net/tun.c:1686 [inline]
  tun_get_user+0xe0a/0x66e0 drivers/net/tun.c:1826
  tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055
  call_write_iter include/linux/fs.h:2087 [inline]
  new_sync_write fs/read_write.c:497 [inline]
  vfs_write+0xb6b/0x1520 fs/read_write.c:590
  ksys_write+0x20f/0x4c0 fs/read_write.c:643
  __do_sys_write fs/read_write.c:655 [inline]
  __se_sys_write fs/read_write.c:652 [inline]
  __x64_sys_write+0x93/0xd0 fs/read_write.c:652
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:14:15 +00:00
Shiming Cheng
c4386ab4f6 ipv6: fib6_rules: flush route cache when rule is changed
When rule policy is changed, ipv6 socket cache is not refreshed.
The sock's skb still uses a outdated route cache and was sent to
a wrong interface.

To avoid this error we should update fib node's version when
rule is changed. Then skb's route will be reroute checked as
route cache version is already different with fib node version.
The route cache is refreshed to match the latest rule.

Fixes: 101367c2f8 ("[IPV6]: Policy Routing Rules")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:10:35 +00:00
Jakub Kicinski
92f8b1f5ca netdev: add queue stat for alloc failures
Rx alloc failures are commonly counted by drivers.
Support reporting those via netdev-genl queue stats.

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:13:26 -08:00
Jakub Kicinski
ab63a2387c netdev: add per-queue statistics
The ethtool-nl family does a good job exposing various protocol
related and IEEE/IETF statistics which used to get dumped under
ethtool -S, with creative names. Queue stats don't have a netlink
API, yet, and remain a lion's share of ethtool -S output for new
drivers. Not only is that bad because the names differ driver to
driver but it's also bug-prone. Intuitively drivers try to report
only the stats for active queues, but querying ethtool stats
involves multiple system calls, and the number of stats is
read separately from the stats themselves. Worse still when user
space asks for values of the stats, it doesn't inform the kernel
how big the buffer is. If number of stats increases in the meantime
kernel will overflow user buffer.

Add a netlink API for dumping queue stats. Queue information is
exposed via the netdev-genl family, so add the stats there.
Support per-queue and sum-for-device dumps. Latter will be useful
when subsequent patches add more interesting common stats than
just bytes and packets.

The API does not currently distinguish between HW and SW stats.
The expectation is that the source of the stats will either not
matter much (good packets) or be obvious (skb alloc errors).

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:13:25 -08:00
Eric Dumazet
ce7f49ab74 net: move rps_sock_flow_table to net_hotdata
rps_sock_flow_table and rps_cpu_mask are used in fast path.

Move them to net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-19-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:43 -08:00
Eric Dumazet
490a79faf9 net: introduce include/net/rps.h
Move RPS related structures and helpers from include/linux/netdevice.h
and include/net/sock.h to a new include file.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:43 -08:00
Eric Dumazet
df51b84564 ipv6: move tcp_ipv6_hash_secret and udp_ipv6_hash_secret to net_hotdata
Use a 32bit hole in "struct net_offload" to store
the remaining 32bit secrets used by TCPv6 and UDPv6.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-17-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:43 -08:00
Eric Dumazet
5af674bb90 ipv6: move inet6_ehash_secret and udp6_ehash_secret into net_hotdata
"struct inet6_protocol" has a 32bit hole in 32bit arches.

Use it to store the 32bit secret used by UDP and TCP,
to increase cache locality in rx path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-16-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:43 -08:00
Eric Dumazet
6e0735723a inet: move inet_ehash_secret and udp_ehash_secret into net_hotdata
"struct net_protocol" has a 32bit hole in 32bit arches.

Use it to store the 32bit secret used by UDP and TCP,
to increase cache locality in rx path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-15-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:43 -08:00
Eric Dumazet
571bf020be inet: move tcp_protocol and udp_protocol to net_hotdata
These structures are read in rx path, move them to net_hotdata
for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-14-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:43 -08:00
Eric Dumazet
4ea0875b9d ipv6: move tcpv6_protocol and udpv6_protocol to net_hotdata
These structures are read in rx path, move them to net_hotdata
for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-13-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:43 -08:00
Eric Dumazet
6a55ca6b01 udp: move udpv4_offload and udpv6_offload to net_hotdata
These structures are used in GRO and GSO paths.
Move them to net_hodata for better cache locality.

v2: udpv6_offload definition depends on CONFIG_INET=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-12-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:43 -08:00
Eric Dumazet
aa70d2d16f net: move skbuff_cache(s) to net_hotdata
skbuff_cache, skbuff_fclone_cache and skb_small_head_cache
are used in rx/tx fast paths.

Move them to net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-11-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
71c0de9bac net: move dev_rx_weight to net_hotdata
dev_rx_weight is read from process_backlog().

Move it to net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-10-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
26722dc74b net: move dev_tx_weight to net_hotdata
dev_tx_weight is used in tx fast path.

Move it to net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
0139806eeb net: move tcpv4_offload and tcpv6_offload to net_hotdata
These are used in TCP fast paths.

Move them into net_hotdata for better cache locality.

v2: tcpv6_offload definition depends on CONFIG_INET

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
61a0be1a53 net: move ip_packet_offload and ipv6_packet_offload to net_hotdata
These structures are used in GRO and GSO paths.

v2: ipv6_packet_offload definition depends on CONFIG_INET

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
edbc666cdc net: move netdev_max_backlog to net_hotdata
netdev_max_backlog is used in rx fat path.

Move it to net_hodata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
0b91fa4bfb net: move ptype_all into net_hotdata
ptype_all is used in rx/tx fast paths.

Move it to net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:41 -08:00
Eric Dumazet
f59b5416c3 net: move netdev_tstamp_prequeue into net_hotdata
netdev_tstamp_prequeue is used in rx path.

Move it to net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:41 -08:00
Eric Dumazet
ae6e22f7b7 net: move netdev_budget and netdev_budget to net_hotdata
netdev_budget and netdev_budget are used in rx path (net_rx_action())

Move them into net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:41 -08:00
Eric Dumazet
2658b5a8a4 net: introduce struct net_hotdata
Instead of spreading networking critical fields
all over the places, add a custom net_hotdata
structure so that we can precisely control its layout.

In this first patch, move :

- gro_normal_batch used in rx (GRO stack)
- offload_base used in rx and tx (GRO and TSO stacks)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:41 -08:00
Jakub Kicinski
c3874bbec9 rxrpc changes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmXnrwoACgkQ+7dXa6fL
 C2vtoA//UstImRxievynpzfaFy14vCY+qVgW3n88sDT9ITP/jCQH/hMorIft7B2K
 W7xN15OrWJ7SmfSSiWgZRTPgc0y5J/25woinZk9zSrFyOPfYXEJUy/Rw2VpQQ1eU
 lN1ArHvSzj/wWYZjb5FfzNds2mO9dpSvgP8gsBPaN/49OX4ZDpfwQO0ho20c8qE3
 VEUAoEPSYFd9EUJ4NBvXhn5q0Z1dp/tl3sQZdOFIN3EZnQBloPFFkvBbDTuX2jtK
 Q7F6cgPc/BWfQ+U4OwyPOA3QBrTPMmWCUtM1QCvYV+cj1O7xGaof372Fqin0E7z9
 NVIN4fVk3SIW/LRLhpyHoVOc0jHIona9gAsawfRzSIz7sPdeoXy0oOFHyKWrf3dH
 TdrdnPPkUaGLPbTv2/l6lzz+uCkMlQ6BOZV+y6gm1+OcM7BEvYBeA6RShPKEk0vo
 XsCwn6Pl9y8TZTOpcjal0b+M6mR5Nz0hYzcdCjmBc3lY8ueURPdvaNyLgmAtMu5c
 udsq8yMxaXyc3BblrtTTdV+sqdycTfU6/CoNY44uS4X9H+MsFYBgoba+HSvekUWj
 b/DCE2YfjQXdRmdlfGhyHSjnvOZohd8wq7i9TEOc8PSNXHo0CV/U8Q6pgsRrwSpG
 7DecipkrdFPYqzOl8PRIwWg/VwNGEI5ESzDD9Svs+cODSYJXJjE=
 =Egry
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-iothread-20240305' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
Here are some changes to AF_RXRPC:

 (1) Cache the transmission serial number of ACK and DATA packets in the
     rxrpc_txbuf struct and log this in the retransmit tracepoint.

 (2) Don't use atomics on rxrpc_txbuf::flags[*] and cache the intended wire
     header flags there too to avoid duplication.

 (3) Cache the wire checksum in rxrpc_txbuf to make it easier to create
     jumbo packets in future (which will require altering the wire header
     to a jumbo header and restoring it back again for retransmission).

 (4) Fix the protocol names in the wire ACK trailer struct.

 (5) Strip all the barriers and atomics out of the call timer tracking[*].

 (6) Remove atomic handling from call->tx_transmitted and
     call->acks_prev_seq[*].

 (7) Don't bother resetting the DF flag after UDP packet transmission.  To
     change it, we now call directly into UDP code, so it's quick just to
     set it every time.

 (8) Merge together the DF/non-DF branches of the DATA transmission to
     reduce duplication in the code.

 (9) Add a kvec array into rxrpc_txbuf and start moving things over to it.
     This paves the way for using page frags.

(10) Split (sub)packet preparation and timestamping out of the DATA
     transmission function.  This helps pave the way for future jumbo
     packet generation.

(11) In rxkad, don't pick values out of the wire header stored in
     rxrpc_txbuf, buf rather find them elsewhere so we can remove the wire
     header from there.

(12) Move rxrpc_send_ACK() to output.c so that it can be merged with
     rxrpc_send_ack_packet().

(13) Use rxrpc_txbuf::kvec[0] to access the wire header for the packet
     rather than directly accessing the copy in rxrpc_txbuf.  This will
     allow that to be removed to a page frag.

(14) Switch from keeping the transmission buffers in rxrpc_txbuf allocated
     in the slab to allocating them using page fragment allocators.  There
     are separate allocators for DATA packets (which persist for a while)
     and control packets (which are discarded immediately).

     We can then turn on MSG_SPLICE_PAGES when transmitting DATA and ACK
     packets.

     We can also get rid of the RCU cleanup on rxrpc_txbufs, preferring
     instead to release the page frags as soon as possible.

(15) Parse received packets before handling timeouts as the former may
     reset the latter.

(16) Make sure we don't retransmit DATA packets after all the packets have
     been ACK'd.

(17) Differentiate traces for PING ACK transmission.

(18) Switch to keeping timeouts as ktime_t rather than a number of jiffies
     as the latter is too coarse a granularity.  Only set the call timer at
     the end of the call event function from the aggregate of all the
     timeouts, thereby reducing the number of timer calls made.  In future,
     it might be possible to reduce the number of timers from one per call
     to one per I/O thread and to use a high-precision timer.

(19) Record RTT probes after successful transmission rather than recording
     it before and then cancelling it after if unsuccessful[*].  This
     allows a number of calls to get the current time to be removed.

(20) Clean up the resend algorithm as there's now no need to walk the
     transmission buffer under lock[*].  DATA packets can be retransmitted
     as soon as they're found rather than being queued up and transmitted
     when the locked is dropped.

(21) When initially parsing a received ACK packet, extract some of the
     fields from the ack info to the skbuff private data.  This makes it
     easier to do path MTU discovery in the future when the call to which a
     PING RESPONSE ACK refers has been deallocated.

[*] Possible with the move of almost all code from softirq context to the
    I/O thread.

Link: https://lore.kernel.org/r/20240301163807.385573-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20240304084322.705539-1-dhowells@redhat.com/ # v2

* tag 'rxrpc-iothread-20240305' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (21 commits)
  rxrpc: Extract useful fields from a received ACK to skb priv data
  rxrpc: Clean up the resend algorithm
  rxrpc: Record probes after transmission and reduce number of time-gets
  rxrpc: Use ktimes for call timeout tracking and set the timer lazily
  rxrpc: Differentiate PING ACK transmission traces.
  rxrpc: Don't permit resending after all Tx packets acked
  rxrpc: Parse received packets before dealing with timeouts
  rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags
  rxrpc: Use rxrpc_txbuf::kvec[0] instead of rxrpc_txbuf::wire
  rxrpc: Move rxrpc_send_ACK() to output.c with rxrpc_send_ack_packet()
  rxrpc: Don't pick values out of the wire header when setting up security
  rxrpc: Split up the DATA packet transmission function
  rxrpc: Add a kvec[] to the rxrpc_txbuf struct
  rxrpc: Merge together DF/non-DF branches of data Tx function
  rxrpc: Do lazy DF flag resetting
  rxrpc: Remove atomic handling on some fields only used in I/O thread
  rxrpc: Strip barriers and atomics off of timer tracking
  rxrpc: Fix the names of the fields in the ACK trailer struct
  rxrpc: Note cksum in txbuf
  rxrpc: Convert rxrpc_txbuf::flags into a mask and don't use atomics
  ...
====================

Link: https://lore.kernel.org/r/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:59:58 -08:00
Eric Dumazet
02e24903e5 netlink: let core handle error cases in dump operations
After commit b5a899154a ("netlink: handle EMSGSIZE errors
in the core"), we can remove some code that was not 100 % correct
anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306102426.245689-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:48:22 -08:00
Christoph Paasch
8edbd39601 mpls: Do not orphan the skb
We observed that TCP-pacing was falling back to the TCP-layer pacing
instead of utilizing sch_fq for the pacing. This causes significant
CPU-usage due to the hrtimer running on a per-TCP-connection basis.

The issue is that mpls_xmit() calls skb_orphan() and thus sets
skb->sk to NULL. Which implies that many of the goodies of TCP won't
work. Pacing falls back to TCP-layer pacing. TCP Small Queues does not
work, ...

It is safe to remove this call to skb_orphan() in mpls_xmit() as there
really is not reason for it to be there. It appears that this call to
skb_orphan comes from the very initial implementation of MPLS.

Cc: Roopa Prabhu <roopa@nvidia.com>
Reported-by: Craig Taylor <cmtaylor@apple.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Link: https://lore.kernel.org/r/20240306181117.77419-1-cpaasch@apple.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:42:13 -08:00
Florian Fainelli
4f6473ad60 net: dsa: Leverage core stats allocator
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core
and convert veth & vrf"), stats allocation could be done on net core
instead of in this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Remove the allocation in the DSA user network device code and leverage
the network core allocation instead.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240306200416.2973179-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:37:13 -08:00
Justin Swartz
7a04ff1277 net: x25: remove dead links from Kconfig
Remove the "You can read more about X.25 at" links provided in
Kconfig as they have not pointed at any relevant pages for quite
a while.

An old copy of https://www.sangoma.com/tutorials/x25/ can be
retrieved via https://archive.org/web/ but nothing useful seems
to have been preserved for http://docwiki.cisco.com/wiki/X.25

For the sake of necromancy and those who really did want to
read more about X.25, a previous incarnation of Kconfig included
a link to:
http://www.cisco.com/univercd/cc/td/doc/product/software/ios11/cbook/cx25.htm

Which can still be read at:
https://web.archive.org/web/20071013101232/http://cisco.com/en/US/docs/ios/11_0/router/configuration/guide/cx25.html

Signed-off-by: Justin Swartz <justin.swartz@risingedge.co.za>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20240306112659.25375-1-justin.swartz@risingedge.co.za
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:24:35 -08:00
Jakub Kicinski
e3afe5dd3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/page_pool_user.c
  0b11b1c5c3 ("netdev: let netlink core handle -EMSGSIZE errors")
  429679dcf7 ("page_pool: fix netlink dump stop/resume")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 10:29:36 -08:00
Luiz Augusto von Dentz
42ed95de82 Bluetooth: ISO: Align broadcast sync_timeout with connection timeout
This aligns broadcast sync_timeout with existing connection timeouts
which are 20 seconds long.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-07 11:58:17 -05:00
Paolo Abeni
d5b8aff73d netfilter pull request 24-03-07
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXpIkoACgkQ1V2XiooU
 IOTEeQ/8DjOKAZW1Tbb6AVNdBI2DxEv3Nl94IkPpTXFOHV2apReuxrZlx0qS/Tda
 FutYLap9SmU0koIgUp3ZDZV8eqk9YlPERmAYog2zB9AHiCyfT5xSUmj9zZCE5l4N
 yFHQ665wl8Iz4TDAoSL75ZRKhOhdaDt4WBThtUkMQHhL+lNDtXSWuQBDtle1q8CF
 Edu0OPlcG6/KMu55XgSXcbvWj6ka9RZjCO5Z5D3ZG6UzNOTCZeb+o8o+K+qKTOyB
 /V5OHWTlEU1D7M6twa8qG6n/ce3sVTh9XoZRAEaqHBMkjwNr/VOeO7Pvb23hVy+j
 ZKATYQsle4gXGJqjwcXXG4K8BWMtR8CiweK85+cBXNPasjJOcGrxy0W04X0vBXWt
 xmJ5ou/0PgYv/0RT/JzN4wJw5MccM7RXXElxNjmZS0zkzEPfMKhqMPbYcAEQaPaF
 CyscYDtVrIeOqHBl/HFqbN0ZwdUIQ4nF57vsUVvn8bsevdJaqWP8VxTtAW0U1ST7
 lPJkmeBiqjDezIHbt3wu2+sdlkrxwgJT3puxyyFP/FA0oiWyTfN7OYlWCUssEKTs
 9MwFL5flgmNnEwvZ8iVqjI/Pcf8W/Rx92eou0YayMuhmfJJZ9NjFERdob7QIvgfP
 XG51rrB1a2y/a08o47URmG5u6219WPBwXO/PYs83M3vahg7FKHc=
 =UZiH
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains fixes for net:

Patch #1 disallows anonymous sets with timeout, except for dynamic sets.
         Anonymous sets with timeouts using the pipapo set backend makes
         no sense from userspace perspective.

Patch #2 rejects constant sets with timeout which has no practical usecase.
         This kind of set, once bound, contains elements that expire but
         no new elements can be added.

Patch #3 restores custom conntrack expectations with NFPROTO_INET,
         from Florian Westphal.

Patch #4 marks rhashtable anonymous set with timeout as dead from the
         commit path to avoid that async GC collects these elements. Rules
         that refers to the anonymous set get released with no mutex held
         from the commit path.

Patch #5 fixes a UBSAN shift overflow in H.323 conntrack helper,
         from Lena Wang.

netfilter pull request 24-03-07

* tag 'nf-24-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
  netfilter: nft_ct: fix l3num expectations with inet pseudo family
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
====================

Link: https://lore.kernel.org/r/20240307021545.149386-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 11:06:14 +01:00
Jason Xing
d380ce7005 netrom: Fix data-races around sysctl_net_busy_read
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
bc76645ebd netrom: Fix a data-race around sysctl_netrom_link_fails_count
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
b5dffcb8f7 netrom: Fix a data-race around sysctl_netrom_routing_control
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
f99b494b40 netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
a2e7068414 netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
43547d8699 netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
806f462ba9 netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
e799299aaf netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
60a7a152ab netrom: Fix a data-race around sysctl_netrom_transport_timeout
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
119cae5ea3 netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
cfd9f4a740 netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
We need to protect the reader reading the sysctl value
because the value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jason Xing
958d6145a6 netrom: Fix a data-race around sysctl_netrom_default_path_quality
We need to protect the reader reading sysctl_netrom_default_path_quality
because the value can be changed concurrently.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 10:36:58 +01:00
Jakub Kicinski
811b3f9b2a ipsec-2024-03-06
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmXoPHUACgkQrB3Eaf9P
 W7ehmxAAoemzwIDP0wcDi7U68Za7wBC7CbV6WoVmDNRsO+BwnqlCtd7+B9hi0Qd0
 h+KCVYw5EbUJbsHcuefj/QMNO46ueZZLswRIEMKlkZOHdC8TTGzjYmjLnkOHKTCm
 wpJ9QSrnBoy3MUcWbZCJh4BZXsTftbu1fHRWy9GdBERXYfHqWdQCq/ZMAgv3IwLF
 KwZahoGZwCDkmWOpshbBRGj0lnONzZ3mW//bN5EB71rSi33gPEtABYBSw9E9sdMw
 uZg/xRnHMhS5CQHRnFEqVUiqu3wDJYgs3kQIDFhC1T2w94GBF/R+HzzFiBKLzxr1
 Dk17avoNexSYRThJfCk6fMbXT4GVaUSKSG6KI4CRLna/wAIb4QEVDPEdk1ybOxRy
 eoUfo7GXkVqhJpqnOX0Sl3262DnhQ/syhmv3sWXmoSpa630mDuFleVuVJE81dtMu
 jSfaXY7BNpEwTwj8kzabKq5cLkt4T4dAnXf0ao1ATNCzcFkUjSIWN5ylnOEtdVe/
 wEZYp8oc1kPEDU0RC8LzpaEooTQlPceeIAZca7a/lAhnRreGyP9j56Y2oM9YS76o
 JPRrznWAdXVa0gX81zGOjZDHWSyLgpJZ9z5MaAdquP9uhXS4buu+pxxoUcFEfLJd
 FV+yY9j3o3l0HLL6j/y6EZy/QzWPV+K9NAhokNL/WTwuOZ2Lq5Y=
 =FnLp
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2024-03-06

1) Clear the ECN bits flowi4_tos in decode_session4().
   This was already fixed but the bug was reintroduced
   when decode_session4() switched to us the flow dissector.
   From Guillaume Nault.

2) Fix UDP encapsulation in the TX path with packet offload mode.
   From Leon Romanovsky,

3) Avoid clang fortify warning in copy_to_user_tmpl().
   From Nathan Chancellor.

4) Fix inter address family tunnel in packet offload mode.
   From Mike Yu.

* tag 'ipsec-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: set skb control buffer based on packet offload as well
  xfrm: fix xfrm child route lookup for packet offload
  xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
  xfrm: Pass UDP encapsulation in TX packet offload
  xfrm: Clear low order bits of ->flowi4_tos in decode_session4().
====================

Link: https://lore.kernel.org/r/20240306100438.3953516-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06 20:55:21 -08:00
Heiner Kallweit
d7933a2c7f ethtool: remove ethtool_eee_use_linkmodes
After 292fac464b ("net: ethtool: eee: Remove legacy _u32 from keee")
this function has no user any longer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/b4ff9b51-092b-4d44-bfce-c95342a05b51@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06 20:40:20 -08:00
Geliang Tang
af250c27ea mptcp: drop lookup_by_id in lookup_addr
When the lookup_by_id parameter of __lookup_addr() is true, it's the same
as __lookup_addr_by_id(), it can be replaced by __lookup_addr_by_id()
directly. So drop this parameter, let __lookup_addr() only looks up address
on the local address list by comparing addresses in it, not address ids.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240305-upstream-net-next-20240304-mptcp-misc-cleanup-v1-4-c436ba5e569b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06 20:24:10 -08:00
Geliang Tang
a4d68b1602 mptcp: set error messages for set_flags
In addition to returning the error value, this patch also sets an error
messages with GENL_SET_ERR_MSG or NL_SET_ERR_MSG_ATTR both for pm_netlink.c
and pm_userspace.c. It will help the userspace to identify the issue.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240305-upstream-net-next-20240304-mptcp-misc-cleanup-v1-3-c436ba5e569b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06 20:24:10 -08:00
Geliang Tang
6a42477fe4 mptcp: update set_flags interfaces
This patch updates set_flags interfaces, make it more similar to the
interfaces of dump_addr and get_addr:

 mptcp_pm_set_flags(struct sk_buff *skb, struct genl_info *info)
 mptcp_pm_nl_set_flags(struct sk_buff *skb, struct genl_info *info)
 mptcp_userspace_pm_set_flags(struct sk_buff *skb, struct genl_info *info)

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240305-upstream-net-next-20240304-mptcp-misc-cleanup-v1-2-c436ba5e569b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06 20:24:10 -08:00
Geliang Tang
d5dfbfa2f8 mptcp: drop duplicate header inclusions
The headers net/tcp.h, net/genetlink.h and uapi/linux/mptcp.h are included
in protocol.h already, no need to include them again directly. This patch
removes these duplicate header inclusions.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240305-upstream-net-next-20240304-mptcp-misc-cleanup-v1-1-c436ba5e569b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06 20:24:10 -08:00
Lena Wang
767146637e netfilter: nf_conntrack_h323: Add protection for bmp length out of range
UBSAN load reports an exception of BRK#5515 SHIFT_ISSUE:Bitwise shifts
that are out of bounds for their data type.

vmlinux   get_bitmap(b=75) + 712
<net/netfilter/nf_conntrack_h323_asn1.c:0>
vmlinux   decode_seq(bs=0xFFFFFFD008037000, f=0xFFFFFFD008037018, level=134443100) + 1956
<net/netfilter/nf_conntrack_h323_asn1.c:592>
vmlinux   decode_choice(base=0xFFFFFFD0080370F0, level=23843636) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux   decode_seq(f=0xFFFFFFD0080371A8, level=134443500) + 812
<net/netfilter/nf_conntrack_h323_asn1.c:576>
vmlinux   decode_choice(base=0xFFFFFFD008037280, level=0) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux   DecodeRasMessage() + 304
<net/netfilter/nf_conntrack_h323_asn1.c:833>
vmlinux   ras_help() + 684
<net/netfilter/nf_conntrack_h323_main.c:1728>
vmlinux   nf_confirm() + 188
<net/netfilter/nf_conntrack_proto.c:137>

Due to abnormal data in skb->data, the extension bitmap length
exceeds 32 when decoding ras message then uses the length to make
a shift operation. It will change into negative after several loop.
UBSAN load could detect a negative shift as an undefined behaviour
and reports exception.
So we add the protection to avoid the length exceeding 32. Or else
it will return out of range error and stop decoding.

Fixes: 5e35941d99 ("[NETFILTER]: Add H.323 conntrack/NAT helper")
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-07 03:10:35 +01:00
Pablo Neira Ayuso
552705a365 netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
While the rhashtable set gc runs asynchronously, a race allows it to
collect elements from anonymous sets with timeouts while it is being
released from the commit path.

Mingi Cho originally reported this issue in a different path in 6.1.x
with a pipapo set with low timeouts which is not possible upstream since
7395dfacff ("netfilter: nf_tables: use timestamp to check for set
element timeout").

Fix this by setting on the dead flag for anonymous sets to skip async gc
in this case.

According to 08e4c8c591 ("netfilter: nf_tables: mark newset as dead on
transaction abort"), Florian plans to accelerate abort path by releasing
objects via workqueue, therefore, this sets on the dead flag for abort
path too.

Cc: stable@vger.kernel.org
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Mingi Cho <mgcho.minic@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-07 03:08:26 +01:00
Florian Westphal
9999378996 netfilter: nft_ct: fix l3num expectations with inet pseudo family
Following is rejected but should be allowed:

table inet t {
        ct expectation exp1 {
                [..]
                l3proto ip

Valid combos are:
table ip t, l3proto ip
table ip6 t, l3proto ip6
table inet t, l3proto ip OR l3proto ip6

Disallow inet pseudeo family, the l3num must be a on-wire protocol known
to conntrack.

Retain NFPROTO_INET case to make it clear its rejected
intentionally rather as oversight.

Fixes: 8059918a13 ("netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-07 00:12:34 +01:00
Pablo Neira Ayuso
5f4fc4bd5c netfilter: nf_tables: reject constant set with timeout
This set combination is weird: it allows for elements to be
added/deleted, but once bound to the rule it cannot be updated anymore.
Eventually, all elements expire, leading to an empty set which cannot
be updated anymore. Reject this flags combination.

Cc: stable@vger.kernel.org
Fixes: 761da2935d ("netfilter: nf_tables: add set timeout API support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-07 00:02:42 +01:00
Pablo Neira Ayuso
16603605b6 netfilter: nf_tables: disallow anonymous set with timeout flag
Anonymous sets are never used with timeout from userspace, reject this.
Exception to this rule is NFT_SET_EVAL to ensure legacy meters still work.

Cc: stable@vger.kernel.org
Fixes: 761da2935d ("netfilter: nf_tables: add set timeout API support")
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-07 00:02:42 +01:00
Vinicius Peixoto
48201a3b3f Bluetooth: Add new quirk for broken read key length on ATS2851
The ATS2851 controller erroneously reports support for the "Read
Encryption Key Length" HCI command. This makes it unable to connect
to any devices, since this command is issued by the kernel during the
connection process in response to an "Encryption Change" HCI event.

Add a new quirk (HCI_QUIRK_BROKEN_ENC_KEY_SIZE) to hint that the command
is unsupported, preventing it from interrupting the connection process.

This is the error log from btmon before this patch:

> HCI Event: Encryption Change (0x08) plen 4
        Status: Success (0x00)
        Handle: 2048 Address: ...
        Encryption: Enabled with E0 (0x01)
< HCI Command: Read Encryption Key Size (0x05|0x0008) plen 2
        Handle: 2048 Address: ...
> HCI Event: Command Status (0x0f) plen 4
      Read Encryption Key Size (0x05|0x0008) ncmd 1
        Status: Unknown HCI Command (0x01)

Signed-off-by: Vinicius Peixoto <nukelet64@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:27:14 -05:00
Roman Smirnov
a310d74dce Bluetooth: mgmt: remove NULL check in add_ext_adv_params_complete()
Remove the cmd pointer NULL check in add_ext_adv_params_complete()
because it occurs earlier in add_ext_adv_params(). This check is
also unnecessary because the pointer is dereferenced just before it.

Found by Linux Verification Center (linuxtesting.org) with Svace.

Signed-off-by: Roman Smirnov <r.smirnov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:27:13 -05:00
Roman Smirnov
3237da12a3 Bluetooth: mgmt: remove NULL check in mgmt_set_connectable_complete()
Remove the cmd pointer NULL check in mgmt_set_connectable_complete()
because it occurs earlier in set_connectable(). This check is also
unnecessary because the pointer is dereferenced just before it.

Found by Linux Verification Center (linuxtesting.org) with Svace.

Signed-off-by: Roman Smirnov <r.smirnov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:27:12 -05:00
Dan Carpenter
18d88f0fd8 Bluetooth: ISO: Clean up returns values in iso_connect_ind()
This function either returns 0 or HCI_LM_ACCEPT.  Make it clearer which
returns are which and delete the "lm" variable because it is no longer
required.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:27:09 -05:00
Pauli Virtanen
947ec0d002 Bluetooth: fix use-after-free in accessing skb after sending it
hci_send_cmd_sync first sends skb and then tries to clone it.  However,
the driver may have already freed the skb at that point.

Fix by cloning the sent_cmd cloned just above, instead of the original.

Log:
================================================================
BUG: KASAN: slab-use-after-free in __copy_skb_header+0x1a/0x240
...
Call Trace: ..
 __skb_clone+0x59/0x2c0
 hci_cmd_work+0x3b3/0x3d0 [bluetooth]
 process_one_work+0x459/0x900
...
Allocated by task 129: ...
 __alloc_skb+0x1ae/0x220
 __hci_cmd_sync_sk+0x44c/0x7a0 [bluetooth]
 __hci_cmd_sync_status+0x24/0xb0 [bluetooth]
 set_cig_params_sync+0x778/0x7d0 [bluetooth]
...
Freed by task 0: ...
 kmem_cache_free+0x157/0x3c0
 __usb_hcd_giveback_urb+0x11e/0x1e0
 usb_giveback_urb_bh+0x1ad/0x2a0
 tasklet_action_common.isra.0+0x259/0x4a0
 __do_softirq+0x15b/0x5a7
================================================================

Fixes: 2615fd9a7c ("Bluetooth: hci_sync: Fix overwriting request callback")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:58 -05:00
Luiz Augusto von Dentz
f7b94bdc1e Bluetooth: af_bluetooth: Fix deadlock
Attemting to do sock_lock on .recvmsg may cause a deadlock as shown
bellow, so instead of using sock_sock this uses sk_receive_queue.lock
on bt_sock_ioctl to avoid the UAF:

INFO: task kworker/u9:1:121 blocked for more than 30 seconds.
      Not tainted 6.7.6-lemon #183
Workqueue: hci0 hci_rx_work
Call Trace:
 <TASK>
 __schedule+0x37d/0xa00
 schedule+0x32/0xe0
 __lock_sock+0x68/0xa0
 ? __pfx_autoremove_wake_function+0x10/0x10
 lock_sock_nested+0x43/0x50
 l2cap_sock_recv_cb+0x21/0xa0
 l2cap_recv_frame+0x55b/0x30a0
 ? psi_task_switch+0xeb/0x270
 ? finish_task_switch.isra.0+0x93/0x2a0
 hci_rx_work+0x33a/0x3f0
 process_one_work+0x13a/0x2f0
 worker_thread+0x2f0/0x410
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe0/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

Fixes: 2e07e8348e ("Bluetooth: af_bluetooth: Fix Use-After-Free in bt_sock_recvmsg")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:25 -05:00
Luiz Augusto von Dentz
0f0639b4d6 Bluetooth: bnep: Fix out-of-bound access
This fixes attempting to access past ethhdr.h_source, although it seems
intentional to copy also the contents of h_proto this triggers
out-of-bound access problems with the likes of static analyzer, so this
instead just copy ETH_ALEN and then proceed to use put_unaligned to copy
h_proto separetely.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:24 -05:00
Luiz Augusto von Dentz
a6e06258f4 Bluetooth: msft: Fix memory leak
Fix leaking buffer allocated to send MSFT_OP_LE_MONITOR_ADVERTISEMENT.

Fixes: 9e14606d8f ("Bluetooth: msft: Extended monitor tracking by address filter")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:23 -05:00
Luiz Augusto von Dentz
81137162bf Bluetooth: hci_core: Fix possible buffer overflow
struct hci_dev_info has a fixed size name[8] field so in the event that
hdev->name is bigger than that strcpy would attempt to write past its
size, so this fixes this problem by switching to use strscpy.

Fixes: dcda165706 ("Bluetooth: hci_core: Fix build warnings")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:22 -05:00
Luiz Augusto von Dentz
2615fd9a7c Bluetooth: hci_sync: Fix overwriting request callback
In a few cases the stack may generate commands as responses to events
which would happen to overwrite the sent_cmd, so this attempts to store
the request in req_skb so even if sent_cmd is replaced with a new
command the pending request will remain in stored in req_skb.

Fixes: 6a98e3836f ("Bluetooth: Add helper for serialized HCI command execution")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:20 -05:00
Luiz Augusto von Dentz
22cbf4f84c Bluetooth: hci_sync: Use QoS to determine which PHY to scan
This used the hci_conn QoS to determine which PHY to scan when creating
a PA Sync.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:20 -05:00
Luiz Augusto von Dentz
bba71ef13b Bluetooth: hci_sync: Use address filtering when HCI_PA_SYNC is set
If HCI_PA_SYNC flag is set it means there is a Periodic Advertising
Synchronization pending, so this attempts to locate the address passed
to HCI_OP_LE_PA_CREATE_SYNC and program it in the accept list so only
reports with that address are processed.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:19 -05:00
Iulia Tanasescu
168d9bf9c7 Bluetooth: ISO: Reassemble PA data for bcast sink
This adds support to reassemble PA data for a Broadcast Sink
listening socket. This is needed in case the BASE is received
fragmented in multiple PA reports.

PA data is first reassembled inside the hcon, before the BASE
is extracted and stored inside the socket. The length of the
le_per_adv_data hcon array has been raised to 1650, to accommodate
the maximum PA data length that can come fragmented, according to
spec.

Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:19 -05:00
Iulia Tanasescu
02171da6e8 Bluetooth: ISO: Add hcon for listening bis sk
This creates a hcon instance at bis listen, before the PA sync
procedure is started.

Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:26:18 -05:00
Luiz Augusto von Dentz
f7cbce60a3 Bluetooth: hci_sync: Fix UAF on create_le_conn_complete
While waiting for hci_dev_lock the hci_conn object may be cleanup
causing the following trace:

BUG: KASAN: slab-use-after-free in hci_connect_le_scan_cleanup+0x29/0x350
Read of size 8 at addr ffff888001a50a30 by task kworker/u3:1/111

CPU: 0 PID: 111 Comm: kworker/u3:1 Not tainted
6.8.0-rc2-00701-g8179b15ab3fd-dirty #6418
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38
04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x21/0x70
 print_report+0xce/0x620
 ? preempt_count_sub+0x13/0xc0
 ? __virt_addr_valid+0x15f/0x310
 ? hci_connect_le_scan_cleanup+0x29/0x350
 kasan_report+0xdf/0x110
 ? hci_connect_le_scan_cleanup+0x29/0x350
 hci_connect_le_scan_cleanup+0x29/0x350
 create_le_conn_complete+0x25c/0x2c0

Fixes: 881559af5f ("Bluetooth: hci_sync: Attempt to dequeue connection attempt")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:25:21 -05:00
Luiz Augusto von Dentz
7453847fb2 Bluetooth: hci_sync: Fix UAF on hci_abort_conn_sync
Fixes the following trace where hci_acl_create_conn_sync attempts to
call hci_abort_conn_sync after timeout:

BUG: KASAN: slab-use-after-free in hci_abort_conn_sync
(net/bluetooth/hci_sync.c:5439)
Read of size 2 at addr ffff88800322c032 by task kworker/u3:2/36

Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38
04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl (./arch/x86/include/asm/irqflags.h:26
./arch/x86/include/asm/irqflags.h:67 ./arch/x86/include/asm/irqflags.h:127
lib/dump_stack.c:107)
print_report (mm/kasan/report.c:378 mm/kasan/report.c:488)
? preempt_count_sub (kernel/sched/core.c:5889)
? __virt_addr_valid (./arch/x86/include/asm/preempt.h:103 (discriminator 1)
./include/linux/rcupdate.h:865 (discriminator 1)
./include/linux/mmzone.h:2026 (discriminator 1)
arch/x86/mm/physaddr.c:65 (discriminator 1))
? hci_abort_conn_sync (net/bluetooth/hci_sync.c:5439)
kasan_report (mm/kasan/report.c:603)
? hci_abort_conn_sync (net/bluetooth/hci_sync.c:5439)
hci_abort_conn_sync (net/bluetooth/hci_sync.c:5439)
? __pfx_hci_abort_conn_sync (net/bluetooth/hci_sync.c:5433)
hci_acl_create_conn_sync (net/bluetooth/hci_sync.c:6681)

Fixes: 45340097ce ("Bluetooth: hci_conn: Only do ACL connections sequentially")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:24:28 -05:00
Ricardo B. Marliere
412b894a18 Bluetooth: constify the struct device_type usage
Since commit aed65af1cc ("drivers: make device_type const"), the driver
core can properly handle constant struct device_type. Move the bt_type and
bnep_type variables to be constant structures as well, placing it into
read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:24:07 -05:00
Luiz Augusto von Dentz
881559af5f Bluetooth: hci_sync: Attempt to dequeue connection attempt
If connection is still queued/pending in the cmd_sync queue it means no
command has been generated and it should be safe to just dequeue the
callback when it is being aborted.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:24:06 -05:00
Luiz Augusto von Dentz
505ea2b295 Bluetooth: hci_sync: Add helper functions to manipulate cmd_sync queue
This adds functions to queue, dequeue and lookup into the cmd_sync
list.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:24:05 -05:00
Luiz Augusto von Dentz
5f641f03ab Bluetooth: hci_conn: Fix UAF Write in __hci_acl_create_connection_sync
This fixes the UAF on __hci_acl_create_connection_sync caused by
connection abortion, it uses the same logic as to LE_LINK which uses
hci_cmd_sync_cancel to prevent the callback to run if the connection is
abort prematurely.

Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com
Fixes: 45340097ce ("Bluetooth: hci_conn: Only do ACL connections sequentially")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:23:52 -05:00
Luiz Augusto von Dentz
bf98feea5b Bluetooth: hci_conn: Always use sk_timeo as conn_timeout
This aligns the use socket sk_timeo as conn_timeout when initiating a
connection and then use it when scheduling the resulting HCI command,
that way the command is actually aborted synchronously thus not
blocking commands generated by hci_abort_conn_sync to inform the
controller the connection is to be aborted.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:22:41 -05:00
Lukas Bulwahn
f4b0c2b4cd Bluetooth: hci_event: Remove code to removed CONFIG_BT_HS
Commit cec9f3c5561d ("Bluetooth: Remove BT_HS") removes config BT_HS, but
misses two "ifdef BT_HS" blocks in hci_event.c.

Remove this dead code from this removed config option.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:22:41 -05:00
Jonas Dreßler
4aa42119d9 Bluetooth: Remove pending ACL connection attempts
With the last commit we moved to using the hci_sync queue for "Create
Connection" requests, removing the need for retrying the paging after
finished/failed "Create Connection" requests and after the end of
inquiries.

hci_conn_check_pending() was used to trigger this retry, we can remove it
now.

Note that we can also remove the special handling for COMMAND_DISALLOWED
errors in the completion handler of "Create Connection", because "Create
Connection" requests are now always serialized.

This is somewhat reverting commit 4c67bc74f0 ("[Bluetooth] Support
concurrent connect requests").

With this, the BT_CONNECT2 state of ACL hci_conn objects should now be
back to meaning only one thing: That we received a "Connection Request"
from another device (see hci_conn_request_evt), but the response to that
is going to be deferred.

Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:22:40 -05:00
Jonas Dreßler
45340097ce Bluetooth: hci_conn: Only do ACL connections sequentially
Pretty much all bluetooth chipsets only support paging a single device at
a time, and if they don't reject a secondary "Create Connection" request
while another is still ongoing, they'll most likely serialize those
requests in the firware.

With commit 4c67bc74f0 ("[Bluetooth] Support concurrent connect
requests") we started adding some serialization of our own in case the
adapter returns "Command Disallowed" HCI error.

This commit was using the BT_CONNECT2 state for the serialization, this
state is also used for a few more things (most notably to indicate we're
waiting for an inquiry to cancel) and therefore a bit unreliable. Also
not all BT firwares would respond with "Command Disallowed" on too many
connection requests, some will also respond with "Hardware Failure"
(BCM4378), and others will error out later and send a "Connect Complete"
event with error "Rejected Limited Resources" (Marvell 88W8897).

We can clean things up a bit and also make the serialization more reliable
by using our hci_sync machinery to always do "Create Connection" requests
in a sequential manner.

This is very similar to what we're already doing for establishing LE
connections, and it works well there.

Note that this causes a test failure in mgmt-tester (test "Pair Device
- Power off 1") because the hci_abort_conn_sync() changes the error we
return on timeout of the "Create Connection". We'll fix this on the
mgmt-tester side by adjusting the expected error for the test.

Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:22:40 -05:00
Luiz Augusto von Dentz
eeda1bf97b Bluetooth: hci_event: Fix not indicating new connection for BIG Sync
BIG Sync (aka. Broadcast sink) requires to inform that the device is
connected when a data path is active otherwise userspace could attempt
to free resources allocated to the device object while scanning.

Fixes: 1d11d70d1f ("Bluetooth: ISO: Pass BIG encryption info through QoS")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:22:39 -05:00
Luiz Augusto von Dentz
e7b02296fb Bluetooth: Remove BT_HS
High Speed, Alternate MAC and PHY (AMP) extension, has been removed from
Bluetooth Core specification on 5.3:

https://www.bluetooth.com/blog/new-core-specification-v5-3-feature-enhancements/

Fixes: 244bc37759 ("Bluetooth: Add BT_HS config option")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:22:39 -05:00
Christophe JAILLET
9c16d0c8d9 Bluetooth: Remove usage of the deprecated ida_simple_xx() API
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_max() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:22:38 -05:00
Luiz Augusto von Dentz
63298d6e75 Bluetooth: hci_core: Cancel request on command timeout
If command has timed out call __hci_cmd_sync_cancel to notify the
hci_req since it will inevitably cause a timeout.

This also rework the code around __hci_cmd_sync_cancel since it was
wrongly assuming it needs to cancel timer as well, but sometimes the
timers have not been started or in fact they already had timed out in
which case they don't need to be cancel yet again.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-06 17:22:38 -05:00