1249376 Commits

Author SHA1 Message Date
Paolo Abeni
63e4b9d693 netfilter pull request 24-02-08
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXEuicACgkQ1V2XiooU
 IOSbvA/9F2BC9TYKAh23/0EFbD4jOl4e26YE4E+Eu8AteoQ/nD+oI+mtWgw2hVXg
 zXvm1vfIc02jGuGfcPZ+EIv/dkznnDqqUpUGa4ixtgvRw2bKkb2kKMlrFsjzsihj
 yabXydwhxYE9b4Ch2AmRyApTLRMocte1IJ3ci4YUXwf68wZlOe2bIG5wyzGkFpjF
 QZN/Rr14UKjC57EYNdUG9UdybWSqSKD23LPZSaLvi6wxoZd8cIcIkng5K4N0WVKF
 lNskuNFY+j+bJz2Yn3mWIlCoM3R1N2B04t7wRkYnKWkSuwymG3O7JC3RUQaZDBZw
 8AogEbvXaIY3nxyN4lHZ/jzM/QzNB1WHlPx6RjWKHoNhnas+xuBYrjCdJZwtEu8g
 xs27Tjk3QtCIuaMuhN0RFqiq93MqZD/qx++kwMwJA0Wrg76MLPpf8yEWwVGYcAEG
 0EWa61UfPezbcVkW8XveW6lgDfcOIOpBevxDQ3Nf7JB0AcbVBks7oDpGwDc5Pdz5
 6y7WQIilxUtu9bHODUxrshxgTBwsocVkXUTIogCihUC+SgSZF+/G796c9Iy5/kPq
 BtmSNJOJyCbnivkqKTLF0Pv0BplOv7W1sx2/fo+IfRXYTHoXVjHe1BYP0Ck3WEtS
 9EPsFlI5f4AOtnPF3JrTPec9PvuHyVN+8aOPi82wlKiayJcXy1I=
 =Rh2n
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Narrow down target/match revision to u8 in nft_compat.

2) Bail out with unused flags in nft_compat.

3) Restrict layer 4 protocol to u16 in nft_compat.

4) Remove static in pipapo get command that slipped through when
   reducing set memory footprint.

5) Follow up incremental fix for the ipset performance regression,
   this includes the missing gc cancellation, from Jozsef Kadlecsik.

6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0
   as no filtering, from Felix Huettner.

7) Reject direction for NFT_CT_ID.

8) Use timestamp to check for set element expiration while transaction
   is handled to prevent garbage collection from removing set elements
   that were just added by this transaction. Packet path and netlink
   dump/get path still use current time to check for expiration.

9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal.

10) map_index needs to be percpu and per-set, not just percpu.
    At this time its possible for a pipapo set to fill the all-zero part
    with ones and take the 'might have bits set' as 'start-from-zero' area.
    From Florian Westphal. This includes three patches:

    - Change scratchpad area to a structure that provides space for a
      per-set-and-cpu toggle and uses it of the percpu one.

    - Add a new free helper to prepare for the next patch.

    - Remove the scratch_aligned pointer and makes AVX2 implementation
      use the exact same memory addresses for read/store of the matching
      state.

netfilter pull request 24-02-08

* tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_set_pipapo: remove scratch_aligned pointer
  netfilter: nft_set_pipapo: add helper to release pcpu scratch area
  netfilter: nft_set_pipapo: store index in scratch maps
  netfilter: nft_set_rbtree: skip end interval element from gc
  netfilter: nfnetlink_queue: un-break NF_REPEAT
  netfilter: nf_tables: use timestamp to check for set element timeout
  netfilter: nft_ct: reject direction for ct id
  netfilter: ctnetlink: fix filtering for zone 0
  netfilter: ipset: Missing gc cancellations fixed
  netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
  netfilter: nft_compat: restrict match/target protocol to u16
  netfilter: nft_compat: reject unused compat flag
  netfilter: nft_compat: narrow down revision to unsigned 8-bits
====================

Link: https://lore.kernel.org/r/20240208112834.1433-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08 12:56:40 +01:00
Florian Westphal
5a8cdf6fd8 netfilter: nft_set_pipapo: remove scratch_aligned pointer
use ->scratch for both avx2 and the generic implementation.

After previous change the scratch->map member is always aligned properly
for AVX2, so we can just use scratch->map in AVX2 too.

The alignoff delta is stored in the scratchpad so we can reconstruct
the correct address to free the area again.

Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:24:02 +01:00
Florian Westphal
47b1c03c3c netfilter: nft_set_pipapo: add helper to release pcpu scratch area
After next patch simple kfree() is not enough anymore, so add
a helper for it.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:10:19 +01:00
Florian Westphal
76313d1a4a netfilter: nft_set_pipapo: store index in scratch maps
Pipapo needs a scratchpad area to keep state during matching.
This state can be large and thus cannot reside on stack.

Each set preallocates percpu areas for this.

On each match stage, one scratchpad half starts with all-zero and the other
is inited to all-ones.

At the end of each stage, the half that starts with all-ones is
always zero.  Before next field is tested, pointers to the two halves
are swapped, i.e.  resmap pointer turns into fill pointer and vice versa.

After the last field has been processed, pipapo stashes the
index toggle in a percpu variable, with assumption that next packet
will start with the all-zero half and sets all bits in the other to 1.

This isn't reliable.

There can be multiple sets and we can't be sure that the upper
and lower half of all set scratch map is always in sync (lookups
can be conditional), so one set might have swapped, but other might
not have been queried.

Thus we need to keep the index per-set-and-cpu, just like the
scratchpad.

Note that this bug fix is incomplete, there is a related issue.

avx2 and normal implementation might use slightly different areas of the
map array space due to the avx2 alignment requirements, so
m->scratch (generic/fallback implementation) and ->scratch_aligned
(avx) may partially overlap. scratch and scratch_aligned are not distinct
objects, the latter is just the aligned address of the former.

After this change, write to scratch_align->map_index may write to
scratch->map, so this issue becomes more prominent, we can set to 1
a bit in the supposedly-all-zero area of scratch->map[].

A followup patch will remove the scratch_aligned and makes generic and
avx code use the same (aligned) area.

Its done in a separate change to ease review.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:10:19 +01:00
Pablo Neira Ayuso
60c0c230c6 netfilter: nft_set_rbtree: skip end interval element from gc
rbtree lazy gc on insert might collect an end interval element that has
been just added in this transactions, skip end interval elements that
are not yet active.

Fixes: f718863aca46 ("netfilter: nft_set_rbtree: fix overlap expiration walk")
Cc: stable@vger.kernel.org
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:10:19 +01:00
Florian Westphal
f82777e8ce netfilter: nfnetlink_queue: un-break NF_REPEAT
Only override userspace verdict if the ct hook returns something
other than ACCEPT.

Else, this replaces NF_REPEAT (run all hooks again) with NF_ACCEPT
(move to next hook).

Fixes: 6291b3a67ad5 ("netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts")
Reported-by: l.6diay@passmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:10:19 +01:00
Pablo Neira Ayuso
7395dfacff netfilter: nf_tables: use timestamp to check for set element timeout
Add a timestamp field at the beginning of the transaction, store it
in the nftables per-netns area.

Update set backend .insert, .deactivate and sync gc path to use the
timestamp, this avoids that an element expires while control plane
transaction is still unfinished.

.lookup and .update, which are used from packet path, still use the
current time to check if the element has expired. And .get path and dump
also since this runs lockless under rcu read size lock. Then, there is
async gc which also needs to check the current time since it runs
asynchronously from a workqueue.

Fixes: c3e1b005ed1c ("netfilter: nf_tables: add set element timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:10:19 +01:00
Pablo Neira Ayuso
38ed1c7062 netfilter: nft_ct: reject direction for ct id
Direction attribute is ignored, reject it in case this ever needs to be
supported

Fixes: 3087c3f7c23b ("netfilter: nft_ct: Add ct id support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:10:19 +01:00
Felix Huettner
fa173a1b4e netfilter: ctnetlink: fix filtering for zone 0
previously filtering for the default zone would actually skip the zone
filter and flush all zones.

Fixes: eff3c558bb7e ("netfilter: ctnetlink: support filtering by zone")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Closes: https://lore.kernel.org/netdev/2032238f-31ac-4106-8f22-522e76df5a12@ovn.org/
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:10:18 +01:00
Alexandra Winter
2fe8a23643 s390/qeth: Fix potential loss of L3-IP@ in case of network issues
Symptom:
In case of a bad cable connection (e.g. dirty optics) a fast sequence of
network DOWN-UP-DOWN-UP could happen. UP triggers recovery of the qeth
interface. In case of a second DOWN while recovery is still ongoing, it
can happen that the IP@ of a Layer3 qeth interface is lost and will not
be recovered by the second UP.

Problem:
When registration of IP addresses with Layer 3 qeth devices fails, (e.g.
because of bad address format) the respective IP address is deleted from
its hash-table in the driver. If registration fails because of a ENETDOWN
condition, the address should stay in the hashtable, so a subsequent
recovery can restore it.

3caa4af834df ("qeth: keep ip-address after LAN_OFFLINE failure")
fixes this for registration failures during normal operation, but not
during recovery.

Solution:
Keep L3-IP address in case of ENETDOWN in qeth_l3_recover_ip(). For
consistency with qeth_l3_add_ip() we also keep it in case of EADDRINUSE,
i.e. for some reason the card already/still has this address registered.

Fixes: 4a71df50047f ("qeth: new qeth device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240206085849.2902775-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08 12:10:09 +01:00
Jozsef Kadlecsik
27c5a095e2 netfilter: ipset: Missing gc cancellations fixed
The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
in swap operation") missed to add the calls to gc cancellations
at the error path of create operations and at module unload. Also,
because the half of the destroy operations now executed by a
function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
or rcu read lock is held and therefore the checking of them results
false warnings.

Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation")
Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Стас Ничипорович <stasn77@gmail.com>
Tested-by: Brad Spengler <spender@grsecurity.net>
Tested-by: Стас Ничипорович <stasn77@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 12:09:23 +01:00
Ratheesh Kannoth
db010ff670 octeontx2-af: Initialize maps.
kmalloc_array() without __GFP_ZERO flag does not initialize
memory to zero. This causes issues. Use kcalloc() for maps and
bitmap_zalloc() for bitmaps.

Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Brett Creeley <bcreeley@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240206024000.1070260-1-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08 12:03:02 +01:00
Paolo Abeni
03fa49a386 Merge branch 'cpsw-enable-mac_managed_pm-to-fix-mdio'
Sinthu Raja says:

====================
CPSW: enable mac_managed_pm to fix mdio

This patch fix the resume/suspend issue on CPSW interface.

Reference from the foloowing patchwork:
https://lore.kernel.org/netdev/20221014144729.1159257-2-shenwei.wang@nxp.com/T/

V1: https://patchwork.kernel.org/project/netdevbpf/patch/20240122083414.6246-1-sinthu.raja@ti.com/
V2: https://patchwork.kernel.org/project/netdevbpf/patch/20240122093326.7618-1-sinthu.raja@ti.com/
====================

Link: https://lore.kernel.org/r/20240206005928.15703-1-sinthu.raja@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08 11:33:22 +01:00
Sinthu Raja
bc4ce46b1e net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio
The below commit  introduced a WARN when phy state is not in the states:
PHY_HALTED, PHY_READY and PHY_UP.
commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")

When cpsw resumes, there have port in PHY_NOLINK state, so the below
warning comes out. Set mac_managed_pm be true to tell mdio that the phy
resume/suspend is managed by the mac, to fix the following warning:

WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
CPU: 0 PID: 965 Comm: sh Tainted: G           O       6.1.46-g247b2535b2 #1
Hardware name: Generic AM33XX (Flattened Device Tree)
 unwind_backtrace from show_stack+0x18/0x1c
 show_stack from dump_stack_lvl+0x24/0x2c
 dump_stack_lvl from __warn+0x84/0x15c
 __warn from warn_slowpath_fmt+0x1a8/0x1c8
 warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
 mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
 dpm_run_callback from device_resume+0xb8/0x2b8
 device_resume from dpm_resume+0x144/0x314
 dpm_resume from dpm_resume_end+0x14/0x20
 dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
 suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
 pm_suspend from state_store+0x74/0xd0
 state_store from kernfs_fop_write_iter+0x104/0x1ec
 kernfs_fop_write_iter from vfs_write+0x1b8/0x358
 vfs_write from ksys_write+0x78/0xf8
 ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xe094dfa8 to 0xe094dff0)
dfa0:                   00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66

Cc: <stable@vger.kernel.org> # v6.0+
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Sinthu Raja <sinthu.raja@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08 11:33:14 +01:00
Sinthu Raja
9def04e759 net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio
The below commit  introduced a WARN when phy state is not in the states:
PHY_HALTED, PHY_READY and PHY_UP.
commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")

When cpsw_new resumes, there have port in PHY_NOLINK state, so the below
warning comes out. Set mac_managed_pm be true to tell mdio that the phy
resume/suspend is managed by the mac, to fix the following warning:

WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
CPU: 0 PID: 965 Comm: sh Tainted: G           O       6.1.46-g247b2535b2 #1
Hardware name: Generic AM33XX (Flattened Device Tree)
 unwind_backtrace from show_stack+0x18/0x1c
 show_stack from dump_stack_lvl+0x24/0x2c
 dump_stack_lvl from __warn+0x84/0x15c
 __warn from warn_slowpath_fmt+0x1a8/0x1c8
 warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
 mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
 dpm_run_callback from device_resume+0xb8/0x2b8
 device_resume from dpm_resume+0x144/0x314
 dpm_resume from dpm_resume_end+0x14/0x20
 dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
 suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
 pm_suspend from state_store+0x74/0xd0
 state_store from kernfs_fop_write_iter+0x104/0x1ec
 kernfs_fop_write_iter from vfs_write+0x1b8/0x358
 vfs_write from ksys_write+0x78/0xf8
 ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xe094dfa8 to 0xe094dff0)
dfa0:                   00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66

Cc: <stable@vger.kernel.org> # v6.0+
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Sinthu Raja <sinthu.raja@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08 11:32:53 +01:00
Pablo Neira Ayuso
ab0beafd52 netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
This has slipped through when reducing memory footprint for set
elements, remove it.

Fixes: 9dad402b89e8 ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08 10:27:41 +01:00
Pablo Neira Ayuso
d694b75489 netfilter: nft_compat: restrict match/target protocol to u16
xt_check_{match,target} expects u16, but NFTA_RULE_COMPAT_PROTO is u32.

NLA_POLICY_MAX(NLA_BE32, 65535) cannot be used because .max in
nla_policy is s16, see 3e48be05f3c7 ("netlink: add attribute range
validation to policy").

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-07 22:02:52 +01:00
Pablo Neira Ayuso
292781c3c5 netfilter: nft_compat: reject unused compat flag
Flag (1 << 0) is ignored is set, never used, reject it it with EINVAL
instead.

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-07 22:02:51 +01:00
Pablo Neira Ayuso
36fa8d6971 netfilter: nft_compat: narrow down revision to unsigned 8-bits
xt_find_revision() expects u8, restrict it to this datatype.

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-07 22:02:51 +01:00
Jakub Kicinski
335bac1daa wireless fixes for v6.8-rc4
This time we have unusually large wireless pull request. Several
 functionality fixes to both stack and iwlwifi. Lots of fixes to
 warnings, especially to MODULE_DESCRIPTION().
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmXCAewRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZtbOAf+PFH5X248YVlVgeamCDVYNsFO0bgsyBN1
 xHDWxxxLFv5uKuRvjx5oa4C4tzRs7kH8h3u5SUqTkIvMcHbzpVe6oBNMDknOtS96
 lQ+lmfuIlKMvfmdG3QK3GyAcn57detHZ0InLsM5OmOdcUexmyJa7qc8qbqKZvX3d
 jux3ISQbyHyisnWHGHwWgKIY0UtZiesAuf/Ug09Ah0WW6KMLgNSoDc9/mQlKoGu7
 gZpmB1xsJaPBPB0Tb1kD5XrGouGnDqN9obS4HH9fCDZwPDBTLDjuYtrtQU9cqtk5
 4EUhuoeefTuep8ZJQgxn05j4D5NWQj9VCEvhaRwoF+Iz+t2/HLHuKQ==
 =4UCw
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2024-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.8-rc4

This time we have unusually large wireless pull request. Several
functionality fixes to both stack and iwlwifi. Lots of fixes to
warnings, especially to MODULE_DESCRIPTION().

* tag 'wireless-2024-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (31 commits)
  wifi: mt76: mt7996: fix fortify warning
  wifi: brcmfmac: Adjust n_channels usage for __counted_by
  wifi: iwlwifi: do not announce EPCS support
  wifi: iwlwifi: exit eSR only after the FW does
  wifi: iwlwifi: mvm: fix a battery life regression
  wifi: mac80211: accept broadcast probe responses on 6 GHz
  wifi: mac80211: adding missing drv_mgd_complete_tx() call
  wifi: mac80211: fix waiting for beacons logic
  wifi: mac80211: fix unsolicited broadcast probe config
  wifi: mac80211: initialize SMPS mode correctly
  wifi: mac80211: fix driver debugfs for vif type change
  wifi: mac80211: set station RX-NSS on reconfig
  wifi: mac80211: fix RCU use in TDLS fast-xmit
  wifi: mac80211: improve CSA/ECSA connection refusal
  wifi: cfg80211: detect stuck ECSA element in probe resp
  wifi: iwlwifi: remove extra kernel-doc
  wifi: fill in MODULE_DESCRIPTION()s for mt76 drivers
  wifi: fill in MODULE_DESCRIPTION()s for wilc1000
  wifi: fill in MODULE_DESCRIPTION()s for wl18xx
  wifi: fill in MODULE_DESCRIPTION()s for p54spi
  ...
====================

Link: https://lore.kernel.org/r/20240206095722.CD9D2C433F1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07 10:34:51 -08:00
Jesse Brandeburg
75428f537d net: intel: fix old compiler regressions
The kernel build regressions/improvements email contained a couple of
issues with old compilers (in fact all the reports were on different
architectures, but all gcc 5.5) and the FIELD_PREP() and FIELD_GET()
conversions. They're all because an integer #define that should have
been declared as unsigned, was shifted to the point that it could set
the sign bit.

The fix just involves making sure the defines use the "U" identifier on
the constants to make sure they're unsigned. Should make the checkers
happier.

Confirmed with objdump before/after that there is no change to the
binaries.

Issues were reported as follows:
./drivers/net/ethernet/intel/ice/ice_base.c:238:7: note: in expansion of macro 'FIELD_GET'
      (FIELD_GET(GLINT_CTL_ITR_GRAN_25_M, regval) == ICE_ITR_GRAN_US))
       ^
./include/linux/compiler_types.h:435:38: error: call to '__compiletime_assert_1093' declared with attribute error: FIELD_GET: mask is not constant
drivers/net/ethernet/intel/ice/ice_nvm.c:709:16: note: in expansion of macro ‘FIELD_GET’
  orom->major = FIELD_GET(ICE_OROM_VER_MASK, combo_ver);
                ^
./include/linux/compiler_types.h:435:38: error: call to ‘__compiletime_assert_796’ declared with attribute error: FIELD_GET: mask is not constant
drivers/net/ethernet/intel/ice/ice_common.c:945:18: note: in expansion of macro ‘FIELD_GET’
  u8 max_agg_bw = FIELD_GET(GL_PWR_MODE_CTL_CAR_MAX_BW_M,
                  ^
./include/linux/compiler_types.h:435:38: error: call to ‘__compiletime_assert_420’ declared with attribute error: FIELD_GET: mask is not constant
drivers/net/ethernet/intel/i40e/i40e_dcb.c:458:8: note: in expansion of macro ‘FIELD_GET’
  oui = FIELD_GET(I40E_LLDP_TLV_OUI_MASK, ouisubtype);
        ^

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/lkml/d03e90ca-8485-4d1b-5ec1-c3398e0e8da@linux-m68k.org/ #i40e #ice
Fixes: 62589808d73b ("i40e: field get conversion")
Fixes: 5a259f8e0baf ("ice: field get conversion")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20240206022906.2194214-1-jesse.brandeburg@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07 09:15:27 -08:00
Allison Henderson
5001bfe927 MAINTAINERS: Maintainer change for rds
At this point, Santosh has moved onto other things and I am happy
to take over the role of rds maintainer. Update the MAINTAINERS
accordingly.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://lore.kernel.org/r/20240205190343.112436-1-allison.henderson@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07 09:05:46 -08:00
Jakub Kicinski
4b00d0c513 selftests: cmsg_ipv6: repeat the exact packet
cmsg_ipv6 test requests tcpdump to capture 4 packets,
and sends until tcpdump quits. Only the first packet
is "real", however, and the rest are basic UDP packets.
So if tcpdump doesn't start in time it will miss
the real packet and only capture the UDP ones.

This makes the test fail on slow machine (no KVM or with
debug enabled) 100% of the time, while it passes in fast
environments.

Repeat the "real" / expected packet.

Fixes: 9657ad09e1fa ("selftests: net: test IPV6_TCLASS")
Fixes: 05ae83d5a4a2 ("selftests: net: test IPV6_HOPLIMIT")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-07 13:38:14 +00:00
Petr Tesarik
38cc3c6dcc net: stmmac: protect updates of 64-bit statistics counters
As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
u64_stats_sync must ensure mutual exclusion, or one seqcount update could
be lost on 32-bit platforms, thus blocking readers forever. Such lockups
have been observed in real world after stmmac_xmit() on one CPU raced with
stmmac_napi_poll_tx() on another CPU.

To fix the issue without introducing a new lock, split the statics into
three parts:

1. fields updated only under the tx queue lock,
2. fields updated only during NAPI poll,
3. fields updated only from interrupt context,

Updates to fields in the first two groups are already serialized through
other locks. It is sufficient to split the existing struct u64_stats_sync
so that each group has its own.

Note that tx_set_ic_bit is updated from both contexts. Split this counter
so that each context gets its own, and calculate their sum to get the total
value in stmmac_get_ethtool_stats().

For the third group, multiple interrupts may be processed by different CPUs
at the same time, but interrupts on the same CPU will not nest. Move fields
from this group to a newly created per-cpu struct stmmac_pcpu_stats.

Fixes: 133466c3bbe1 ("net: stmmac: use per-queue 64 bit statistics where necessary")
Link: https://lore.kernel.org/netdev/Za173PhviYg-1qIn@torres.zugschlus.de/t/
Cc: stable@vger.kernel.org
Signed-off-by: Petr Tesarik <petr@tesarici.cz>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-07 09:00:34 +00:00
Eric Dumazet
cb88cb53ba ppp_async: limit MRU to 64K
syzbot triggered a warning [1] in __alloc_pages():

WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)

Willem fixed a similar issue in commit c0a2a1b0d631 ("ppp: limit MRU to 64K")

Adopt the same sanity check for ppp_async_ioctl(PPPIOCSMRU)

[1]:

 WARNING: CPU: 1 PID: 11 at mm/page_alloc.c:4543 __alloc_pages+0x308/0x698 mm/page_alloc.c:4543
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u4:0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98fb79 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Workqueue: events_unbound flush_to_ldisc
pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __alloc_pages+0x308/0x698 mm/page_alloc.c:4543
 lr : __alloc_pages+0xc8/0x698 mm/page_alloc.c:4537
sp : ffff800093967580
x29: ffff800093967660 x28: ffff8000939675a0 x27: dfff800000000000
x26: ffff70001272ceb4 x25: 0000000000000000 x24: ffff8000939675c0
x23: 0000000000000000 x22: 0000000000060820 x21: 1ffff0001272ceb8
x20: ffff8000939675e0 x19: 0000000000000010 x18: ffff800093967120
x17: ffff800083bded5c x16: ffff80008ac97500 x15: 0000000000000005
x14: 1ffff0001272cebc x13: 0000000000000000 x12: 0000000000000000
x11: ffff70001272cec1 x10: 1ffff0001272cec0 x9 : 0000000000000001
x8 : ffff800091c91000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 00000000ffffffff x4 : 0000000000000000 x3 : 0000000000000020
x2 : 0000000000000008 x1 : 0000000000000000 x0 : ffff8000939675e0
Call trace:
  __alloc_pages+0x308/0x698 mm/page_alloc.c:4543
  __alloc_pages_node include/linux/gfp.h:238 [inline]
  alloc_pages_node include/linux/gfp.h:261 [inline]
  __kmalloc_large_node+0xbc/0x1fc mm/slub.c:3926
  __do_kmalloc_node mm/slub.c:3969 [inline]
  __kmalloc_node_track_caller+0x418/0x620 mm/slub.c:4001
  kmalloc_reserve+0x17c/0x23c net/core/skbuff.c:590
  __alloc_skb+0x1c8/0x3d8 net/core/skbuff.c:651
  __netdev_alloc_skb+0xb8/0x3e8 net/core/skbuff.c:715
  netdev_alloc_skb include/linux/skbuff.h:3235 [inline]
  dev_alloc_skb include/linux/skbuff.h:3248 [inline]
  ppp_async_input drivers/net/ppp/ppp_async.c:863 [inline]
  ppp_asynctty_receive+0x588/0x186c drivers/net/ppp/ppp_async.c:341
  tty_ldisc_receive_buf+0x12c/0x15c drivers/tty/tty_buffer.c:390
  tty_port_default_receive_buf+0x74/0xac drivers/tty/tty_port.c:37
  receive_buf drivers/tty/tty_buffer.c:444 [inline]
  flush_to_ldisc+0x284/0x6e4 drivers/tty/tty_buffer.c:494
  process_one_work+0x694/0x1204 kernel/workqueue.c:2633
  process_scheduled_works kernel/workqueue.c:2706 [inline]
  worker_thread+0x938/0xef4 kernel/workqueue.c:2787
  kthread+0x288/0x310 kernel/kthread.c:388
  ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+c5da1f087c9e4ec6c933@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240205171004.1059724-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-06 18:45:52 -08:00
Jiri Pirko
58086721b7 devlink: avoid potential loop in devlink_rel_nested_in_notify_work()
In case devlink_rel_nested_in_notify_work() can not take the devlink
lock mutex. Convert the work to delayed work and in case of reschedule
do it jiffie later and avoid potential looping.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240205171114.338679-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-06 18:45:16 -08:00
Kuniyuki Iwashima
1279f9d9de af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.
syzbot reported a warning [0] in __unix_gc() with a repro, which
creates a socketpair and sends one socket's fd to itself using the
peer.

  socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 4]) = 0
  sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\360", iov_len=1}],
          msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
                                      cmsg_type=SCM_RIGHTS, cmsg_data=[3]}],
          msg_controllen=24, msg_flags=0}, MSG_OOB|MSG_PROBE|MSG_DONTWAIT|MSG_ZEROCOPY) = 1

This forms a self-cyclic reference that GC should finally untangle
but does not due to lack of MSG_OOB handling, resulting in memory
leak.

Recently, commit 11498715f266 ("af_unix: Remove io_uring code for
GC.") removed io_uring's dead code in GC and revealed the problem.

The code was executed at the final stage of GC and unconditionally
moved all GC candidates from gc_candidates to gc_inflight_list.
That papered over the reported problem by always making the following
WARN_ON_ONCE(!list_empty(&gc_candidates)) false.

The problem has been there since commit 2aab4b969002 ("af_unix: fix
struct pid leaks in OOB support") added full scm support for MSG_OOB
while fixing another bug.

To fix this problem, we must call kfree_skb() for unix_sk(sk)->oob_skb
if the socket still exists in gc_candidates after purging collected skb.

Then, we need to set NULL to oob_skb before calling kfree_skb() because
it calls last fput() and triggers unix_release_sock(), where we call
duplicate kfree_skb(u->oob_skb) if not NULL.

Note that the leaked socket remained being linked to a global list, so
kmemleak also could not detect it.  We need to check /proc/net/protocol
to notice the unfreed socket.

[0]:
WARNING: CPU: 0 PID: 2863 at net/unix/garbage.c:345 __unix_gc+0xc74/0xe80 net/unix/garbage.c:345
Modules linked in:
CPU: 0 PID: 2863 Comm: kworker/u4:11 Not tainted 6.8.0-rc1-syzkaller-00583-g1701940b1a02 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Workqueue: events_unbound __unix_gc
RIP: 0010:__unix_gc+0xc74/0xe80 net/unix/garbage.c:345
Code: 8b 5c 24 50 e9 86 f8 ff ff e8 f8 e4 22 f8 31 d2 48 c7 c6 30 6a 69 89 4c 89 ef e8 97 ef ff ff e9 80 f9 ff ff e8 dd e4 22 f8 90 <0f> 0b 90 e9 7b fd ff ff 48 89 df e8 5c e7 7c f8 e9 d3 f8 ff ff e8
RSP: 0018:ffffc9000b03fba0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffc9000b03fc10 RCX: ffffffff816c493e
RDX: ffff88802c02d940 RSI: ffffffff896982f3 RDI: ffffc9000b03fb30
RBP: ffffc9000b03fce0 R08: 0000000000000001 R09: fffff52001607f66
R10: 0000000000000003 R11: 0000000000000002 R12: dffffc0000000000
R13: ffffc9000b03fc10 R14: ffffc9000b03fc10 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005559c8677a60 CR3: 000000000d57a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 process_one_work+0x889/0x15e0 kernel/workqueue.c:2633
 process_scheduled_works kernel/workqueue.c:2706 [inline]
 worker_thread+0x8b9/0x12a0 kernel/workqueue.c:2787
 kthread+0x2c6/0x3b0 kernel/kthread.c:388
 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
 </TASK>

Reported-by: syzbot+fa3ef895554bdbfd1183@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa3ef895554bdbfd1183
Fixes: 2aab4b969002 ("af_unix: fix struct pid leaks in OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240203183149.63573-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-06 18:33:07 -08:00
Furong Xu
1ce2654d87 net: stmmac: xgmac: fix a typo of register name in DPP safety handling
DDPP is copied from Synopsys Data book:

DDPP: Disable Data path Parity Protection.
    When it is 0x0, Data path Parity Protection is enabled.
    When it is 0x1, Data path Parity Protection is disabled.

The macro name should be XGMAC_DPP_DISABLE.

Fixes: 46eba193d04f ("net: stmmac: xgmac: fix handling of DPP safety error for DMA channels")
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20240203053133.1129236-1-0x1207@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06 13:57:28 +01:00
Dmitry Safonov
b083d24fcf selftests/net: Amend per-netns counter checks
Selftests here check not only that connect()/accept() for
TCP-AO/TCP-MD5/non-signed-TCP combinations do/don't establish
connections, but also counters: those are per-AO-key, per-socket and
per-netns.

The counters are checked on the server's side, as the server listener
has TCP-AO/TCP-MD5/no keys for different peers. All tests run in
the same namespaces with the same veth pair, created in test_init().

After close() in both client and server, the sides go through
the regular FIN/ACK + FIN/ACK sequence, which goes in the background.
If the selftest has already started a new testing scenario, read
per-netns counters - it may fail in the end iff it doesn't expect
the TCPAOGood per-netns counters go up during the test.

Let's just kill both TCP-AO sides - that will avoid any asynchronous
background TCP-AO segments going to either sides.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/all/20240201132153.4d68f45e@kernel.org/T/#u
Fixes: 6f0c472a6815 ("selftests/net: Add TCP-AO + TCP-MD5 + no sign listen socket tests")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06 10:35:29 +01:00
Shigeru Yoshida
3871aa01e1 tipc: Check the bearer type before calling tipc_udp_nl_bearer_add()
syzbot reported the following general protection fault [1]:

general protection fault, probably for non-canonical address 0xdffffc0000000010: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000080-0x0000000000000087]
...
RIP: 0010:tipc_udp_is_known_peer+0x9c/0x250 net/tipc/udp_media.c:291
...
Call Trace:
 <TASK>
 tipc_udp_nl_bearer_add+0x212/0x2f0 net/tipc/udp_media.c:646
 tipc_nl_bearer_add+0x21e/0x360 net/tipc/bearer.c:1089
 genl_family_rcv_msg_doit+0x1fc/0x2e0 net/netlink/genetlink.c:972
 genl_family_rcv_msg net/netlink/genetlink.c:1052 [inline]
 genl_rcv_msg+0x561/0x800 net/netlink/genetlink.c:1067
 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2544
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
 netlink_unicast+0x53b/0x810 net/netlink/af_netlink.c:1367
 netlink_sendmsg+0x8b7/0xd70 net/netlink/af_netlink.c:1909
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0xd5/0x180 net/socket.c:745
 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2584
 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2638
 __sys_sendmsg+0x117/0x1e0 net/socket.c:2667
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

The cause of this issue is that when tipc_nl_bearer_add() is called with
the TIPC_NLA_BEARER_UDP_OPTS attribute, tipc_udp_nl_bearer_add() is called
even if the bearer is not UDP.

tipc_udp_is_known_peer() called by tipc_udp_nl_bearer_add() assumes that
the media_ptr field of the tipc_bearer has an udp_bearer type object, so
the function goes crazy for non-UDP bearers.

This patch fixes the issue by checking the bearer type before calling
tipc_udp_nl_bearer_add() in tipc_nl_bearer_add().

Fixes: ef20cd4dd163 ("tipc: introduce UDP replicast")
Reported-and-tested-by: syzbot+5142b87a9abc510e14fa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5142b87a9abc510e14fa [1]
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20240131152310.4089541-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06 08:49:26 +01:00
Felix Fietkau
0647903efb wifi: mt76: mt7996: fix fortify warning
Copy cck and ofdm separately in order to avoid __read_overflow2_field
warning.

Fixes: f75e4779d215 ("wifi: mt76: mt7996: add txpower setting support")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240203132446.54790-1-nbd@nbd.name
2024-02-05 20:00:45 +02:00
Paolo Abeni
a19747c3b9 selftests: net: let big_tcp test cope with slow env
In very slow environments, most big TCP cases including
segmentation and reassembly of big TCP packets have a good
chance to fail: by default the TCP client uses write size
well below 64K. If the host is low enough autocorking is
unable to build real big TCP packets.

Address the issue using much larger write operations.

Note that is hard to observe the issue without an extremely
slow and/or overloaded environment; reduce the TCP transfer
time to allow for much easier/faster reproducibility.

Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 12:36:16 +00:00
David S. Miller
645eb54331 Merge branch 'rxrpc-fixes'
David Howells says:

====================
rxrpc: Miscellaneous fixes

Here some miscellaneous fixes for AF_RXRPC:

 (1) The zero serial number has a special meaning in an ACK packet serial
     reference, so skip it when assigning serial numbers to transmitted
     packets.

 (2) Don't set the reference serial number in a delayed ACK as the ACK
     cannot be used for RTT calculation.

 (3) Don't emit a DUP ACK response to a PING RESPONSE ACK coming back to a
     call that completed in the meantime.

 (4) Fix the counting of acks and nacks in ACK packet to better drive
     congestion management.  We want to know if there have been new
     acks/nacks since the last ACK packet, not that there are still
     acks/nacks.  This is more complicated as we have to save the old SACK
     table and compare it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 12:34:07 +00:00
David Howells
41b7fa157e rxrpc: Fix counting of new acks and nacks
Fix the counting of new acks and nacks when parsing a packet - something
that is used in congestion control.

As the code stands, it merely notes if there are any nacks whereas what we
really should do is compare the previous SACK table to the new one,
assuming we get two successive ACK packets with nacks in them.  However, we
really don't want to do that if we can avoid it as the tables might not
correspond directly as one may be shifted from the other - something that
will only get harder to deal with once extended ACK tables come into full
use (with a capacity of up to 8192).

Instead, count the number of nacks shifted out of the old SACK, the number
of nacks retained in the portion still active and the number of new acks
and nacks in the new table then calculate what we need.

Note this ends up a bit of an estimate as the Rx protocol allows acks to be
withdrawn by the receiver and packets requested to be retransmitted.

Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 12:34:07 +00:00
David Howells
6f769f2282 rxrpc: Fix response to PING RESPONSE ACKs to a dead call
Stop rxrpc from sending a DUP ACK in response to a PING RESPONSE ACK on a
dead call.  We may have initiated the ping but the call may have beaten the
response to completion.

Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 12:34:06 +00:00
David Howells
e7870cf13d rxrpc: Fix delayed ACKs to not set the reference serial number
Fix the construction of delayed ACKs to not set the reference serial number
as they can't be used as an RTT reference.

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 12:34:06 +00:00
David Howells
f31041417b rxrpc: Fix generation of serial numbers to skip zero
In the Rx protocol, every packet generated is marked with a per-connection
monotonically increasing serial number.  This number can be referenced in
an ACK packet generated in response to an incoming packet - thereby
allowing the sender to use this for RTT determination, amongst other
things.

However, if the reference field in the ACK is zero, it doesn't refer to any
incoming packet (it could be a ping to find out if a packet got lost, for
example) - so we shouldn't generate zero serial numbers.

Fix the generation of serial numbers to retry if it comes up with a zero.

Furthermore, since the serial numbers are only ever allocated within the
I/O thread this connection is bound to, there's no need for atomics so
remove that too.

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 12:34:06 +00:00
David S. Miller
fdeba0b57d Merge branch 'nfp-fixes'
Louis Peens says:

====================
nfp: a few simple driver fixes

This is combining a few unrelated one-liner fixes which have been
floating around internally into a single series. I'm not sure what is
the least amount of overhead for reviewers, this or a separate
submission per-patch? I guess it probably depends on personal
preference, but please let me know if there is a strong preference to
rather split these in the future.

Summary:

Patch1: Fixes an old issue which was hidden because 0 just so happens to
        be the correct value.
Patch2: Fixes a corner case for flower offloading with bond ports
Patch3: Re-enables the 'NETDEV_XDP_ACT_REDIRECT', which was accidentally
        disabled after a previous refactor.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 11:11:09 +00:00
James Hershaw
0f4d6f011b nfp: enable NETDEV_XDP_ACT_REDIRECT feature flag
Enable previously excluded xdp feature flag for NFD3 devices. This
feature flag is required in order to bind nfp interfaces to an xdp
socket and the nfp driver does in fact support the feature.

Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features")
Cc: stable@vger.kernel.org # 6.3+
Signed-off-by: James Hershaw <james.hershaw@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 11:11:09 +00:00
Daniel de Villiers
1a1c13303f nfp: flower: prevent re-adding mac index for bonded port
When physical ports are reset (either through link failure or manually
toggled down and up again) that are slaved to a Linux bond with a tunnel
endpoint IP address on the bond device, not all tunnel packets arriving
on the bond port are decapped as expected.

The bond dev assigns the same MAC address to itself and each of its
slaves. When toggling a slave device, the same MAC address is therefore
offloaded to the NFP multiple times with different indexes.

The issue only occurs when re-adding the shared mac. The
nfp_tunnel_add_shared_mac() function has a conditional check early on
that checks if a mac entry already exists and if that mac entry is
global: (entry && nfp_tunnel_is_mac_idx_global(entry->index)). In the
case of a bonded device (For example br-ex), the mac index is obtained,
and no new index is assigned.

We therefore modify the conditional in nfp_tunnel_add_shared_mac() to
check if the port belongs to the LAG along with the existing checks to
prevent a new global mac index from being re-assigned to the slave port.

Fixes: 20cce8865098 ("nfp: flower: enable MAC address sharing for offloadable devs")
CC: stable@vger.kernel.org # 5.1+
Signed-off-by: Daniel de Villiers <daniel.devilliers@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 11:11:09 +00:00
Daniel Basilio
b3d4f7f228 nfp: use correct macro for LengthSelect in BAR config
The 1st and 2nd expansion BAR configuration registers are configured,
when the driver starts up, in variables 'barcfg_msix_general' and
'barcfg_msix_xpb', respectively. The 'LengthSelect' field is ORed in
from bit 0, which is incorrect. The 'LengthSelect' field should
start from bit 27.

This has largely gone un-noticed because
NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT happens to be 0.

Fixes: 4cb584e0ee7d ("nfp: add CPP access core")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Daniel Basilio <daniel.basilio@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05 11:11:09 +00:00
Eric Dumazet
eef00a82c5 inet: read sk->sk_family once in inet_recv_error()
inet_recv_error() is called without holding the socket lock.

IPv6 socket could mutate to IPv4 with IPV6_ADDRFORM
socket option and trigger a KCSAN warning.

Fixes: f4713a3dfad0 ("net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-04 16:06:53 +00:00
Shradha Gupta
9cae43da98 hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
If hv_netvsc driver is unloaded and reloaded, the NET_DEVICE_REGISTER
handler cannot perform VF register successfully as the register call
is received before netvsc_probe is finished. This is because we
register register_netdevice_notifier() very early( even before
vmbus_driver_register()).
To fix this, we try to register each such matching VF( if it is visible
as a netdevice) at the end of netvsc_probe.

Cc: stable@vger.kernel.org
Fixes: 85520856466e ("hv_netvsc: Fix race of register_netdevice_notifier and VF register")
Suggested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-04 13:38:08 +00:00
Zhipeng Lu
b09b58e31b octeontx2-pf: Fix a memleak otx2_sq_init
When qmem_alloc and pfvf->hw_ops->sq_aq_init fails, sq->sg should be
freed to prevent memleak.

Fixes: c9c12d339d93 ("octeontx2-pf: Add support for PTP clock")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-03 12:48:20 +00:00
Zhipeng Lu
f3616173bf atm: idt77252: fix a memleak in open_card_ubr0
When alloc_scq fails, card->vcs[0] (i.e. vc) should be freed. Otherwise,
in the following call chain:

idt77252_init_one
  |-> idt77252_dev_open
        |-> open_card_ubr0
              |-> alloc_scq [failed]
  |-> deinit_card
        |-> vfree(card->vcs);

card->vcs is freed and card->vcs[0] is leaked.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-03 12:46:13 +00:00
Antoine Tenart
d75abeec40 tunnels: fix out of bounds access when building IPv6 PMTU error
If the ICMPv6 error is built from a non-linear skb we get the following
splat,

  BUG: KASAN: slab-out-of-bounds in do_csum+0x220/0x240
  Read of size 4 at addr ffff88811d402c80 by task netperf/820
  CPU: 0 PID: 820 Comm: netperf Not tainted 6.8.0-rc1+ #543
  ...
   kasan_report+0xd8/0x110
   do_csum+0x220/0x240
   csum_partial+0xc/0x20
   skb_tunnel_check_pmtu+0xeb9/0x3280
   vxlan_xmit_one+0x14c2/0x4080
   vxlan_xmit+0xf61/0x5c00
   dev_hard_start_xmit+0xfb/0x510
   __dev_queue_xmit+0x7cd/0x32a0
   br_dev_queue_push_xmit+0x39d/0x6a0

Use skb_checksum instead of csum_partial who cannot deal with non-linear
SKBs.

Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-03 12:43:19 +00:00
Gerhard Engleder
d7f5fb33cf tsnep: Fix mapping for zero copy XDP_TX action
For XDP_TX action xdp_buff is converted to xdp_frame. The conversion is
done by xdp_convert_buff_to_frame(). The memory type of the resulting
xdp_frame depends on the memory type of the xdp_buff. For page pool
based xdp_buff it produces xdp_frame with memory type
MEM_TYPE_PAGE_POOL. For zero copy XSK pool based xdp_buff it produces
xdp_frame with memory type MEM_TYPE_PAGE_ORDER0.

tsnep_xdp_xmit_back() is not prepared for that and uses always the page
pool buffer type TSNEP_TX_TYPE_XDP_TX. This leads to invalid mappings
and the transmission of undefined data.

Improve tsnep_xdp_xmit_back() to use the generic buffer type
TSNEP_TX_TYPE_XDP_NDO for zero copy XDP_TX.

Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-03 12:40:02 +00:00
Jakub Kicinski
010e03db4d Merge branch 'selftests-net-more-fixes'
Paolo Abeni says:

====================
selftests: net: more fixes

Another small bunch of fixes, addressing issues outlined by the
netdev CI.

The first 2 patches are just rebased.

The following 2 are new fixes, for even more problems that surfaced
meanwhile.
====================

Link: https://lore.kernel.org/r/cover.1706812005.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-02 21:11:21 -08:00
Paolo Abeni
691bb4e49c selftests: net: avoid just another constant wait
Using hard-coded constant timeout to wait for some expected
event is deemed to fail sooner or later, especially in slow
env.

Our CI has spotted another of such race:
   # TEST: ipv6: cleanup of cached exceptions - nexthop objects          [FAIL]
   #   can't delete veth device in a timely manner, PMTU dst likely leaked

Replace the crude sleep with a loop looking for the expected condition
at low interval for a much longer range.

Fixes: b3cc4f8a8a41 ("selftests: pmtu: add explicit tests for PMTU exceptions cleanup")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/fd5c745e9bb665b724473af6a9373a8c2a62b247.1706812005.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-02 21:11:21 -08:00
Paolo Abeni
e71e016ad0 selftests: net: fix tcp listener handling in pmtu.sh
The pmtu.sh test uses a few TCP listener in a problematic way:
It hard-codes a constant timeout to wait for the listener starting-up
in background. That introduces unneeded latency and on very slow and
busy host it can fail.

Additionally the test starts again the same listener in the same
namespace on the same port, just after the previous connection
completed. Fast host can attempt starting the new server before the
old one really closed the socket.

Address the issues using the wait_local_port_listen helper and
explicitly waiting for the background listener process exit.

Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/f8e8f6d44427d8c45e9f6a71ee1a321047452087.1706812005.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-02 21:11:21 -08:00