74655 Commits

Author SHA1 Message Date
Jakub Kicinski
5e5d8b94a4 netfilter pull request 23-10-25
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmU45rAACgkQ1V2XiooU
 IORtEQ//U91FHPqc1KpJi5lAnXXAXaDji6RjZ080bwk4H3oXc2moc71SiGEgggGs
 POZEnN0sNJXfUacdG23pQGLnrT1iQpG927mzV01W9HhyZEopO4g+mRt5ymt/qmvO
 Q9MKWuNOlJCD5blPyKxU7VF3LsQynaPST1IbuPI1NVKiqNUpIpAWC1G+Ofpt67QY
 Tq7KiJDX0yc+51OFT9Ahs3piSbzC5bl0yC4iynajPxziv+rUiJW5ym2GM24G2rNh
 /SD4EeJkArdFa3I4Kf15Hnj9809qQP22PDhoQ2Hzzr7XbveArmPjaI0UQ39uV5Jr
 1/lFP3iQMBsj04dI/xRLBHJHb2WZvlNa+btV/RHuaw1TEnYevdarMl3Lh0q7p5sT
 3M4JBbk0+bq7ZXWmDBT48ZQs4S5UqMscunZXKg2k0fZPn/rSlASAZ3TAXZuF0avp
 KLQGQsjeBX/zgmQqhq37/oD+YV13LCtEqC0xz4WgX9WpVvgyMR3LFcsHQcZBAVUN
 PJenvgmpdo8sbhABOXsURJPVDo0JzS4xZhrPyIKaojTo33KfQ/1Z5Ef0EOkbs75+
 6wMoUTdvcZK+Y5f6hvMQ/XOu7XNz0sVZlfBjAhFrVU/TsbprviQCN8QB1IQNHclm
 5A93VnID0WPCSAmOmaIdMlcJka4wKv4irI+Iv8vNlQXqV7dXuzQ=
 =r+0z
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

This patch contains two late Netfilter's flowtable fixes for net:

1) Flowtable GC pushes back packets to classic path in every GC run,
   ie. every second. This is because NF_FLOW_HW_ESTABLISHED is only
   used by sched/act_ct (never set) and IPS_SEEN_REPLY might be unset
   by the time the flow is offloaded (this status bit is only reliable
   in the sched/act_ct datapath).

2) sched/act_ct logic to push back packets to classic path to reevaluate
   if UDP flow is unidirectional only applies if IPS_HW_OFFLOAD_BIT is
   set on and no hardware offload request is pending to be handled.
   From Vlad Buslov.

These two patches fixes two problems that were introduced in the
previous 6.5 development cycle.

* tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  net/sched: act_ct: additional checks for outdated flows
  netfilter: flowtable: GC pushes back packets to classic path
====================

Link: https://lore.kernel.org/r/20231025100819.2664-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-25 16:02:06 -07:00
Alexandru Matei
53b08c4985 vsock/virtio: initialize the_virtio_vsock before using VQs
Once VQs are filled with empty buffers and we kick the host, it can send
connection requests. If the_virtio_vsock is not initialized before,
replies are silently dropped and do not reach the host.

virtio_transport_send_pkt() can queue packets once the_virtio_vsock is
set, but they won't be processed until vsock->tx_run is set to true. We
queue vsock->send_pkt_work when initialization finishes to send those
packets queued earlier.

Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20231024191742.14259-1-alexandru.matei@uipath.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-25 15:49:47 -07:00
Deming Wang
1711435e3e net: ipv6: fix typo in comments
The word "advertize" should be replaced by "advertise".

Signed-off-by: Deming Wang <wangdeming@inspur.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25 10:38:07 +01:00
Deming Wang
197f9fba96 net: ipv4: fix typo in comments
The word "advertize" should be replaced by "advertise".

Signed-off-by: Deming Wang <wangdeming@inspur.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25 10:38:07 +01:00
Vlad Buslov
a63b662212 net/sched: act_ct: additional checks for outdated flows
Current nf_flow_is_outdated() implementation considers any flow table flow
which state diverged from its underlying CT connection status for teardown
which can be problematic in the following cases:

- Flow has never been offloaded to hardware in the first place either
because flow table has hardware offload disabled (flag
NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
workqueue to be offloaded for the first time. The former is incorrect, the
later generates excessive deletions and additions of flows.

- Flow is already pending to be updated on the workqueue. Tearing down such
flows will also generate excessive removals from the flow table, especially
on highly loaded system where the latency to re-offload a flow via 'add'
workqueue can be quite high.

When considering a flow for teardown as outdated verify that it is both
offloaded to hardware and doesn't have any pending updates.

Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-25 11:35:57 +02:00
Pablo Neira Ayuso
735795f68b netfilter: flowtable: GC pushes back packets to classic path
Since 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded
unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY
back to classic path in every run, ie. every second. This is because of
a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct.

In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on
and IPS_SEEN_REPLY is unreliable since users decide when to offload the
flow before, such bit might be set on at a later stage.

Fix it by adding a custom .gc handler that sched/act_ct can use to
deal with its NF_FLOW_HW_ESTABLISHED bit.

Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
Reported-by: Vladimir Smelhaus <vl.sm@email.cz>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-25 11:35:46 +02:00
Jakub Kicinski
00d67093e4 Three more fixes:
- don't drop all unprotected public action frames since
    some don't have a protected dual
  - fix pointer confusion in scanning code
  - fix warning in some connections with multiple links
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmU3naIACgkQ10qiO8sP
 aADUNA//clFAsaH6A94tD6Hgyafi9idBpsHERkYE4RaGmiID34yOYDInvkoDmsy1
 WG7wEdNjnsYDBrX0eG1x7WSrQRLhs76U0HnBP9tFYIeygnLuul2/UNRFkK6EwQfn
 OJbVJdjQdL/c8p129DUr5JKhavbc4ovY2acECLRY54n1fAYlnn6u7SWsOyCu0zl7
 wXSQ5pzYHu5lFM5LSFj6mC7U8b/aFQ5r9XsNHwGz4YVvd5cEdLYc/y5bAK6xAIxz
 jJcJLV088QikAcYmgIS7MNQuKrMudNjCEDWtqM23N9pO//QjsbOag2q02JmfxWyv
 4YJy42G/0K0/wjwCpIZig2OOE5iKDKCJ+dvNBUaCnnTn1ARQSXSDgAYkmWReCrZu
 DUpvn8Be3fgULtaUC0QQ3R1oCTVJyKYTD55Ofcy3Pj1qt+1lmhgLp1qyezemJcfJ
 p2sv5GLwyPLcOUTjeqTgP57xoJl2JV9vUVey9xvk2dMl0vS5qpfIf3FR7R0+HtlZ
 UIrveQOLMsKAaamk59RaSpfg4vJoCqaabu97f/lHRc5WdaeURSlUw0rU9xdqc/P+
 GTX7ubKoMiMEx11v25JdTE3eniFGxu28cojqScryvFo6WIlkYp4cbNxtRb4i9rOX
 ZJXQWCE6YJZ90VlR/a8lpJnTXjntQT5vBtH7vhqAneN2TJN74h8=
 =AvmQ
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2023-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Three more fixes:
 - don't drop all unprotected public action frames since
   some don't have a protected dual
 - fix pointer confusion in scanning code
 - fix warning in some connections with multiple links

* tag 'wireless-2023-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: don't drop all unprotected public action frames
  wifi: cfg80211: fix assoc response warning on failed links
  wifi: cfg80211: pass correct pointer to rdev_inform_bss()
====================

Link: https://lore.kernel.org/r/20231024103540.19198-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24 13:10:53 -07:00
Moritz Wanzenböck
7798b59409 net/handshake: fix file ref count in handshake_nl_accept_doit()
If req->hr_proto->hp_accept() fail, we call fput() twice:
Once in the error path, but also a second time because sock->file
is at that point already associated with the file descriptor. Once
the task exits, as it would probably do after receiving an error
reading from netlink, the fd is closed, calling fput() a second time.

To fix, we move installing the file after the error path for the
hp_accept() call. In the case of errors we simply put the unused fd.
In case of success we can use fd_install() to link the sock->file
to the reserved fd.

Fixes: 7ea9c1ec66bc ("net/handshake: Fix handshake_dup() ref counting")
Signed-off-by: Moritz Wanzenböck <moritz.wanzenboeck@linbit.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20231019125847.276443-1-moritz.wanzenboeck@linbit.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 10:19:33 -07:00
Avraham Stern
91535613b6 wifi: mac80211: don't drop all unprotected public action frames
Not all public action frames have a protected variant. When MFP is
enabled drop only public action frames that have a dual protected
variant.

Fixes: 76a3059cf124 ("wifi: mac80211: drop some unprotected action frames")
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20231016145213.2973e3c8d3bb.I6198b8d3b04cf4a97b06660d346caec3032f232a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-10-23 13:25:30 +02:00
Johannes Berg
c434b2be2d wifi: cfg80211: fix assoc response warning on failed links
The warning here shouldn't be done before we even set the
bss field (or should've used the input data). Move the
assignment before the warning to fix it.

We noticed this now because of Wen's bugfix, where the bug
fixed there had previously hidden this other bug.

Fixes: 53ad07e9823b ("wifi: cfg80211: support reporting failed links")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-10-23 13:25:30 +02:00
Ben Greear
3e3929ef88 wifi: cfg80211: pass correct pointer to rdev_inform_bss()
Confusing struct member names here resulted in passing
the wrong pointer, causing crashes. Pass the correct one.

Fixes: eb142608e2c4 ("wifi: cfg80211: use a struct for inform_single_bss data")
Signed-off-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20231021154827.1142734-1-greearb@candelatech.com
[rewrite commit message, add fixes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-10-23 13:24:51 +02:00
Fred Chen
d2a0fc372a tcp: fix wrong RTO timeout when received SACK reneging
This commit fix wrong RTO timeout when received SACK reneging.

When an ACK arrived pointing to a SACK reneging, tcp_check_sack_reneging()
will rearm the RTO timer for min(1/2*srtt, 10ms) into to the future.

But since the commit 62d9f1a6945b ("tcp: fix TLP timer not set when
CA_STATE changes from DISORDER to OPEN") merged, the tcp_set_xmit_timer()
is moved after tcp_fastretrans_alert()(which do the SACK reneging check),
so the RTO timeout will be overwrited by tcp_set_xmit_timer() with
icsk_rto instead of 1/2*srtt.

Here is a packetdrill script to check this bug:
0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

// simulate srtt to 100ms
+0    < S 0:0(0) win 32792 <mss 1000, sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
+.1    < . 1:1(0) ack 1 win 1024

+0    accept(3, ..., ...) = 4

+0    write(4, ..., 10000) = 10000
+0    > P. 1:10001(10000) ack 1

// inject sack
+.1    < . 1:1(0) ack 1 win 257 <sack 1001:10001,nop,nop>
+0    > . 1:1001(1000) ack 1

// inject sack reneging
+.1    < . 1:1(0) ack 1001 win 257 <sack 9001:10001,nop,nop>

// we expect rto fired in 1/2*srtt (50ms)
+.05    > . 1001:2001(1000) ack 1

This fix remove the FLAG_SET_XMIT_TIMER from ack_flag when
tcp_check_sack_reneging() set RTO timer with 1/2*srtt to avoid
being overwrited later.

Fixes: 62d9f1a6945b ("tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN")
Signed-off-by: Fred Chen <fred.chenchen03@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:47:44 +01:00
Eric Dumazet
a9beb7e81b neighbour: fix various data-races
1) tbl->gc_thresh1, tbl->gc_thresh2, tbl->gc_thresh3 and tbl->gc_interval
   can be written from sysfs.

2) tbl->last_flush is read locklessly from neigh_alloc()

3) tbl->proxy_queue.qlen is read locklessly from neightbl_fill_info()

4) neightbl_fill_info() reads cpu stats that can be changed concurrently.

Fixes: c7fb64db001f ("[NETLINK]: Neighbour table configuration and statistics via rtnetlink")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231019122104.1448310-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 17:41:29 -07:00
Eric Dumazet
72bf4f1767 net: do not leave an empty skb in write queue
Under memory stress conditions, tcp_sendmsg_locked()
might call sk_stream_wait_memory(), thus releasing the socket lock.

If a fresh skb has been allocated prior to this,
we should not leave it in the write queue otherwise
tcp_write_xmit() could panic.

This apparently does not happen often, but a future change
in __sk_mem_raise_allocated() that Shakeel and others are
considering would increase chances of being hurt.

Under discussion is to remove this controversial part:

    /* Fail only if socket is _under_ its sndbuf.
     * In this case we cannot block, so that we have to fail.
     */
    if (sk->sk_wmem_queued + size >= sk->sk_sndbuf) {
        /* Force charge with __GFP_NOFAIL */
        if (memcg_charge && !charged) {
            mem_cgroup_charge_skmem(sk->sk_memcg, amt,
                gfp_memcg_charge() | __GFP_NOFAIL);
        }
        return 1;
    }

Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20231019112457.1190114-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20 17:40:10 -07:00
Linus Torvalds
ce55c22ec8 Including fixes from bluetooth, netfilter, WiFi.
Feels like an up-tick in regression fixes, mostly for older releases.
 The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as fairly
 clear-cut user reported regressions. The mlx5 DMA bug was causing strife
 for 390x folks. The fixes themselves are not particularly scary, tho.
 No open investigations / outstanding reports at the time of writing.
 
 Current release - regressions:
 
  - eth: mlx5: perform DMA operations in the right locations,
    make devices usable on s390x, again
 
  - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve,
    previous fix of rejecting invalid config broke some scripts
 
  - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
 
  - revert "ethtool: Fix mod state of verbose no_mask bitset",
    needs more work
 
 Current release - new code bugs:
 
  - tcp: fix listen() warning with v4-mapped-v6 address
 
 Previous releases - regressions:
 
  - tcp: allow tcp_disconnect() again when threads are waiting,
    it was denied to plug a constant source of bugs but turns out
    .NET depends on it
 
  - eth: mlx5: fix double-free if buffer refill fails under OOM
 
  - revert "net: wwan: iosm: enable runtime pm support for 7560",
    it's causing regressions and the WWAN team at Intel disappeared
 
  - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains
    a single skb, fix single-stream perf regression on some devices
 
 Previous releases - always broken:
 
  - Bluetooth:
    - fix issues in legacy BR/EDR PIN code pairing
    - correctly bounds check and pad HCI_MON_NEW_INDEX name
 
  - netfilter:
    - more fixes / follow ups for the large "commit protocol" rework,
      which went in as a fix to 6.5
    - fix null-derefs on netlink attrs which user may not pass in
 
  - tcp: fix excessive TLP and RACK timeouts from HZ rounding
    (bless Debian for keeping HZ=250 alive)
 
  - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
    letting frankenstein UDP super-frames from getting into the stack
 
  - net: fix interface altnames when ifc moves to a new namespace
 
  - eth: qed: fix the size of the RX buffers
 
  - mptcp: avoid sending RST when closing the initial subflow
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmUxawUACgkQMUZtbf5S
 Irt75w/+MC2ilnFQhfoQqpCxL2uHWOKzhenUz2PoNKR60OKgLLmTq8YBYcpyCVAQ
 wBQaNzNu1/yjF5BG7aNS/j0suGeJsOfhfoahQvXlLaat9NuDqxTpaeoT5FZ7eNQw
 RZ8+CJug3BRDV0TkaJH9UDVC/nfJTsnsGWNIhNXYGPsuveqAUun+xrnN8ZbvZIrn
 6D9rMF+u9SdVO+ANCquXBC7+CWEWiJS1ljUrU7BRNiv/9FSlnPQtjOdpuKleeBO8
 4usMS7TezHNgRdiAKC8GjSGUiIkIIMJT4y4wuczBEQAD4Pkki9UpBrui97ozOj7h
 W4N7UOuPlUBIardvKNoYz9rZyiFXBcPPm0GruHiuCqpxyqmzoFgv2XJsb/6KfzNn
 Dyro+lvh8smtbFHvFqiwaNu5y8ucfClaowvR4gjSe2KcB7hIpwNkh6vWC6OMGJK3
 hiKHnDrnXBQMbnP1YfiJ4feLmm3UYCG8eFdv/ULZT0a9TzZ7fKfzAfywUwD+/O8Y
 +S28Hr9srdDCHO7ih/gF3Wq9wtnnLy8QEkpt7cpXjRDj0uWH8JkHU3YEIGF2814Y
 LNVGmX9y6RcgrHNp03K1PmUcgAzhTTuV9QRoQKEucuBT5AK9ALDQ8YomnWDWDgrp
 UOdJPi1RUTsmqslADF15wZ9W5Ki/cDnUJsE4HU/MtnM2w95C49Q=
 =z1F6
 -----END PGP SIGNATURE-----

Merge tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, netfilter, WiFi.

  Feels like an up-tick in regression fixes, mostly for older releases.
  The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
  fairly clear-cut user reported regressions. The mlx5 DMA bug was
  causing strife for 390x folks. The fixes themselves are not
  particularly scary, tho. No open investigations / outstanding reports
  at the time of writing.

  Current release - regressions:

   - eth: mlx5: perform DMA operations in the right locations, make
     devices usable on s390x, again

   - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
     curve, previous fix of rejecting invalid config broke some scripts

   - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock

   - revert "ethtool: Fix mod state of verbose no_mask bitset", needs
     more work

  Current release - new code bugs:

   - tcp: fix listen() warning with v4-mapped-v6 address

  Previous releases - regressions:

   - tcp: allow tcp_disconnect() again when threads are waiting, it was
     denied to plug a constant source of bugs but turns out .NET depends
     on it

   - eth: mlx5: fix double-free if buffer refill fails under OOM

   - revert "net: wwan: iosm: enable runtime pm support for 7560", it's
     causing regressions and the WWAN team at Intel disappeared

   - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
     single skb, fix single-stream perf regression on some devices

  Previous releases - always broken:

   - Bluetooth:
      - fix issues in legacy BR/EDR PIN code pairing
      - correctly bounds check and pad HCI_MON_NEW_INDEX name

   - netfilter:
      - more fixes / follow ups for the large "commit protocol" rework,
        which went in as a fix to 6.5
      - fix null-derefs on netlink attrs which user may not pass in

   - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
     Debian for keeping HZ=250 alive)

   - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
     letting frankenstein UDP super-frames from getting into the stack

   - net: fix interface altnames when ifc moves to a new namespace

   - eth: qed: fix the size of the RX buffers

   - mptcp: avoid sending RST when closing the initial subflow"

* tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  Revert "ethtool: Fix mod state of verbose no_mask bitset"
  selftests: mptcp: join: no RST when rm subflow/addr
  mptcp: avoid sending RST when closing the initial subflow
  mptcp: more conservative check for zero probes
  tcp: check mptcp-level constraints for backlog coalescing
  selftests: mptcp: join: correctly check for no RST
  net: ti: icssg-prueth: Fix r30 CMDs bitmasks
  selftests: net: add very basic test for netdev names and namespaces
  net: move altnames together with the netdevice
  net: avoid UAF on deleted altname
  net: check for altname conflicts when changing netdev's netns
  net: fix ifname in netlink ntf during netns move
  net: ethernet: ti: Fix mixed module-builtin object
  net: phy: bcm7xxx: Add missing 16nm EPHY statistics
  ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
  tcp_bpf: properly release resources on error paths
  net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
  net: mdio-mux: fix C45 access returning -EIO after API change
  tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
  octeon_ep: update BQL sent bytes before ringing doorbell
  ...
2023-10-19 12:08:18 -07:00
Kory Maincent
524515020f Revert "ethtool: Fix mod state of verbose no_mask bitset"
This reverts commit 108a36d07c01edbc5942d27c92494d1c6e4d45a0.

It was reported that this fix breaks the possibility to remove existing WoL
flags. For example:
~$ ethtool lan2
...
        Supports Wake-on: pg
        Wake-on: d
...
~$ ethtool -s lan2 wol gp
~$ ethtool lan2
...
        Wake-on: pg
...
~$ ethtool -s lan2 wol d
~$ ethtool lan2
...
        Wake-on: pg
...

This worked correctly before this commit because we were always updating
a zero bitmap (since commit 6699170376ab ("ethtool: fix application of
verbose no_mask bitset"), that is) so that the rest was left zero
naturally. But now the 1->0 change (old_val is true, bit not present in
netlink nest) no longer works.

Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Closes: https://lore.kernel.org/netdev/20231019095140.l6fffnszraeb6iiw@lion.mk-sys.cz/
Cc: stable@vger.kernel.org
Fixes: 108a36d07c01 ("ethtool: Fix mod state of verbose no_mask bitset")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/20231019-feature_ptp_bitset_fix-v1-1-70f3c429a221@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:27:12 -07:00
Geliang Tang
14c56686a6 mptcp: avoid sending RST when closing the initial subflow
When closing the first subflow, the MPTCP protocol unconditionally
calls tcp_disconnect(), which in turn generates a reset if the subflow
is established.

That is unexpected and different from what MPTCP does with MPJ
subflows, where resets are generated only on FASTCLOSE and other edge
scenarios.

We can't reuse for the first subflow the same code in place for MPJ
subflows, as MPTCP clean them up completely via a tcp_close() call,
while must keep the first subflow socket alive for later re-usage, due
to implementation constraints.

This patch adds a new helper __mptcp_subflow_disconnect() that
encapsulates, a logic similar to tcp_close, issuing a reset only when
the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
otherwise.

Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-4-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00
Paolo Abeni
72377ab2d6 mptcp: more conservative check for zero probes
Christoph reported that the MPTCP protocol can find the subflow-level
write queue unexpectedly not empty while crafting a zero-window probe,
hitting a warning:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 188 at net/mptcp/protocol.c:1312 mptcp_sendmsg_frag+0xc06/0xe70
Modules linked in:
CPU: 0 PID: 188 Comm: kworker/0:2 Not tainted 6.6.0-rc2-g1176aa719d7a #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:mptcp_sendmsg_frag+0xc06/0xe70 net/mptcp/protocol.c:1312
RAX: 47d0530de347ff6a RBX: 47d0530de347ff6b RCX: ffff8881015d3c00
RDX: ffff8881015d3c00 RSI: 47d0530de347ff6b RDI: 47d0530de347ff6b
RBP: 47d0530de347ff6b R08: ffffffff8243c6a8 R09: ffffffff82042d9c
R10: 0000000000000002 R11: ffffffff82056850 R12: ffff88812a13d580
R13: 0000000000000001 R14: ffff88812b375e50 R15: ffff88812bbf3200
FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000695118 CR3: 0000000115dfc001 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 __subflow_push_pending+0xa4/0x420 net/mptcp/protocol.c:1545
 __mptcp_push_pending+0x128/0x3b0 net/mptcp/protocol.c:1614
 mptcp_release_cb+0x218/0x5b0 net/mptcp/protocol.c:3391
 release_sock+0xf6/0x100 net/core/sock.c:3521
 mptcp_worker+0x6e8/0x8f0 net/mptcp/protocol.c:2746
 process_scheduled_works+0x341/0x690 kernel/workqueue.c:2630
 worker_thread+0x3a7/0x610 kernel/workqueue.c:2784
 kthread+0x143/0x180 kernel/kthread.c:388
 ret_from_fork+0x4d/0x60 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:304
 </TASK>

The root cause of the issue is that expectations are wrong: e.g. due
to MPTCP-level re-injection we can hit the critical condition.

Explicitly avoid the zero-window probe when the subflow write queue
is not empty and drop the related warnings.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/444
Fixes: f70cad1085d1 ("mptcp: stop relying on tcp_tx_skb_cache")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-3-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00
Paolo Abeni
6db8a37dfc tcp: check mptcp-level constraints for backlog coalescing
The MPTCP protocol can acquire the subflow-level socket lock and
cause the tcp backlog usage. When inserting new skbs into the
backlog, the stack will try to coalesce them.

Currently, we have no check in place to ensure that such coalescing
will respect the MPTCP-level DSS, and that may cause data stream
corruption, as reported by Christoph.

Address the issue by adding the relevant admission check for coalescing
in tcp_add_backlog().

Note the issue is not easy to reproduce, as the MPTCP protocol tries
hard to avoid acquiring the subflow-level socket lock.

Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/420
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-2-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00
Jakub Kicinski
8e15aee621 net: move altnames together with the netdevice
The altname nodes are currently not moved to the new netns
when netdevice itself moves:

  [ ~]# ip netns add test
  [ ~]# ip -netns test link add name eth0 type dummy
  [ ~]# ip -netns test link property add dev eth0 altname some-name
  [ ~]# ip -netns test link show dev some-name
  2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff
      altname some-name
  [ ~]# ip -netns test link set dev eth0 netns 1
  [ ~]# ip link
  ...
  3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
      altname some-name
  [ ~]# ip li show dev some-name
  Device "some-name" does not exist.

Remove them from the hash table when device is unlisted
and add back when listed again.

Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19 15:51:16 +02:00
Jakub Kicinski
1a83f4a7c1 net: avoid UAF on deleted altname
Altnames are accessed under RCU (dev_get_by_name_rcu())
but freed by kfree() with no synchronization point.

Each node has one or two allocations (node and a variable-size
name, sometimes the name is netdev->name). Adding rcu_heads
here is a bit tedious. Besides most code which unlists the names
already has rcu barriers - so take the simpler approach of adding
synchronize_rcu(). Note that the one on the unregistration path
(which matters more) is removed by the next fix.

Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19 15:51:16 +02:00
Jakub Kicinski
7663d52209 net: check for altname conflicts when changing netdev's netns
It's currently possible to create an altname conflicting
with an altname or real name of another device by creating
it in another netns and moving it over:

 [ ~]$ ip link add dev eth0 type dummy

 [ ~]$ ip netns add test
 [ ~]$ ip -netns test link add dev ethX netns test type dummy
 [ ~]$ ip -netns test link property add dev ethX altname eth0
 [ ~]$ ip -netns test link set dev ethX netns 1

 [ ~]$ ip link
 ...
 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
 ...
 5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
     altname eth0

Create a macro for walking the altnames, this hopefully makes
it clearer that the list we walk contains only altnames.
Which is otherwise not entirely intuitive.

Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19 15:51:16 +02:00
Jakub Kicinski
311cca4066 net: fix ifname in netlink ntf during netns move
dev_get_valid_name() overwrites the netdev's name on success.
This makes it hard to use in prepare-commit-like fashion,
where we do validation first, and "commit" to the change
later.

Factor out a helper which lets us save the new name to a buffer.
Use it to fix the problem of notification on netns move having
incorrect name:

 5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
     link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
 6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
     link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff

 [ ~]# ip link set dev eth0 netns 1 name eth1

ip monitor inside netns:
 Deleted inet eth0
 Deleted inet6 eth0
 Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
     link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7

Name is reported as eth1 in old netns for ifindex 5, already renamed.

Fixes: d90310243fd7 ("net: device name allocation cleanups")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19 15:51:16 +02:00
Jakub Kicinski
9b9ac46c6c netfilter pr 2023-18-10
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUvzlINHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2APelD/9b9FP1jNIX11OqPqmOL0bgGczU5OW3iI6VxLOx
 1qygw4pXHjtLGfNBEA7QtckbkFgXIxou/JvgavYBXY5mGDHDEJuV3JnX/HqIxKy9
 czAahqkdQ94/a0mI9wFuPY5ZeLTF/yGAe1iLKKWf4f//QwTc7dCfDsGjdKeKLHwL
 ye0f0MRwIwojTjBCf+szlacYrVE+Jgog+GFKX6F3eUtA+JOhpspwWNCPt5W/IQg/
 Q9Bl0dRQeqbIUzaB2YxvIFxB80NG4dg31ThYfgFW8IBz6bTYm3Vml4iorGSkYAqD
 c4PjKY/46/cpCAzkLd+5c+WE0YLMlq2u52KJXThUd5Lpzy3QG8KG5AMR1vXDVw2s
 7fLNeJGH+4zHDY3nXVTqCA/IV2x2sNwgfF6VZHvvkqpKhL5euaKJGu3a7/KGOhoI
 JO1Q6CBaTpZ3HuBNJoTI/uMeZJySlfv7080pHIypy7mYtTyChQC7/IvYpaI9yFwU
 Ka7wIR5MOJ/npaj6ZpVkbNQLntt+SbJINyTiJax5vO63FJ5kSrdgPZiR1PwtMHpZ
 br1e+2eFz8foxaEf6ADL/5QwfIvfglT4fVk/flsukJdyrsPxI2H7kYm/6G8Fcbu3
 TOZQmsNvqnqUpnQLXYe/YbtKoIvTAZOoGy/rcj+IIl2ym/2DiwbWx2nye6uwufIH
 UlP5gA==
 =M/4J
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

First patch, from Phil Sutter, reduces number of audit notifications
when userspace requests to re-set stateful objects.
This change also comes with a selftest update.

Second patch, also from Phil, moves the nftables audit selftest
to its own netns to avoid interference with the init netns.

Third patch, from Pablo Neira, fixes an inconsistency with the "rbtree"
set backend: When set element X has expired, a request to delete element
X should fail (like with all other backends).

Finally, patch four, also from Pablo, reverts a recent attempt to speed
up abort of a large pending update with the "pipapo" set backend.

It could cause stray references to remain in the set, which then
results in a double-free.

* tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: revert do not remove elements if set backend implements .abort
  netfilter: nft_set_rbtree: .deactivate fails if element has expired
  selftests: netfilter: Run nft_audit.sh in its own netns
  netfilter: nf_tables: audit log object reset once per table
====================

Link: https://lore.kernel.org/r/20231018125605.27299-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:17:51 -07:00
Jakub Kicinski
88343fbe5a A few more fixes:
* prevent value bounce/glitch in rfkill GPIO probe
  * fix lockdep report in rfkill
  * fix error path leak in mac80211 key handling
  * use system_unbound_wq for wiphy work since it
    can take longer
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmUvhKQACgkQ10qiO8sP
 aAB+3A//eNEP5W9ZigsZ5pPYuxKyXoctyV3Qp8yChYWHEE46JbbnQsvvvGa6SYFu
 TzWS3dShLqXU83pWHrvNSuWo/AySaoT4kDp7c8TK7GIed8MHxguU9QXaPVqQiNEG
 tmAVwPz5y8Y4NzjytBU0MjuAfbWo/HTIs5nKxBhNTJp3imqlezCnhYz4pMjc0rcN
 CnWKH8bYW+ubqnJVi0l01GDwtGs4xROf+43is6sA8zu0zA3gx+V0BJdx3J7t1Jrd
 qCBmi8ZSFcUZ/GLqYE04MdlHGRrzGR5p+Rq+/MsF3AUoobIrbkVIXXP0VoPpLAqf
 xZwccVhWXI1ZGM3T5L91hrhONcwoCA67m79qhMcr2KZZ0flhBjNLVqCXXRAXn0Hs
 /gynjrvGVaZMYlJegAd0iOIAOn9FjYcdcYgYhDRFdbRCt7eCRSj3YMuO9rFhX6kL
 tbmIII6vMoPunzSwUeNI1CXUmJoTJMnzwWVPC0EZqT2gymhBAeS/GK8ELC3lG1dz
 aXoxX2dCHMk2rv86B3g/Pk2j0smxCNd+hXjFVfJIJ+t1xp8TvuEI8AGrae1nNzck
 c2cPn34u/5PLkO5bJqu2JgF9qAHgArqfPIFIKLZNcpAOvqCNWqSozc5KtNQ4ASMt
 f/cnPGQUxcHekAFsJWB9aqwW3oPAUkWR6tJ8jYHxZC47oGs/qnY=
 =Sth1
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
A few more fixes:
 * prevent value bounce/glitch in rfkill GPIO probe
 * fix lockdep report in rfkill
 * fix error path leak in mac80211 key handling
 * use system_unbound_wq for wiphy work since it
   can take longer

* tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  net: rfkill: reduce data->mtx scope in rfkill_fop_open
  net: rfkill: gpio: prevent value glitch during probe
  wifi: mac80211: fix error path key leak
  wifi: cfg80211: use system_unbound_wq for wiphy work
====================

Link: https://lore.kernel.org/r/20231018071041.8175-2-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:14:25 -07:00
Eric Dumazet
195374d893 ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
syzbot reported a data-race while accessing nh->nh_saddr_genid [1]

Add annotations, but leave the code lazy as intended.

[1]
BUG: KCSAN: data-race in fib_select_path / fib_select_path

write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1:
fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline]
fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline]
fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269
ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0:
fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline]
fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269
ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x959d3217 -> 0x959d3218

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker

Fixes: 436c3b66ec98 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231017192304.82626-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:11:31 -07:00
Paolo Abeni
68b54aeff8 tcp_bpf: properly release resources on error paths
In the blamed commit below, I completely forgot to release the acquired
resources before erroring out in the TCP BPF code, as reported by Dan.

Address the issues by replacing the bogus return with a jump to the
relevant cleanup code.

Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/8f99194c698bcef12666f0a9a999c58f8b1cb52c.1697557782.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:09:31 -07:00
Pedro Tammela
a13b67c9a0 net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
Christian Theune says:
   I upgraded from 6.1.38 to 6.1.55 this morning and it broke my traffic shaping script,
   leaving me with a non-functional uplink on a remote router.

A 'rt' curve cannot be used as a inner curve (parent class), but we were
allowing such configurations since the qdisc was introduced. Such
configurations would trigger a UAF as Budimir explains:
   The parent will have vttree_insert() called on it in init_vf(),
   but will not have vttree_remove() called on it in update_vf()
   because it does not have the HFSC_FSC flag set.

The qdisc always assumes that inner classes have the HFSC_FSC flag set.
This is by design as it doesn't make sense 'qdisc wise' for an 'rt'
curve to be an inner curve.

Budimir's original patch disallows users to add classes with a 'rt'
parent, but this is too strict as it breaks users that have been using
'rt' as a inner class. Another approach, taken by this patch, is to
upgrade the inner 'rt' into a 'sc', warning the user in the process.
It avoids the UAF reported by Budimir while also being more permissive
to bad scripts/users/code using 'rt' as a inner class.

Users checking the `tc class ls [...]` or `tc class get [...]` dumps would
observe the curve change and are potentially breaking with this change.

v1->v2: https://lore.kernel.org/all/20231013151057.2611860-1-pctammela@mojatatu.com/
- Correct 'Fixes' tag and merge with revert (Jakub)

Cc: Christian Theune <ct@flyingcircus.io>
Cc: Budimir Markovic <markovicbudimir@gmail.com>
Fixes: b3d26c5702c7 ("net/sched: sch_hfsc: Ensure inner classes have fsc curve")
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231017143602.3191556-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:08:28 -07:00
Eric Dumazet
f921a4a5bf tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
In commit 75eefc6c59fd ("tcp: tsq: add a shortcut in tcp_small_queue_check()")
we allowed to send an skb regardless of TSQ limits being hit if rtx queue
was empty or had a single skb, in order to better fill the pipe
when/if TX completions were slow.

Then later, commit 75c119afe14f ("tcp: implement rb-tree based
retransmit queue") accidentally removed the special case for
one skb in rtx queue.

Stefan Wahren reported a regression in single TCP flow throughput
using a 100Mbit fec link, starting from commit 65466904b015 ("tcp: adjust
TSO packet sizes based on min_rtt"). This last commit only made the
regression more visible, because it locked the TCP flow on a particular
behavior where TSQ prevented two skbs being pushed downstream,
adding silences on the wire between each TSO packet.

Many thanks to Stefan for his invaluable help !

Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Link: https://lore.kernel.org/netdev/7f31ddc8-9971-495e-a1f6-819df542e0af@gmx.net/
Reported-by: Stefan Wahren <wahrenst@gmx.net>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20231017124526.4060202-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:06:36 -07:00
Pablo Neira Ayuso
f86fb94011 netfilter: nf_tables: revert do not remove elements if set backend implements .abort
nf_tables_abort_release() path calls nft_set_elem_destroy() for
NFT_MSG_NEWSETELEM which releases the element, however, a reference to
the element still remains in the working copy.

Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:47:32 +02:00
Pablo Neira Ayuso
d111692a59 netfilter: nft_set_rbtree: .deactivate fails if element has expired
This allows to remove an expired element which is not possible in other
existing set backends, this is more noticeable if gc-interval is high so
expired elements remain in the tree. On-demand gc also does not help in
this case, because this is delete element path. Return NULL if element
has expired.

Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:47:32 +02:00
Phil Sutter
1baf0152f7 netfilter: nf_tables: audit log object reset once per table
When resetting multiple objects at once (via dump request), emit a log
message per table (or filled skb) and resurrect the 'entries' parameter
to contain the number of objects being logged for.

To test the skb exhaustion path, perform some bulk counter and quota
adds in the kselftest.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:43:40 +02:00
Gavrilov Ilia
1d30162f35 net: pktgen: Fix interface flags printing
Device flags are displayed incorrectly:
1) The comparison (i == F_FLOW_SEQ) is always false, because F_FLOW_SEQ
is equal to (1 << FLOW_SEQ_SHIFT) == 2048, and the maximum value
of the 'i' variable is (NR_PKT_FLAG - 1) == 17. It should be compared
with FLOW_SEQ_SHIFT.

2) Similarly to the F_IPSEC flag.

3) Also add spaces to the print end of the string literal "spi:%u"
to prevent the output from merging with the flag that follows.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Fixes: 99c6d3d20d62 ("pktgen: Remove brute-force printing of flags")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18 10:12:30 +01:00
Jakub Kicinski
f6c7b42243 ipsec-2023-10-17
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmUuRZ4ACgkQrB3Eaf9P
 W7e7eg//fgQux/sJoq2Qf7T8cvVt3NSpOLXB43NRsIb9nUyMNN/cYTtypYM/+0FM
 f0zxmwtUkj9Mx/IL2nNzK/h8lMtJeKfy14tt8lAX2ux5oJltCcTiLgdp3C4HWxOh
 9DrXMGIryr0apedkcStSzhRoJBd8giFViAQZE8NwO5EKG8WLJNVvw0YzlNgM0WOk
 5y/sN1uXeUmUCYDkOGcdt6yIyV+GIo/nEg/XOY8LkaQDzKveDypydj0dPGL/Dj8U
 MhczTCf1WdbTHh0dmITIdX/yZ8/bPNfV3EzAtAaYgceVh1DSq8/F9buRXz2oxxvh
 r8Igv+640+SoWEtIsVoXHx7KIZ7LckasrJv1IoKRXGVV25LjnR+bxGVQNyUjdd6+
 Cb11Q/m66fp/k7B+++b6eFVlKvr9O4jBogb3kfGpqMiwm6LNM+CJicdAfM8/GEgM
 ejnVNSEaChhdgGvrXVbKhI2ACatwbPrt6d4PN0d9fpUbZJnuBZl4QHElLNSBznjO
 xJUF8LvPjFGx6vj3YFnBg5f3uNH35nR2jQxPsx+yWdBEFqbxBt7yKr/pftwP1gxD
 ZLHF8JbOT4J+96ClDRFnlWFdY+Phq0fg5S5WhPhNGQ4rj9rsjAS8ZMcHWZeA7iZ2
 0w5P/DlFmmnw6CC+Xr+wQJlgB1tZ2sdC8nAK6gxQJ3eFVv4wnhk=
 =yHtO
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2023-10-17

1) Fix a slab-use-after-free in xfrm_policy_inexact_list_reinsert.
   From Dong Chenchen.

2) Fix data-races in the xfrm interfaces dev->stats fields.
   From Eric Dumazet.

3) Fix a data-race in xfrm_gen_index.
   From Eric Dumazet.

4) Fix an inet6_dev refcount underflow.
   From Zhang Changzhong.

5) Check the return value of pskb_trim in esp_remove_trailer
   for esp4 and esp6. From Ma Ke.

6) Fix a data-race in xfrm_lookup_with_ifid.
   From Eric Dumazet.

* tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: fix a data-race in xfrm_lookup_with_ifid()
  net: ipv4: fix return value check in esp_remove_trailer
  net: ipv6: fix return value check in esp_remove_trailer
  xfrm6: fix inet6_dev refcount underflow problem
  xfrm: fix a data-race in xfrm_gen_index()
  xfrm: interface: use DEV_STATS_INC()
  net: xfrm: skip policies marked as dead while reinserting policies
====================

Link: https://lore.kernel.org/r/20231017083723.1364940-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17 18:21:13 -07:00
Neal Cardwell
1c2709cfff tcp: fix excessive TLP and RACK timeouts from HZ rounding
We discovered from packet traces of slow loss recovery on kernels with
the default HZ=250 setting (and min_rtt < 1ms) that after reordering,
when receiving a SACKed sequence range, the RACK reordering timer was
firing after about 16ms rather than the desired value of roughly
min_rtt/4 + 2ms. The problem is largely due to the RACK reorder timer
calculation adding in TCP_TIMEOUT_MIN, which is 2 jiffies. On kernels
with HZ=250, this is 2*4ms = 8ms. The TLP timer calculation has the
exact same issue.

This commit fixes the TLP transmit timer and RACK reordering timer
floor calculation to more closely match the intended 2ms floor even on
kernels with HZ=250. It does this by adding in a new
TCP_TIMEOUT_MIN_US floor of 2000 us and then converting to jiffies,
instead of the current approach of converting to jiffies and then
adding th TCP_TIMEOUT_MIN value of 2 jiffies.

Our testing has verified that on kernels with HZ=1000, as expected,
this does not produce significant changes in behavior, but on kernels
with the default HZ=250 the latency improvement can be large. For
example, our tests show that for HZ=250 kernels at low RTTs this fix
roughly halves the latency for the RACK reorder timer: instead of
mostly firing at 16ms it mostly fires at 8ms.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: bb4d991a28cc ("tcp: adjust tail loss probe timeout")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231015174700.2206872-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17 17:25:42 -07:00
Jakub Kicinski
2b10740ce7 bluetooth pull request for net:
- Fix race when opening vhci device
  - Avoid memcmp() out of bounds warning
  - Correctly bounds check and pad HCI_MON_NEW_INDEX name
  - Fix using memcmp when comparing keys
  - Ignore error return for hci_devcd_register() in btrtl
  - Always check if connection is alive before deleting
  - Fix a refcnt underflow problem for hci_conn
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmUqBjIZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKeVmD/95NHJd61l0JpQw9yYNjDc+
 8C2tAa4rDrbooqfhgS3PPy4SQ9X07KO9VRKaQVEjnVahnfIsbE++sptBWjMSQz94
 8cRde2NJs2uUnl0Eovv9hcE/dgscXu5Lgt/fCCOyY6YwbsyFW5FmYcNUSzOLu83i
 6KuPigt2n/bFC7pnB/VROmO3WusNVFX1J8unZa/MvMK5wn6DQkJeZgJlFlgJSfeg
 Tou4SwI8ormUsDAjV3+dVl7vZVNpyX4mv5cR6RlwHEtvcnkFIj6+ZE96M1BwOBrm
 zbc1OyzrXt5jXuSORlOQEAfjZUab+ygrv1tCgpVkQf+vnIiREsN/57NS6J/kmbc9
 tbOqfJwYokRwjKtXsQ2bc0Jm5kvA0j/9Mrep4E3UZpTv2LwjGTaN/wXGNu1qW8+d
 LTgzqwmgx/PwqC4oiAFrT7guPDwoHzFqSwnvdhkdr04dH19zk7IjIm8ZrTruPDDk
 3wiMOijCtx31RBlRPmfb2njostkS+ysrW1bw3vQmfs04djvcgar0MqKdauXIuBvE
 QMB+dyvPwCFQ8OYHQwfuwYfdHSgG7v+f35IEeflB4OISbZI4aFWkQnjvcloFpnmP
 m/vo+y86ORlxJiHlJypgwqxZ42nAxpGUCwxxhiMDeiqp4GzxUYcwK5RteOcDkDTb
 /ukSEQz/suO3Fya2kTz4mw==
 =SY+z
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2023-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix race when opening vhci device
 - Avoid memcmp() out of bounds warning
 - Correctly bounds check and pad HCI_MON_NEW_INDEX name
 - Fix using memcmp when comparing keys
 - Ignore error return for hci_devcd_register() in btrtl
 - Always check if connection is alive before deleting
 - Fix a refcnt underflow problem for hci_conn

* tag 'for-net-2023-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_sock: Correctly bounds check and pad HCI_MON_NEW_INDEX name
  Bluetooth: avoid memcmp() out of bounds warning
  Bluetooth: hci_sock: fix slab oob read in create_monitor_event
  Bluetooth: btrtl: Ignore error return for hci_devcd_register()
  Bluetooth: hci_event: Fix coding style
  Bluetooth: hci_event: Fix using memcmp when comparing keys
  Bluetooth: Fix a refcnt underflow problem for hci_conn
  Bluetooth: hci_sync: always check if connection is alive before deleting
  Bluetooth: Reject connection with the device which has same BD_ADDR
  Bluetooth: hci_event: Ignore NULL link key
  Bluetooth: ISO: Fix invalid context error
  Bluetooth: vhci: Fix race when opening vhci device
====================

Link: https://lore.kernel.org/r/20231014031336.1664558-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-16 18:02:20 -07:00
Krzysztof Kozlowski
7937609cd3 nfc: nci: fix possible NULL pointer dereference in send_acknowledge()
Handle memory allocation failure from nci_skb_alloc() (calling
alloc_skb()) to avoid possible NULL pointer dereference.

Reported-by: 黄思聪 <huangsicong@iie.ac.cn>
Fixes: 391d8a2da787 ("NFC: Add NCI over SPI receive")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231013184129.18738-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-16 17:34:53 -07:00
Christoph Paasch
503930f8e1 netlink: Correct offload_xstats size
rtnl_offload_xstats_get_size_hw_s_info_one() conditionalizes the
size-computation for IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED based on whether
or not the device has offload_xstats enabled.

However, rtnl_offload_xstats_fill_hw_s_info_one() is adding the u8 for
that field uncondtionally.

syzkaller triggered a WARNING in rtnl_stats_get due to this:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 754 at net/core/rtnetlink.c:5982 rtnl_stats_get+0x2f4/0x300
Modules linked in:
CPU: 0 PID: 754 Comm: syz-executor148 Not tainted 6.6.0-rc2-g331b78eb12af #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:rtnl_stats_get+0x2f4/0x300 net/core/rtnetlink.c:5982
Code: ff ff 89 ee e8 7d 72 50 ff 83 fd a6 74 17 e8 33 6e 50 ff 4c 89 ef be 02 00 00 00 e8 86 00 fa ff e9 7b fe ff ff e8 1c 6e 50 ff <0f> 0b eb e5 e8 73 79 7b 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc900006837c0 EFLAGS: 00010293
RAX: ffffffff81cf7f24 RBX: ffff8881015d9000 RCX: ffff888101815a00
RDX: 0000000000000000 RSI: 00000000ffffffa6 RDI: 00000000ffffffa6
RBP: 00000000ffffffa6 R08: ffffffff81cf7f03 R09: 0000000000000001
R10: ffff888101ba47b9 R11: ffff888101815a00 R12: ffff8881017dae00
R13: ffff8881017dad00 R14: ffffc90000683ab8 R15: ffffffff83c1f740
FS:  00007fbc22dbc740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000046 CR3: 000000010264e003 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 rtnetlink_rcv_msg+0x677/0x710 net/core/rtnetlink.c:6480
 netlink_rcv_skb+0xea/0x1c0 net/netlink/af_netlink.c:2545
 netlink_unicast+0x430/0x500 net/netlink/af_netlink.c:1342
 netlink_sendmsg+0x4fc/0x620 net/netlink/af_netlink.c:1910
 sock_sendmsg+0xa8/0xd0 net/socket.c:730
 ____sys_sendmsg+0x22a/0x320 net/socket.c:2541
 ___sys_sendmsg+0x143/0x190 net/socket.c:2595
 __x64_sys_sendmsg+0xd8/0x150 net/socket.c:2624
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7fbc22e8d6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc4320e778 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004007d0 RCX: 00007fbc22e8d6a9
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000004007d0
R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffc4320e898
R13: 00007ffc4320e8a8 R14: 00000000004004a0 R15: 00007fbc22fa5a80
 </TASK>
---[ end trace 0000000000000000 ]---

Which didn't happen prior to commit bf9f1baa279f ("net: add dedicated
kmem_cache for typical/small skb->head") as the skb always was large
enough.

Fixes: 0e7788fd7622 ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20231013041448.8229-1-cpaasch@apple.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-16 17:32:59 -07:00
Dust Li
4abbd2e3c1 net/smc: return the right falback reason when prefix checks fail
In the smc_listen_work(), if smc_listen_prfx_check() failed,
the real reason: SMC_CLC_DECL_DIFFPREFIX was dropped, and
SMC_CLC_DECL_NOSMCDEV was returned.

Althrough this is also kind of SMC_CLC_DECL_NOSMCDEV, but return
the real reason is much friendly for debugging.

Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20231012123729.29307-1-dust.li@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-16 16:37:40 -07:00
Kees Cook
cb3871b1cd Bluetooth: hci_sock: Correctly bounds check and pad HCI_MON_NEW_INDEX name
The code pattern of memcpy(dst, src, strlen(src)) is almost always
wrong. In this case it is wrong because it leaves memory uninitialized
if it is less than sizeof(ni->name), and overflows ni->name when longer.

Normally strtomem_pad() could be used here, but since ni->name is a
trailing array in struct hci_mon_new_index, compilers that don't support
-fstrict-flex-arrays=3 can't tell how large this array is via
__builtin_object_size(). Instead, open-code the helper and use sizeof()
since it will work correctly.

Additionally mark ni->name as __nonstring since it appears to not be a
%NUL terminated C string.

Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Edward AD <twuufnxlz@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Fixes: 18f547f3fc07 ("Bluetooth: hci_sock: fix slab oob read in create_monitor_event")
Link: https://lore.kernel.org/lkml/202310110908.F2639D3276@keescook/
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-10-13 20:06:33 -07:00
Arnd Bergmann
9d1a3c7474 Bluetooth: avoid memcmp() out of bounds warning
bacmp() is a wrapper around memcpy(), which contain compile-time
checks for buffer overflow. Since the hci_conn_request_evt() also calls
bt_dev_dbg() with an implicit NULL pointer check, the compiler is now
aware of a case where 'hdev' is NULL and treats this as meaning that
zero bytes are available:

In file included from net/bluetooth/hci_event.c:32:
In function 'bacmp',
    inlined from 'hci_conn_request_evt' at net/bluetooth/hci_event.c:3276:7:
include/net/bluetooth/bluetooth.h:364:16: error: 'memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
  364 |         return memcmp(ba1, ba2, sizeof(bdaddr_t));
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Add another NULL pointer check before the bacmp() to ensure the compiler
understands the code flow enough to not warn about it.  Since the patch
that introduced the warning is marked for stable backports, this one
should also go that way to avoid introducing build regressions.

Fixes: 1ffc6f8cc332 ("Bluetooth: Reject connection with the device which has same BD_ADDR")
Cc: Kees Cook <keescook@chromium.org>
Cc: "Lee, Chun-Yi" <jlee@suse.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-10-13 20:04:45 -07:00
Edward AD
18f547f3fc Bluetooth: hci_sock: fix slab oob read in create_monitor_event
When accessing hdev->name, the actual string length should prevail

Reported-by: syzbot+c90849c50ed209d77689@syzkaller.appspotmail.com
Fixes: dcda165706b9 ("Bluetooth: hci_core: Fix build warnings")
Signed-off-by: Edward AD <twuufnxlz@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-10-13 20:03:04 -07:00
Luiz Augusto von Dentz
35d91d95a0 Bluetooth: hci_event: Fix coding style
This fixes the following code style problem:

ERROR: that open brace { should be on the previous line
+	if (!bacmp(&hdev->bdaddr, &ev->bdaddr))
+	{

Fixes: 1ffc6f8cc332 ("Bluetooth: Reject connection with the device which has same BD_ADDR")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-10-13 20:02:35 -07:00
Luiz Augusto von Dentz
b541260615 Bluetooth: hci_event: Fix using memcmp when comparing keys
memcmp is not consider safe to use with cryptographic secrets:

 'Do  not  use memcmp() to compare security critical data, such as
 cryptographic secrets, because the required CPU time depends on the
 number of equal bytes.'

While usage of memcmp for ZERO_KEY may not be considered a security
critical data, it can lead to more usage of memcmp with pairing keys
which could introduce more security problems.

Fixes: 455c2ff0a558 ("Bluetooth: Fix BR/EDR out-of-band pairing with only initiator data")
Fixes: 33155c4aae52 ("Bluetooth: hci_event: Ignore NULL link key")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-10-13 20:00:25 -07:00
Jakub Kicinski
f50ee3a003 nf pull request 2023-10-12
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUnsR0NHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AJaDD/9WqsWX8KK+QaXMNdhuY2TYicRUrzAObeCOkPzU
 8M0S0pZiSsfZO4Tzlo30krSoLERnBYsedngEIzLHHlQ/V+65H8h9jY1p1oGZFKIg
 5ui9ZQbcvWLqKWTOaCneZuEjsGxodWvVokVubSAWw9XExuYqbfo8WOOKBZzAX4mt
 J7KznGlShGHithvssWrk8n2FQr8+ZqzRrJjpoNfQcKU1a+GEIiQvIIfm4f3P6baj
 yiuP2dXWsUXuDK63JmMUK2k3w1e/XwjPpCZ+uyoNxPCggu0X0u//MpM71CNpGAZ4
 g9V5dn+p6WsLh1gQ4QvWGgJqXPm8xU1mRux5WLnp5dHVwm+8bgGGy0O8QC+s9H63
 15pZmeaGcTd6zp3FxbnCWcWFbcW1jsYoscLrMyqjtgDmGIOBxQfia2oCyaN3Z37M
 vjRYtNFQbU6TbKZc0tKlh79geUykZL+ibtVuBiHkniZkAMOQXqUBkh3K1yfJ1j76
 w/GjIm+wFKn0KKij27mzebwTJNLbXdXWem5bzUg6lhC0X57Mcf3+L8VbNWPv220x
 drsI9uzw+YPE7neVbmXQ8hHyJKelG5cF0xgfykLT+6DnadiCpdrlrxwiMCP/POQ5
 XIPkjgyi1T3k/5Sv4kab7cZxbq1A14URJHASidT/JN2NCYlMV1A1CaeE1vhrZ5Z4
 BHgDRw==
 =mZQY
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-10-12' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter updates for net

Patch 1, from Pablo Neira Ayuso, fixes a performance regression
(since 6.4) when a large pending set update has to be canceled towards
the end of the transaction.

Patch 2 from myself, silences an incorrect compiler warning reported
with a few (older) compiler toolchains.

Patch 3, from Kees Cook, adds __counted_by annotation to
nft_pipapo set backend type.  I took this for net instead of -next
given infra is already in place and no actual code change is made.

Patch 4, from Pablo Neira Ayso, disables timeout resets on
stateful element reset.  The rest should only affect internal object
state, e.g. reset a quota or counter, but not affect a pending timeout.

Patches 5 and 6 fix NULL dereferences in 'inner header' match,
control plane doesn't test for netlink attribute presence before
accessing them. Broken since feature was added in 6.2, fixes from
Xingyuan Mo.

Last patch, from myself, fixes a bogus rule match when skb has
a 0-length mac header, in this case we'd fetch data from network
header instead of canceling rule evaluation.  This is a day 0 bug,
present since nftables was merged in 3.13.

* tag 'nf-23-10-12' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_payload: fix wrong mac header matching
  nf_tables: fix NULL pointer dereference in nft_expr_inner_parse()
  nf_tables: fix NULL pointer dereference in nft_inner_init()
  netfilter: nf_tables: do not refresh timeout when resetting element
  netfilter: nf_tables: Annotate struct nft_pipapo_match with __counted_by
  netfilter: nfnetlink_log: silence bogus compiler warning
  netfilter: nf_tables: do not remove elements if set backend implements .abort
====================

Link: https://lore.kernel.org/r/20231012085724.15155-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 17:50:58 -07:00
Albert Huang
c68681ae46 net/smc: fix smc clc failed issue when netdevice not in init_net
If the netdevice is within a container and communicates externally
through network technologies such as VxLAN, we won't be able to find
routing information in the init_net namespace. To address this issue,
we need to add a struct net parameter to the smc_ib_find_route function.
This allow us to locate the routing information within the corresponding
net namespace, ensuring the correct completion of the SMC CLC interaction.

Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment")
Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20231011074851.95280-1-huangjie.albert@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 16:52:02 -07:00
Paolo Abeni
419ce133ab tcp: allow again tcp_disconnect() when threads are waiting
As reported by Tom, .NET and applications build on top of it rely
on connect(AF_UNSPEC) to async cancel pending I/O operations on TCP
socket.

The blamed commit below caused a regression, as such cancellation
can now fail.

As suggested by Eric, this change addresses the problem explicitly
causing blocking I/O operation to terminate immediately (with an error)
when a concurrent disconnect() is executed.

Instead of tracking the number of threads blocked on a given socket,
track the number of disconnect() issued on such socket. If such counter
changes after a blocking operation releasing and re-acquiring the socket
lock, error out the current operation.

Fixes: 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting")
Reported-by: Tom Deseyn <tdeseyn@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886305
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f3b95e47e3dbed840960548aebaa8d954372db41.1697008693.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 16:49:32 -07:00
Linus Torvalds
a1ef447dee Fixes for an overreaching WARN_ON, two error paths and a switch to
kernel_connect() which recently grown protection against someone using
 BPF to rewrite the address.  All but one marked for stable.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmUpYoQTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHziwvmCACK13dkAaupcHyteYPloBgtJLNixR3X
 6++nHCOXGtE7cK6n1snobFQgp/d5BqSKAeymyLqDjJOJVqG/5n8FR1gwcuY/Ogdj
 Aju2Mkt7R/R/V6kvmCbqGSwiOvxZP1gBFkKcluRNQkFNP3boKw4vmJGq29Rabbyr
 66NfZSETsR/H4JhAWvUyVmffrvxIx11THnvmrAnprGVKoK72HwOQFx0H4KLD2Hio
 aN/yiiR15yDS6hL8dfilt+QGO6o+ZWgvRl1GOzqjwISgfeUUYVknMJXrEf+20kHs
 kX3tWaLWVZsU1FyVFRO5HPYLAREjALXstFO4K1OHPERM7EipWNinIHoS
 =wyYP
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.6-rc6' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Fixes for an overreaching WARN_ON, two error paths and a switch to
  kernel_connect() which recently grown protection against someone using
  BPF to rewrite the address.

  All but one marked for stable"

* tag 'ceph-for-6.6-rc6' of https://github.com/ceph/ceph-client:
  ceph: fix type promotion bug on 32bit systems
  libceph: use kernel_connect()
  ceph: remove unnecessary IS_ERR() check in ceph_fname_to_usr()
  ceph: fix incorrect revoked caps assert in ceph_fill_file_size()
2023-10-13 11:27:31 -07:00
Kuniyuki Iwashima
8702cf12e6 tcp: Fix listen() warning with v4-mapped-v6 address.
syzbot reported a warning [0] introduced by commit c48ef9c4aed3 ("tcp: Fix
bind() regression for v4-mapped-v6 non-wildcard address.").

After the cited commit, a v4 socket's address matches the corresponding
v4-mapped-v6 tb2 in inet_bind2_bucket_match_addr(), not vice versa.

During X.X.X.X -> ::ffff:X.X.X.X order bind()s, the second bind() uses
bhash and conflicts properly without checking bhash2 so that we need not
check if a v4-mapped-v6 sk matches the corresponding v4 address tb2 in
inet_bind2_bucket_match_addr().  However, the repro shows that we need
to check that in a no-conflict case.

The repro bind()s two sockets to the 2-tuples using SO_REUSEPORT and calls
listen() for the first socket:

  from socket import *

  s1 = socket()
  s1.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1)
  s1.bind(('127.0.0.1', 0))

  s2 = socket(AF_INET6)
  s2.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1)
  s2.bind(('::ffff:127.0.0.1', s1.getsockname()[1]))

  s1.listen()

The second socket should belong to the first socket's tb2, but the second
bind() creates another tb2 bucket because inet_bind2_bucket_find() returns
NULL in inet_csk_get_port() as the v4-mapped-v6 sk does not match the
corresponding v4 address tb2.

  bhash2[] -> tb2(::ffff:X.X.X.X) -> tb2(X.X.X.X)

Then, listen() for the first socket calls inet_csk_get_port(), where the
v4 address matches the v4-mapped-v6 tb2 and WARN_ON() is triggered.

To avoid that, we need to check if v4-mapped-v6 sk address matches with
the corresponding v4 address tb2 in inet_bind2_bucket_match().

The same checks are needed in inet_bind2_bucket_addr_match() too, so we
can move all checks there and call it from inet_bind2_bucket_match().

Note that now tb->family is just an address family of tb->(v6_)?rcv_saddr
and not of sockets in the bucket.  This could be refactored later by
defining tb->rcv_saddr as tb->v6_rcv_saddr.s6_addr32[3] and prepending
::ffff: when creating v4 tb2.

[0]:
WARNING: CPU: 0 PID: 5049 at net/ipv4/inet_connection_sock.c:587 inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587
Modules linked in:
CPU: 0 PID: 5049 Comm: syz-executor288 Not tainted 6.6.0-rc2-syzkaller-00018-g2cf0f7156238 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
RIP: 0010:inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587
Code: 7c 24 08 e8 4c b6 8a 01 31 d2 be 88 01 00 00 48 c7 c7 e0 94 ae 8b e8 59 2e a3 f8 2e 2e 2e 31 c0 e9 04 fe ff ff e8 ca 88 d0 f8 <0f> 0b e9 0f f9 ff ff e8 be 88 d0 f8 49 8d 7e 48 e8 65 ca 5a 00 31
RSP: 0018:ffffc90003abfbf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888026429100 RCX: 0000000000000000
RDX: ffff88807edcbb80 RSI: ffffffff88b73d66 RDI: ffff888026c49f38
RBP: ffff888026c49f30 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9260f200
R13: ffff888026c49880 R14: 0000000000000000 R15: ffff888026429100
FS:  00005555557d5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000045ad50 CR3: 0000000025754000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 inet_csk_listen_start+0x155/0x360 net/ipv4/inet_connection_sock.c:1256
 __inet_listen_sk+0x1b8/0x5c0 net/ipv4/af_inet.c:217
 inet_listen+0x93/0xd0 net/ipv4/af_inet.c:239
 __sys_listen+0x194/0x270 net/socket.c:1866
 __do_sys_listen net/socket.c:1875 [inline]
 __se_sys_listen net/socket.c:1873 [inline]
 __x64_sys_listen+0x53/0x80 net/socket.c:1873
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f3a5bce3af9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc1a1c79e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000032
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a5bce3af9
RDX: 00007f3a5bce3af9 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007f3a5bd565f0 R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
 </TASK>

Fixes: c48ef9c4aed3 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.")
Reported-by: syzbot+71e724675ba3958edb31@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=71e724675ba3958edb31
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231010013814.70571-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 10:01:49 -07:00
Eric Dumazet
de5724ca38 xfrm: fix a data-race in xfrm_lookup_with_ifid()
syzbot complains about a race in xfrm_lookup_with_ifid() [1]

When preparing commit 0a9e5794b21e ("xfrm: annotate data-race
around use_time") I thought xfrm_lookup_with_ifid() was modifying
a still private structure.

[1]
BUG: KCSAN: data-race in xfrm_lookup_with_ifid / xfrm_lookup_with_ifid

write to 0xffff88813ea41108 of 8 bytes by task 8150 on cpu 1:
xfrm_lookup_with_ifid+0xce7/0x12d0 net/xfrm/xfrm_policy.c:3218
xfrm_lookup net/xfrm/xfrm_policy.c:3270 [inline]
xfrm_lookup_route+0x3b/0x100 net/xfrm/xfrm_policy.c:3281
ip6_dst_lookup_flow+0x98/0xc0 net/ipv6/ip6_output.c:1246
send6+0x241/0x3c0 drivers/net/wireguard/socket.c:139
wg_socket_send_skb_to_peer+0xbd/0x130 drivers/net/wireguard/socket.c:178
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

write to 0xffff88813ea41108 of 8 bytes by task 15867 on cpu 0:
xfrm_lookup_with_ifid+0xce7/0x12d0 net/xfrm/xfrm_policy.c:3218
xfrm_lookup net/xfrm/xfrm_policy.c:3270 [inline]
xfrm_lookup_route+0x3b/0x100 net/xfrm/xfrm_policy.c:3281
ip6_dst_lookup_flow+0x98/0xc0 net/ipv6/ip6_output.c:1246
send6+0x241/0x3c0 drivers/net/wireguard/socket.c:139
wg_socket_send_skb_to_peer+0xbd/0x130 drivers/net/wireguard/socket.c:178
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x00000000651cd9d1 -> 0x00000000651cd9d2

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 15867 Comm: kworker/u4:58 Not tainted 6.6.0-rc4-syzkaller-00016-g5e62ed3b1c8a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Workqueue: wg-kex-wg2 wg_packet_handshake_send_worker

Fixes: 0a9e5794b21e ("xfrm: annotate data-race around use_time")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-10-13 07:57:27 +02:00