Commit Graph

74869 Commits

Author SHA1 Message Date
Ilan Peer
0914468adf wifi: cfg80211: Fix 6GHz scan configuration
When the scan request includes a non broadcast BSSID, when adding the
scan parameters for 6GHz collocated scanning, do not include entries
that do not match the given BSSID.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230918140607.6d31d2a96baf.I6c4e3e3075d1d1878ee41f45190fdc6b86f18708@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25 08:41:11 +02:00
Colin Ian King
5b43bd71f4 wifi: cfg80211: make read-only array centers_80mhz static const
Don't populate the read-only array lanes on the stack, instead make
it static const.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20230919095205.24949-1-colin.i.king@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25 08:40:35 +02:00
Johannes Berg
d097ae01eb wifi: mac80211: fix potential key leak
When returning from ieee80211_key_link(), the key needs to
have been freed or successfully installed. This was missed
in a number of error paths, fix it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25 08:40:07 +02:00
Johannes Berg
31db78a492 wifi: mac80211: fix potential key use-after-free
When ieee80211_key_link() is called by ieee80211_gtk_rekey_add()
but returns 0 due to KRACK protection (identical key reinstall),
ieee80211_gtk_rekey_add() will still return a pointer into the
key, in a potential use-after-free. This normally doesn't happen
since it's only called by iwlwifi in case of WoWLAN rekey offload
which has its own KRACK protection, but still better to fix, do
that by returning an error code and converting that to success on
the cfg80211 boundary only, leaving the error for bad callers of
ieee80211_gtk_rekey_add().

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: fdf7cb4185 ("mac80211: accept key reinstall without changing anything")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25 08:40:04 +02:00
Paolo Abeni
e9cbc89067 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21 21:49:45 +02:00
Linus Torvalds
27bbf45eae Networking fixes for 6.6-rc2, including fixes from netfilter and bpf
Current release - regressions:
 
  - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE
 
  - netfilter: fix entries val in rule reset audit log
 
  - eth: stmmac: fix incorrect rxq|txq_stats reference
 
 Previous releases - regressions:
 
  - ipv4: fix null-deref in ipv4_link_failure
 
  - netfilter:
    - fix several GC related issues
    - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
 
  - eth: team: fix null-ptr-deref when team device type is changed
 
  - eth: i40e: fix VF VLAN offloading when port VLAN is configured
 
  - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB
 
 Previous releases - always broken:
 
  - core: fix ETH_P_1588 flow dissector
 
  - mptcp: fix several connection hang-up conditions
 
  - bpf:
    - avoid deadlock when using queue and stack maps from NMI
    - add override check to kprobe multi link attach
 
  - hsr: properly parse HSRv1 supervisor frames.
 
  - eth: igc: fix infinite initialization loop with early XDP redirect
 
  - eth: octeon_ep: fix tx dma unmap len values in SG
 
  - eth: hns3: fix GRE checksum offload issue
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmUMFG8SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOksHAP+QE2eNf5yxo86dIS+3RQnOQ8kFBnNbEn
 04lrheGnzG7PpNnGoCoTZna+xYQPYVLgbmmip2/CFnQvnQIsKyLQfCui85sfV2V9
 KjUeE/kTgeC+jUQOWNDyz3zDP/MPC2LmiK8Gwyggvm9vFYn5tVZXC36aPZBZ7Vok
 /DUW6iXyl31SeVGOOEKakcwn0GIYJSABhVFNsjrDe4tV+leUwvf8obAq3ZWxOGaU
 D94ez28lSXgfOSWfQQ/l1rHI/yC0fr8HYyWJ60dNG2uS3fNEqT8LyqZfAUK24kVz
 XbAGZa+GA7CDq3cVsU7vCWNWbB5fO+kXtmGOwPtuKtJQM5LPo4X77CuSHlpzdyvq
 TuW0vxeVfdzAYVb3Zg+2QgWxDJjY0B8ujwdDWrnnKTPu4Ylhn6HLISXIlkMBoGwT
 1/47TCnmn9t+lGagkMADppRRnJotHWObQG5wkzksqVa2CUB0HTESgbrm4rsxe6Ku
 JiZhHbTiiPWy7LgY6EFtj/YGPvLs0CSltvh4QUsd+QtDTM/EN7y3HcHqkv88ropG
 bSvJIh6WXdEJkwfSUdA0LECXSC6dizzZW2Y1glnT+7FMlhE1jVY4gruNJ37mCYMb
 0gh9Zr76c2KYLA5vljGp6uo3j3A7wARJTdLfRFVcaFoz6NQmuFf9ZdBfDNDcymxs
 AGvO3j55JAZf
 =AoVg
 -----END PGP SIGNATURE-----

Merge tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and bpf.

  Current release - regressions:

   - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE

   - netfilter: fix entries val in rule reset audit log

   - eth: stmmac: fix incorrect rxq|txq_stats reference

  Previous releases - regressions:

   - ipv4: fix null-deref in ipv4_link_failure

   - netfilter:
      - fix several GC related issues
      - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP

   - eth: team: fix null-ptr-deref when team device type is changed

   - eth: i40e: fix VF VLAN offloading when port VLAN is configured

   - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB

  Previous releases - always broken:

   - core: fix ETH_P_1588 flow dissector

   - mptcp: fix several connection hang-up conditions

   - bpf:
      - avoid deadlock when using queue and stack maps from NMI
      - add override check to kprobe multi link attach

   - hsr: properly parse HSRv1 supervisor frames.

   - eth: igc: fix infinite initialization loop with early XDP redirect

   - eth: octeon_ep: fix tx dma unmap len values in SG

   - eth: hns3: fix GRE checksum offload issue"

* tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
  igc: Expose tx-usecs coalesce setting to user
  octeontx2-pf: Do xdp_do_flush() after redirects.
  bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
  net: ena: Flush XDP packets on error.
  net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
  net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev'
  netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
  netfilter: nf_tables: fix memleak when more than 255 elements expired
  netfilter: nf_tables: disable toggling dormant table state more than once
  vxlan: Add missing entries to vxlan_get_size()
  net: rds: Fix possible NULL-pointer dereference
  team: fix null-ptr-deref when team device type is changed
  net: bridge: use DEV_STATS_INC()
  net: hns3: add 5ms delay before clear firmware reset irq source
  net: hns3: fix fail to delete tc flower rules during reset issue
  net: hns3: only enable unicast promisc when mac table full
  net: hns3: fix GRE checksum offload issue
  net: hns3: add cmdq check for vf periodic service task
  net: stmmac: fix incorrect rxq|txq_stats reference
  ...
2023-09-21 11:28:16 -07:00
Arseniy Krasnov
581512a6dc vsock/virtio: MSG_ZEROCOPY flag support
This adds handling of MSG_ZEROCOPY flag on transmission path:

1) If this flag is set and zerocopy transmission is possible (enabled
   in socket options and transport allows zerocopy), then non-linear
   skb will be created and filled with the pages of user's buffer.
   Pages of user's buffer are locked in memory by 'get_user_pages()'.
2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
   calls 'skb_set_owner_w()'. Reason of this change is that
   '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
   to decrease this field correctly, proper skb destructor is needed:
   'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
   If this callback is set, then transport needs extra check to be able
   to send provided number of buffers in zerocopy mode. Currently, the
   only transport that needs this callback set is virtio, because this
   transport adds new buffers to the virtio queue and we need to check,
   that number of these buffers is less than size of the queue (it is
   required by virtio spec). vhost and loopback transports don't need
   this check.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21 12:34:00 +02:00
Arseniy Krasnov
4b0bf10eb0 vsock/virtio: non-linear skb handling for tap
For tap device new skb is created and data from the current skb is
copied to it. This adds copying data from non-linear skb to new
the skb.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21 12:34:00 +02:00
Arseniy Krasnov
64c99d2d6a vsock/virtio: support to send non-linear skb
For non-linear skb use its pages from fragment array as buffers in
virtio tx queue. These pages are already pinned by 'get_user_pages()'
during such skb creation.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21 12:34:00 +02:00
Arseniy Krasnov
0df7cd3c13 vsock/virtio/vhost: read data from non-linear skb
This is preparation patch for MSG_ZEROCOPY support. It adds handling of
non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with
'skb_copy_datagram_iter()'. Main advantage of the second one is that it
can handle paged part of the skb by using 'kmap()' on each page, but if
there are no pages in the skb, it behaves like simple copying to iov
iterator. This patch also adds new field to the control block of skb -
this value shows current offset in the skb to read next portion of data
(it doesn't matter linear it or not). Idea behind this field is that
'skb_copy_datagram_iter()' handles both types of skb internally - it
just needs an offset from which to copy data from the given skb. This
offset is incremented on each read from skb. This approach allows to
simplify handling of both linear and non-linear skbs, because for
linear skb we need to call 'skb_pull()' after reading data from it,
while in non-linear case we need to update 'data_len'.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21 12:34:00 +02:00
Paolo Abeni
ecf4392600 netfilter PR 2023-09-20
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmUKru0NHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AOLvEACEDiOEB9yhlnx4ZENgNRHMG48qLXKyRs+0cVL/
 PLmc+jNtq/doMLUXOkZUnoTKdr6W5JuTVHMK/BJOWu5QjHCCybeQScx7uIjG8Yl1
 F6zufiZ1g+nzyQsMHh/xQazqb75qnc7XvhzeW2vCiasPIDAaFpzmscX5nDy1rx7Q
 HIpJCippnQ7Ami0X+qdH3SDgP4bK0+YXtLhtXLHGskb1yyP1QynjzUbDdZNHca88
 sLn6OB36DCuANGDtgAFNo1dptkgT2aaE6XBGIVlpwtfo5KjJtzvGRIohxnbTcnSY
 n562GokPQXn+yeJD2ZKW5XpvXTEDSLimzpHm4wNTRjTyrG3kIy+JxBjBM5inZam+
 c7AJjuAZlfgrxbk2J6/laiV2WjqazswHuvoqs4wND2rCgi7Cdw9cBJ7iXy6ijN0J
 oQS8Or0FnOU280hWEJLvp8jbdRzhksvRI5PHYaX5mvOy/iLKB9m6LIlHcjTMx0dh
 NyLVPLx7C8UNOsqNIBe5545FqiRi5z/VxCDdL8AbCIy8cLja8A3D4t2Otggiktx7
 skQCrLSBXcxh8sdmDvUUDfjeRmthqDnIbICCPJlWAZIR8e4rxbh8HxJHXutO7eEs
 QH5CXJsW2psM/AcBrypqb2TQvsOAyGfL7XUt92GGOkHUjr333Qx2E4pGAsS7I+lz
 XSguQA==
 =dTwV
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter updates for net

The following three patches fix regressions in the netfilter subsystem:

1. Reject attempts to repeatedly toggle the 'dormant' flag in a single
   transaction.  Doing so makes nf_tables lose track of the real state
   vs. the desired state.  This ends with an attempt to unregister hooks
   that were never registered in the first place, which yields a splat.

2. Fix element counting in the new nftables garbage collection infra
   that came with 6.5:  More than 255 expired elements wraps a counter
   which results in memory leak.

3. Since 6.4 ipset can BUG when a set is renamed while a CREATE command
   is in progress, fix from Jozsef Kadlecsik.

* tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
  netfilter: nf_tables: fix memleak when more than 255 elements expired
  netfilter: nf_tables: disable toggling dormant table state more than once
====================

Link: https://lore.kernel.org/r/20230920084156.4192-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21 11:09:45 +02:00
Luiz Augusto von Dentz
b938790e70 Bluetooth: hci_codec: Fix leaking content of local_codecs
The following memory leak can be observed when the controller supports
codecs which are stored in local_codecs list but the elements are never
freed:

unreferenced object 0xffff88800221d840 (size 32):
  comm "kworker/u3:0", pid 36, jiffies 4294898739 (age 127.060s)
  hex dump (first 32 bytes):
    f8 d3 02 03 80 88 ff ff 80 d8 21 02 80 88 ff ff  ..........!.....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffb324f557>] __kmalloc+0x47/0x120
    [<ffffffffb39ef37d>] hci_codec_list_add.isra.0+0x2d/0x160
    [<ffffffffb39ef643>] hci_read_codec_capabilities+0x183/0x270
    [<ffffffffb39ef9ab>] hci_read_supported_codecs+0x1bb/0x2d0
    [<ffffffffb39f162e>] hci_read_local_codecs_sync+0x3e/0x60
    [<ffffffffb39ff1b3>] hci_dev_open_sync+0x943/0x11e0
    [<ffffffffb396d55d>] hci_power_on+0x10d/0x3f0
    [<ffffffffb30c99b4>] process_one_work+0x404/0x800
    [<ffffffffb30ca134>] worker_thread+0x374/0x670
    [<ffffffffb30d9108>] kthread+0x188/0x1c0
    [<ffffffffb304db6b>] ret_from_fork+0x2b/0x50
    [<ffffffffb300206a>] ret_from_fork_asm+0x1a/0x30

Cc: stable@vger.kernel.org
Fixes: 8961987f3f ("Bluetooth: Enumerate local supported codec and cache details")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-09-20 11:03:11 -07:00
Luiz Augusto von Dentz
dcda165706 Bluetooth: hci_core: Fix build warnings
This fixes the following warnings:

net/bluetooth/hci_core.c: In function ‘hci_register_dev’:
net/bluetooth/hci_core.c:2620:54: warning: ‘%d’ directive output may
be truncated writing between 1 and 10 bytes into a region of size 5
[-Wformat-truncation=]
 2620 |         snprintf(hdev->name, sizeof(hdev->name), "hci%d", id);
      |                                                      ^~
net/bluetooth/hci_core.c:2620:50: note: directive argument in the range
[0, 2147483647]
 2620 |         snprintf(hdev->name, sizeof(hdev->name), "hci%d", id);
      |                                                  ^~~~~~~
net/bluetooth/hci_core.c:2620:9: note: ‘snprintf’ output between 5 and
14 bytes into a destination of size 8
 2620 |         snprintf(hdev->name, sizeof(hdev->name), "hci%d", id);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-09-20 11:02:41 -07:00
Ying Hsu
1d8e801422 Bluetooth: Avoid redundant authentication
While executing the Android 13 CTS Verifier Secure Server test on a
ChromeOS device, it was observed that the Bluetooth host initiates
authentication for an RFCOMM connection after SSP completes.
When this happens, some Intel Bluetooth controllers, like AC9560, would
disconnect with "Connection Rejected due to Security Reasons (0x0e)".

Historically, BlueZ did not mandate this authentication while an
authenticated combination key was already in use for the connection.
This behavior was changed since commit 7b5a9241b7
("Bluetooth: Introduce requirements for security level 4").
So, this patch addresses the aforementioned disconnection issue by
restoring the previous behavior.

Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-09-20 11:02:22 -07:00
Luiz Augusto von Dentz
e0275ea521 Bluetooth: ISO: Fix handling of listen for unicast
iso_listen_cis shall only return -EADDRINUSE if the listening socket has
the destination set to BDADDR_ANY otherwise if the destination is set to
a specific address it is for broadcast which shall be ignored.

Fixes: f764a6c2c1 ("Bluetooth: ISO: Add broadcast support")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-09-20 11:02:02 -07:00
Ying Hsu
c7eaf80bfb Bluetooth: Fix hci_link_tx_to RCU lock usage
Syzbot found a bug "BUG: sleeping function called from invalid context
at kernel/locking/mutex.c:580". It is because hci_link_tx_to holds an
RCU read lock and calls hci_disconnect which would hold a mutex lock
since the commit a13f316e90 ("Bluetooth: hci_conn: Consolidate code
for aborting connections"). Here's an example call trace:

   __dump_stack lib/dump_stack.c:88 [inline]
   dump_stack_lvl+0xfc/0x174 lib/dump_stack.c:106
   ___might_sleep+0x4a9/0x4d3 kernel/sched/core.c:9663
   __mutex_lock_common kernel/locking/mutex.c:576 [inline]
   __mutex_lock+0xc7/0x6e7 kernel/locking/mutex.c:732
   hci_cmd_sync_queue+0x3a/0x287 net/bluetooth/hci_sync.c:388
   hci_abort_conn+0x2cd/0x2e4 net/bluetooth/hci_conn.c:1812
   hci_disconnect+0x207/0x237 net/bluetooth/hci_conn.c:244
   hci_link_tx_to net/bluetooth/hci_core.c:3254 [inline]
   __check_timeout net/bluetooth/hci_core.c:3419 [inline]
   __check_timeout+0x310/0x361 net/bluetooth/hci_core.c:3399
   hci_sched_le net/bluetooth/hci_core.c:3602 [inline]
   hci_tx_work+0xe8f/0x12d0 net/bluetooth/hci_core.c:3652
   process_one_work+0x75c/0xba1 kernel/workqueue.c:2310
   worker_thread+0x5b2/0x73a kernel/workqueue.c:2457
   kthread+0x2f7/0x30b kernel/kthread.c:319
   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298

This patch releases RCU read lock before calling hci_disconnect and
reacquires it afterward to fix the bug.

Fixes: a13f316e90 ("Bluetooth: hci_conn: Consolidate code for aborting connections")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-09-20 11:01:42 -07:00
Luiz Augusto von Dentz
941c998b42 Bluetooth: hci_sync: Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER
When HCI_QUIRK_STRICT_DUPLICATE_FILTER is set LE scanning requires
periodic restarts of the scanning procedure as the controller would
consider device previously found as duplicated despite of RSSI changes,
but in order to set the scan timeout properly set le_scan_restart needs
to be synchronous so it shall not use hci_cmd_sync_queue which defers
the command processing to cmd_sync_work.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-bluetooth/578e6d7afd676129decafba846a933f5@agner.ch/#t
Fixes: 27d54b778a ("Bluetooth: Rework le_scan_restart for hci_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-09-20 11:00:49 -07:00
Yao Xiao
cbaabbcdcb Bluetooth: Delete unused hci_req_prepare_suspend() declaration
hci_req_prepare_suspend() has been deprecated in favor of
hci_suspend_sync().

Fixes: 182ee45da0 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Signed-off-by: Yao Xiao <xiaoyao@rock-chips.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-09-20 10:55:29 -07:00
Jinjie Ruan
4a0f07d71b net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
When making CONFIG_DEBUG_KMEMLEAK=y and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y,
modprobe handshake-test and then rmmmod handshake-test, the below memory
leak is detected.

The struct socket_alloc which is allocated by alloc_inode_sb() in
__sock_create() is not freed. And the struct dentry which is allocated
by __d_alloc() in sock_alloc_file() is not freed.

Since fput() will call file->f_op->release() which is sock_close() here and
it will call __sock_release(). and fput() will call dput(dentry) to free
the struct dentry. So replace sock_release() with fput() to fix the
below memory leak. After applying this patch, the following memory leak is
never detected.

unreferenced object 0xffff888109165840 (size 768):
  comm "kunit_try_catch", pid 1852, jiffies 4294685807 (age 976.262s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa0209ba2>] 0xffffffffa0209ba2
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810f472008 (size 192):
  comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 08 20 47 0f 81 88 ff ff  ......... G.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0209bbb>] 0xffffffffa0209bbb
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810958e580 (size 224):
  comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0209bbb>] 0xffffffffa0209bbb
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926dc88 (size 192):
  comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 88 dc 26 09 81 88 ff ff  ..........&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208fdc>] 0xffffffffa0208fdc
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a241380 (size 224):
  comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208fdc>] 0xffffffffa0208fdc
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109165040 (size 768):
  comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa0208860>] 0xffffffffa0208860
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926d568 (size 192):
  comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 68 d5 26 09 81 88 ff ff  ........h.&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208879>] 0xffffffffa0208879
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240580 (size 224):
  comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.347s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208879>] 0xffffffffa0208879
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164c40 (size 768):
  comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa0208541>] 0xffffffffa0208541
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926cd18 (size 192):
  comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 18 cd 26 09 81 88 ff ff  ..........&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa020855a>] 0xffffffffa020855a
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240200 (size 224):
  comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa020855a>] 0xffffffffa020855a
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164840 (size 768):
  comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa02093e2>] 0xffffffffa02093e2
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926cab8 (size 192):
  comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 b8 ca 26 09 81 88 ff ff  ..........&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa02093fb>] 0xffffffffa02093fb
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240040 (size 224):
  comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa02093fb>] 0xffffffffa02093fb
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109166440 (size 768):
  comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa02097c1>] 0xffffffffa02097c1
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926c398 (size 192):
  comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 98 c3 26 09 81 88 ff ff  ..........&.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa02097da>] 0xffffffffa02097da
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107e0b8c0 (size 224):
  comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa02097da>] 0xffffffffa02097da
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164440 (size 768):
  comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.487s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00  ......ZZ .......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
    [<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
    [<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
    [<ffffffff8397889c>] sock_alloc+0x3c/0x260
    [<ffffffff83979b46>] __sock_create+0x66/0x3d0
    [<ffffffffa020824e>] 0xffffffffa020824e
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810f4cf698 (size 192):
  comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s)
  hex dump (first 32 bytes):
    00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00  ..P@............
    00 00 00 00 00 00 00 00 98 f6 4c 0f 81 88 ff ff  ..........L.....
  backtrace:
    [<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
    [<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
    [<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208267>] 0xffffffffa0208267
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107e0b000 (size 224):
  comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
    [<ffffffff819d4cf9>] alloc_file+0x59/0x730
    [<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
    [<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
    [<ffffffffa0208267>] 0xffffffffa0208267
    [<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
    [<ffffffff81236fc6>] kthread+0x2b6/0x380
    [<ffffffff81096afd>] ret_from_fork+0x2d/0x70
    [<ffffffff81003511>] ret_from_fork_asm+0x11/0x20

Fixes: 88232ec1ec ("net/handshake: Add Kunit tests for the handshake consumer API")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-20 11:54:49 +01:00
Colin Ian King
6c0da84063 wifi: cfg80211: make read-only array centers_80mhz static const
Don't populate the read-only array lanes on the stack, instead make
it static const.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-20 11:52:13 +01:00
Jozsef Kadlecsik
7433b6d2af netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
Kyle Zeng reported that there is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP
in netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on a
wrong `set`, triggering the `BUG_ON(set->ref == 0);` check in it.

The race is caused by using the wrong reference counter, i.e. the ref counter instead
of ref_netlink.

Fixes: 24e227896b ("netfilter: ipset: Add schedule point in call_ad().")
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/ZPZqetxOmH+w%2Fmyc@westworld/#r
Tested-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-20 10:35:24 +02:00
Florian Westphal
cf5000a778 netfilter: nf_tables: fix memleak when more than 255 elements expired
When more than 255 elements expired we're supposed to switch to a new gc
container structure.

This never happens: u8 type will wrap before reaching the boundary
and nft_trans_gc_space() always returns true.

This means we recycle the initial gc container structure and
lose track of the elements that came before.

While at it, don't deref 'gc' after we've passed it to call_rcu.

Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-20 10:35:23 +02:00
Florian Westphal
c9bd26513b netfilter: nf_tables: disable toggling dormant table state more than once
nft -f -<<EOF
add table ip t
add table ip t { flags dormant; }
add chain ip t c { type filter hook input priority 0; }
add table ip t
EOF

Triggers a splat from nf core on next table delete because we lose
track of right hook register state:

WARNING: CPU: 2 PID: 1597 at net/netfilter/core.c:501 __nf_unregister_net_hook
RIP: 0010:__nf_unregister_net_hook+0x41b/0x570
 nf_unregister_net_hook+0xb4/0xf0
 __nf_tables_unregister_hook+0x160/0x1d0
[..]

The above should have table in *active* state, but in fact no
hooks were registered.

Reject on/off/on games rather than attempting to fix this.

Fixes: 179d9ba555 ("netfilter: nf_tables: fix table flag updates")
Reported-by: "Lee, Cherie-Anne" <cherie.lee@starlabs.sg>
Cc: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Cc: info@starlabs.sg
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-20 10:35:23 +02:00
Artem Chernyshev
f1d95df0f3 net: rds: Fix possible NULL-pointer dereference
In rds_rdma_cm_event_handler_cmn() check, if conn pointer exists
before dereferencing it as rdma_set_service_type() argument

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: fd261ce6a3 ("rds: rdma: update rdma transport for tos")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-20 08:49:03 +01:00
Eric Dumazet
fa17a6d8a5 ipv6: lockless IPV6_ADDR_PREFERENCES implementation
We have data-races while reading np->srcprefs

Switch the field to a plain byte, add READ_ONCE()
and WRITE_ONCE() annotations where needed,
and IPV6_ADDR_PREFERENCES setsockopt() can now be lockless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230918142321.1794107-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-19 18:21:44 +02:00
Eric Dumazet
44bdb313da net: bridge: use DEV_STATS_INC()
syzbot/KCSAN reported data-races in br_handle_frame_finish() [1]
This function can run from multiple cpus without mutual exclusion.

Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.

Handles updates to dev->stats.tx_dropped while we are at it.

[1]
BUG: KCSAN: data-race in br_handle_frame_finish / br_handle_frame_finish

read-write to 0xffff8881374b2178 of 8 bytes by interrupt on cpu 1:
br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189
br_nf_hook_thresh+0x1ed/0x220
br_nf_pre_routing_finish_ipv6+0x50f/0x540
NF_HOOK include/linux/netfilter.h:304 [inline]
br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178
br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508
nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:272 [inline]
br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417
__netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417
__netif_receive_skb_one_core net/core/dev.c:5521 [inline]
__netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637
process_backlog+0x21f/0x380 net/core/dev.c:5965
__napi_poll+0x60/0x3b0 net/core/dev.c:6527
napi_poll net/core/dev.c:6594 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6727
__do_softirq+0xc1/0x265 kernel/softirq.c:553
run_ksoftirqd+0x17/0x20 kernel/softirq.c:921
smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

read-write to 0xffff8881374b2178 of 8 bytes by interrupt on cpu 0:
br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189
br_nf_hook_thresh+0x1ed/0x220
br_nf_pre_routing_finish_ipv6+0x50f/0x540
NF_HOOK include/linux/netfilter.h:304 [inline]
br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178
br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508
nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:272 [inline]
br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417
__netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417
__netif_receive_skb_one_core net/core/dev.c:5521 [inline]
__netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637
process_backlog+0x21f/0x380 net/core/dev.c:5965
__napi_poll+0x60/0x3b0 net/core/dev.c:6527
napi_poll net/core/dev.c:6594 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6727
__do_softirq+0xc1/0x265 kernel/softirq.c:553
do_softirq+0x5e/0x90 kernel/softirq.c:454
__local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
_raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210
spin_unlock_bh include/linux/spinlock.h:396 [inline]
batadv_tt_local_purge+0x1a8/0x1f0 net/batman-adv/translation-table.c:1356
batadv_tt_purge+0x2b/0x630 net/batman-adv/translation-table.c:3560
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x00000000000d7190 -> 0x00000000000d7191

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 14848 Comm: kworker/u4:11 Not tainted 6.6.0-rc1-syzkaller-00236-gad8a69f361b9 #0

Fixes: 1c29fc4989 ("[BRIDGE]: keep track of received multicast packets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: bridge@lists.linux-foundation.org
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230918091351.1356153-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-19 13:35:15 +02:00
Linus Torvalds
2cf0f71562 NFS Client Bugfixes for Linux 6.6
Bugfixes:
   * Various O_DIRECT related fixes from Trond
     * Error handling
     * Locking issues
     * Use the correct commit infor for joining page groups
     * Fixes for rescheduling IO
   * Sunrpc bad verifier fixes
     * Report EINVAL errors from connect()
     * Revalidate creds that the server has rejected
     * Revert "SUNRPC: Fail faster on bad verifier"
   * Fix pNFS session trunking when MDS=DS
   * Fix zero-value filehandles for post-open getattr operations
   * Fix compiler warning about tautological comparisons
     * Revert "SUNRPC: clean up integer overflow check" before Trond's fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmUIjesACgkQ18tUv7Cl
 QOv/MBAAxxZrW45qlTsEMvhnFXu1cvt08IdqivWURuSxK6oV31tARMqw00MxzhsF
 ubNX4hynDNmsWYJLryHTTGwUJzaNQOhAGjU46AhwFgS3Lj2wefeuR3IYuEDgQcHF
 8uXrrT/GWRIG4a/Uu1IxodfgMvgUPV+HGFc5BQFGkXGqPQdgbdxBDtOtyU1QlmaV
 rqKnxPgUibEZwVoz8+PNsN+pwNjKSaSR3pH53KKS6HUBAM69sT4jJiJDO/UGSlzb
 F9U1DHPeyltHXVAaQXUjHlsjh49NbIjzD5/T8xKWscvS7rbQFCcJkUl3MBHyvSEr
 eixx7muqOMCEm4lNFJqHIY7XrEN+50wi92tKj05Bz9wrnp4L84cxSV/KNZrKNA3T
 LCeIm4DGynPcLPopEUm9lqZMrabwvdV4wZSiybB6p/F/5ksVyaMnwTLzZZ1+NhkB
 OglNhF4L6m18/Ai/bY6XOixzgzWfcmYfXRhq7YoeO/JSvixG9Sk2crC6ryjWadgo
 xbitjjCYXHl3MUkH2TcaQoMGFhmvSShXg//5YXpxm3/0C3EVPa44T7/aFJqelL2p
 kkVsLcjcevAOwBHxehYLofL6c3GhLqRnwmTV7yqgJqfZf8uGxeQhXCMqOmv57/c0
 3iJJM4Llb14B+t3xl3T8MotT4WejzlHDnJKrscipfXJwRT7UDIM=
 =M6aI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "Various O_DIRECT related fixes from Trond:
   - Error handling
   - Locking issues
   - Use the correct commit info for joining page groups
   - Fixes for rescheduling IO

  Sunrpc bad verifier fixes:
   - Report EINVAL errors from connect()
   - Revalidate creds that the server has rejected
   - Revert "SUNRPC: Fail faster on bad verifier"

  Misc:
   - Fix pNFS session trunking when MDS=DS
   - Fix zero-value filehandles for post-open getattr operations
   - Fix compiler warning about tautological comparisons
   - Revert 'SUNRPC: clean up integer overflow check' before Trond's fix"

* tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: Silence compiler complaints about tautological comparisons
  Revert "SUNRPC: clean up integer overflow check"
  NFSv4.1: fix zero value filehandle in post open getattr
  NFSv4.1: fix pnfs MDS=DS session trunking
  Revert "SUNRPC: Fail faster on bad verifier"
  SUNRPC: Mark the cred for revalidation if the server rejects it
  NFS/pNFS: Report EINVAL errors from connect() to the server
  NFS: More fixes for nfs_direct_write_reschedule_io()
  NFS: Use the correct commit info in nfs_join_page_group()
  NFS: More O_DIRECT accounting fixes for error paths
  NFS: Fix O_DIRECT locking issues
  NFS: Fix error handling for O_DIRECT write scheduling
2023-09-18 12:16:34 -07:00
Peter Lafreniere
71273c46a3 ax25: Kconfig: Update link for linux-ax25.org
http://linux-ax25.org has been down for nearly a year. Its official
replacement is https://linux-ax25.in-berlin.de. Change all references to
the old site in the ax25 Kconfig to its replacement.

Link: https://marc.info/?m=166792551600315
Signed-off-by: Peter Lafreniere <peter@n8pjl.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 12:56:58 +01:00
Paolo Abeni
27e5ccc2d5 mptcp: fix dangling connection hang-up
According to RFC 8684 section 3.3:

  A connection is not closed unless [...] or an implementation-specific
  connection-level send timeout.

Currently the MPTCP protocol does not implement such timeout, and
connection timing-out at the TCP-level never move to close state.

Introduces a catch-up condition at subflow close time to move the
MPTCP socket to close, too.

That additionally allows removing similar existing inside the worker.

Finally, allow some additional timeout for plain ESTABLISHED mptcp
sockets, as the protocol allows creating new subflows even at that
point and making the connection functional again.

This issue is actually present since the beginning, but it is basically
impossible to solve without a long chain of functional pre-requisites
topped by commit bbd49d114d ("mptcp: consolidate transition to
TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current
patch, please also backport this other commit as well.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/430
Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 12:47:56 +01:00
Paolo Abeni
f6909dc1c1 mptcp: rename timer related helper to less confusing names
The msk socket uses to different timeout to track close related
events and retransmissions. The existing helpers do not indicate
clearly which timer they actually touch, making the related code
quite confusing.

Change the existing helpers name to avoid such confusion. No
functional change intended.

This patch is linked to the next one ("mptcp: fix dangling connection
hang-up"). The two patches are supposed to be backported together.

Cc: stable@vger.kernel.org # v5.11+
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 12:47:56 +01:00
Paolo Abeni
9f1a98813b mptcp: process pending subflow error on close
On incoming TCP reset, subflow closing could happen before error
propagation. That in turn could cause the socket error being ignored,
and a missing socket state transition, as reported by Daire-Byrne.

Address the issues explicitly checking for subflow socket error at
close time. To avoid code duplication, factor-out of __mptcp_error_report()
a new helper implementing the relevant bits.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/429
Fixes: 15cc104533 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 12:47:56 +01:00
Paolo Abeni
d5fbeff1ab mptcp: move __mptcp_error_report in protocol.c
This will simplify the next patch ("mptcp: process pending subflow error
on close").

No functional change intended.

Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 12:47:56 +01:00
Paolo Abeni
6bec041147 mptcp: fix bogus receive window shrinkage with multiple subflows
In case multiple subflows race to update the mptcp-level receive
window, the subflow losing the race should use the window value
provided by the "winning" subflow to update it's own tcp-level
rcv_wnd.

To such goal, the current code bogusly uses the mptcp-level rcv_wnd
value as observed before the update attempt. On unlucky circumstances
that may lead to TCP-level window shrinkage, and stall the other end.

Address the issue feeding to the rcv wnd update the correct value.

Fixes: f3589be0c4 ("mptcp: never shrink offered window")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/427
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 12:47:55 +01:00
Kees Cook
1cb6422eca ceph: Annotate struct ceph_monmap with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct ceph_monmap.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 10:39:29 +01:00
Gustavo A. R. Silva
2506a91734 tipc: Use size_add() in calls to struct_size()
If, for any reason, the open-coded arithmetic causes a wraparound,
the protection that `struct_size()` adds against potential integer
overflows is defeated. Fix this by hardening call to `struct_size()`
with `size_add()`.

Fixes: e034c6d23b ("tipc: Use struct_size() helper")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 09:38:32 +01:00
Gustavo A. R. Silva
a2713257ee tls: Use size_add() in call to struct_size()
If, for any reason, the open-coded arithmetic causes a wraparound,
the protection that `struct_size()` adds against potential integer
overflows is defeated. Fix this by hardening call to `struct_size()`
with `size_add()`.

Fixes: b89fec54fd ("tls: rx: wrap decrypt params in a struct")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 09:37:23 +01:00
Wen Gong
ddd7f45c89 wifi: cfg80211: save power spectral density(psd) of regulatory rule
6 GHz regulatory domains introduces Power Spectral Density (PSD).
The PSD value of the regulatory rule should be taken into effect
for the ieee80211_channels falling into that particular regulatory
rule. Save the values in the channel which has PSD value and add
nl80211 attributes accordingly to handle it.

Co-developed-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Link: https://lore.kernel.org/r/20230914082026.3709-1-quic_wgong@quicinc.com
[use hole in chan flags, reword docs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-18 09:44:05 +02:00
Johannes Berg
2c3dfba4cf rfkill: sync before userspace visibility/changes
If userspace quickly opens /dev/rfkill after a new
instance was created, it might see the old state of
the instance from before the sync work runs and may
even _change_ the state, only to have the sync work
change it again.

Fix this by doing the sync inline where needed, not
just for /dev/rfkill but also for sysfs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-18 09:36:57 +02:00
Sebastian Andrzej Siewior
fbd825fcd7 net: hsr: Add __packed to struct hsr_sup_tlv.
Struct hsr_sup_tlv describes HW layout and therefore it needs a __packed
attribute to ensure the compiler does not add any padding.
Due to the size and __packed attribute of the structs that use
hsr_sup_tlv it has no functional impact.

Add __packed to struct hsr_sup_tlv.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 08:26:19 +01:00
Lukasz Majewski
295de650d3 net: hsr: Properly parse HSRv1 supervisor frames.
While adding support for parsing the redbox supervision frames, the
author added `pull_size' and `total_pull_size' to track the amount of
bytes that were pulled from the skb during while parsing the skb so it
can be reverted/ pushed back at the end.
In the process probably copy&paste error occurred and for the HSRv1 case
the ethhdr was used instead of the hsr_tag. Later the hsr_tag was used
instead of hsr_sup_tag. The later error didn't matter because both
structs have the size so HSRv0 was still working. It broke however HSRv1
parsing because struct ethhdr is larger than struct hsr_tag.

Reinstate the old pulling flow and pull first ethhdr, hsr_tag in v1 case
followed by hsr_sup_tag.

[bigeasy: commit message]

Fixes: eafaa88b3e ("net: hsr: Add support for redbox supervision frames")'
Suggested-by: Tristram.Ha@microchip.com
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 08:26:19 +01:00
Eric Dumazet
6af289746a dccp: fix dccp_v4_err()/dccp_v6_err() again
dh->dccph_x is the 9th byte (offset 8) in "struct dccp_hdr",
not in the "byte 7" as Jann claimed.

We need to make sure the ICMP messages are big enough,
using more standard ways (no more assumptions).

syzbot reported:
BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2667 [inline]
BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2681 [inline]
BUG: KMSAN: uninit-value in dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94
pskb_may_pull_reason include/linux/skbuff.h:2667 [inline]
pskb_may_pull include/linux/skbuff.h:2681 [inline]
dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94
icmpv6_notify+0x4c7/0x880 net/ipv6/icmp.c:867
icmpv6_rcv+0x19d5/0x30d0
ip6_protocol_deliver_rcu+0xda6/0x2a60 net/ipv6/ip6_input.c:438
ip6_input_finish net/ipv6/ip6_input.c:483 [inline]
NF_HOOK include/linux/netfilter.h:304 [inline]
ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492
ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586
dst_input include/net/dst.h:468 [inline]
ip6_rcv_finish+0x5db/0x870 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:304 [inline]
ipv6_rcv+0xda/0x390 net/ipv6/ip6_input.c:310
__netif_receive_skb_one_core net/core/dev.c:5523 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5637
netif_receive_skb_internal net/core/dev.c:5723 [inline]
netif_receive_skb+0x58/0x660 net/core/dev.c:5782
tun_rx_batched+0x83b/0x920
tun_get_user+0x564c/0x6940 drivers/net/tun.c:2002
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:1985 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x8ef/0x15c0 fs/read_write.c:584
ksys_write+0x20f/0x4c0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Uninit was created at:
slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767
slab_alloc_node mm/slub.c:3478 [inline]
kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:559
__alloc_skb+0x318/0x740 net/core/skbuff.c:650
alloc_skb include/linux/skbuff.h:1286 [inline]
alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6313
sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2795
tun_alloc_skb drivers/net/tun.c:1531 [inline]
tun_get_user+0x23cf/0x6940 drivers/net/tun.c:1846
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:1985 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x8ef/0x15c0 fs/read_write.c:584
ksys_write+0x20f/0x4c0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

CPU: 0 PID: 4995 Comm: syz-executor153 Not tainted 6.6.0-rc1-syzkaller-00014-ga747acc0b752 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023

Fixes: 977ad86c2a ("dccp: Fix out of bounds access in DCCP error handler")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jann Horn <jannh@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 07:10:31 +01:00
Johnathan Mantey
3780bb2931 ncsi: Propagate carrier gain/loss events to the NCSI controller
Report the carrier/no-carrier state for the network interface
shared between the BMC and the passthrough channel. Without this
functionality the BMC is unable to reconfigure the NIC in the event
of a re-cabling to a different subnet.

Signed-off-by: Johnathan Mantey <johnathanx.mantey@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-18 07:06:05 +01:00
Kyle Zeng
0113d9c9d1 ipv4: fix null-deref in ipv4_link_failure
Currently, we assume the skb is associated with a device before calling
__ip_options_compile, which is not always the case if it is re-routed by
ipvs.
When skb->dev is NULL, dev_net(skb->dev) will become null-dereference.
This patch adds a check for the edge case and switch to use the net_device
from the rtable when skb->dev is NULL.

Fixes: ed0de45a10 ("ipv4: recompile ip options in ipv4_link_failure")
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Cc: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 15:14:58 +01:00
David S. Miller
685c6d5b2c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 79 files changed, 5275 insertions(+), 600 deletions(-).

The main changes are:

1) Basic BTF validation in libbpf, from Andrii Nakryiko.

2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi.

3) next_thread cleanups, from Oleg Nesterov.

4) Add mcpu=v4 support to arm32, from Puranjay Mohan.

5) Add support for __percpu pointers in bpf progs, from Yonghong Song.

6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang.

7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot!

Also thanks to reporters, reviewers and testers of commits in this pull-request:

Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman",
Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle),
Song Liu, Stanislav Fomichev, Yonghong Song
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 15:12:06 +01:00
Jiri Pirko
c5e1bf8a51 devlink: introduce possibility to expose info about nested devlinks
In mlx5, there is a devlink instance created for PCI device. Also, one
separate devlink instance is created for auxiliary device that
represents the netdev of uplink port. This relation is currently
invisible to the devlink user.

Benefit from the rel infrastructure and allow for nested devlink
instance to set the relationship for the nested-in devlink instance.
Note that there may be many nested instances, therefore use xarray to
hold the list of rel_indexes for individual nested instances.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 14:01:47 +01:00
Jiri Pirko
9473bc0119 devlink: convert linecard nested devlink to new rel infrastructure
Benefit from the newly introduced rel infrastructure, treat the linecard
nested devlink instances in the same way as port function instances.
Convert the code to use the rel infrastructure.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 14:01:47 +01:00
Jiri Pirko
0b7a2721e3 devlink: expose peer SF devlink instance
Introduce a new helper devl_port_fn_devlink_set() to be used by driver
assigning a devlink instance to the peer devlink port function.

Expose this to user over new netlink attribute nested under port
function nest to expose devlink handle related to the port function.

This is particularly helpful for user to understand the relationship
between devlink instances created for SFs and the port functions
they belong to.

Note that caller of devlink_port_notify() needs to hold devlink
instance lock, put the assertion to devl_port_fn_devlink_set() to make
this requirement explicit. Also note the limitations that only allow to
make this assignment for registered objects.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 14:01:47 +01:00
Jiri Pirko
c137743bce devlink: introduce object and nested devlink relationship infra
It is a bit tricky to maintain relationship between devlink objects and
nested devlink instances due to following aspects:

1) Locking. It is necessary to lock the devlink instance that contains
   the object first, only after that to lock the nested instance.
2) Lifetimes. Objects (e.g devlink port) may be removed before
   the nested devlink instance.
3) Notifications. If nested instance changes (e.g. gets
   registered/unregistered) the nested-in object needs to send
   appropriate notifications.

Resolve this by introducing an xarray that holds 1:1 relationships
between devlink object and related nested devlink instance.
Use that xarray index to get the object/nested devlink instance on
the other side.

Provide necessary helpers:
devlink_rel_nested_in_add/clear() to add and clear the relationship.
devlink_rel_nested_in_notify() to call the nested-in object to send
	notifications during nested instance register/unregister/netns
	change.
devlink_rel_devlink_handle_put() to be used by nested-in object fill
	function to fill the nested handle.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 14:01:47 +01:00
Jiri Pirko
1c2197c47a devlink: extend devlink_nl_put_nested_handle() with attrtype arg
As the next patch is going to call this helper with need to fill another
type of nested attribute, pass it over function arg.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 14:01:46 +01:00
Jiri Pirko
af1f1400af devlink: move devlink_nl_put_nested_handle() into netlink.c
As the next patch is going to call this helper out of the linecard.c,
move to netlink.c.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 14:01:46 +01:00
Jiri Pirko
ad99637ac9 devlink: put netnsid to nested handle
If netns of devlink instance and nested devlink instance differs,
put netnsid attr to indicate that.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 14:01:46 +01:00
Jiri Pirko
d0b7e990f7 devlink: move linecard struct into linecard.c
Instead of exposing linecard struct, expose a simple helper to get the
linecard index, which is all is needed outside linecard.c. Move the
linecard struct to linecard.c and keep it private similar to the rest of
the devlink objects.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 14:01:46 +01:00
Jiri Pirko
5f18426928 netdev: expose DPLL pin handle for netdevice
In case netdevice represents a SyncE port, the user needs to understand
the connection between netdevice and associated DPLL pin. There might me
multiple netdevices pointing to the same pin, in case of VF/SF
implementation.

Add a IFLA Netlink attribute to nest the DPLL pin handle, similar to
how it is implemented for devlink port. Add a struct dpll_pin pointer
to netdev and protect access to it by RTNL. Expose netdev_dpll_pin_set()
and netdev_dpll_pin_clear() helpers to the drivers so they can set/clear
the DPLL pin relationship to netdev.

Note that during the lifetime of struct dpll_pin the pin handle does not
change. Therefore it is save to access it lockless. It is drivers
responsibility to call netdev_dpll_pin_clear() before dpll_pin_put().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 11:50:20 +01:00
Aananth V
3868ab0f19 tcp: new TCP_INFO stats for RTO events
The 2023 SIGCOMM paper "Improving Network Availability with Protective
ReRoute" has indicated Linux TCP's RTO-triggered txhash rehashing can
effectively reduce application disruption during outages. To better
measure the efficacy of this feature, this patch adds three more
detailed stats during RTO recovery and exports via TCP_INFO.
Applications and monitoring systems can leverage this data to measure
the network path diversity and end-to-end repair latency during network
outages to improve their network infrastructure.

The following counters are added to tcp_sock in order to track RTO
events over the lifetime of a TCP socket.

1. u16 total_rto - Counts the total number of RTO timeouts.
2. u16 total_rto_recoveries - Counts the total number of RTO recoveries.
3. u32 total_rto_time - Counts the total time spent (ms) in RTO
                        recoveries. (time spent in CA_Loss and
                        CA_Recovery states)

To compute total_rto_time, we add a new u32 rto_stamp field to
tcp_sock. rto_stamp records the start timestamp (ms) of the last RTO
recovery (CA_Loss).

Corresponding fields are also added to the tcp_info struct.

Signed-off-by: Aananth V <aananthv@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16 13:42:34 +01:00
Aananth V
e326578a21 tcp: call tcp_try_undo_recovery when an RTOd TFO SYNACK is ACKed
For passive TCP Fast Open sockets that had SYN/ACK timeout and did not
send more data in SYN_RECV, upon receiving the final ACK in 3WHS, the
congestion state may awkwardly stay in CA_Loss mode unless the CA state
was undone due to TCP timestamp checks. However, if
tcp_rcv_synrecv_state_fastopen() decides not to undo, then we should
enter CA_Open, because at that point we have received an ACK covering
the retransmitted SYNACKs. Currently, the icsk_ca_state is only set to
CA_Open after we receive an ACK for a data-packet. This is because
tcp_ack does not call tcp_fastretrans_alert (and tcp_process_loss) if
!prior_packets

Note that tcp_process_loss() calls tcp_try_undo_recovery(), so having
tcp_rcv_synrecv_state_fastopen() decide that if we're in CA_Loss we
should call tcp_try_undo_recovery() is consistent with that, and
low risk.

Fixes: dad8cea7ad ("tcp: fix TFO SYNACK undo to avoid double-timestamp-undo")
Signed-off-by: Aananth V <aananthv@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16 13:42:34 +01:00
Andy Shevchenko
aabb4af9bb net: core: Use the bitmap API to allocate bitmaps
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them.
It is less verbose and it improves the type checking and semantic.

While at it, add missing header inclusion (should be bitops.h,
but with the above change it becomes bitmap.h).

Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230911154534.4174265-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16 13:32:30 +01:00
David S. Miller
1612cc4b14 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================

The following pull-request contains BPF updates for your *net* tree.

We've added 21 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 450 insertions(+), 36 deletions(-).

The main changes are:

1) Adjust bpf_mem_alloc buckets to match ksize(), from Hou Tao.

2) Check whether override is allowed in kprobe mult, from Jiri Olsa.

3) Fix btf_id symbol generation with ld.lld, from Jiri and Nick.

4) Fix potential deadlock when using queue and stack maps from NMI, from Toke Høiland-Jørgensen.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

Also thanks to reporters, reviewers and testers of commits in this pull-request:

Alan Maguire, Biju Das, Björn Töpel, Dan Carpenter, Daniel Borkmann,
Eduard Zingerman, Hsin-Wei Hung, Marcus Seyfarth, Nathan Chancellor,
Satya Durga Srinivasu Prabhala, Song Liu, Stephen Rothwell
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16 11:16:00 +01:00
Eric Dumazet
c123e0d30b net: add truesize debug checks in skb_{add|coalesce}_rx_frag()
It can be time consuming to track driver bugs, that might be detected
too late from this confusing warning in skb_try_coalesce()

	WARN_ON_ONCE(delta < len);

Add sanity check in skb_add_rx_frag() and skb_coalesce_rx_frag()
to better track bug origin for CONFIG_DEBUG_NET=y builds.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16 10:10:27 +01:00
Eric Dumazet
41862d12e7 net: use indirect call helpers for sk->sk_prot->release_cb()
When adding sk->sk_prot->release_cb() call from __sk_flush_backlog()
Paolo suggested using indirect call helpers to take care of
CONFIG_RETPOLINE=y case.

It turns out Google had such mitigation for years in release_sock(),
it is time to make this public :)

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16 10:09:43 +01:00
Stanislav Fomichev
a9c2a60854 bpf: expose information about supported xdp metadata kfunc
Add new xdp-rx-metadata-features member to netdev netlink
which exports a bitmask of supported kfuncs. Most of the patch
is autogenerated (headers), the only relevant part is netdev.yaml
and the changes in netdev-genl.c to marshal into netlink.

Example output on veth:

$ ip link add veth0 type veth peer name veth1 # ifndex == 12
$ ./tools/net/ynl/samples/netdev 12

Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12
   veth1[12]    xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0

Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 11:26:58 -07:00
Stanislav Fomichev
fc45c5b642 bpf: make it easier to add new metadata kfunc
No functional changes.

Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc,
move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx.
This way, any time new kfunc is added, we don't have to touch
bpf_dev_bound_resolve_kfunc.

Also document XDP_METADATA_KFUNC_xxx arguments since we now have
more than two and it might be confusing what is what.

Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230913171350.369987-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 11:26:58 -07:00
Tirthendu Sarkar
d609f3d228 xsk: add multi-buffer support for sockets sharing umem
Userspace applications indicate their multi-buffer capability to xsk
using XSK_USE_SG socket bind flag. For sockets using shared umem the
bind flag may contain XSK_USE_SG only for the first socket. For any
subsequent socket the only option supported is XDP_SHARED_UMEM.

Add option XDP_UMEM_SG_FLAG in umem config flags to store the
multi-buffer handling capability when indicated by XSK_USE_SG option in
bing flag by the first socket. Use this to derive multi-buffer capability
for subsequent sockets in xsk core.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Fixes: 81470b5c3c ("xsk: introduce XSK_USE_SG bind flag for xsk socket")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230907035032.2627879-1-tirthendu.sarkar@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 11:00:22 -07:00
Ilya Leoshkevich
837723b22a netfilter, bpf: Adjust timeouts of non-confirmed CTs in bpf_ct_insert_entry()
bpf_nf testcase fails on s390x: bpf_skb_ct_lookup() cannot find the entry
that was added by bpf_ct_insert_entry() within the same BPF function.

The reason is that this entry is deleted by nf_ct_gc_expired().

The CT timeout starts ticking after the CT confirmation; therefore
nf_conn.timeout is initially set to the timeout value, and
__nf_conntrack_confirm() sets it to the deadline value.

bpf_ct_insert_entry() sets IPS_CONFIRMED_BIT, but does not adjust the
timeout, making its value meaningless and causing false positives.

Fix the problem by making bpf_ct_insert_entry() adjust the timeout,
like __nf_conntrack_confirm().

Fixes: 2cdaa3eefe ("netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/bpf/20230830011128.1415752-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 10:17:55 -07:00
David S. Miller
615efed8b6 netfilter pull request 23-09-13
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmUCKboACgkQ1V2XiooU
 IOQ1CxAAqKwyeROJ7+qLvIBbwRFIQr70pPCjfY/GskP9aqljhth+e5TsKurWA12X
 wwbVhQ9xblvxarekR4B8lwGhvenYHk3l6R/3wuTMYPHFTXkE+mluGgljffaMwV+D
 YywK5hOkLenBZmxdjUfdJ87DJwAadcbLOABmEiSQ3hDxj3/xTBf7gToqlSwHtjCC
 JDC7vhxjosQHQSLhjqetfrUauz0OZAqldZ2is/FELYg56oCGKddGAZxnC4fQBnXx
 DzvRroP8f8bkqGjKwkt945bKiQ4Cz1frQE+YP1+pRk0rOkv70hhzH0JXIELQ5q9L
 RYLFfgkemp2HfBJ+y2PK8lBDailre4MdGdsAI5eWjBXgrl3jRBybioafhhUbJVIq
 Q3zIzXVgLQqXwSONBF2sfVssVZzhfjAzZQzzgw3wayhWj1WgwqsCb0EChvA4FJZ7
 HW4xyROeOV7GHoUAWCPcoeBiNJYKmGNWjkWwlT4q5LtYMyWWP9oYx2kOn9/JQ9QI
 Tth8QobntRr8Gw/f0awGULM2pcecCLyYhIoJtWctegFSN2ejrKiV9XItbxZ3G1in
 3pYSVgpyve9ZAvHmTSyvh+mjZ71X2ZebLyMADrWbsHrCXgIUSUkoksQd97XsffeZ
 noRVlLj0MlfRlUoorDQG3A+QxdQb+ZaHkBKTOEzouKOYEj6vylY=
 =TgRd
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

netfilter pull request 23-09-13

====================

The following patchset contains Netfilter fixes for net:

1) Do not permit to remove rules from chain binding, otherwise
   double rule release is possible, triggering UaF. This rule
   deletion support does not make sense and userspace does not use
   this. Problem exists since the introduction of chain binding support.

2) rbtree GC worker only collects the elements that have expired.
   This operation is not destructive, therefore, turn write into
   read spinlock to avoid datapath contention due to GC worker run.
   This was not fixed in the recent GC fix batch in the 6.5 cycle.

3) pipapo set backend performs sync GC, therefore, catchall elements
   must use sync GC queue variant. This bug was introduced in the
   6.5 cycle with the recent GC fixes.

4) Stop GC run if memory allocation fails in pipapo set backend,
   otherwise access to NULL pointer to GC transaction object might
   occur. This bug was introduced in the 6.5 cycle with the recent
   GC fixes.

5) rhash GC run uses an iterator that might hit EAGAIN to rewind,
   triggering double-collection of the same element. This bug was
   introduced in the 6.5 cycle with the recent GC fixes.

6) Do not permit to remove elements in anonymous sets, this type of
   sets are populated once and then bound to rules. This fix is
   similar to the chain binding patch coming first in this batch.
   API permits since the very beginning but it has no use case from
   userspace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 13:56:58 +01:00
Dan Carpenter
4fa5ce3e3a tcp: indent an if statement
Indent this if statement one tab.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 13:55:50 +01:00
Sasha Neftin
75ad80ed88 net/core: Fix ETH_P_1588 flow dissector
When a PTP ethernet raw frame with a size of more than 256 bytes followed
by a 0xff pattern is sent to __skb_flow_dissect, nhoff value calculation
is wrong. For example: hdr->message_length takes the wrong value (0xffff)
and it does not replicate real header length. In this case, 'nhoff' value
was overridden and the PTP header was badly dissected. This leads to a
kernel crash.

net/core: flow_dissector
net/core flow dissector nhoff = 0x0000000e
net/core flow dissector hdr->message_length = 0x0000ffff
net/core flow dissector nhoff = 0x0001000d (u16 overflow)
...
skb linear:   00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88
skb frag:     00000000: f7 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Using the size of the ptp_header struct will allow the corrected
calculation of the nhoff value.

net/core flow dissector nhoff = 0x0000000e
net/core flow dissector nhoff = 0x00000030 (sizeof ptp_header)
...
skb linear:   00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88 f7 ff ff
skb linear:   00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
skb linear:   00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
skb frag:     00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Kernel trace:
[   74.984279] ------------[ cut here ]------------
[   74.989471] kernel BUG at include/linux/skbuff.h:2440!
[   74.995237] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   75.001098] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G     U            5.15.85-intel-ese-standard-lts #1
[   75.011629] Hardware name: Intel Corporation A-Island (CPU:AlderLake)/A-Island (ID:06), BIOS SB_ADLP.01.01.00.01.03.008.D-6A9D9E73-dirty Mar 30 2023
[   75.026507] RIP: 0010:eth_type_trans+0xd0/0x130
[   75.031594] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9
[   75.052612] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297
[   75.058473] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003
[   75.066462] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300
[   75.074458] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800
[   75.082466] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010
[   75.090461] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800
[   75.098464] FS:  0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000
[   75.107530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   75.113982] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0
[   75.121980] PKRU: 55555554
[   75.125035] Call Trace:
[   75.127792]  <IRQ>
[   75.130063]  ? eth_get_headlen+0xa4/0xc0
[   75.134472]  igc_process_skb_fields+0xcd/0x150
[   75.139461]  igc_poll+0xc80/0x17b0
[   75.143272]  __napi_poll+0x27/0x170
[   75.147192]  net_rx_action+0x234/0x280
[   75.151409]  __do_softirq+0xef/0x2f4
[   75.155424]  irq_exit_rcu+0xc7/0x110
[   75.159432]  common_interrupt+0xb8/0xd0
[   75.163748]  </IRQ>
[   75.166112]  <TASK>
[   75.168473]  asm_common_interrupt+0x22/0x40
[   75.173175] RIP: 0010:cpuidle_enter_state+0xe2/0x350
[   75.178749] Code: 85 c0 0f 8f 04 02 00 00 31 ff e8 39 6c 67 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 50 02 00 00 31 ff e8 52 b0 6d ff fb 45 85 f6 <0f> 88 b1 00 00 00 49 63 ce 4c 2b 2c 24 48 89 c8 48 6b d1 68 48 c1
[   75.199757] RSP: 0018:ffff9948c013bea8 EFLAGS: 00000202
[   75.205614] RAX: ffff8e4e8fb00000 RBX: ffffb948bfd23900 RCX: 000000000000001f
[   75.213619] RDX: 0000000000000004 RSI: ffffffff94206161 RDI: ffffffff94212e20
[   75.221620] RBP: 0000000000000004 R08: 000000117568973a R09: 0000000000000001
[   75.229622] R10: 000000000000afc8 R11: ffff8e4e8fb29ce4 R12: ffffffff945ae980
[   75.237628] R13: 000000117568973a R14: 0000000000000004 R15: 0000000000000000
[   75.245635]  ? cpuidle_enter_state+0xc7/0x350
[   75.250518]  cpuidle_enter+0x29/0x40
[   75.254539]  do_idle+0x1d9/0x260
[   75.258166]  cpu_startup_entry+0x19/0x20
[   75.262582]  secondary_startup_64_no_verify+0xc2/0xcb
[   75.268259]  </TASK>
[   75.270721] Modules linked in: 8021q snd_sof_pci_intel_tgl snd_sof_intel_hda_common tpm_crb snd_soc_hdac_hda snd_sof_intel_hda snd_hda_ext_core snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress iTCO_wdt ac97_bus intel_pmc_bxt mei_hdcp iTCO_vendor_support snd_hda_codec_hdmi pmt_telemetry intel_pmc_core pmt_class snd_hda_intel x86_pkg_temp_thermal snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel snd_pcm snd_timer kvm snd mei_me soundcore tpm_tis irqbypass i2c_i801 mei tpm_tis_core pcspkr intel_rapl_msr tpm i2c_smbus intel_pmt thermal sch_fq_codel uio uhid i915 drm_buddy video drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm fuse configfs
[   75.342736] ---[ end trace 3785f9f360400e3a ]---
[   75.347913] RIP: 0010:eth_type_trans+0xd0/0x130
[   75.352984] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9
[   75.373994] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297
[   75.379860] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003
[   75.387856] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300
[   75.395864] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800
[   75.403857] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010
[   75.411863] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800
[   75.419875] FS:  0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000
[   75.428946] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   75.435403] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0
[   75.443410] PKRU: 55555554
[   75.446477] Kernel panic - not syncing: Fatal exception in interrupt
[   75.453738] Kernel Offset: 0x11c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   75.465794] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: 4f1cc51f34 ("net: flow_dissector: Parse PTP L2 packet header")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:40:04 +01:00
Eric Dumazet
859f8b265f ipv6: lockless IPV6_FLOWINFO_SEND implementation
np->sndflow reads are racy.

Use one bit ftom atomic inet->inet_flags instead,
IPV6_FLOWINFO_SEND setsockopt() can be lockless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:48 +01:00
Eric Dumazet
6b724bc430 ipv6: lockless IPV6_MTU_DISCOVER implementation
Most np->pmtudisc reads are racy.

Move this 3bit field on a full byte, add annotations
and make IPV6_MTU_DISCOVER setsockopt() lockless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:48 +01:00
Eric Dumazet
83cd5eb654 ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementation
Reads from np->rtalert_isolate are racy.

Move this flag to inet->inet_flags to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:48 +01:00
Eric Dumazet
3cccda8db2 ipv6: move np->repflow to atomic flags
Move np->repflow to inet->inet_flags to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:48 +01:00
Eric Dumazet
3fa29971c6 ipv6: lockless IPV6_RECVERR implemetation
np->recverr is moved to inet->inet_flags to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:48 +01:00
Eric Dumazet
1086ca7cce ipv6: lockless IPV6_DONTFRAG implementation
Move np->dontfrag flag to inet->inet_flags to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:47 +01:00
Eric Dumazet
5121516b0c ipv6: lockless IPV6_AUTOFLOWLABEL implementation
Move np->autoflowlabel and np->autoflowlabel_set in inet->inet_flags,
to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:47 +01:00
Eric Dumazet
6559c0ff3b ipv6: lockless IPV6_MULTICAST_ALL implementation
Move np->mc_all to an atomic flags to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:47 +01:00
Eric Dumazet
dcae74622c ipv6: lockless IPV6_RECVERR_RFC4884 implementation
Move np->recverr_rfc4884 to an atomic flag to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:47 +01:00
Eric Dumazet
273784d3c5 ipv6: lockless IPV6_MINHOPCOUNT implementation
Add one missing READ_ONCE() annotation in do_ipv6_getsockopt()
and make IPV6_MINHOPCOUNT setsockopt() lockless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:47 +01:00
Eric Dumazet
15f926c445 ipv6: lockless IPV6_MTU implementation
np->frag_size can be read/written without holding socket lock.

Add missing annotations and make IPV6_MTU setsockopt() lockless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:46 +01:00
Eric Dumazet
2da23eb07c ipv6: lockless IPV6_MULTICAST_HOPS implementation
This fixes data-races around np->mcast_hops,
and make IPV6_MULTICAST_HOPS lockless.

Note that np->mcast_hops is never negative,
thus can fit an u8 field instead of s16.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:46 +01:00
Eric Dumazet
d986f52124 ipv6: lockless IPV6_MULTICAST_LOOP implementation
Add inet6_{test|set|clear|assign}_bit() helpers.

Note that I am using bits from inet->inet_flags,
this might change in the future if we need more flags.

While solving data-races accessing np->mc_loop,
this patch also allows to implement lockless accesses
to np->mcast_hops in the following patch.

Also constify sk_mc_loop() argument.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:46 +01:00
Eric Dumazet
b0adfba7ee ipv6: lockless IPV6_UNICAST_HOPS implementation
Some np->hop_limit accesses are racy, when socket lock is not held.

Add missing annotations and switch to full lockless implementation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15 10:33:46 +01:00
Paolo Abeni
f2fa1c812c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 19:48:23 +02:00
Gavrilov Ilia
59bb1d6980 ipv6: mcast: Remove redundant comparison in igmp6_mcf_get_next()
The 'state->im' value will always be non-zero after
the 'while' statement, so the check can be removed.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230912084100.1502379-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 17:20:35 +02:00
Gavrilov Ilia
a613ed1afd ipv4: igmp: Remove redundant comparison in igmp_mcf_get_next()
The 'state->im' value will always be non-zero after
the 'while' statement, so the check can be removed.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230912084039.1501984-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 17:20:17 +02:00
Eric Dumazet
882af43a0f udplite: fix various data-races
udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy.

Move udp->pcflag to udp->udp_flags for atomicity,
and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen.

Fixes: ba4e58eca8 ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
729549aa35 udplite: remove UDPLITE_BIT
This flag is set but never read, we can remove it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
70a36f5713 udp: annotate data-races around udp->encap_type
syzbot/KCSAN complained about UDP_ENCAP_L2TPINUDP setsockopt() racing.

Add READ_ONCE()/WRITE_ONCE() to document races on this lockless field.

syzbot report was:
BUG: KCSAN: data-race in udp_lib_setsockopt / udp_lib_setsockopt

read-write to 0xffff8881083603fa of 1 bytes by task 16557 on cpu 0:
udp_lib_setsockopt+0x682/0x6c0
udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697
__sys_setsockopt+0x1c9/0x230 net/socket.c:2263
__do_sys_setsockopt net/socket.c:2274 [inline]
__se_sys_setsockopt net/socket.c:2271 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2271
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read-write to 0xffff8881083603fa of 1 bytes by task 16554 on cpu 1:
udp_lib_setsockopt+0x682/0x6c0
udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779
sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697
__sys_setsockopt+0x1c9/0x230 net/socket.c:2263
__do_sys_setsockopt net/socket.c:2274 [inline]
__se_sys_setsockopt net/socket.c:2271 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2271
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x01 -> 0x05

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 16554 Comm: syz-executor.5 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
ac9a7f4ce5 udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO
Move udp->encap_enabled to udp->udp_flags.

Add udp_test_and_set_bit() helper to allow lockless
udp_tunnel_encap_enable() implementation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
f5f52f0884 udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flags
These are read locklessly, move them to udp_flags to fix data-races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
6d5a12eb91 udp: add missing WRITE_ONCE() around up->encap_rcv
UDP_ENCAP_ESPINUDP_NON_IKE setsockopt() writes over up->encap_rcv
while other cpus read it.

Fixes: 067b207b28 ("[UDP]: Cleanup UDP encapsulation code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
e1dc0615c6 udp: move udp->gro_enabled to udp->udp_flags
syzbot reported that udp->gro_enabled can be read locklessly.
Use one atomic bit from udp->udp_flags.

Fixes: e20cf8d3f1 ("udp: implement GRO for plain UDP sockets.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
bcbc1b1de8 udp: move udp->no_check6_rx to udp->udp_flags
syzbot reported that udp->no_check6_rx can be read locklessly.
Use one atomic bit from udp->udp_flags.

Fixes: 1c19448c9b ("net: Make enabling of zero UDP6 csums more restrictive")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
a0002127cd udp: move udp->no_check6_tx to udp->udp_flags
syzbot reported that udp->no_check6_tx can be read locklessly.
Use one atomic bit from udp->udp_flags

Fixes: 1c19448c9b ("net: Make enabling of zero UDP6 csums more restrictive")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Eric Dumazet
81b36803ac udp: introduce udp->udp_flags
According to syzbot, it is time to use proper atomic flags
for various UDP flags.

Add udp_flags field, and convert udp->corkflag to first
bit in it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 16:16:36 +02:00
Kuniyuki Iwashima
a22730b1b4 kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().
syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88bd7
("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by
updating kcm_tx_msg(head)->last_skb if partial data is copied so that the
following sendmsg() will resume from the skb.

However, we cannot know how many bytes were copied when we get the error.
Thus, we could mess up the MSG_MORE queue.

When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we
do so for UDP by udp_flush_pending_frames().

Even without this change, when the error occurred, the following sendmsg()
resumed from a wrong skb and the queue was messed up.  However, we have
yet to get such a report, and only syzkaller stumbled on it.  So, this
can be changed safely.

Note this does not change SOCK_SEQPACKET behaviour.

Fixes: c821a88bd7 ("kcm: Fix memory leak in error path of kcm_sendmsg()")
Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 10:43:51 +02:00
Arseniy Krasnov
8ecf0cedc0 vsock: send SIGPIPE on write to shutdowned socket
POSIX requires to send SIGPIPE on write to SOCK_STREAM socket which was
shutdowned with SHUT_WR flag or its peer was shutdowned with SHUT_RD
flag. Also we must not send SIGPIPE if MSG_NOSIGNAL flag is set.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14 08:19:55 +02:00
Phil Sutter
7fb818f248 netfilter: nf_tables: Fix entries val in rule reset audit log
The value in idx and the number of rules handled in that particular
__nf_tables_dump_rules() call is not identical. The former is a cursor
to pick up from if multiple netlink messages are needed, so its value is
ever increasing. Fixing this is not just a matter of subtracting s_idx
from it, though: When resetting rules in multiple chains,
__nf_tables_dump_rules() is called for each and cb->args[0] is not
adjusted in between. Introduce a dedicated counter to record the number
of rules reset in this call in a less confusing way.

While being at it, prevent the direct return upon buffer exhaustion: Any
rules previously dumped into that skb would evade audit logging
otherwise.

Fixes: 9b5ba5c9c5 ("netfilter: nf_tables: Unbreak audit log reset")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-13 21:57:50 +02:00
Florian Westphal
4908d5af16 netfilter: conntrack: fix extension size table
The size table is incorrect due to copypaste error,
this reserves more size than needed.

TSTAMP reserved 32 instead of 16 bytes.
TIMEOUT reserved 16 instead of 8 bytes.

Fixes: 5f31edc067 ("netfilter: conntrack: move extension sizes into core")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-13 21:57:50 +02:00
Olga Kornievskaia
806a3bc421 NFSv4.1: fix pnfs MDS=DS session trunking
Currently, when GETDEVICEINFO returns multiple locations where each
is a different IP but the server's identity is same as MDS, then
nfs4_set_ds_client() finds the existing nfs_client structure which
has the MDS's max_connect value (and if it's 1), then the 1st IP
on the DS's list will get dropped due to MDS trunking rules. Other
IPs would be added as they fall under the pnfs trunking rules.

For the list of IPs the 1st goes thru calling nfs4_set_ds_client()
which will eventually call nfs4_add_trunk() and call into
rpc_clnt_test_and_add_xprt() which has the check for MDS trunking.
The other IPs (after the 1st one), would call rpc_clnt_add_xprt()
which doesn't go thru that check.

nfs4_add_trunk() is called when MDS trunking is happening and it
needs to enforce the usage of max_connect mount option of the
1st mount. However, this shouldn't be applied to pnfs flow.

Instead, this patch proposed to treat MDS=DS as DS trunking and
make sure that MDS's max_connect limit does not apply to the
1st IP returned in the GETDEVICEINFO list. It does so by
marking the newly created client with a new flag NFS_CS_PNFS
which then used to pass max_connect value to use into the
rpc_clnt_test_and_add_xprt() instead of the existing rpc
client's max_connect value set by the MDS connection.

For example, mount was done without max_connect value set
so MDS's rpc client has cl_max_connect=1. Upon calling into
rpc_clnt_test_and_add_xprt() and using rpc client's value,
the caller passes in max_connect value which is previously
been set in the pnfs path (as a part of handling
GETDEVICEINFO list of IPs) in nfs4_set_ds_client().

However, when NFS_CS_PNFS flag is not set and we know we
are doing MDS trunking, comparing a new IP of the same
server, we then set the max_connect value to the
existing MDS's value and pass that into
rpc_clnt_test_and_add_xprt().

Fixes: dc48e0abee ("SUNRPC enforce creation of no more than max_connect xprts")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13 11:51:11 -04:00
Trond Myklebust
e86fcf0820 Revert "SUNRPC: Fail faster on bad verifier"
This reverts commit 0701214cd6.

The premise of this commit was incorrect. There are exactly 2 cases
where rpcauth_checkverf() will return an error:

1) If there was an XDR decode problem (i.e. garbage data).
2) If gss_validate() had a problem verifying the RPCSEC_GSS MIC.

In the second case, there are again 2 subcases:

a) The GSS context expires, in which case gss_validate() will force a
   new context negotiation on retry by invalidating the cred.
b) The sequence number check failed because an RPC call timed out, and
   the client retransmitted the request using a new sequence number,
   as required by RFC2203.

In neither subcase is this a fatal error.

Reported-by: Russell Cattelan <cattelan@thebarn.com>
Fixes: 0701214cd6 ("SUNRPC: Fail faster on bad verifier")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13 11:51:11 -04:00
Trond Myklebust
611fa42dfa SUNRPC: Mark the cred for revalidation if the server rejects it
If the server rejects the credential as being stale, or bad, then we
should mark it for revalidation before retransmitting.

Fixes: 7f5667a5f8 ("SUNRPC: Clean up rpc_verify_header()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13 11:51:11 -04:00
Ping-Ke Shih
e160ab8516 wifi: mac80211: don't return unset power in ieee80211_get_tx_power()
We can get a UBSAN warning if ieee80211_get_tx_power() returns the
INT_MIN value mac80211 internally uses for "unset power level".

 UBSAN: signed-integer-overflow in net/wireless/nl80211.c:3816:5
 -2147483648 * 100 cannot be represented in type 'int'
 CPU: 0 PID: 20433 Comm: insmod Tainted: G        WC OE
 Call Trace:
  dump_stack+0x74/0x92
  ubsan_epilogue+0x9/0x50
  handle_overflow+0x8d/0xd0
  __ubsan_handle_mul_overflow+0xe/0x10
  nl80211_send_iface+0x688/0x6b0 [cfg80211]
  [...]
  cfg80211_register_wdev+0x78/0xb0 [cfg80211]
  cfg80211_netdev_notifier_call+0x200/0x620 [cfg80211]
  [...]
  ieee80211_if_add+0x60e/0x8f0 [mac80211]
  ieee80211_register_hw+0xda5/0x1170 [mac80211]

In this case, simply return an error instead, to indicate
that no data is available.

Cc: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20230203023636.4418-1-pkshih@realtek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 16:29:24 +02:00
Stephen Douthit
3e99b4d282 wifi: mac80211: Sanity check tx bitrate if not provided by driver
If the driver doesn't fill NL80211_STA_INFO_TX_BITRATE in sta_set_sinfo()
then as a fallback sta->deflink.tx_stats.last_rate is used.  Unfortunately
there's no guarantee that this has actually been set before it's used.

Originally found when 'iw <dev> link' would always return a tx rate of
6Mbps regardless of actual link speed for the QCA9337 running firmware
WLAN.TF.2.1-00021-QCARMSWP-1 in my netbook.

Use the sanity check logic from ieee80211_fill_rx_status() and refactor
that to use the new inline function.

Signed-off-by: Stephen Douthit <stephen.douthit@gmail.com>
Link: https://lore.kernel.org/r/20230213204024.3377-1-stephen.douthit@gmail.com
[change to bool ..._rate_valid() instead of int ..._rate_invalid()]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 16:24:05 +02:00
Pedro Tammela
ef765c2587 net/sched: cls_route: make netlink errors meaningful
Use netlink extended ack and parsing policies to return more meaningful
errors instead of the relying solely on errnos.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 12:38:52 +01:00
Aditya Kumar Singh
30ca8b0c4d wifi: cfg80211: export DFS CAC time and usable state helper functions
cfg80211 has cfg80211_chandef_dfs_usable() function to know whether
at least one channel in the chandef is in usable state or not. Also,
cfg80211_chandef_dfs_cac_time() function is there which tells the CAC
time required for the given chandef.

Make these two functions visible to drivers by exporting their symbol
to global list of kernel symbols.

Lower level drivers can make use of these two functions to be aware
if CAC is required on the given chandef and for how long. For example
drivers which maintains the CAC state internally can make use of these.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230912051857.2284-2-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 13:24:11 +02:00
Abhishek Kumar
b13b6bbfbb wifi: cfg80211: call reg_call_notifier on beacon hints
Currently the channel property updates are not propagated to
driver. This causes issues in the discovery of hidden SSIDs and
fails to connect to them.
This change defines a new wiphy flag which when enabled by vendor
driver, the reg_call_notifier callback will be trigger on beacon
hints. This ensures that the channel property changes are visible
to the vendor driver. The vendor changes the channels for active
scans. This fixes the discovery issue of hidden SSID.

Signed-off-by: Abhishek Kumar <kuabhs@chromium.org>
Link: https://lore.kernel.org/r/20230629035254.1.I059fe585f9f9e896c2d51028ef804d197c8c009e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 12:34:01 +02:00
Raj Kumar Bhagat
13ba6794d2 wifi: cfg80211: allow reg update by driver even if wiphy->regd is set
Currently regulatory update by driver is not allowed when the
wiphy->regd is already set and drivers_request->intersect is false.

During wiphy registration, some drivers (ath10k does this currently)
first register the world regulatory to cfg80211 using
wiphy_apply_custom_regulatory(). The driver then obtain the current
operating country and tries to update the correct regulatory to
cfg80211 using regulatory_hint().

But at this point, wiphy->regd is already set to world regulatory.
Also, since this is the first request from driver after the world
regulatory is set this will result in drivers_request->intersect
set to false. In this condition the driver request regulatory is not
allowed to update to cfg80211 in reg_set_rd_driver(). This restricts
the device operation to the world regulatory.

This driver request to update the regulatory with current operating
country is valid and should be updated to cfg80211. Hence allow
regulatory update by driver even if the wiphy->regd is already set
and driver_request->intersect is false.

Signed-off-by: Raj Kumar Bhagat <quic_rajkbhag@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230421061312.13722-1-quic_rajkbhag@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 12:34:01 +02:00
Aloka Dixit
6bc5ddb2fd wifi: mac80211: additions to change_beacon()
Process FILS discovery and unsolicited broadcast probe response
transmission configurations in ieee80211_change_beacon().

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230727174100.11721-6-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 12:34:01 +02:00
Aloka Dixit
b2d431d43c wifi: nl80211: additions to NL80211_CMD_SET_BEACON
FILS discovery and unsolicited broadcast probe response templates
need to be updated along with beacon templates in some cases such as
the channel switch operation. Add the missing implementation.

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230727174100.11721-5-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 12:34:01 +02:00
Aloka Dixit
66f85d57b7 wifi: cfg80211: modify prototype for change_beacon
Modify the prototype for change_beacon() in struct cfg80211_op to
accept cfg80211_ap_settings instead of cfg80211_beacon_data so that
it can process data in addition to beacons.
Modify the prototypes of ieee80211_change_beacon() and driver specific
functions accordingly.

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230727174100.11721-4-quic_alokad@quicinc.com
[while at it, remove pointless "if (info)" check in tracing that just
 makes all the lines longer than they need be - it's never NULL]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 12:34:01 +02:00
Aloka Dixit
3b1c256eb4 wifi: mac80211: fixes in FILS discovery updates
FILS discovery configuration gets updated only if the maximum interval
is set to a non-zero value, hence there is no way to reset this value
to 0 once set. Replace the check for interval with a new flag which is
set only if the configuration should be updated.

Add similar changes for the unsolicited broadcast probe response handling.

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230727174100.11721-3-quic_alokad@quicinc.com
[move NULL'ing to else branch to not have intermediate NULL visible]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 12:34:01 +02:00
Aloka Dixit
0cfaec2599 wifi: nl80211: fixes to FILS discovery updates
Add a new flag 'update' which is set to true during start_ap()
if (and only if) one of the following two conditions are met:
- Userspace passed an empty nested attribute which indicates that
  the feature should be disabled and templates deleted.
- Userspace passed all the parameters for the nested attribute.

Existing configuration will not be changed while the flag
remains false.

Add similar changes for unsolicited broadcast probe response
transmission.

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230727174100.11721-2-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 12:34:00 +02:00
Tom Rix
e04b1973e2 wifi: lib80211: remove unused variables iv32 and iv16
clang with W=1 reports
net/wireless/lib80211_crypt_tkip.c:667:7: error: variable 'iv32'
  set but not used [-Werror,-Wunused-but-set-variable]
                u32 iv32 = tkey->tx_iv32;
                    ^
This variable not used so remove it.
Then remove a similar iv16 variable.
Change the comment because the unmodified value is returned.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230517123310.873023-1-trix@redhat.com
[change commit log wrt. 'length', add comment in the code]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 12:33:58 +02:00
Johannes Berg
2400dfe23f wifi: mac80211: remove shifted rate support
We really cannot even get into this as we can't have
a BSS with a 5/10 MHz (scan) width, and therefore all
the code handling shifted rates cannot happen. Remove
it all, since it's broken anyway, at least with MLO.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 11:22:16 +02:00
Johannes Berg
5add321c32 wifi: cfg80211: remove scan_width support
There really isn't any support for scanning at different
channel widths than 20 MHz since there's no way to set it.
Remove this support for now, if somebody wants to maintain
this whole thing later we can revisit how it should work.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 11:10:50 +02:00
Dmitry Antipov
22446b7ee2 wifi: wext: avoid extra calls to strlen() in ieee80211_bss()
Since 'sprintf()' returns the number of characters emitted, an
extra calls to 'strlen()' in 'ieee80211_bss()' may be dropped.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://lore.kernel.org/r/20230912035522.15947-1-dmantipov@yandex.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 10:20:49 +02:00
Felix Fietkau
6e48ebffc2 wifi: mac80211: fix mesh id corruption on 32 bit systems
Since the changed field size was increased to u64, mesh_bss_info_changed
pulls invalid bits from the first 3 bytes of the mesh id, clears them, and
passes them on to ieee80211_link_info_change_notify, because
ifmsh->mbss_changed was not updated to match its size.
Fix this by turning into ifmsh->mbss_changed into an unsigned long array with
64 bit size.

Fixes: 15ddba5f43 ("wifi: mac80211: consistently use u64 for BSS changes")
Reported-by: Thomas Hühn <thomas.huehn@hs-nordhausen.de>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230913050134.53536-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-13 10:14:44 +02:00
Kuniyuki Iwashima
c48ef9c4ae tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.
Since bhash2 was introduced, the example below does not work as expected.
These two bind() should conflict, but the 2nd bind() now succeeds.

  from socket import *

  s1 = socket(AF_INET6, SOCK_STREAM)
  s1.bind(('::ffff:127.0.0.1', 0))

  s2 = socket(AF_INET, SOCK_STREAM)
  s2.bind(('127.0.0.1', s1.getsockname()[1]))

During the 2nd bind() in inet_csk_get_port(), inet_bind2_bucket_find()
fails to find the 1st socket's tb2, so inet_bind2_bucket_create() allocates
a new tb2 for the 2nd socket.  Then, we call inet_csk_bind_conflict() that
checks conflicts in the new tb2 by inet_bhash2_conflict().  However, the
new tb2 does not include the 1st socket, thus the bind() finally succeeds.

In this case, inet_bind2_bucket_match() must check if AF_INET6 tb2 has
the conflicting v4-mapped-v6 address so that inet_bind2_bucket_find()
returns the 1st socket's tb2.

Note that if we bind two sockets to 127.0.0.1 and then ::FFFF:127.0.0.1,
the 2nd bind() fails properly for the same reason mentinoed in the previous
commit.

Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 07:18:04 +01:00
Kuniyuki Iwashima
aa99e5f87b tcp: Fix bind() regression for v4-mapped-v6 wildcard address.
Andrei Vagin reported bind() regression with strace logs.

If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4
socket to 127.0.0.1, the 2nd bind() should fail but now succeeds.

  from socket import *

  s1 = socket(AF_INET6, SOCK_STREAM)
  s1.bind(('::ffff:0.0.0.0', 0))

  s2 = socket(AF_INET, SOCK_STREAM)
  s2.bind(('127.0.0.1', s1.getsockname()[1]))

During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is
AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check
if tb has the v4-mapped-v6 wildcard address.

The example above does not work after commit 5456262d2b ("net: Fix
incorrect address comparison when searching for a bind2 bucket"), but
the blamed change is not the commit.

Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated
as 0.0.0.0, and the sequence above worked by chance.  Technically, this
case has been broken since bhash2 was introduced.

Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0,
the 2nd bind() fails properly because we fall back to using bhash to
detect conflicts for the v4-mapped-v6 address.

Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Andrei Vagin <avagin@google.com>
Closes: https://lore.kernel.org/netdev/ZPuYBOFC8zsK6r9T@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 07:18:04 +01:00
Kuniyuki Iwashima
c6d277064b tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any).
This is a prep patch to make the following patches cleaner that touch
inet_bind2_bucket_match() and inet_bind2_bucket_match_addr_any().

Both functions have duplicated comparison for netns, port, and l3mdev.
Let's factorise them.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 07:18:04 +01:00
David S. Miller
7e6cadf51a Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-09-11 (i40e, iavf)

This series contains updates to i40e and iavf drivers.

Andrii ensures all VSIs are cleaned up for remove in i40e.

Brett reworks logic for setting promiscuous mode that can, currently, cause
incorrect states on iavf.
---
v2:
 - Remove redundant i40e_vsi_free_q_vectors() and kfree() calls (patch 1)

v1: https://lore.kernel.org/netdev/20230905180521.887861-1-anthony.l.nguyen@intel.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13 06:50:58 +01:00
Eric Dumazet
133c4c0d37 tcp: defer regular ACK while processing socket backlog
This idea came after a particular workload requested
the quickack attribute set on routes, and a performance
drop was noticed for large bulk transfers.

For high throughput flows, it is best to use one cpu
running the user thread issuing socket system calls,
and a separate cpu to process incoming packets from BH context.
(With TSO/GRO, bottleneck is usually the 'user' cpu)

Problem is the user thread can spend a lot of time while holding
the socket lock, forcing BH handler to queue most of incoming
packets in the socket backlog.

Whenever the user thread releases the socket lock, it must first
process all accumulated packets in the backlog, potentially
adding latency spikes. Due to flood mitigation, having too many
packets in the backlog increases chance of unexpected drops.

Backlog processing unfortunately shifts a fair amount of cpu cycles
from the BH cpu to the 'user' cpu, thus reducing max throughput.

This patch takes advantage of the backlog processing,
and the fact that ACK are mostly cumulative.

The idea is to detect we are in the backlog processing
and defer all eligible ACK into a single one,
sent from tcp_release_cb().

This saves cpu cycles on both sides, and network resources.

Performance of a single TCP flow on a 200Gbit NIC:

- Throughput is increased by 20% (100Gbit -> 120Gbit).
- Number of generated ACK per second shrinks from 240,000 to 40,000.
- Number of backlog drops per second shrinks from 230 to 0.

Benchmark context:
 - Regular netperf TCP_STREAM (no zerocopy)
 - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids)
 - MAX_SKB_FRAGS = 17 (~60KB per GRO packet)

This feature is guarded by a new sysctl, and enabled by default:
 /proc/sys/net/ipv4/tcp_backlog_ack_defer

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
Eric Dumazet
4505dc2a52 net: call prot->release_cb() when processing backlog
__sk_flush_backlog() / sk_flush_backlog() are used
when TCP recvmsg()/sendmsg() process large chunks,
to not let packets in the backlog too long.

It makes sense to call tcp_release_cb() to also
process actions held in sk->sk_tsq_flags for smoother
scheduling.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
Eric Dumazet
b49d252216 tcp: no longer release socket ownership in tcp_release_cb()
This partially reverts c3f9b01849 ("tcp: tcp_release_cb()
should release socket ownership").

prequeue has been removed by Florian in commit e7942d0633
("tcp: remove prequeue support")

__tcp_checksum_complete_user() being gone, we no longer
have to release socket ownership in tcp_release_cb().

This is a prereq for third patch in the series
("net: call prot->release_cb() when processing backlog").

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 19:10:01 +02:00
Zhengchao Shao
762c8dc7f2 net: dst: remove unnecessary input parameter in dst_alloc and dst_init
Since commit 1202cdd66531("Remove DECnet support from kernel") has been
merged, all callers pass in the initial_ref value of 1 when they call
dst_alloc(). Therefore, remove initial_ref when the dst_alloc() is
declared and replace initial_ref with 1 in dst_alloc().
Also when all callers call dst_init(), the value of initial_ref is 1.
Therefore, remove the input parameter initial_ref of the dst_init() and
replace initial_ref with the value 1 in dst_init.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20230911125045.346390-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 11:42:25 +02:00
Liu Jian
cfaa80c91f net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
I got the below warning when do fuzzing test:
BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470
Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9

CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G           OE
Hardware name: linux,dummy-virt (DT)
Workqueue: pencrypt_parallel padata_parallel_worker
Call trace:
 dump_backtrace+0x0/0x420
 show_stack+0x34/0x44
 dump_stack+0x1d0/0x248
 __kasan_report+0x138/0x140
 kasan_report+0x44/0x6c
 __asan_load4+0x94/0xd0
 scatterwalk_copychunks+0x320/0x470
 skcipher_next_slow+0x14c/0x290
 skcipher_walk_next+0x2fc/0x480
 skcipher_walk_first+0x9c/0x110
 skcipher_walk_aead_common+0x380/0x440
 skcipher_walk_aead_encrypt+0x54/0x70
 ccm_encrypt+0x13c/0x4d0
 crypto_aead_encrypt+0x7c/0xfc
 pcrypt_aead_enc+0x28/0x84
 padata_parallel_worker+0xd0/0x2dc
 process_one_work+0x49c/0xbdc
 worker_thread+0x124/0x880
 kthread+0x210/0x260
 ret_from_fork+0x10/0x18

This is because the value of rec_seq of tls_crypto_info configured by the
user program is too large, for example, 0xffffffffffffff. In addition, TLS
is asynchronously accelerated. When tls_do_encryption() returns
-EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow,
skmsg is released before the asynchronous encryption process ends. As a
result, the UAF problem occurs during the asynchronous processing of the
encryption module.

If the operation is asynchronous and the encryption module returns
EINPROGRESS, do not free the record information.

Fixes: 635d939817 ("net/tls: free record only on encryption error")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12 09:51:49 +02:00
Johannes Berg
37c20b2eff wifi: cfg80211: fix cqm_config access race
Max Schulze reports crashes with brcmfmac. The reason seems
to be a race between userspace removing the CQM config and
the driver calling cfg80211_cqm_rssi_notify(), where if the
data is freed while cfg80211_cqm_rssi_notify() runs it will
crash since it assumes wdev->cqm_config is set. This can't
be fixed with a simple non-NULL check since there's nothing
we can do for locking easily, so use RCU instead to protect
the pointer, but that requires pulling the updates out into
an asynchronous worker so they can sleep and call back into
the driver.

Since we need to change the free anyway, also change it to
go back to the old settings if changing the settings fails.

Reported-and-tested-by: Max Schulze <max.schulze@online.de>
Closes: https://lore.kernel.org/r/ac96309a-8d8d-4435-36e6-6d152eb31876@online.de
Fixes: 4a4b816950 ("cfg80211: Accept multiple RSSI thresholds for CQM")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 16:43:35 +02:00
Johannes Berg
86a8db67a1 wifi: mac80211: fix channel switch link data
Use the correct link ID and per-link puncturing data instead
of hardcoding link ID 0 and using deflink puncturing.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.0b6a211c8e75.I5724d32bb2dae440888efbc47334d8c115db9d50@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:33:55 +02:00
Ilan Peer
563fe446ef wifi: mac80211: Do not force off-channel for management Tx with MLO
When user space transmits a management frame it is expected to use
the MLD addresses if the connection is an MLD one. Thus, in case
the management Tx is using the MLD address and no channel is configured
off-channel should not be used (as one of the active links would be used).

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.73c8efce252f.Ie4b0a842debb24ef25c5e6cb2ad69b9f46bc4b2a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:33:54 +02:00
Johannes Berg
90668e3204 wifi: mac80211: take MBSSID/EHT data also from probe resp
The code that sets up the assoc link will currently take the BSS
element data from the beacon only. This is correct for some of
the data, notably the timing and the "have_beacon", but all the
data about MBSSID and EHT really doesn't need to be taken from
there, and if the EHT puncturing is misconfigured on the AP but
we didn't receive a beacon yet, this causes us to connect but
immediately disconnect upon receiving the first beacon, rather
than connecting without EHT in the first place.

Change the code to take MBSSID and EHT data also from the probe
response, for a better picture of what the BSS capabilities are
and to avoid that EHT puncturing problem.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.3c7e52d49482.Iba6b672f6dc74b45bba26bc497e953e27da43ef9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:40 +02:00
Ilan Peer
0f99f08783 wifi: mac80211: Print local link address during authentication
To ease debugging, mostly in cases that authentication fails.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.9c08605e2691.I0032e9d6e01325862189e4a20b02ddbe8f2f5e75@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:40 +02:00
Johannes Berg
428e8976a1 wifi: mac80211: fix # of MSDU in A-MSDU calculation
During my refactoring I wanted to get rid of the switch,
but replaced it with the wrong calculation. Fix that.

Fixes: 175ad2ec89 ("wifi: mac80211: limit A-MSDU subframes for client too")
Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.51bf1b8b0adb.Iffbd337fdad2b86ae12f5a39c69fb82b517f7486@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:39 +02:00
Johannes Berg
2a53743989 wifi: cfg80211: reg: fix various kernel-doc issues
Clean up the kernel-doc comments in reg.h.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.36d7b52da0f5.I85fbfb3095613f4a0512493cbbdda881dc31be2c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:39 +02:00
Johannes Berg
799f53e223 wifi: mac80211: fix various kernel-doc issues
There are various kernel-doc issues here, fix them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.7ce9761f9ebb.I0f44e76c518f72135cc855c809bfa7a5e977b894@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:39 +02:00
Johannes Berg
fe5cb719e7 wifi: mac80211: remove unnecessary struct forward declaration
This just causes kernel-doc to complain at this spot, but
isn't actually needed anyway, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.33a5591dfdeb.If4e7e1a1cb4c04f0afd83db7401c780404dca699@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:39 +02:00
Benjamin Berg
79aa3a09a7 wifi: mac80211: add more warnings about inserting sta info
The sta info needs to be inserted before its links may be modified.
Add a few warnings to prevent accidental usage of these functions.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.eeb43b3cc9e3.I5fd8236f70e64bf6268f33c883f7a878d963b83e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:38 +02:00
Benjamin Berg
5806ef25bc wifi: cfg80211: add ieee80211_fragment_element to public API
This function will be used by the kunit tests within cfg80211. As it
is generally useful, move it from mac80211 to cfg80211.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.5af9391659f5.Ie534ed6591ba02be8572d4d7242394f29e3af04b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:16 +02:00
Johannes Berg
ffbd0c8c1e wifi: mac80211: add an element parsing unit test
Add a unit test for the parsing of a fragmented sta profile
sub-element inside a fragmented multi-link element.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.333bc75df13f.I0ddfeb6a88a4d89e7c7850e8ef45a4b19b5a061a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:16 +02:00
Johannes Berg
730eeb17bb wifi: cfg80211: add first kunit tests, for element defrag
Add a couple of tests for element defragmentation, to
see that the function works correctly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.e2a5cead1816.I09f0edc19d162b54ee330991c728c1e9aa42ebf6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:16 +02:00
Johannes Berg
43125539fc wifi: cfg80211: fix off-by-one in element defrag
If a fragment is the last element, it's erroneously not
accepted. Fix that.

Fixes: f837a653a0 ("wifi: cfg80211: add element defragmentation helper")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230827135854.adca9fbd3317.I6b2df45eb71513f3e48efd196ae3cddec362dc1c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:32:15 +02:00
Emmanuel Grumbach
a469a5938d wifi: mac80211: add support for mld in ieee80211_chswitch_done
This allows to finalize the CSA per link.
In case the switch didn't work, tear down the MLD connection.
Also pass the ieee80211_bss_conf to post_channel_switch to let the
driver know which link completed the switch.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230828130311.3d3eacc88436.Ic2d14e2285aa1646216a56806cfd4a8d0054437c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:31:31 +02:00
Aditya Kumar Singh
5112fa5027 wifi: cfg80211: validate AP phy operation before starting it
Many regulatories can have HE/EHT Operation as not permitted. In such
cases, AP should not be allowed to start if it is using a channel
having the no operation flag set. However, currently there is no such
check in place.

Fix this issue by validating such IEs sent during start AP against the
channel flags.

Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20230905064857.1503-1-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:18:55 +02:00
Wen Gong
234249d88b wifi: cfg80211/mac80211: hold link BSSes when assoc fails for MLO connection
When connect to MLO AP with more than one link, and the assoc response of
AP is not success, then cfg80211_unhold_bss() is not called for all the
links' cfg80211_bss except the primary link which means the link used by
the latest successful association request. Thus the hold value of the
cfg80211_bss is not reset to 0 after the assoc fail, and then the
__cfg80211_unlink_bss() will not be called for the cfg80211_bss by
__cfg80211_bss_expire().

Then the AP always looks exist even the AP is shutdown or reconfigured
to another type, then it will lead error while connecting it again.

The detail info are as below.

When connect with muti-links AP, cfg80211_hold_bss() is called by
cfg80211_mlme_assoc() for each cfg80211_bss of all the links. When
assoc response from AP is not success(such as status_code==1), the
ieee80211_link_data of non-primary link(sdata->link[link_id]) is NULL
because ieee80211_assoc_success()->ieee80211_vif_update_links() is
not called for the links.

Then struct cfg80211_rx_assoc_resp resp in cfg80211_rx_assoc_resp() and
struct cfg80211_connect_resp_params cr in __cfg80211_connect_result()
will only have the data of the primary link, and finally function
cfg80211_connect_result_release_bsses() only call cfg80211_unhold_bss()
for the primary link. Then cfg80211_bss of the other links will never free
because its hold is always > 0 now.

Hence assign value for the bss and status from assoc_data since it is
valid for this case. Also assign value of addr from assoc_data when the
link is NULL because the addrs of assoc_data and link both represent the
local link addr and they are same value for success connection.

Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Link: https://lore.kernel.org/r/20230825070055.28164-1-quic_wgong@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:16:52 +02:00
Johannes Berg
5ea82df1f5 wifi: mac80211: fix RCU usage warning in mesh fast-xmit
In mesh_fast_tx_flush_addr() we already hold the lock, so
don't need additional hashtable RCU protection. Use the
rhashtable_lookup_fast() variant to avoid RCU protection
warnings.

Fixes: d5edb9ae8d ("wifi: mac80211: mesh fast xmit support")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 12:04:49 +02:00
Johannes Berg
e8c1841278 wifi: cfg80211: annotate iftype_data pointer with sparse
There were are a number of cases in mac80211 and iwlwifi (at
least) that used the sband->iftype_data pointer directly,
instead of using the accessors to find the right array entry
to use.

Make sparse warn when such a thing is done.

To not have a lot of casts, add two helper functions/macros

 - ieee80211_set_sband_iftype_data()
 - for_each_sband_iftype_data()

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:23 +02:00
Johannes Berg
05f136220d wifi: mac80211: fix BA session teardown race
As previously reported by Alexander, whose commit 69403bad97
("wifi: mac80211: sdata can be NULL during AMPDU start") I'm
reverting as part of this commit, there's a race between station
destruction and aggregation setup, where the aggregation setup
can happen while the station is being removed and queue the work
after ieee80211_sta_tear_down_BA_sessions() has already run in
__sta_info_destroy_part1(), and thus the worker will run with a
now freed station. In his case, this manifested in a NULL sdata
pointer, but really there's no guarantee whatsoever.

The real issue seems to be that it's possible at all to have a
situation where this occurs - we want to stop the BA sessions
when doing _part1, but we cannot be sure, and WLAN_STA_BLOCK_BA
isn't necessarily effective since we don't know that the setup
isn't concurrently running and already got past the check.

Simply call ieee80211_sta_tear_down_BA_sessions() again in the
second part of station destruction, since at that point really
nothing else can hold a reference to the station any more.

Also revert the sdata checks since those are just misleading at
this point.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:23 +02:00
Johannes Berg
b7600aae8a wifi: mac80211: fix TXQ error path and cleanup
We currently call ieee80211_txq_teardown_flows() as part
of ieee80211_remove_interfaces(), but that's not really
right in case of HW registration failures, specifically
rate control. Call it separately to fix that issue.

Reported-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:23 +02:00
Johannes Berg
076fc8775d wifi: cfg80211: remove wdev mutex
Since we're now protecting everything with the wiphy mutex
(and were really using it for almost everything before),
there's no longer any real reason to have a separate wdev
mutex. It may feel better, but really has no value.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:23 +02:00
Johannes Berg
a7614b482d wifi: mac80211: set wiphy for virtual monitors
Drivers might plausibly want to have this, but also the
locking assertions will need it later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
be0df01dae wifi: mac80211: reduce iflist_mtx
We now hold the wiphy mutex everywhere that we use or
needed the iflist_mtx, so we don't need this mutex any
more in mac80211. However, drivers may also iterate,
and in some cases (e.g. mt76) do so from high-priority
contexts. Thus, keep the mutex around but remove its
usage in mac80211 apart from those driver-visible parts
that are still needed.

Most of this change was done automatically with spatch,
with the parts that are still needed as described above
reverted manually.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
0cd8080e46 wifi: mac80211: remove local->mtx
We now hold the wiphy mutex everywhere that we use or
needed the local->mtx, so we don't need this mutex any
more. Remove it.

Most of this change was done automatically with spatch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
463559b7c3 wifi: mac80211: remove ampdu_mlme.mtx
We now hold the wiphy mutex everywhere that we use or
needed the A-MPDU locking, so we don't need this mutex
any more. Remove it.

Most of this change was done automatically with spatch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
5435af6e6a wifi: mac80211: remove chanctx_mtx
We now hold the wiphy mutex everywhere that we use or
needed the chanctx_mtx, so we don't need this mutex any
more. Remove it.

Most of this change was done automatically with spatch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
2a8b665e6b wifi: mac80211: remove key_mtx
We now hold the wiphy mutex everywhere that we use or
needed the key_mtx, so we don't need this mutex any
more. Remove it.

Most of this change was done automatically with spatch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
4d3acf4311 wifi: mac80211: remove sta_mtx
We now hold the wiphy mutex everywhere that we use or
needed the sta_mtx, so we don't need this mutex any
more. Remove it.

Most of this change was done automatically with spatch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
a26787aa13 wifi: mac80211: take wiphy lock for MAC addr change
We want to ensure everything holds the wiphy lock,
so also extend that to the MAC change callback.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
332e68bc55 wifi: mac80211: extend wiphy lock in interface removal
We want to extend the wiphy locking to the interface list,
so move that into the section locked with the wiphy lock.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:22 +02:00
Johannes Berg
7a53b71d8e wifi: mac80211: hold wiphy_lock around concurrency checks
We want to replace the locking in mac80211 by just the wiphy
mutex, so hold the lock here around concurrency checks for
the future where the chanctx_mtx used inside goes away.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
01ca280d32 wifi: mac80211: ethtool: hold wiphy mutex
We should hold the wiphy mutex here since we're going
to call the driver and want to remove the sta_mtx.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
1474bc87fe wifi: cfg80211: check wiphy mutex is held for wdev mutex
This might seem pretty pointless rather than changing the locking
immediately, but it seems safer to run for a while with checks and
the old locking scheme, and then remove the wdev lock later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
fa8809a519 wifi: cfg80211: hold wiphy lock in cfg80211_any_wiphy_oper_chan()
We have the RTNL here for the iteration, but we need to lock
each wiphy separately as well for using its data. Hold the
wiphy lock for all of the ones in the iteration. Note that
this implies we cannot already hold the wiphy mutex for the
wiphy passed by the argument, but that's true now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
beb2df475b wifi: cfg80211: sme: hold wiphy lock for wdev iteration
Since we will want to remove the wdev lock in the future,
lock the wiphy here to iterate and for checking the status
of the connections.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
0320d68f56 wifi: cfg80211: reg: hold wiphy mutex for wdev iteration
Since we will want to remove the wdev lock in the future,
lock the wiphy here to iterate and check the flags.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
0e8185ce1d wifi: mac80211: check wiphy mutex in ops
Check that we hold the wiphy mutex in the ops when
calling the driver, since we're now on our way to
always hold it, and simplify the locking.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
b38579aeb5 wifi: mac80211: move color change finalize to wiphy work
Again this should be per link and will get cancellation
issues, move it to a wiphy work.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
d7074be64a wifi: mac80211: move CSA finalize to wiphy work
This work should be made per link as well, and then
will have cancellation issues. Moving it to a wiphy
work already fixes those beforehand.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:21 +02:00
Johannes Berg
a6add8bee6 wifi: mac80211: move filter reconfig to wiphy work
This again is intended for future cleanups that are
possible when mac80211 and drivers can assume the
wiphy is locked.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
aca40a5fa6 wifi: mac80211: move tspec work to wiphy work
One more work that will now execute with the wiphy
locked, for future cleanups.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
e3208fb739 wifi: mac80211: move key tailroom work to wiphy work
This way we hold the wiphy mutex there, as a step towards
removing some of the additional locks we have.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
777b26002b wifi: mac80211: move TDLS work to wiphy work
Again, to have the wiphy locked for it.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
eadfb54756 wifi: mac80211: move sched-scan stop work to wiphy work
This also has the wiphy locked here then. We need to use
the _locked version of cfg80211_sched_scan_stopped() now,
which also fixes an old deadlock there.

Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
9fa659f9f4 wifi: mac80211: move dynamic PS to wiphy work
Along with everything else, move the dynamic PS work
to be a wiphy work, to simplify locking later.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
7206a94871 wifi: mac80211: move link activation work to wiphy work
We want to have the wiphy locked for these as well,
so move it to be a wiphy work.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
97c19e42b2 wifi: mac80211: move offchannel works to wiphy work
Make the offchannel works wiphy works to have the
wiphy locked for executing them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
730538edc8 wifi: mac80211: lock wiphy in IP address notifier
Lock the wiphy in the IP address notifier as another
place that should have it locked before calling into
the driver. This needs a bit of attention since the
notifier can be called while the wiphy is already
locked, when we remove an interface. Handle this by
not running the notifier in this case, and instead
calling out to the driver directly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:20 +02:00
Johannes Berg
ac2f7d6f27 wifi: mac80211: move monitor work to wiphy work
Again this serves to simplify the locking in mac80211
in the future, since this is a relatively complex work.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
201712512c wifi: mac80211: move scan work to wiphy work
Move the scan work to wiphy work, which also simplifies
the way we handle the work vs. the scan configuration.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
228e4f931b wifi: mac80211: move radar detect work to wiphy work
Move the radar detect work to wiphy work in order
to lock the wiphy for it without doing it manually.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
766d2601a6 wifi: mac80211: move DFS CAC work to wiphy work
Move the DFS CAC work over to hold the wiphy lock
there without worry about work cancellation.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
5549b0885d wifi: mac80211: add more ops assertions
Add more might_sleep() checks and check sdata-in-driver
for one additional place.

type=feature
ticket=jira:WIFI-314309

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
1b67211895 wifi: mac80211: convert A-MPDU work to wiphy work
Convert the A-MPDU work to wiphy work so it holds the
wiphy mutex and we can later guarantee that to drivers.
It might seem that we could run these concurrently for
different stations, but they're all on the ordered
mac80211 workqueue, so this shouldn't matter for that.

Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
b920590f9a wifi: mac80211: flush wiphy work where appropriate
Before converting more works to wiphy work, add
flushing in mac80211 where we also flush the
mac80211 workqueue. Not needed in suspend since
cfg80211 will have taken care of it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
56cfb8ce1f wifi: cfg80211: add flush functions for wiphy work
There may be sometimes reasons to actually run the work
if it's pending, add flush functions for both regular and
delayed wiphy work that will do this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
7483a2147a wifi: cfg80211: check RTNL when iterating devices
Add a new "for_each_rdev()" macro and check that we
hold the RTNL when calling it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:19 +02:00
Johannes Berg
e911a8192e wifi: mac80211: lock wiphy for aggregation debugfs
To change aggregation status may call into the driver,
lock the wiphy for this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:18 +02:00
Johannes Berg
0ab6cba069 wifi: mac80211: hold wiphy lock in netdev/link debugfs
It's no longer really needed to ensure that the debugfs
file isn't going away, debugfs handles that. So there's
no point in holding dev_base_lock or RTNL here, but we
should instead hold the wiphy lock since drivers will
be allowed to depend on that. Do that, which requires
splitting the sdata and link macros a bit.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:18 +02:00
Johannes Berg
629ebb8532 wifi: mac80211: debugfs: lock wiphy instead of RTNL
Since we no longer really use the RTNL, there's no point
in locking it here. Most drivers don't really need to
have any locks here anyway, and the rest are probably
completely broken, but it's a debugfs-only callback so
it really doesn't matter much.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:18 +02:00
Johannes Berg
e3640a82e5 wifi: mac80211: fix SMPS status handling
The current SMPS status handling isn't per link, so we only
ever change the deflink, which is obviously wrong, it's not
even used for multi-link connections, but the request API
actually includes the link ID.

Use the new status_data changes to move the handling to the
right link, this also saves parsing the frame again on the
status report, instead we can now check only if it was an
SMPS frame.

Of course, move the worker to be a wiphy work so that we're
able to cancel it safely for the link.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:18 +02:00
Ilan Peer
e665ab9c5a wifi: mac80211: Fix SMPS handling in the context of MLO
When the connection is a MLO connection, a SMPS request should be
sent on a specific link, as SMPS is BSS specific, and the DA and BSSID
used for the action frame transmission should be the AP MLD address, as
the underlying driver is expected to perform the address translation
(based on the link ID).

Fix the SMPS request handling to use the AP MLD address and provide the
link ID for the request processing during Tx.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:18 +02:00
Johannes Berg
f498f6ab7a wifi: mac80211: rework ack_frame_id handling a bit
Take one more free bit to indicate it's IDR vs. internal
usage, to be able to carve out some bits here for other
internal usage, other than IDR handling with a full ACK
SKB, that is.

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:18 +02:00
Johannes Berg
0499bead73 wifi: mac80211: tx: clarify conditions in if statement
This really just reformats the statement, but makes it
more readable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-11 11:27:18 +02:00
Pablo Neira Ayuso
23a3bfd4ba netfilter: nf_tables: disallow element removal on anonymous sets
Anonymous sets need to be populated once at creation and then they are
bound to rule since 938154b93b ("netfilter: nf_tables: reject unbound
anonymous set before commit phase"), otherwise transaction reports
EINVAL.

Userspace does not need to delete elements of anonymous sets that are
not yet bound, reject this with EOPNOTSUPP.

From flush command path, skip anonymous sets, they are expected to be
bound already. Otherwise, EINVAL is hit at the end of this transaction
for unbound sets.

Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-11 11:27:13 +02:00
Shigeru Yoshida
c821a88bd7 kcm: Fix memory leak in error path of kcm_sendmsg()
syzbot reported a memory leak like below:

BUG: memory leak
unreferenced object 0xffff88810b088c00 (size 240):
  comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s)
  hex dump (first 32 bytes):
    00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634
    [<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline]
    [<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815
    [<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline]
    [<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748
    [<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494
    [<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548
    [<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577
    [<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append
newly allocated skbs to 'head'. If some bytes are copied, an error occurred,
and jumped to out_error label, 'last_skb' is left unmodified. A later
kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the
'head' frag_list and causing the leak.

This patch fixes this issue by properly updating the last allocated skb in
'last_skb'.

Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Reported-and-tested-by: syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccb
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 10:03:08 +01:00
Ziyang Xuan
484b4833c6 hsr: Fix uninit-value access in fill_frame_info()
Syzbot reports the following uninit-value access problem.

=====================================================
BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:601 [inline]
BUG: KMSAN: uninit-value in hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
 fill_frame_info net/hsr/hsr_forward.c:601 [inline]
 hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
 hsr_dev_xmit+0x192/0x330 net/hsr/hsr_device.c:223
 __netdev_start_xmit include/linux/netdevice.h:4889 [inline]
 netdev_start_xmit include/linux/netdevice.h:4903 [inline]
 xmit_one net/core/dev.c:3544 [inline]
 dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3560
 __dev_queue_xmit+0x34d0/0x52a0 net/core/dev.c:4340
 dev_queue_xmit include/linux/netdevice.h:3082 [inline]
 packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
 packet_snd net/packet/af_packet.c:3087 [inline]
 packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119
 sock_sendmsg_nosec net/socket.c:730 [inline]
 sock_sendmsg net/socket.c:753 [inline]
 __sys_sendto+0x781/0xa30 net/socket.c:2176
 __do_sys_sendto net/socket.c:2188 [inline]
 __se_sys_sendto net/socket.c:2184 [inline]
 __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
 __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
 entry_SYSENTER_compat_after_hwframe+0x70/0x82

Uninit was created at:
 slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767
 slab_alloc_node mm/slub.c:3478 [inline]
 kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523
 kmalloc_reserve+0x148/0x470 net/core/skbuff.c:559
 __alloc_skb+0x318/0x740 net/core/skbuff.c:644
 alloc_skb include/linux/skbuff.h:1286 [inline]
 alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6299
 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2794
 packet_alloc_skb net/packet/af_packet.c:2936 [inline]
 packet_snd net/packet/af_packet.c:3030 [inline]
 packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119
 sock_sendmsg_nosec net/socket.c:730 [inline]
 sock_sendmsg net/socket.c:753 [inline]
 __sys_sendto+0x781/0xa30 net/socket.c:2176
 __do_sys_sendto net/socket.c:2188 [inline]
 __se_sys_sendto net/socket.c:2184 [inline]
 __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
 __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
 entry_SYSENTER_compat_after_hwframe+0x70/0x82

It is because VLAN not yet supported in hsr driver. Return error
when protocol is ETH_P_8021Q in fill_frame_info() now to fix it.

Fixes: 451d8123f8 ("net: prp: add packet handling support")
Reported-by: syzbot+bf7e6250c7ce248f3ec9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bf7e6250c7ce248f3ec9
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11 08:28:36 +01:00
Guangguan Wang
f5146e3ef0 net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add
While doing smcr_port_add, there maybe linkgroup add into or delete
from smc_lgr_list.list at the same time, which may result kernel crash.
So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in
smcr_port_add.

The crash calltrace show below:
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G
Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
Workqueue: events smc_ib_port_event_work [smc]
RIP: 0010:smcr_port_add+0xa6/0xf0 [smc]
RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297
RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918
R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4
R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08
FS:  0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0
PKRU: 55555554
Call Trace:
 smc_ib_port_event_work+0x18f/0x380 [smc]
 process_one_work+0x19b/0x340
 worker_thread+0x30/0x370
 ? process_one_work+0x340/0x340
 kthread+0x114/0x130
 ? __kthread_cancel_work+0x50/0x50
 ret_from_fork+0x1f/0x30

Fixes: 1f90a05d9f ("net/smc: add smcr_port_add() and smcr_link_up() processing")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10 19:31:42 +01:00
Guangguan Wang
6912e72483 net/smc: bugfix for smcr v2 server connect success statistic
In the macro SMC_STAT_SERV_SUCC_INC, the smcd_version is used
to determin whether to increase the v1 statistic or the v2
statistic. It is correct for SMCD. But for SMCR, smcr_version
should be used.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10 19:31:42 +01:00
Liu Jian
ac28b1ec61 net: ipv4: fix one memleak in __inet_del_ifa()
I got the below warning when do fuzzing test:
unregister_netdevice: waiting for bond0 to become free. Usage count = 2

It can be repoduced via:

ip link add bond0 type bond
sysctl -w net.ipv4.conf.bond0.promote_secondaries=1
ip addr add 4.117.174.103/0 scope 0x40 dev bond0
ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0
ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0
ip addr del 4.117.174.103/0 scope 0x40 dev bond0
ip link delete bond0 type bond

In this reproduction test case, an incorrect 'last_prim' is found in
__inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40)
is lost. The memory of the secondary address is leaked and the reference of
in_device and net_device is leaked.

Fix this problem:
Look for 'last_prim' starting at location of the deleted IP and inserting
the promoted IP into the location of 'last_prim'.

Fixes: 0ff60a4567 ("[IPV4]: Fix secondary IP addresses after promotion")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-08 08:02:17 +01:00
Linus Torvalds
73be7fb14e Including fixes from netfilter and bpf.
Current release - regressions:
 
  - eth: stmmac: fix failure to probe without MAC interface specified
 
 Current release - new code bugs:
 
  - docs: netlink: fix missing classic_netlink doc reference
 
 Previous releases - regressions:
 
  - deal with integer overflows in kmalloc_reserve()
 
  - use sk_forward_alloc_get() in sk_get_meminfo()
 
  - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc
 
  - fib: avoid warn splat in flow dissector after packet mangling
 
  - skb_segment: call zero copy functions before using skbuff frags
 
  - eth: sfc: check for zero length in EF10 RX prefix
 
 Previous releases - always broken:
 
  - af_unix: fix msg_controllen test in scm_pidfd_recv() for
    MSG_CMSG_COMPAT
 
  - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()
 
  - netfilter:
    - nft_exthdr: fix non-linear header modification
    - xt_u32, xt_sctp: validate user space input
    - nftables: exthdr: fix 4-byte stack OOB write
    - nfnetlink_osf: avoid OOB read
    - one more fix for the garbage collection work from last release
 
  - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
 
  - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t
 
  - handshake: fix null-deref in handshake_nl_done_doit()
 
  - ip: ignore dst hint for multipath routes to ensure packets
    are hashed across the nexthops
 
  - phy: micrel:
    - correct bit assignments for cable test errata
    - disable EEE according to the KSZ9477 errata
 
 Misc:
 
  - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations
 
  - Revert "net: macsec: preserve ingress frame ordering", it appears
    to have been developed against an older kernel, problem doesn't
    exist upstream
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmT6R6wACgkQMUZtbf5S
 IrsmTg//TgmRjxSZ0lrPQtJwZR/eN3ZR2oQG3rwnssCx+YgHEGGxQsfT4KHEMacR
 ZgGDZVTpthUJkkACBPi8ZMoy++RdjEmlCcanfeDkGHoYGtiX1lhkofhLMn1KUHbI
 rIbP9EdNKxQT0SsBlw/U28pD5jKyqOgL23QobEwmcjLTdMpamb+qIsD6/xNv9tEj
 Tu4BdCIkhjxnBD622hsE3pFTG7oSn2WM6rf5NT1E43mJ3W8RrMcydSB27J7Oryo9
 l3nYMAhz0vQINS2WQ9eCT1/7GI6gg1nDtxFtrnV7ASvxayRBPIUr4kg1vT+Tixsz
 CZMnwVamEBIYl9agmj7vSji7d5nOUgXPhtWhwWUM2tRoGdeGw3vSi1pgDvRiUCHE
 PJ4UHv7goa2AgnOlOQCFtRybAu+9nmSGm7V+GkeGLnH7xbFsEa5smQ/+FSPJs8Dn
 Yf4q5QAhdN8tdnofRlrN/nCssoDF3cfmBsTJ7wo5h71gW+BWhsP58eDCJlXd/r8k
 +Qnvoe2kw27ktFR1tjsUDZ0AcSmeVARNwmXCOBYZsG4tEek8pLyj008mDvJvdfyn
 PGPn7Eo5DyaERlHVmPuebHXSyniDEPe2GLTmlHcGiRpGspoUHbB+HRiDAuRLMB9g
 pkL8RHpNfppnuUXeUoNy3rgEkYwlpTjZX0QHC6N8NQ76ccB6CNM=
 =YpmE
 -----END PGP SIGNATURE-----

Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking updates from Jakub Kicinski:
 "Including fixes from netfilter and bpf.

  Current release - regressions:

   - eth: stmmac: fix failure to probe without MAC interface specified

  Current release - new code bugs:

   - docs: netlink: fix missing classic_netlink doc reference

  Previous releases - regressions:

   - deal with integer overflows in kmalloc_reserve()

   - use sk_forward_alloc_get() in sk_get_meminfo()

   - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc

   - fib: avoid warn splat in flow dissector after packet mangling

   - skb_segment: call zero copy functions before using skbuff frags

   - eth: sfc: check for zero length in EF10 RX prefix

  Previous releases - always broken:

   - af_unix: fix msg_controllen test in scm_pidfd_recv() for
     MSG_CMSG_COMPAT

   - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()

   - netfilter:
      - nft_exthdr: fix non-linear header modification
      - xt_u32, xt_sctp: validate user space input
      - nftables: exthdr: fix 4-byte stack OOB write
      - nfnetlink_osf: avoid OOB read
      - one more fix for the garbage collection work from last release

   - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU

   - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t

   - handshake: fix null-deref in handshake_nl_done_doit()

   - ip: ignore dst hint for multipath routes to ensure packets are
     hashed across the nexthops

   - phy: micrel:
      - correct bit assignments for cable test errata
      - disable EEE according to the KSZ9477 errata

  Misc:

   - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations

   - Revert "net: macsec: preserve ingress frame ordering", it appears
     to have been developed against an older kernel, problem doesn't
     exist upstream"

* tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
  net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
  Revert "net: team: do not use dynamic lockdep key"
  net: hns3: remove GSO partial feature bit
  net: hns3: fix the port information display when sfp is absent
  net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
  net: hns3: fix debugfs concurrency issue between kfree buffer and read
  net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
  net: hns3: Support query tx timeout threshold by debugfs
  net: hns3: fix tx timeout issue
  net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
  netfilter: nf_tables: Unbreak audit log reset
  netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
  netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
  netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
  netfilter: nfnetlink_osf: avoid OOB read
  netfilter: nftables: exthdr: fix 4-byte stack OOB write
  selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
  bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
  bpf: bpf_sk_storage: Fix invalid wait context lockdep report
  s390/bpf: Pass through tail call counter in trampolines
  ...
2023-09-07 18:33:07 -07:00
Pablo Neira Ayuso
b079155faa netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
Skip GC run if iterator rewinds to the beginning with EAGAIN, otherwise GC
might collect the same element more than once.

Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-08 03:26:58 +02:00
Pablo Neira Ayuso
6d365eabce netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
nft_trans_gc_queue_sync() enqueues the GC transaction and it allocates a
new one. If this allocation fails, then stop this GC sync run and retry
later.

Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-08 03:26:58 +02:00
Pablo Neira Ayuso
4a9e12ea7e netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
pipapo needs to enqueue GC transactions for catchall elements through
nft_trans_gc_queue_sync(). Add nft_trans_gc_catchall_sync() and
nft_trans_gc_catchall_async() to handle GC transaction queueing
accordingly.

Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-08 03:26:57 +02:00
Pablo Neira Ayuso
96b33300fb netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
rbtree GC does not modify the datastructure, instead it collects expired
elements and it enqueues a GC transaction. Use a read spinlock instead
to avoid data contention while GC worker is running.

Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-08 03:26:57 +02:00
Pablo Neira Ayuso
f15f29fd47 netfilter: nf_tables: disallow rule removal from chain binding
Chain binding only requires the rule addition/insertion command within
the same transaction. Removal of rules from chain bindings within the
same transaction makes no sense, userspace does not utilize this
feature. Replace nft_chain_is_bound() check to nft_chain_binding() in
rule deletion commands. Replace command implies a rule deletion, reject
this command too.

Rule flush command can also safely rely on this nft_chain_binding()
check because unbound chains are not allowed since 62e1e94b24
("netfilter: nf_tables: reject unbound chain set before commit phase").

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-09-08 03:25:23 +02:00
Paolo Abeni
7153a404fb netfilter pull request 2023-09-06
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmT4pk0NHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AFILEADUXdj4PWEZv0/1avh+cT14q4gM0KkNi0eWDePA
 kRRTvuVZd6CkAxMNW9EAa6koJdK7k7dq2Jax9MYgzvvkFBO5PH5HHvwQRjs+Ch4W
 WCckV83J0W2eTcPqSxTM6wbSL8p1a0rJSHVSG7N431dPQUJoMnmd4zW5q1P9qTkv
 gqE4iQoK2jGL01GOyuQLnh6fEH2TUviguXZPKxVymoy/pbBiDA1mAMQplyAJisAz
 c1lAb+/oW58xYpxTOCtlU3kqB4tPonUow8x8KkYrlZlrA8rxgr2WkVeg82eVXP9u
 X8DM/nSzxCY7aN3TVrS2m8dS1cK1Gj841atuPA2N9ot/R954h7VjxhvhW1IEpmY9
 P6EYvRrLtCZrif4cmbRY4n6UztFNo3FJeBgmM3pgiVQhCXm+1fpNnSTm3QvZY1YO
 6o5rQn7F7IiIzkBeQ5sPihQPvLqn+J/61ZyQWAdP/TAG+roC0ARPeEFjyBoRHttI
 uw4PcXuzrMjBzym816h8kAjJbVIygpZFIpA/QllN8d3rB96mWpFCILIBG9FnA91K
 Ie6IyRGjDKfe5TuJplSIfCdBrvOPqUjc+l7XAPz9QXL8W3MQnYZshHRs5ViSOaca
 iMhMScBYT31uBZ7KODYxgPcifr0EWSYIGceWnSCDOmRz+VJk1LSfvZTwvqOB+qZ3
 x01bTw==
 =Z3CA
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter updates for net

This PR contains nf_tables updates for your *net* tree.
This time almost all fixes are for old bugs:

First patch fixes a 4-byte stack OOB write, from myself.
This was broken ever since nftables was switches from 128 to 32bit
register addressing in v4.1.

2nd patch fixes an out-of-bounds read.
This has been broken ever since xt_osf got added in 2.6.31, the bug
was then just moved around during refactoring, from Wander Lairson Costa.

3rd patch adds a missing enum description, from Phil Sutter.

4th patch fixes a UaF inftables that occurs when userspace adds
elements with a timeout so small that expiration happens while the
transaction is still in progress.  Fix from Pablo Neira Ayuso.

Patch 5 fixes a memory out of bounds access, this was
broken since v4.20. Patch from Kyle Zeng and Jozsef Kadlecsik.

Patch 6 fixes another bogus memory access when building audit
record. Bug added in the previous pull request, fix from Pablo.

netfilter pull request 2023-09-06

* tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: Unbreak audit log reset
  netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
  netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
  netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
  netfilter: nfnetlink_osf: avoid OOB read
  netfilter: nftables: exthdr: fix 4-byte stack OOB write
====================

Link: https://lore.kernel.org/r/20230906162525.11079-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-07 11:47:15 +02:00
Jakub Kicinski
f16d411c29 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZPhEYwAKCRDbK58LschI
 g6puAQCGLE9S2d5xw68Fq2KLrTjTXC5trB9N25NWvZWqgXHAeQEA3wXxTC8eF0dG
 J24RwZC2vOC4hF3aNGeI0fv/6j7SKA0=
 =0IU4
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-09-06

We've added 9 non-merge commits during the last 6 day(s) which contain
a total of 12 files changed, 189 insertions(+), 44 deletions(-).

The main changes are:

1) Fix bpf_sk_storage to address an invalid wait context lockdep
   report and another one to address missing omem uncharge,
   from Martin KaFai Lau.

2) Two BPF recursion detection related fixes,
   from Sebastian Andrzej Siewior.

3) Fix tailcall limit enforcement in trampolines for s390 JIT,
   from Ilya Leoshkevich.

4) Fix a sockmap refcount race where skbs in sk_psock_backlog can
   be referenced after user space side has already skb_consumed them,
   from John Fastabend.

5) Fix BPF CI flake/race wrt sockmap vsock write test where
   the transport endpoint is not connected, from Xu Kuohai.

6) Follow-up doc fix to address a cross-link warning,
   from Eduard Zingerman.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
  bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
  bpf: bpf_sk_storage: Fix invalid wait context lockdep report
  s390/bpf: Pass through tail call counter in trampolines
  bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
  bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
  bpf, sockmap: Fix skb refcnt race after locking changes
  docs/bpf: Fix "file doesn't exist" warnings in {llvm_reloc,btf}.rst
  selftests/bpf: Fix a CI failure caused by vsock write
====================

Link: https://lore.kernel.org/r/20230906095117.16941-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-06 18:43:05 -07:00
Linus Torvalds
7ba2090ca6 Mixed with some fixes and cleanups, this brings in reasonably complete
fscrypt support to CephFS!  The list of things which don't work with
 encryption should be fairly short, mostly around the edges: fallocate
 (not supported well in CephFS to begin with), copy_file_range (requires
 re-encryption), non-default striping patterns.
 
 This was a multi-year effort principally by Jeff Layton with assistance
 from Xiubo Li, Luís Henriques and others, including several dependant
 changes in the MDS, netfs helper library and fscrypt framework itself.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmT4pl4THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi5kzB/4sMgzZyUa3T1vA/G2pPvEkyy1qDxsW
 y+o4dDMWA9twcrBVpNuGd54wbXpmO/LAekHEdorjayH+f0zf10MsnP1ePz9WB3NG
 jr7RRujb+Gpd2OFYJXGSEbd3faTg8M2kpGCCrVe7SFNoyu8z9NwFItwWMog5aBjX
 ODGQrq+kA4ARA6xIqwzF5gP0zr+baT9rWhQdm7Xo9itWdosnbyDLJx1dpEfLuqBX
 te3SmifDzedn3Gw73hdNo/+ybw0kHARoK+RmXCTsoDDQw+JsoO9KxZF5Q8QcDELq
 2woPNp0Hl+Dm4MkzGnPxv56Qj8ZDViS59syXC0CfGRmu4nzF1Rw+0qn5
 =/WlE
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "Mixed with some fixes and cleanups, this brings in reasonably complete
  fscrypt support to CephFS! The list of things which don't work with
  encryption should be fairly short, mostly around the edges: fallocate
  (not supported well in CephFS to begin with), copy_file_range
  (requires re-encryption), non-default striping patterns.

  This was a multi-year effort principally by Jeff Layton with
  assistance from Xiubo Li, Luís Henriques and others, including several
  dependant changes in the MDS, netfs helper library and fscrypt
  framework itself"

* tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits)
  ceph: make num_fwd and num_retry to __u32
  ceph: make members in struct ceph_mds_request_args_ext a union
  rbd: use list_for_each_entry() helper
  libceph: do not include crypto/algapi.h
  ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper
  ceph: fix updating i_truncate_pagecache_size for fscrypt
  ceph: wait for OSD requests' callbacks to finish when unmounting
  ceph: drop messages from MDS when unmounting
  ceph: update documentation regarding snapshot naming limitations
  ceph: prevent snapshot creation in encrypted locked directories
  ceph: add support for encrypted snapshot names
  ceph: invalidate pages when doing direct/sync writes
  ceph: plumb in decryption during reads
  ceph: add encryption support to writepage and writepages
  ceph: add read/modify/write to ceph_sync_write
  ceph: align data in pages in ceph_sync_write
  ceph: don't use special DIO path for encrypted inodes
  ceph: add truncate size handling support for fscrypt
  ceph: add object version support for sync read
  libceph: allow ceph_osdc_new_request to accept a multi-op read
  ...
2023-09-06 12:10:15 -07:00
Pablo Neira Ayuso
9b5ba5c9c5 netfilter: nf_tables: Unbreak audit log reset
Deliver audit log from __nf_tables_dump_rules(), table dereference at
the end of the table list loop might point to the list head, leading to
this crash.

[ 4137.407349] BUG: unable to handle page fault for address: 00000000001f3c50
[ 4137.407357] #PF: supervisor read access in kernel mode
[ 4137.407359] #PF: error_code(0x0000) - not-present page
[ 4137.407360] PGD 0 P4D 0
[ 4137.407363] Oops: 0000 [#1] PREEMPT SMP PTI
[ 4137.407365] CPU: 4 PID: 500177 Comm: nft Not tainted 6.5.0+ #277
[ 4137.407369] RIP: 0010:string+0x49/0xd0
[ 4137.407374] Code: ff 77 36 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e9 58 e5 ff ff 48 c7 c0 0e b2 ff 81
[ 4137.407377] RSP: 0018:ffff8881179737f0 EFLAGS: 00010286
[ 4137.407379] RAX: 00000000001f2c50 RBX: ffff888117973848 RCX: ffff0a00ffffff04
[ 4137.407380] RDX: 00000000001f3c50 RSI: 0000000000000000 RDI: 0000000000000000
[ 4137.407381] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[ 4137.407383] R10: ffffffffffffffff R11: ffff88813584d200 R12: 0000000000000000
[ 4137.407384] R13: ffffffffa15cf709 R14: 0000000000000000 R15: ffffffffa15cf709
[ 4137.407385] FS:  00007fcfc18bb580(0000) GS:ffff88840e700000(0000) knlGS:0000000000000000
[ 4137.407387] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4137.407388] CR2: 00000000001f3c50 CR3: 00000001055b2001 CR4: 00000000001706e0
[ 4137.407390] Call Trace:
[ 4137.407392]  <TASK>
[ 4137.407393]  ? __die+0x1b/0x60
[ 4137.407397]  ? page_fault_oops+0x6b/0xa0
[ 4137.407399]  ? exc_page_fault+0x60/0x120
[ 4137.407403]  ? asm_exc_page_fault+0x22/0x30
[ 4137.407408]  ? string+0x49/0xd0
[ 4137.407410]  vsnprintf+0x257/0x4f0
[ 4137.407414]  kvasprintf+0x3e/0xb0
[ 4137.407417]  kasprintf+0x3e/0x50
[ 4137.407419]  nf_tables_dump_rules+0x1c0/0x360 [nf_tables]
[ 4137.407439]  ? __alloc_skb+0xc3/0x170
[ 4137.407442]  netlink_dump+0x170/0x330
[ 4137.407447]  __netlink_dump_start+0x227/0x300
[ 4137.407449]  nf_tables_getrule+0x205/0x390 [nf_tables]

Deliver audit log only once at the end of the rule dump+reset for
consistency with the set dump+reset.

Ensure audit reset access to table under rcu read side lock. The table
list iteration holds rcu read lock side, but recent audit code
dereferences table object out of the rcu read lock side.

Fixes: ea078ae910 ("netfilter: nf_tables: Audit log rule reset")
Fixes: 7e9be1124d ("netfilter: nf_tables: Audit log setelem reset")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06 18:09:12 +02:00
Kyle Zeng
050d91c03b netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can
lead to the use of wrong `CIDR_POS(c)` for calculating array offsets,
which can lead to integer underflow. As a result, it leads to slab
out-of-bound access.
This patch adds back the IP_SET_HASH_WITH_NET0 macro to
ip_set_hash_netportnet to address the issue.

Fixes: 886503f34d ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net")
Suggested-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06 18:09:12 +02:00
Pablo Neira Ayuso
2ee52ae94b netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
New elements in this transaction might expired before such transaction
ends. Skip sync GC for such elements otherwise commit path might walk
over an already released object. Once transaction is finished, async GC
will collect such expired element.

Fixes: f6c383b8c3 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06 18:09:12 +02:00
Wander Lairson Costa
f4f8a78031 netfilter: nfnetlink_osf: avoid OOB read
The opt_num field is controlled by user mode and is not currently
validated inside the kernel. An attacker can take advantage of this to
trigger an OOB read and potentially leak information.

BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
Read of size 2 at addr ffff88804bc64272 by task poc/6431

CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1
Call Trace:
 nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
 nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281
 nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47
 expr_call_ops_eval net/netfilter/nf_tables_core.c:214
 nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264
 nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23
 [..]

Also add validation to genre, subtype and version fields.

Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06 18:07:49 +02:00
Florian Westphal
fd94d9dade netfilter: nftables: exthdr: fix 4-byte stack OOB write
If priv->len is a multiple of 4, then dst[len / 4] can write past
the destination array which leads to stack corruption.

This construct is necessary to clean the remainder of the register
in case ->len is NOT a multiple of the register size, so make it
conditional just like nft_payload.c does.

The bug was added in 4.1 cycle and then copied/inherited when
tcp/sctp and ip option support was added.

Bug reported by Zero Day Initiative project (ZDI-CAN-21950,
ZDI-CAN-21951, ZDI-CAN-21961).

Fixes: 49499c3e6e ("netfilter: nf_tables: switch registers to 32 bit addressing")
Fixes: 935b7f6430 ("netfilter: nft_exthdr: add TCP option matching")
Fixes: 133dc203d7 ("netfilter: nft_exthdr: Support SCTP chunks")
Fixes: dbb5281a1f ("netfilter: nf_tables: add support for matching IPv4 options")
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06 18:03:02 +02:00
Quan Tian
a5e2151ff9 net/ipv6: SKB symmetric hash should incorporate transport ports
__skb_get_hash_symmetric() was added to compute a symmetric hash over
the protocol, addresses and transport ports, by commit eb70db8756
("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses
flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate
IPv4 addresses, IPv6 addresses and ports. However, it should not specify
the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further
dissection when an IPv6 flow label is encountered, making transport
ports not being incorporated in such case.

As a consequence, the symmetric hash is based on 5-tuple for IPv4 but
3-tuple for IPv6 when flow label is present. It caused a few problems,
e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash
to perform load balancing as different L4 flows between two given IPv6
addresses would always get the same symmetric hash, leading to uneven
traffic distribution.

Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the
symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently.

Fixes: eb70db8756 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.")
Reported-by: Lars Ekman <uablrek@gmail.com>
Closes: https://github.com/antrea-io/antrea/issues/5457
Signed-off-by: Quan Tian <qtian@vmware.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06 06:02:27 +01:00
Eric Dumazet
c3b704d4a4 igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
This is a follow up of commit 915d975b2f ("net: deal with integer
overflows in kmalloc_reserve()") based on David Laight feedback.

Back in 2010, I failed to realize malicious users could set dev->mtu
to arbitrary values. This mtu has been since limited to 0x7fffffff but
regardless of how big dev->mtu is, it makes no sense for igmpv3_newpack()
to allocate more than IP_MAX_MTU and risk various skb fields overflows.

Fixes: 57e1ab6ead ("igmp: refine skb allocations")
Link: https://lore.kernel.org/netdev/d273628df80f45428e739274ab9ecb72@AcuMS.aculab.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Cc: Kyle Zeng <zengyhkyle@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-05 17:49:40 +01:00
Shigeru Yoshida
6ad40b36cd kcm: Destroy mutex in kcm_exit_net()
kcm_exit_net() should call mutex_destroy() on knet->mutex. This is especially
needed if CONFIG_DEBUG_MUTEXES is enabled.

Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://lore.kernel.org/r/20230902170708.1727999-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-05 10:12:03 +02:00
valis
8fc134fee2 net: sched: sch_qfq: Fix UAF in qfq_dequeue()
When the plug qdisc is used as a class of the qfq qdisc it could trigger a
UAF. This issue can be reproduced with following commands:

  tc qdisc add dev lo root handle 1: qfq
  tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
  tc qdisc add dev lo parent 1:1 handle 2: plug
  tc filter add dev lo parent 1: basic classid 1:1
  ping -c1 127.0.0.1

and boom:

[  285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
[  285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
[  285.355903]
[  285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
[  285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  285.358376] Call Trace:
[  285.358773]  <IRQ>
[  285.359109]  dump_stack_lvl+0x44/0x60
[  285.359708]  print_address_description.constprop.0+0x2c/0x3c0
[  285.360611]  kasan_report+0x10c/0x120
[  285.361195]  ? qfq_dequeue+0xa7/0x7f0
[  285.361780]  qfq_dequeue+0xa7/0x7f0
[  285.362342]  __qdisc_run+0xf1/0x970
[  285.362903]  net_tx_action+0x28e/0x460
[  285.363502]  __do_softirq+0x11b/0x3de
[  285.364097]  do_softirq.part.0+0x72/0x90
[  285.364721]  </IRQ>
[  285.365072]  <TASK>
[  285.365422]  __local_bh_enable_ip+0x77/0x90
[  285.366079]  __dev_queue_xmit+0x95f/0x1550
[  285.366732]  ? __pfx_csum_and_copy_from_iter+0x10/0x10
[  285.367526]  ? __pfx___dev_queue_xmit+0x10/0x10
[  285.368259]  ? __build_skb_around+0x129/0x190
[  285.368960]  ? ip_generic_getfrag+0x12c/0x170
[  285.369653]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  285.370390]  ? csum_partial+0x8/0x20
[  285.370961]  ? raw_getfrag+0xe5/0x140
[  285.371559]  ip_finish_output2+0x539/0xa40
[  285.372222]  ? __pfx_ip_finish_output2+0x10/0x10
[  285.372954]  ip_output+0x113/0x1e0
[  285.373512]  ? __pfx_ip_output+0x10/0x10
[  285.374130]  ? icmp_out_count+0x49/0x60
[  285.374739]  ? __pfx_ip_finish_output+0x10/0x10
[  285.375457]  ip_push_pending_frames+0xf3/0x100
[  285.376173]  raw_sendmsg+0xef5/0x12d0
[  285.376760]  ? do_syscall_64+0x40/0x90
[  285.377359]  ? __static_call_text_end+0x136578/0x136578
[  285.378173]  ? do_syscall_64+0x40/0x90
[  285.378772]  ? kasan_enable_current+0x11/0x20
[  285.379469]  ? __pfx_raw_sendmsg+0x10/0x10
[  285.380137]  ? __sock_create+0x13e/0x270
[  285.380673]  ? __sys_socket+0xf3/0x180
[  285.381174]  ? __x64_sys_socket+0x3d/0x50
[  285.381725]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.382425]  ? __rcu_read_unlock+0x48/0x70
[  285.382975]  ? ip4_datagram_release_cb+0xd8/0x380
[  285.383608]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[  285.384295]  ? preempt_count_sub+0x14/0xc0
[  285.384844]  ? __list_del_entry_valid+0x76/0x140
[  285.385467]  ? _raw_spin_lock_bh+0x87/0xe0
[  285.386014]  ? __pfx__raw_spin_lock_bh+0x10/0x10
[  285.386645]  ? release_sock+0xa0/0xd0
[  285.387148]  ? preempt_count_sub+0x14/0xc0
[  285.387712]  ? freeze_secondary_cpus+0x348/0x3c0
[  285.388341]  ? aa_sk_perm+0x177/0x390
[  285.388856]  ? __pfx_aa_sk_perm+0x10/0x10
[  285.389441]  ? check_stack_object+0x22/0x70
[  285.390032]  ? inet_send_prepare+0x2f/0x120
[  285.390603]  ? __pfx_inet_sendmsg+0x10/0x10
[  285.391172]  sock_sendmsg+0xcc/0xe0
[  285.391667]  __sys_sendto+0x190/0x230
[  285.392168]  ? __pfx___sys_sendto+0x10/0x10
[  285.392727]  ? kvm_clock_get_cycles+0x14/0x30
[  285.393328]  ? set_normalized_timespec64+0x57/0x70
[  285.393980]  ? _raw_spin_unlock_irq+0x1b/0x40
[  285.394578]  ? __x64_sys_clock_gettime+0x11c/0x160
[  285.395225]  ? __pfx___x64_sys_clock_gettime+0x10/0x10
[  285.395908]  ? _copy_to_user+0x3e/0x60
[  285.396432]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.397086]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.397734]  ? do_syscall_64+0x71/0x90
[  285.398258]  __x64_sys_sendto+0x74/0x90
[  285.398786]  do_syscall_64+0x64/0x90
[  285.399273]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.399949]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.400605]  ? do_syscall_64+0x71/0x90
[  285.401124]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.401807] RIP: 0033:0x495726
[  285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
[  285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
[  285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
[  285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
[  285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
[  285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
[  285.410403]  </TASK>
[  285.410704]
[  285.410929] Allocated by task 144:
[  285.411402]  kasan_save_stack+0x1e/0x40
[  285.411926]  kasan_set_track+0x21/0x30
[  285.412442]  __kasan_slab_alloc+0x55/0x70
[  285.412973]  kmem_cache_alloc_node+0x187/0x3d0
[  285.413567]  __alloc_skb+0x1b4/0x230
[  285.414060]  __ip_append_data+0x17f7/0x1b60
[  285.414633]  ip_append_data+0x97/0xf0
[  285.415144]  raw_sendmsg+0x5a8/0x12d0
[  285.415640]  sock_sendmsg+0xcc/0xe0
[  285.416117]  __sys_sendto+0x190/0x230
[  285.416626]  __x64_sys_sendto+0x74/0x90
[  285.417145]  do_syscall_64+0x64/0x90
[  285.417624]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.418306]
[  285.418531] Freed by task 144:
[  285.418960]  kasan_save_stack+0x1e/0x40
[  285.419469]  kasan_set_track+0x21/0x30
[  285.419988]  kasan_save_free_info+0x27/0x40
[  285.420556]  ____kasan_slab_free+0x109/0x1a0
[  285.421146]  kmem_cache_free+0x1c2/0x450
[  285.421680]  __netif_receive_skb_core+0x2ce/0x1870
[  285.422333]  __netif_receive_skb_one_core+0x97/0x140
[  285.423003]  process_backlog+0x100/0x2f0
[  285.423537]  __napi_poll+0x5c/0x2d0
[  285.424023]  net_rx_action+0x2be/0x560
[  285.424510]  __do_softirq+0x11b/0x3de
[  285.425034]
[  285.425254] The buggy address belongs to the object at ffff8880bad31280
[  285.425254]  which belongs to the cache skbuff_head_cache of size 224
[  285.426993] The buggy address is located 40 bytes inside of
[  285.426993]  freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
[  285.428572]
[  285.428798] The buggy address belongs to the physical page:
[  285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
[  285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
[  285.431447] page_type: 0xffffffff()
[  285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000
[  285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
[  285.433562] page dumped because: kasan: bad access detected
[  285.434144]
[  285.434320] Memory state around the buggy address:
[  285.434828]  ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.435580]  ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  285.436777]                                   ^
[  285.437106]  ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  285.437616]  ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.438126] ==================================================================
[  285.438662] Disabling lock debugging due to kernel taint

Fix this by:
1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
function compatible with non-work-conserving qdiscs
2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.

Fixes: 462dbc9101 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: valis <sec@valis.email>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-05 08:54:12 +02:00
Kuniyuki Iwashima
b192812905 af_unix: Fix data race around sk->sk_err.
As with sk->sk_shutdown shown in the previous patch, sk->sk_err can be
read locklessly by unix_dgram_sendmsg().

Let's use READ_ONCE() for sk_err as well.

Note that the writer side is marked by commit cc04410af7 ("af_unix:
annotate lockless accesses to sk->sk_err").

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04 11:06:16 +01:00
Kuniyuki Iwashima
afe8764f76 af_unix: Fix data-races around sk->sk_shutdown.
sk->sk_shutdown is changed under unix_state_lock(sk), but
unix_dgram_sendmsg() calls two functions to read sk_shutdown locklessly.

  sock_alloc_send_pskb
  `- sock_wait_for_wmem

Let's use READ_ONCE() there.

Note that the writer side was marked by commit e1d09c2c2f ("af_unix:
Fix data races around sk->sk_shutdown.").

BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock

write (marked) to 0xffff8880069af12c of 1 bytes by task 1 on cpu 1:
 unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
 unix_release+0x59/0x80 net/unix/af_unix.c:1053
 __sock_release+0x7d/0x170 net/socket.c:654
 sock_close+0x19/0x30 net/socket.c:1386
 __fput+0x2a3/0x680 fs/file_table.c:384
 ____fput+0x15/0x20 fs/file_table.c:412
 task_work_run+0x116/0x1a0 kernel/task_work.c:179
 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
 syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

read to 0xffff8880069af12c of 1 bytes by task 28650 on cpu 0:
 sock_alloc_send_pskb+0xd2/0x620 net/core/sock.c:2767
 unix_dgram_sendmsg+0x2f8/0x14f0 net/unix/af_unix.c:1944
 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
 unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
 sock_sendmsg_nosec net/socket.c:725 [inline]
 sock_sendmsg+0x148/0x160 net/socket.c:748
 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548
 __sys_sendmsg+0x94/0x140 net/socket.c:2577
 __do_sys_sendmsg net/socket.c:2586 [inline]
 __se_sys_sendmsg net/socket.c:2584 [inline]
 __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

value changed: 0x00 -> 0x03

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 28650 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04 11:06:16 +01:00
Kuniyuki Iwashima
ade32bd8a7 af_unix: Fix data-race around unix_tot_inflight.
unix_tot_inflight is changed under spin_lock(unix_gc_lock), but
unix_release_sock() reads it locklessly.

Let's use READ_ONCE() for unix_tot_inflight.

Note that the writer side was marked by commit 9d6d7f1cb6 ("af_unix:
annote lockless accesses to unix_tot_inflight & gc_in_progress")

BUG: KCSAN: data-race in unix_inflight / unix_release_sock

write (marked) to 0xffffffff871852b8 of 4 bytes by task 123 on cpu 1:
 unix_inflight+0x130/0x180 net/unix/scm.c:64
 unix_attach_fds+0x137/0x1b0 net/unix/scm.c:123
 unix_scm_to_skb net/unix/af_unix.c:1832 [inline]
 unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1955
 sock_sendmsg_nosec net/socket.c:724 [inline]
 sock_sendmsg+0x148/0x160 net/socket.c:747
 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2493
 ___sys_sendmsg+0xc6/0x140 net/socket.c:2547
 __sys_sendmsg+0x94/0x140 net/socket.c:2576
 __do_sys_sendmsg net/socket.c:2585 [inline]
 __se_sys_sendmsg net/socket.c:2583 [inline]
 __x64_sys_sendmsg+0x45/0x50 net/socket.c:2583
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

read to 0xffffffff871852b8 of 4 bytes by task 4891 on cpu 0:
 unix_release_sock+0x608/0x910 net/unix/af_unix.c:671
 unix_release+0x59/0x80 net/unix/af_unix.c:1058
 __sock_release+0x7d/0x170 net/socket.c:653
 sock_close+0x19/0x30 net/socket.c:1385
 __fput+0x179/0x5e0 fs/file_table.c:321
 ____fput+0x15/0x20 fs/file_table.c:349
 task_work_run+0x116/0x1a0 kernel/task_work.c:179
 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
 syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

value changed: 0x00000000 -> 0x00000001

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4891 Comm: systemd-coredum Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: 9305cfa444 ("[AF_UNIX]: Make unix_tot_inflight counter non-atomic")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04 11:06:16 +01:00
Kuniyuki Iwashima
0bc36c0650 af_unix: Fix data-races around user->unix_inflight.
user->unix_inflight is changed under spin_lock(unix_gc_lock),
but too_many_unix_fds() reads it locklessly.

Let's annotate the write/read accesses to user->unix_inflight.

BUG: KCSAN: data-race in unix_attach_fds / unix_inflight

write to 0xffffffff8546f2d0 of 8 bytes by task 44798 on cpu 1:
 unix_inflight+0x157/0x180 net/unix/scm.c:66
 unix_attach_fds+0x147/0x1e0 net/unix/scm.c:123
 unix_scm_to_skb net/unix/af_unix.c:1827 [inline]
 unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950
 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
 unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
 sock_sendmsg_nosec net/socket.c:725 [inline]
 sock_sendmsg+0x148/0x160 net/socket.c:748
 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548
 __sys_sendmsg+0x94/0x140 net/socket.c:2577
 __do_sys_sendmsg net/socket.c:2586 [inline]
 __se_sys_sendmsg net/socket.c:2584 [inline]
 __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

read to 0xffffffff8546f2d0 of 8 bytes by task 44814 on cpu 0:
 too_many_unix_fds net/unix/scm.c:101 [inline]
 unix_attach_fds+0x54/0x1e0 net/unix/scm.c:110
 unix_scm_to_skb net/unix/af_unix.c:1827 [inline]
 unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950
 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
 unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
 sock_sendmsg_nosec net/socket.c:725 [inline]
 sock_sendmsg+0x148/0x160 net/socket.c:748
 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548
 __sys_sendmsg+0x94/0x140 net/socket.c:2577
 __do_sys_sendmsg net/socket.c:2586 [inline]
 __se_sys_sendmsg net/socket.c:2584 [inline]
 __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

value changed: 0x000000000000000c -> 0x000000000000000d

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 44814 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: 712f4aad40 ("unix: properly account for FDs passed over unix sockets")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Willy Tarreau <w@1wt.eu>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04 11:06:16 +01:00
John Fastabend
a454d84ee2 bpf, sockmap: Fix skb refcnt race after locking changes
There is a race where skb's from the sk_psock_backlog can be referenced
after userspace side has already skb_consumed() the sk_buff and its refcnt
dropped to zer0 causing use after free.

The flow is the following:

  while ((skb = skb_peek(&psock->ingress_skb))
    sk_psock_handle_Skb(psock, skb, ..., ingress)
    if (!ingress) ...
    sk_psock_skb_ingress
       sk_psock_skb_ingress_enqueue(skb)
          msg->skb = skb
          sk_psock_queue_msg(psock, msg)
    skb_dequeue(&psock->ingress_skb)

The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
what the application reads when recvmsg() is called. An application can
read this anytime after the msg is placed on the queue. The recvmsg hook
will also read msg->skb and then after user space reads the msg will call
consume_skb(skb) on it effectively free'ing it.

But, the race is in above where backlog queue still has a reference to
the skb and calls skb_dequeue(). If the skb_dequeue happens after the
user reads and free's the skb we have a use after free.

The !ingress case does not suffer from this problem because it uses
sendmsg_*(sk, msg) which does not pass the sk_buff further down the
stack.

The following splat was observed with 'test_progs -t sockmap_listen':

  [ 1022.710250][ T2556] general protection fault, ...
  [...]
  [ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
  [ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
  [ 1022.713653][ T2556] Code: ...
  [...]
  [ 1022.720699][ T2556] Call Trace:
  [ 1022.720984][ T2556]  <TASK>
  [ 1022.721254][ T2556]  ? die_addr+0x32/0x80^M
  [ 1022.721589][ T2556]  ? exc_general_protection+0x25a/0x4b0
  [ 1022.722026][ T2556]  ? asm_exc_general_protection+0x22/0x30
  [ 1022.722489][ T2556]  ? skb_dequeue+0x4c/0x80
  [ 1022.722854][ T2556]  sk_psock_backlog+0x27a/0x300
  [ 1022.723243][ T2556]  process_one_work+0x2a7/0x5b0
  [ 1022.723633][ T2556]  worker_thread+0x4f/0x3a0
  [ 1022.723998][ T2556]  ? __pfx_worker_thread+0x10/0x10
  [ 1022.724386][ T2556]  kthread+0xfd/0x130
  [ 1022.724709][ T2556]  ? __pfx_kthread+0x10/0x10
  [ 1022.725066][ T2556]  ret_from_fork+0x2d/0x50
  [ 1022.725409][ T2556]  ? __pfx_kthread+0x10/0x10
  [ 1022.725799][ T2556]  ret_from_fork_asm+0x1b/0x30
  [ 1022.726201][ T2556]  </TASK>

To fix we add an skb_get() before passing the skb to be enqueued in the
engress queue. This bumps the skb->users refcnt so that consume_skb()
and kfree_skb will not immediately free the sk_buff. With this we can
be sure the skb is still around when we do the dequeue. Then we just
need to decrement the refcnt or free the skb in the backlog case which
we do by calling kfree_skb() on the ingress case as well as the sendmsg
case.

Before locking change from fixes tag we had the sock locked so we
couldn't race with user and there was no issue here.

Fixes: 799aa7f98d ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Jiri Olsa  <jolsa@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Xu Kuohai <xukuohai@huawei.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com
2023-09-04 09:53:35 +02:00
Alex Henrie
f31867d0d9 net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddr
The existing code incorrectly casted a negative value (the result of a
subtraction) to an unsigned value without checking. For example, if
/proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred
lifetime would jump to 4 billion seconds. On my machine and network the
shortest lifetime that avoided underflow was 3 seconds.

Fixes: 76506a986d ("IPv6: fix DESYNC_FACTOR")
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04 07:07:30 +01:00
Eric Dumazet
915d975b2f net: deal with integer overflows in kmalloc_reserve()
Blamed commit changed:
    ptr = kmalloc(size);
    if (ptr)
      size = ksize(ptr);

to:
    size = kmalloc_size_roundup(size);
    ptr = kmalloc(size);

This allowed various crash as reported by syzbot [1]
and Kyle Zeng.

Problem is that if @size is bigger than 0x80000001,
kmalloc_size_roundup(size) returns 2^32.

kmalloc_reserve() uses a 32bit variable (obj_size),
so 2^32 is truncated to 0.

kmalloc(0) returns ZERO_SIZE_PTR which is not handled by
skb allocations.

Following trace can be triggered if a netdev->mtu is set
close to 0x7fffffff

We might in the future limit netdev->mtu to more sensible
limit (like KMALLOC_MAX_SIZE).

This patch is based on a syzbot report, and also a report
and tentative fix from Kyle Zeng.

[1]
BUG: KASAN: user-memory-access in __build_skb_around net/core/skbuff.c:294 [inline]
BUG: KASAN: user-memory-access in __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
Write of size 32 at addr 00000000fffffd10 by task syz-executor.4/22554

CPU: 1 PID: 22554 Comm: syz-executor.4 Not tainted 6.1.39-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Call trace:
dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:279
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:286
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x120/0x1a0 lib/dump_stack.c:106
print_report+0xe4/0x4b4 mm/kasan/report.c:398
kasan_report+0x150/0x1ac mm/kasan/report.c:495
kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189
memset+0x40/0x70 mm/kasan/shadow.c:44
__build_skb_around net/core/skbuff.c:294 [inline]
__alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
alloc_skb include/linux/skbuff.h:1316 [inline]
igmpv3_newpack+0x104/0x1088 net/ipv4/igmp.c:359
add_grec+0x81c/0x1124 net/ipv4/igmp.c:534
igmpv3_send_cr net/ipv4/igmp.c:667 [inline]
igmp_ifc_timer_expire+0x1b0/0x1008 net/ipv4/igmp.c:810
call_timer_fn+0x1c0/0x9f0 kernel/time/timer.c:1474
expire_timers kernel/time/timer.c:1519 [inline]
__run_timers+0x54c/0x710 kernel/time/timer.c:1790
run_timer_softirq+0x28/0x4c kernel/time/timer.c:1803
_stext+0x380/0xfbc
____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:79
call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:84
invoke_softirq kernel/softirq.c:437 [inline]
__irq_exit_rcu+0x1c0/0x4cc kernel/softirq.c:683
irq_exit_rcu+0x14/0x78 kernel/softirq.c:695
el0_interrupt+0x7c/0x2e0 arch/arm64/kernel/entry-common.c:717
__el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:724
el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:729
el0t_64_irq+0x1a0/0x1a4 arch/arm64/kernel/entry.S:584

Fixes: 12d6c1d3a2 ("skbuff: Proactively round up to kmalloc bucket size")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04 06:49:31 +01:00
Linus Torvalds
8e1e49550d TTY/Serial driver changes for 6.6-rc1
Here is the big set of tty and serial driver changes for 6.6-rc1.
 
 Lots of cleanups in here this cycle, and some driver updates.  Short
 summary is:
   - Jiri's continued work to make the tty code and apis be a bit more
     sane with regards to modern kernel coding style and types
   - cpm_uart driver updates
   - n_gsm updates and fixes
   - meson driver updates
   - sc16is7xx driver updates
   - 8250 driver updates for different hardware types
   - qcom-geni driver fixes
   - tegra serial driver change
   - stm32 driver updates
   - synclink_gt driver cleanups
   - tty structure size reduction
 
 All of these have been in linux-next this week with no reported issues.
 The last bit of cleanups from Jiri and the tty structure size reduction
 came in last week, a bit late but as they were just style changes and
 size reductions, I figured they should get into this merge cycle so that
 others can work on top of them with no merge conflicts.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH+jA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykKyACgldt6QeenTN+6dXIHS/eQHtTKZwMAn3arSeXI
 QrUUnLFjOWyoX87tbMBQ
 =LVw0
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the big set of tty and serial driver changes for 6.6-rc1.

  Lots of cleanups in here this cycle, and some driver updates. Short
  summary is:

   - Jiri's continued work to make the tty code and apis be a bit more
     sane with regards to modern kernel coding style and types

   - cpm_uart driver updates

   - n_gsm updates and fixes

   - meson driver updates

   - sc16is7xx driver updates

   - 8250 driver updates for different hardware types

   - qcom-geni driver fixes

   - tegra serial driver change

   - stm32 driver updates

   - synclink_gt driver cleanups

   - tty structure size reduction

  All of these have been in linux-next this week with no reported
  issues. The last bit of cleanups from Jiri and the tty structure size
  reduction came in last week, a bit late but as they were just style
  changes and size reductions, I figured they should get into this merge
  cycle so that others can work on top of them with no merge conflicts"

* tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits)
  tty: shrink the size of struct tty_struct by 40 bytes
  tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw()
  tty: n_tty: extract ECHO_OP processing to a separate function
  tty: n_tty: unify counts to size_t
  tty: n_tty: use u8 for chars and flags
  tty: n_tty: simplify chars_in_buffer()
  tty: n_tty: remove unsigned char casts from character constants
  tty: n_tty: move newline handling to a separate function
  tty: n_tty: move canon handling to a separate function
  tty: n_tty: use MASK() for masking out size bits
  tty: n_tty: make n_tty_data::num_overrun unsigned
  tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun()
  tty: n_tty: use 'num' for writes' counts
  tty: n_tty: use output character directly
  tty: n_tty: make flow of n_tty_receive_buf_common() a bool
  Revert "tty: serial: meson: Add a earlycon for the T7 SoC"
  Documentation: devices.txt: Fix minors for ttyCPM*
  Documentation: devices.txt: Remove ttySIOC*
  Documentation: devices.txt: Remove ttyIOC*
  serial: 8250_bcm7271: improve bcm7271 8250 port
  ...
2023-09-01 09:38:00 -07:00
Sriram Yagnaraman
8423be8926 ipv6: ignore dst hint for multipath routes
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.

A new SKB flag IP6SKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in fib6_select_path() and use it in
ip6_can_use_hint() to check for the existence of the flag.

Fixes: 197dbf24e3 ("ipv6: introduce and uses route look hints for list input.")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 08:11:51 +01:00
Sriram Yagnaraman
6ac66cb03a ipv4: ignore dst hint for multipath routes
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.

A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in ip_mkroute_input() and use it in
ip_extract_route_hint() to check for the existence of the flag.

Fixes: 02b2494161 ("ipv4: use dst hint for ipv4 list receive")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 08:11:51 +01:00
Mohamed Khalfella
2ea35288c8 skbuff: skb_segment, Call zero copy functions before using skbuff frags
Commit bf5c25d608 ("skbuff: in skb_segment, call zerocopy functions
once per nskb") added the call to zero copy functions in skb_segment().
The change introduced a bug in skb_segment() because skb_orphan_frags()
may possibly change the number of fragments or allocate new fragments
altogether leaving nrfrags and frag to point to the old values. This can
cause a panic with stacktrace like the one below.

[  193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
[  193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G           O      5.15.123+ #26
[  193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
[  194.021892] Call Trace:
[  194.027422]  <TASK>
[  194.072861]  tcp_gso_segment+0x107/0x540
[  194.082031]  inet_gso_segment+0x15c/0x3d0
[  194.090783]  skb_mac_gso_segment+0x9f/0x110
[  194.095016]  __skb_gso_segment+0xc1/0x190
[  194.103131]  netem_enqueue+0x290/0xb10 [sch_netem]
[  194.107071]  dev_qdisc_enqueue+0x16/0x70
[  194.110884]  __dev_queue_xmit+0x63b/0xb30
[  194.121670]  bond_start_xmit+0x159/0x380 [bonding]
[  194.128506]  dev_hard_start_xmit+0xc3/0x1e0
[  194.131787]  __dev_queue_xmit+0x8a0/0xb30
[  194.138225]  macvlan_start_xmit+0x4f/0x100 [macvlan]
[  194.141477]  dev_hard_start_xmit+0xc3/0x1e0
[  194.144622]  sch_direct_xmit+0xe3/0x280
[  194.147748]  __dev_queue_xmit+0x54a/0xb30
[  194.154131]  tap_get_user+0x2a8/0x9c0 [tap]
[  194.157358]  tap_sendmsg+0x52/0x8e0 [tap]
[  194.167049]  handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
[  194.173631]  handle_tx+0xcd/0xe0 [vhost_net]
[  194.176959]  vhost_worker+0x76/0xb0 [vhost]
[  194.183667]  kthread+0x118/0x140
[  194.190358]  ret_from_fork+0x1f/0x30
[  194.193670]  </TASK>

In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
local variable in skb_segment() stale. This resulted in the code hitting
i >= nrfrags prematurely and trying to move to next frag_skb using
list_skb pointer, which was NULL, and caused kernel panic. Move the call
to zero copy functions before using frags and nr_frags.

Fixes: bf5c25d608 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reported-by: Amit Goyal <agoyal@purestorage.com>
Cc: stable@vger.kernel.org
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 08:09:07 +01:00
Eric Dumazet
251cd405a9 net: annotate data-races around sk->sk_bind_phc
sk->sk_bind_phc is read locklessly. Add corresponding annotations.

Fixes: d463126e23 ("net: sock: extend SO_TIMESTAMPING for PHC binding")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 07:27:33 +01:00
Eric Dumazet
e3390b30a5 net: annotate data-races around sk->sk_tsflags
sk->sk_tsflags can be read locklessly, add corresponding annotations.

Fixes: b9f40e21ef ("net-timestamp: move timestamp flags out of sk_flags")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 07:27:33 +01:00
Eric Dumazet
9531e4a83f mptcp: annotate data-races around msk->rmem_fwd_alloc
msk->rmem_fwd_alloc can be read locklessly.

Add mptcp_rmem_fwd_alloc_add(), similar to sk_forward_alloc_add(),
and appropriate READ_ONCE()/WRITE_ONCE() annotations.

Fixes: 6511882cdd ("mptcp: allocate fwd memory separately on the rx and tx path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 07:27:33 +01:00
Eric Dumazet
5e6300e7b3 net: annotate data-races around sk->sk_forward_alloc
Every time sk->sk_forward_alloc is read locklessly,
add a READ_ONCE().

Add sk_forward_alloc_add() helper to centralize updates,
to reduce number of WRITE_ONCE().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 07:27:33 +01:00
Eric Dumazet
66d58f046c net: use sk_forward_alloc_get() in sk_get_meminfo()
inet_sk_diag_fill() has been changed to use sk_forward_alloc_get(),
but sk_get_meminfo() was forgotten.

Fixes: 292e6077b0 ("net: introduce sk_forward_alloc_get()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 07:27:33 +01:00
Eric Dumazet
82ba0ff7bf net/handshake: fix null-ptr-deref in handshake_nl_done_doit()
We should not call trace_handshake_cmd_done_err() if socket lookup has failed.

Also we should call trace_handshake_cmd_done_err() before releasing the file,
otherwise dereferencing sock->sk can return garbage.

This also reverts 7afc6d0a10 ("net/handshake: Fix uninitialized local variable")

Unable to handle kernel paging request at virtual address dfff800000000003
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[dfff800000000003] address between user and kernel address ranges
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 5986 Comm: syz-executor292 Not tainted 6.5.0-rc7-syzkaller-gfe4469582053 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193
lr : handshake_nl_done_doit+0x180/0x9c8
sp : ffff800096e37180
x29: ffff800096e37200 x28: 1ffff00012dc6e34 x27: dfff800000000000
x26: ffff800096e373d0 x25: 0000000000000000 x24: 00000000ffffffa8
x23: ffff800096e373f0 x22: 1ffff00012dc6e38 x21: 0000000000000000
x20: ffff800096e371c0 x19: 0000000000000018 x18: 0000000000000000
x17: 0000000000000000 x16: ffff800080516cc4 x15: 0000000000000001
x14: 1fffe0001b14aa3b x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000003
x8 : 0000000000000003 x7 : ffff800080afe47c x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800080a88078
x2 : 0000000000000001 x1 : 00000000ffffffa8 x0 : 0000000000000000
Call trace:
handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193
genl_family_rcv_msg_doit net/netlink/genetlink.c:970 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1050 [inline]
genl_rcv_msg+0x96c/0xc50 net/netlink/genetlink.c:1067
netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2549
genl_rcv+0x38/0x50 net/netlink/genetlink.c:1078
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x56c/0x840 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmsg+0x26c/0x33c net/socket.c:2577
__do_sys_sendmsg net/socket.c:2586 [inline]
__se_sys_sendmsg net/socket.c:2584 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2584
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Code: 12800108 b90043e8 910062b3 d343fe68 (387b6908)

Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-01 07:25:14 +01:00
Jakub Kicinski
ddaa935d33 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZPD2qwAKCRDbK58LschI
 gzy9APoCsV3B0rJCX2PnxoKmx7ZwAbEhWRHN3iDAGgEOwuAdLQEAi1Mafivr/4Rr
 WLi6AQOy+Erv7dAQRq2KbR2yE8rkEgg=
 =BJ9X
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-08-31

We've added 15 non-merge commits during the last 3 day(s) which contain
a total of 17 files changed, 468 insertions(+), 97 deletions(-).

The main changes are:

1) BPF selftest fixes: one flake and one related to clang18 testing,
   from Yonghong Song.

2) Fix a d_path BPF selftest failure after fast-forward from Linus'
   tree, from Jiri Olsa.

3) Fix a preempt_rt splat in sockmap when using raw_spin_lock_t,
   from John Fastabend.

4) Fix a xsk_diag_fill use-after-free race during socket cleanup,
   from Magnus Karlsson.

5) Fix xsk_build_skb to address a buggy dereference of an ERR_PTR(),
   from Tirthendu Sarkar.

6) Fix a bpftool build warning when compiled with -Wtype-limits,
   from Yafang Shao.

7) Several misc fixes and cleanups in standardization docs,
   from David Vernet.

8) Fix BPF selftest install to consider no_alu32/cpuv4/bpf-gcc flavors,
   from Björn Töpel.

9) Annotate a data race in bpf_long_memcpy for KCSAN, from Daniel Borkmann.

10) Extend documentation with a description for CO-RE relocations,
    from Eduard Zingerman.

11) Fix several invalid escape sequence warnings in bpf_doc.py script,
    from Vishal Chourasia.

12) Fix the instruction set doc wrt offset of BPF-to-BPF call,
    from Will Hawkins.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Include build flavors for install target
  bpf: Annotate bpf_long_memcpy with data_race
  selftests/bpf: Fix d_path test
  bpf, docs: Fix invalid escape sequence warnings in bpf_doc.py
  xsk: Fix xsk_diag use-after-free error during socket cleanup
  bpf, docs: s/eBPF/BPF in standards documents
  bpf, docs: Add abi.rst document to standardization subdirectory
  bpf, docs: Move linux-notes.rst to root bpf docs tree
  bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t
  docs/bpf: Add description for CO-RE relocations
  bpf, docs: Correct source of offset for program-local call
  selftests/bpf: Fix flaky cgroup_iter_sleepable subtest
  xsk: Fix xsk_build_skb() error: 'skb' dereferencing possible ERR_PTR()
  bpftool: Fix build warnings with -Wtype-limits
  bpf: Prevent inlining of bpf_fentry_test7()
====================

Link: https://lore.kernel.org/r/20230831210019.14417-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-31 18:44:24 -07:00
Linus Torvalds
99d99825fc NFS CLient Updates for Linux 6.6
New Features:
   * Enable the NFS v4.2 READ_PLUS operation by default
 
 Stable Fixes:
   * NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
   * NFS: Fix a potential data corruption
 
 Bugfixes:
   * Fix various READ_PLUS issues including:
     * smatch warnings
     * xdr size calculations
     * scratch buffer handling
     * 32bit / highmem xdr page handling
   * Fix checkpatch errors in file.c
   * Fix redundant readdir request after an EOF
   * Fix handling of COPY ERR_OFFLOAD_NO_REQ
   * Fix assignment of xprtdata.cred
 
 Cleanups:
   * Remove unused xprtrdma function declarations
   * Clean up an integer overflow check to avoid a warning
   * Clean up #includes in dns_resolve.c
   * Clean up nfs4_get_device_info so we don't pass a NULL pointer to __free_page()
   * Clean up sunrpc TCP socket timeout configuration
   * Guard against READDIR loops when entry names are too long
   * Use EXCHID4_FLAG_USE_PNFS_DS for DS servers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmTwzwYACgkQ18tUv7Cl
 QOtIhBAA+BOh7MB6yjlyctFxABJiXz2x2Dehxy7Ox15LfnyStqQAUEpk35CXWvjC
 iNxpZJ486+WrzM76WGEaRbECK9nTQLK1yacR3V1zpnDwHWIJA6VHN6qU4JAfSMu7
 XbhWkHWry6d7PXhvqHlaiYvPX2pF39wUzfH+vLlzS2QLIkpT6LnG0zVRJTQvLCmq
 zE5xD+NCQ1Dpo9VnouuzW7VVfm532hI7GQNrpo0E0vWKgeQD+/fOpDu23MW8A1Ua
 ZgVMAc7vScgDZH8/20Ze5PH4jAEB4gwEIzjreQlIXr7Tf+mE7qn435lgOuvdMQCW
 eHhdNriZ2X6HMLhNFFpup8bkRKGCCTooTHC1W66n9CuxIAuVT5DNwBbakpagHSZf
 J4ho81hEgBfc5zppISVjV6eFK4brM0rF9AliaIw8r/qGcMmO1CILi9tLGiheiJcT
 LuId7U2sE/vfIa6SiBt7rx37/MkrgLlAgjpk4dCRJQW+gKVBi09sMGnDlgaRvCZz
 T0WCsK4DgI9q2rScpwJYJbNWbC2Q8qUtYWW9LSRvwhbNdm/VbRnEHWA7eOwqqm8r
 KkkF4chyoTJqpnF3SjxT/lyFk6GwsD+wXafOmEeuFA6Si3dHDU9i3aUf+cCXhwRI
 uUzCUHYcCKnv4QVGPuAbIdxMgueNCuLoeWgTClVlqidv7GRyz7Y=
 =rjmq
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Enable the NFS v4.2 READ_PLUS operation by default

  Stable Fixes:
   - NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
   - NFS: Fix a potential data corruption

  Bugfixes:
   - Fix various READ_PLUS issues including:
      - smatch warnings
      - xdr size calculations
      - scratch buffer handling
      - 32bit / highmem xdr page handling
   - Fix checkpatch errors in file.c
   - Fix redundant readdir request after an EOF
   - Fix handling of COPY ERR_OFFLOAD_NO_REQ
   - Fix assignment of xprtdata.cred

  Cleanups:
   - Remove unused xprtrdma function declarations
   - Clean up an integer overflow check to avoid a warning
   - Clean up #includes in dns_resolve.c
   - Clean up nfs4_get_device_info so we don't pass a NULL pointer
     to __free_page()
   - Clean up sunrpc TCP socket timeout configuration
   - Guard against READDIR loops when entry names are too long
   - Use EXCHID4_FLAG_USE_PNFS_DS for DS servers"

* tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (22 commits)
  pNFS: Fix assignment of xprtdata.cred
  NFSv4.2: fix handling of COPY ERR_OFFLOAD_NO_REQ
  NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN
  NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server
  NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver
  SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt()
  SUNRPC: Allow specification of TCP client connect timeout at setup
  SUNRPC: Refactor and simplify connect timeout
  SUNRPC: Set the TCP_SYNCNT to match the socket timeout
  NFS: Fix a potential data corruption
  nfs: fix redundant readdir request after get eof
  nfs/blocklayout: Use the passed in gfp flags
  filemap: Fix errors in file.c
  NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
  NFS: Move common includes outside ifdef
  SUNRPC: clean up integer overflow check
  xprtrdma: Remove unused function declaration rpcrdma_bc_post_recv()
  NFS: Enable the READ_PLUS operation by default
  SUNRPC: kmap() the xdr pages during decode
  NFSv4.2: Rework scratch handling for READ_PLUS (again)
  ...
2023-08-31 15:36:41 -07:00
Linus Torvalds
f35d170615 NFSD 6.6 Release Notes
I'm thrilled to announce that the Linux in-kernel NFS server now
 offers NFSv4 write delegations. A write delegation enables a client
 to cache data and metadata for a single file more aggressively,
 reducing network round trips and server workload. Many thanks to Dai
 Ngo for contributing this facility, and to Jeff Layton and Neil
 Brown for reviewing and testing it.
 
 This release also sees the removal of all support for DES- and
 triple-DES-based Kerberos encryption types in the kernel's SunRPC
 implementation. These encryption types have been deprecated by the
 Internet community for years and are considered insecure. This
 change affects both the in-kernel NFS client and server.
 
 The server's UDP and TCP socket transports have now fully adopted
 David Howells' new bio_vec iterator so that no more than one
 sendmsg() call is needed to transmit each RPC message. In
 particular, this helps kTLS optimize record boundaries when sending
 RPC-with-TLS replies, and it takes the server a baby step closer to
 handling file I/O via folios.
 
 We've begun work on overhauling the SunRPC thread scheduler to
 remove a costly linked-list walk when looking for an idle RPC
 service thread to wake. The pre-requisites are included in this
 release. Thanks to Neil Brown for his ongoing work on this
 improvement.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTwoa0ACgkQM2qzM29m
 f5cZvw/8CmFVNC27aMrJEhRRhwwrXbLzUkWh9GCYkG98PHYiLxLTvZ6qELXAax/a
 UjSgIDSRcWl4z8M/tyBQtgsw7NADr+7XWqEoXR7HZ5pEEC/KNGM0oQWQ92ojjKYy
 JmHdB02uaDJfcd9ioFTU13cw7q2BQfoe2xLI8yqis2vcVSu92AM7aIw+cvJIpwQB
 inA3TIIsYTV/gPByXSfEtvmYACadoFiMvfvYwaWhjFS9MdSzFmcVG0Dp3EFIig29
 odmWEofcz6uIvUWvUswWEGdoSu7uOKIztSuAI4PlTwaofUaSKG6e5kmtpr3cLERD
 Uhg2lm5JgqkXBd7QHObNimJ4DtQzFwHmhA08qo8rd/zba75mn/Hr5IF0q3Rxs99J
 SRYHcAeP8afKn5Ge0yzoTgCNcqhfz8KLRfoCQX49mljr+muNxld4nMklD2KdUwJi
 XEB512/q3E4nUgopXZiSJYQYAq1CfdR5WpGipZ9X0XK9HZBDF/qhXGtk1YQuNWyj
 ZxJS3bfBza4oVIvP5/ehjCIQwOvqkcrC5zZGDIgDvw9Q6L3L1wqmVntsdCLCLRcJ
 jB4IOsj+DECfJ6w2vP2SZ3GeMtFnyuTQjhUTkjPuAKTBBiKo4Tj0o/agpfDYbWZy
 1l3a2yH5jqJgkm4MaVh3YHRJGc0ub0ccpIrs3QQ4jvjMLQ/3Gcs=
 =XGHs
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "I'm thrilled to announce that the Linux in-kernel NFS server now
  offers NFSv4 write delegations. A write delegation enables a client to
  cache data and metadata for a single file more aggressively, reducing
  network round trips and server workload. Many thanks to Dai Ngo for
  contributing this facility, and to Jeff Layton and Neil Brown for
  reviewing and testing it.

  This release also sees the removal of all support for DES- and
  triple-DES-based Kerberos encryption types in the kernel's SunRPC
  implementation. These encryption types have been deprecated by the
  Internet community for years and are considered insecure. This change
  affects both the in-kernel NFS client and server.

  The server's UDP and TCP socket transports have now fully adopted
  David Howells' new bio_vec iterator so that no more than one sendmsg()
  call is needed to transmit each RPC message. In particular, this helps
  kTLS optimize record boundaries when sending RPC-with-TLS replies, and
  it takes the server a baby step closer to handling file I/O via
  folios.

  We've begun work on overhauling the SunRPC thread scheduler to remove
  a costly linked-list walk when looking for an idle RPC service thread
  to wake. The pre-requisites are included in this release. Thanks to
  Neil Brown for his ongoing work on this improvement"

* tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits)
  Documentation: Add missing documentation for EXPORT_OP flags
  SUNRPC: Remove unused declaration rpc_modcount()
  SUNRPC: Remove unused declarations
  NFSD: da_addr_body field missing in some GETDEVICEINFO replies
  SUNRPC: Remove return value of svc_pool_wake_idle_thread()
  SUNRPC: make rqst_should_sleep() idempotent()
  SUNRPC: Clean up svc_set_num_threads
  SUNRPC: Count ingress RPC messages per svc_pool
  SUNRPC: Deduplicate thread wake-up code
  SUNRPC: Move trace_svc_xprt_enqueue
  SUNRPC: Add enum svc_auth_status
  SUNRPC: change svc_xprt::xpt_flags bits to enum
  SUNRPC: change svc_rqst::rq_flags bits to enum
  SUNRPC: change svc_pool::sp_flags bits to enum
  SUNRPC: change cache_head.flags bits to enum
  SUNRPC: remove timeout arg from svc_recv()
  SUNRPC: change svc_recv() to return void.
  SUNRPC: call svc_process() from svc_recv().
  nfsd: separate nfsd_last_thread() from nfsd_put()
  nfsd: Simplify code around svc_exit_thread() call in nfsd()
  ...
2023-08-31 15:32:18 -07:00
Magnus Karlsson
3e019d8a05 xsk: Fix xsk_diag use-after-free error during socket cleanup
Fix a use-after-free error that is possible if the xsk_diag interface
is used after the socket has been unbound from the device. This can
happen either due to the socket being closed or the device
disappearing. In the early days of AF_XDP, the way we tested that a
socket was not bound to a device was to simply check if the netdevice
pointer in the xsk socket structure was NULL. Later, a better system
was introduced by having an explicit state variable in the xsk socket
struct. For example, the state of a socket that is on the way to being
closed and has been unbound from the device is XSK_UNBOUND.

The commit in the Fixes tag below deleted the old way of signalling
that a socket is unbound, setting dev to NULL. This in the belief that
all code using the old way had been exterminated. That was
unfortunately not true as the xsk diagnostics code was still using the
old way and thus does not work as intended when a socket is going
down. Fix this by introducing a test against the state variable. If
the socket is in the state XSK_UNBOUND, simply abort the diagnostic's
netlink operation.

Fixes: 18b1ab7aa7 ("xsk: Fix race at socket teardown")
Reported-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+822d1359297e2694f873@syzkaller.appspotmail.com
Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20230831100119.17408-1-magnus.karlsson@gmail.com
2023-08-31 13:21:11 +02:00
Eric Dumazet
a3e0fdf71b net: read sk->sk_family once in sk_mc_loop()
syzbot is playing with IPV6_ADDRFORM quite a lot these days,
and managed to hit the WARN_ON_ONCE(1) in sk_mc_loop()

We have many more similar issues to fix.

WARNING: CPU: 1 PID: 1593 at net/core/sock.c:782 sk_mc_loop+0x165/0x260
Modules linked in:
CPU: 1 PID: 1593 Comm: kworker/1:3 Not tainted 6.1.40-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Workqueue: events_power_efficient gc_worker
RIP: 0010:sk_mc_loop+0x165/0x260 net/core/sock.c:782
Code: 34 1b fd 49 81 c7 18 05 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 ff e8 25 36 6d fd 4d 8b 37 eb 13 e8 db 33 1b fd <0f> 0b b3 01 eb 34 e8 d0 33 1b fd 45 31 f6 49 83 c6 38 4c 89 f0 48
RSP: 0018:ffffc90000388530 EFLAGS: 00010246
RAX: ffffffff846d9b55 RBX: 0000000000000011 RCX: ffff88814f884980
RDX: 0000000000000102 RSI: ffffffff87ae5160 RDI: 0000000000000011
RBP: ffffc90000388550 R08: 0000000000000003 R09: ffffffff846d9a65
R10: 0000000000000002 R11: ffff88814f884980 R12: dffffc0000000000
R13: ffff88810dbee000 R14: 0000000000000010 R15: ffff888150084000
FS: 0000000000000000(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000180 CR3: 000000014ee5b000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
[<ffffffff8507734f>] ip6_finish_output2+0x33f/0x1ae0 net/ipv6/ip6_output.c:83
[<ffffffff85062766>] __ip6_finish_output net/ipv6/ip6_output.c:200 [inline]
[<ffffffff85062766>] ip6_finish_output+0x6c6/0xb10 net/ipv6/ip6_output.c:211
[<ffffffff85061f8c>] NF_HOOK_COND include/linux/netfilter.h:298 [inline]
[<ffffffff85061f8c>] ip6_output+0x2bc/0x3d0 net/ipv6/ip6_output.c:232
[<ffffffff852071cf>] dst_output include/net/dst.h:444 [inline]
[<ffffffff852071cf>] ip6_local_out+0x10f/0x140 net/ipv6/output_core.c:161
[<ffffffff83618fb4>] ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:483 [inline]
[<ffffffff83618fb4>] ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:529 [inline]
[<ffffffff83618fb4>] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
[<ffffffff83618fb4>] ipvlan_queue_xmit+0x1174/0x1be0 drivers/net/ipvlan/ipvlan_core.c:677
[<ffffffff8361ddd9>] ipvlan_start_xmit+0x49/0x100 drivers/net/ipvlan/ipvlan_main.c:229
[<ffffffff84763fc0>] netdev_start_xmit include/linux/netdevice.h:4925 [inline]
[<ffffffff84763fc0>] xmit_one net/core/dev.c:3644 [inline]
[<ffffffff84763fc0>] dev_hard_start_xmit+0x320/0x980 net/core/dev.c:3660
[<ffffffff8494c650>] sch_direct_xmit+0x2a0/0x9c0 net/sched/sch_generic.c:342
[<ffffffff8494d883>] qdisc_restart net/sched/sch_generic.c:407 [inline]
[<ffffffff8494d883>] __qdisc_run+0xb13/0x1e70 net/sched/sch_generic.c:415
[<ffffffff8478c426>] qdisc_run+0xd6/0x260 include/net/pkt_sched.h:125
[<ffffffff84796eac>] net_tx_action+0x7ac/0x940 net/core/dev.c:5247
[<ffffffff858002bd>] __do_softirq+0x2bd/0x9bd kernel/softirq.c:599
[<ffffffff814c3fe8>] invoke_softirq kernel/softirq.c:430 [inline]
[<ffffffff814c3fe8>] __irq_exit_rcu+0xc8/0x170 kernel/softirq.c:683
[<ffffffff814c3f09>] irq_exit_rcu+0x9/0x20 kernel/softirq.c:695

Fixes: 7ad6848c7e ("ip: fix mc_loop checks for tunnels with multicast outer addresses")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230830101244.1146934-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-31 11:58:51 +02:00
Eric Dumazet
fce92af1c2 ipv4: annotate data-races around fi->fib_dead
syzbot complained about a data-race in fib_table_lookup() [1]

Add appropriate annotations to document it.

[1]
BUG: KCSAN: data-race in fib_release_info / fib_table_lookup

write to 0xffff888150f31744 of 1 bytes by task 1189 on cpu 0:
fib_release_info+0x3a0/0x460 net/ipv4/fib_semantics.c:281
fib_table_delete+0x8d2/0x900 net/ipv4/fib_trie.c:1777
fib_magic+0x1c1/0x1f0 net/ipv4/fib_frontend.c:1106
fib_del_ifaddr+0x8cf/0xa60 net/ipv4/fib_frontend.c:1317
fib_inetaddr_event+0x77/0x200 net/ipv4/fib_frontend.c:1448
notifier_call_chain kernel/notifier.c:93 [inline]
blocking_notifier_call_chain+0x90/0x200 kernel/notifier.c:388
__inet_del_ifa+0x4df/0x800 net/ipv4/devinet.c:432
inet_del_ifa net/ipv4/devinet.c:469 [inline]
inetdev_destroy net/ipv4/devinet.c:322 [inline]
inetdev_event+0x553/0xaf0 net/ipv4/devinet.c:1606
notifier_call_chain kernel/notifier.c:93 [inline]
raw_notifier_call_chain+0x6b/0x1c0 kernel/notifier.c:461
call_netdevice_notifiers_info net/core/dev.c:1962 [inline]
call_netdevice_notifiers_mtu+0xd2/0x130 net/core/dev.c:2037
dev_set_mtu_ext+0x30b/0x3e0 net/core/dev.c:8673
do_setlink+0x5be/0x2430 net/core/rtnetlink.c:2837
rtnl_setlink+0x255/0x300 net/core/rtnetlink.c:3177
rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6445
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2549
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6463
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
sock_write_iter+0x1aa/0x230 net/socket.c:1129
do_iter_write+0x4b4/0x7b0 fs/read_write.c:860
vfs_writev+0x1a8/0x320 fs/read_write.c:933
do_writev+0xf8/0x220 fs/read_write.c:976
__do_sys_writev fs/read_write.c:1049 [inline]
__se_sys_writev fs/read_write.c:1046 [inline]
__x64_sys_writev+0x45/0x50 fs/read_write.c:1046
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff888150f31744 of 1 bytes by task 21839 on cpu 1:
fib_table_lookup+0x2bf/0xd50 net/ipv4/fib_trie.c:1585
fib_lookup include/net/ip_fib.h:383 [inline]
ip_route_output_key_hash_rcu+0x38c/0x12c0 net/ipv4/route.c:2751
ip_route_output_key_hash net/ipv4/route.c:2641 [inline]
__ip_route_output_key include/net/route.h:134 [inline]
ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2869
send4+0x1e7/0x500 drivers/net/wireguard/socket.c:61
wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
process_one_work+0x434/0x860 kernel/workqueue.c:2600
worker_thread+0x5f2/0xa10 kernel/workqueue.c:2751
kthread+0x1d7/0x210 kernel/kthread.c:389
ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x00 -> 0x01

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 21839 Comm: kworker/u4:18 Tainted: G W 6.5.0-syzkaller #0

Fixes: dccd9ecc37 ("ipv4: Do not use dead fib_info entries.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230830095520.1046984-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-31 11:58:18 +02:00
Eric Dumazet
dc9511dd6f sctp: annotate data-races around sk->sk_wmem_queued
sk->sk_wmem_queued can be read locklessly from sctp_poll()

Use sk_wmem_queued_add() when the field is changed,
and add READ_ONCE() annotations in sctp_writeable()
and sctp_assocs_seq_show()

syzbot reported:

BUG: KCSAN: data-race in sctp_poll / sctp_wfree

read-write to 0xffff888149d77810 of 4 bytes by interrupt on cpu 0:
sctp_wfree+0x170/0x4a0 net/sctp/socket.c:9147
skb_release_head_state+0xb7/0x1a0 net/core/skbuff.c:988
skb_release_all net/core/skbuff.c:1000 [inline]
__kfree_skb+0x16/0x140 net/core/skbuff.c:1016
consume_skb+0x57/0x180 net/core/skbuff.c:1232
sctp_chunk_destroy net/sctp/sm_make_chunk.c:1503 [inline]
sctp_chunk_put+0xcd/0x130 net/sctp/sm_make_chunk.c:1530
sctp_datamsg_put+0x29a/0x300 net/sctp/chunk.c:128
sctp_chunk_free+0x34/0x50 net/sctp/sm_make_chunk.c:1515
sctp_outq_sack+0xafa/0xd70 net/sctp/outqueue.c:1381
sctp_cmd_process_sack net/sctp/sm_sideeffect.c:834 [inline]
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1366 [inline]
sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline]
sctp_do_sm+0x12c7/0x31b0 net/sctp/sm_sideeffect.c:1169
sctp_assoc_bh_rcv+0x2b2/0x430 net/sctp/associola.c:1051
sctp_inq_push+0x108/0x120 net/sctp/inqueue.c:80
sctp_rcv+0x116e/0x1340 net/sctp/input.c:243
sctp6_rcv+0x25/0x40 net/sctp/ipv6.c:1120
ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
dst_input include/net/dst.h:468 [inline]
ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:303 [inline]
ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5452 [inline]
__netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
process_backlog+0x21f/0x380 net/core/dev.c:5894
__napi_poll+0x60/0x3b0 net/core/dev.c:6460
napi_poll net/core/dev.c:6527 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6660
__do_softirq+0xc1/0x265 kernel/softirq.c:553
run_ksoftirqd+0x17/0x20 kernel/softirq.c:921
smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
kthread+0x1d7/0x210 kernel/kthread.c:389
ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

read to 0xffff888149d77810 of 4 bytes by task 17828 on cpu 1:
sctp_writeable net/sctp/socket.c:9304 [inline]
sctp_poll+0x265/0x410 net/sctp/socket.c:8671
sock_poll+0x253/0x270 net/socket.c:1374
vfs_poll include/linux/poll.h:88 [inline]
do_pollfd fs/select.c:873 [inline]
do_poll fs/select.c:921 [inline]
do_sys_poll+0x636/0xc00 fs/select.c:1015
__do_sys_ppoll fs/select.c:1121 [inline]
__se_sys_ppoll+0x1af/0x1f0 fs/select.c:1101
__x64_sys_ppoll+0x67/0x80 fs/select.c:1101
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00019e80 -> 0x0000cc80

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 17828 Comm: syz-executor.1 Not tainted 6.5.0-rc7-syzkaller-00185-g28f20a19294d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230830094519.950007-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-31 11:56:59 +02:00
Eric Dumazet
8c21ab1bae net/sched: fq_pie: avoid stalls in fq_pie_timer()
When setting a high number of flows (limit being 65536),
fq_pie_timer() is currently using too much time as syzbot reported.

Add logic to yield the cpu every 2048 flows (less than 150 usec
on debug kernels).
It should also help by not blocking qdisc fast paths for too long.
Worst case (65536 flows) would need 31 jiffies for a complete scan.

Relevant extract from syzbot report:

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 2663 jiffies s: 873 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 5177 Comm: syz-executor273 Not tainted 6.5.0-syzkaller-00453-g727dbda16b83 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
RIP: 0010:write_comp_data+0x21/0x90 kernel/kcov.c:236
Code: 2e 0f 1f 84 00 00 00 00 00 65 8b 05 01 b2 7d 7e 49 89 f1 89 c6 49 89 d2 81 e6 00 01 00 00 49 89 f8 65 48 8b 14 25 80 b9 03 00 <a9> 00 01 ff 00 74 0e 85 f6 74 59 8b 82 04 16 00 00 85 c0 74 4f 8b
RSP: 0018:ffffc90000007bb8 EFLAGS: 00000206
RAX: 0000000000000101 RBX: ffffc9000dc0d140 RCX: ffffffff885893b0
RDX: ffff88807c075940 RSI: 0000000000000100 RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000dc0d178
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000555555d54380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b442f6130 CR3: 000000006fe1c000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <IRQ>
 pie_calculate_probability+0x480/0x850 net/sched/sch_pie.c:415
 fq_pie_timer+0x1da/0x4f0 net/sched/sch_fq_pie.c:387
 call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700

Fixes: ec97ecf1eb ("net: sched: add Flow Queue PIE packet scheduler")
Link: https://lore.kernel.org/lkml/00000000000017ad3f06040bf394@google.com/
Reported-by: syzbot+e46fbd5289363464bc13@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230829123541.3745013-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-31 11:21:52 +02:00
Jakub Kicinski
4e60de1e47 netfilter pull request 23-08-31
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmTv1cgACgkQ1V2XiooU
 IOSkexAAr7A9dsFLmP5gESXxkMIeJ+ZdwPeKd+XYm1qe+3AFAPd6fo1UxNKIoCD2
 aEff/MJy+SImc3khnjRJ1swXZNI1FFOQMNhywqVvWdpT/Z4SB+yt3bYIXM9CG7qQ
 X1s5n8zlP1WntCk716LGdwEcaA+hcmMTgVPaVPPJOZpNBQsZaB9TjuPMcI05zTBz
 xkNwRAZFxssCwS0/bDqHH8guC95AxDdzgZRjGL9y1786y10/qNet9/WWxcx+MwTp
 K8xA3WUPWiBMcY1N1amYb44tzMLWaLedGGzcuDFMth8s+pyxGOJM/QsNUyNG0qPr
 9I4bIZWjgsi6OAFXLJQacXH0hXohChIyXFTq3yq09M5AG5EKQi0I9Da1olOtYER7
 6yvaEFQICyGWcY9eg1tlqr6ZioKx/3g9Xa54jPcldXl3U0+qhuUj6qIkXrGbnahy
 yizTpozEmMxFevdMbjCEZ6dRWixjFmB66KeVLuoyBpZXHoXvyGbCKkz/GJ45bFW8
 gVgSTLZYtQYYCWA2CIdr3ucXlzibWnhh6b+yB3IYncHUitXu029IA51eeOuuEL9N
 XAgExjAUU5GwMo6iPFEnIsPM5scYNkwnt90SaW6DaPVqAahqyZ2e84IYzw7zod2v
 z1IGGp+tjUXIsBMXzLX96JBXWlK6w5nMDnYag4UUQia3X3fUZmc=
 =2mzW
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-08-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix mangling of TCP options with non-linear skbuff, from Xiao Liang.

2) OOB read in xt_sctp due to missing sanitization of array length field.
   From Wander Lairson Costa.

3) OOB read in xt_u32 due to missing sanitization of array length field.
   Also from Wander Lairson Costa.

All of them above, always broken for several releases.

4) Missing audit log for set element reset command, from Phil Sutter.

5) Missing audit log for rule reset command, also from Phil.

These audit log support are missing in 6.5.

* tag 'nf-23-08-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: Audit log rule reset
  netfilter: nf_tables: Audit log setelem reset
  netfilter: xt_u32: validate user space input
  netfilter: xt_sctp: validate the flag_info count
  netfilter: nft_exthdr: Fix non-linear header modification
====================

Link: https://lore.kernel.org/r/20230830235935.465690-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-30 18:34:53 -07:00
Phil Sutter
ea078ae910 netfilter: nf_tables: Audit log rule reset
Resetting rules' stateful data happens outside of the transaction logic,
so 'get' and 'dump' handlers have to emit audit log entries themselves.

Fixes: 8daa8fde3f ("netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-31 01:29:28 +02:00
Phil Sutter
7e9be1124d netfilter: nf_tables: Audit log setelem reset
Since set element reset is not integrated into nf_tables' transaction
logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET
handling.

For the sake of simplicity, catchall element reset will always generate
a dedicated log entry. This relieves nf_tables_dump_set() from having to
adjust the logged element count depending on whether a catchall element
was found or not.

Fixes: 079cd63321 ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-31 01:29:27 +02:00
Wander Lairson Costa
69c5d284f6 netfilter: xt_u32: validate user space input
The xt_u32 module doesn't validate the fields in the xt_u32 structure.
An attacker may take advantage of this to trigger an OOB read by setting
the size fields with a value beyond the arrays boundaries.

Add a checkentry function to validate the structure.

This was originally reported by the ZDI project (ZDI-CAN-18408).

Fixes: 1b50b8a371 ("[NETFILTER]: Add u32 match")
Cc: stable@vger.kernel.org
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-30 17:34:01 +02:00
Wander Lairson Costa
e994764976 netfilter: xt_sctp: validate the flag_info count
sctp_mt_check doesn't validate the flag_count field. An attacker can
take advantage of that to trigger a OOB read and leak memory
information.

Add the field validation in the checkentry function.

Fixes: 2e4e6a17af ("[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables")
Cc: stable@vger.kernel.org
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-30 17:34:01 +02:00
Xiao Liang
28427f368f netfilter: nft_exthdr: Fix non-linear header modification
Fix skb_ensure_writable() size. Don't use nft_tcp_header_pointer() to
make it explicit that pointers point to the packet (not local buffer).

Fixes: 99d1712bc4 ("netfilter: exthdr: tcp option set support")
Fixes: 7890cbea66 ("netfilter: exthdr: add support for tcp option removal")
Cc: stable@vger.kernel.org
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-30 17:34:01 +02:00
Heng Guo
e4da8c7897 net: ipv4, ipv6: fix IPSTATS_MIB_OUTOCTETS increment duplicated
commit edf391ff17 ("snmp: add missing counters for RFC 4293") had
already added OutOctets for RFC 4293. In commit 2d8dbb04c6 ("snmp: fix
OutOctets counter to include forwarded datagrams"), OutOctets was
counted again, but not removed from ip_output().

According to RFC 4293 "3.2.3. IP Statistics Tables",
ipipIfStatsOutTransmits is not equal to ipIfStatsOutForwDatagrams. So
"IPSTATS_MIB_OUTOCTETS must be incremented when incrementing" is not
accurate. And IPSTATS_MIB_OUTOCTETS should be counted after fragment.

This patch reverts commit 2d8dbb04c6 ("snmp: fix OutOctets counter to
include forwarded datagrams") and move IPSTATS_MIB_OUTOCTETS to
ip_finish_output2 for ipv4.

Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
Signed-off-by: Heng Guo <heng.guo@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-30 09:44:09 +01:00
John Fastabend
35d2b7ffff bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t
Sockmap and sockhash maps are a collection of psocks that are
objects representing a socket plus a set of metadata needed
to manage the BPF programs associated with the socket. These
maps use the stab->lock to protect from concurrent operations
on the maps, e.g. trying to insert to objects into the array
at the same time in the same slot. Additionally, a sockhash map
has a bucket lock to protect iteration and insert/delete into
the hash entry.

Each psock has a psock->link which is a linked list of all the
maps that a psock is attached to. This allows a psock (socket)
to be included in multiple sockmap and sockhash maps. This
linked list is protected the psock->link_lock.

They _must_ be nested correctly to avoid deadlock:

  lock(stab->lock)
    : do BPF map operations and psock insert/delete
    lock(psock->link_lock)
       : add map to psock linked list of maps
    unlock(psock->link_lock)
  unlock(stab->lock)

For non PREEMPT_RT kernels both raw_spin_lock_t and spin_lock_t
are guaranteed to not sleep. But, with PREEMPT_RT kernels the
spin_lock_t variants may sleep. In the current code we have
many patterns like this:

   rcu_critical_section:
      raw_spin_lock(stab->lock)
         spin_lock(psock->link_lock) <- may sleep ouch
         spin_unlock(psock->link_lock)
      raw_spin_unlock(stab->lock)
   rcu_critical_section

Nesting spin_lock() inside a raw_spin_lock() violates locking
rules for PREEMPT_RT kernels. And additionally we do alloc(GFP_ATOMICS)
inside the stab->lock, but those might sleep on PREEMPT_RT kernels.
The result is splats like this:

./test_progs -t sockmap_basic
[   33.344330] bpf_testmod: loading out-of-tree module taints kernel.
[   33.441933]
[   33.442089] =============================
[   33.442421] [ BUG: Invalid wait context ]
[   33.442763] 6.5.0-rc5-01731-gec0ded2e0282 #4958 Tainted: G           O
[   33.443320] -----------------------------
[   33.443624] test_progs/2073 is trying to lock:
[   33.443960] ffff888102a1c290 (&psock->link_lock){....}-{3:3}, at: sock_map_update_common+0x2c2/0x3d0
[   33.444636] other info that might help us debug this:
[   33.444991] context-{5:5}
[   33.445183] 3 locks held by test_progs/2073:
[   33.445498]  #0: ffff88811a208d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_map_update_elem_sys+0xff/0x330
[   33.446159]  #1: ffffffff842539e0 (rcu_read_lock){....}-{1:3}, at: sock_map_update_elem_sys+0xf5/0x330
[   33.446809]  #2: ffff88810d687240 (&stab->lock){+...}-{2:2}, at: sock_map_update_common+0x177/0x3d0
[   33.447445] stack backtrace:
[   33.447655] CPU: 10 PID

To fix observe we can't readily remove the allocations (for that
we would need to use/create something similar to bpf_map_alloc). So
convert raw_spin_lock_t to spin_lock_t. We note that sock_map_update
that would trigger the allocate and potential sleep is only allowed
through sys_bpf ops and via sock_ops which precludes hw interrupts
and low level atomic sections in RT preempt kernel. On non RT
preempt kernel there are no changes here and spin locks sections
and alloc(GFP_ATOMIC) are still not sleepable.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230830053517.166611-1-john.fastabend@gmail.com
2023-08-30 09:58:42 +02:00
Tirthendu Sarkar
9d0a67b9d4 xsk: Fix xsk_build_skb() error: 'skb' dereferencing possible ERR_PTR()
Currently, xsk_build_skb() is a function that builds skb in two possible
ways and then is ended with common error handling.

We can distinguish four possible error paths and handling in xsk_build_skb():

 1. sock_alloc_send_skb fails: Retry (skb is NULL).
 2. skb_store_bits fails : Free skb and retry.
 3. MAX_SKB_FRAGS exceeded: Free skb, cleanup and drop packet.
 4. alloc_page fails for frag: Retry page allocation w/o freeing skb

1] and 3] can happen in xsk_build_skb_zerocopy(), which is one of the
two code paths responsible for building skb. Common error path in
xsk_build_skb() assumes that in case errno != -EAGAIN, skb is a valid
pointer, which is wrong as kernel test robot reports that in
xsk_build_skb_zerocopy() other errno values are returned for skb being
NULL.

To fix this, set -EOVERFLOW as error when MAX_SKB_FRAGS are exceeded
and packet needs to be dropped in both xsk_build_skb() and
xsk_build_skb_zerocopy() and use this to distinguish against all other
error cases. Also, add explicit kfree_skb() for 3] so that handling
of 1], 2], and 3] becomes identical where allocation needs to be retried.

Fixes: cf24f5a5fe ("xsk: add support for AF_XDP multi-buffer on Tx path")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Closes: https://lore.kernel.org/r/202307210434.OjgqFcbB-lkp@intel.com
Link: https://lore.kernel.org/bpf/20230823144713.2231808-1-tirthendu.sarkar@intel.com
2023-08-30 08:41:23 +02:00
Yonghong Song
32337c0a28 bpf: Prevent inlining of bpf_fentry_test7()
With latest clang18, I hit test_progs failures for the following test:

  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  #75      fentry_fexit:FAIL
  #76/1    fentry_test/fentry:FAIL
  #76      fentry_test:FAIL
  #80/1    fexit_test/fexit:FAIL
  #80      fexit_test:FAIL
  #110/1   kprobe_multi_test/skel_api:FAIL
  #110/2   kprobe_multi_test/link_api_addrs:FAIL
  #110/3   kprobe_multi_test/link_api_syms:FAIL
  #110/4   kprobe_multi_test/attach_api_pattern:FAIL
  #110/5   kprobe_multi_test/attach_api_addrs:FAIL
  #110/6   kprobe_multi_test/attach_api_syms:FAIL
  #110     kprobe_multi_test:FAIL

For example, for #13/2, the error messages are:

  [...]
  kprobe_multi_test_run:FAIL:kprobe_test7_result unexpected kprobe_test7_result: actual 0 != expected 1
  [...]
  kprobe_multi_test_run:FAIL:kretprobe_test7_result unexpected kretprobe_test7_result: actual 0 != expected 1

clang17 does not have this issue.

Further investigation shows that kernel func bpf_fentry_test7(), used in
the above tests, is inlined by the compiler although it is marked as
noinline.

  int noinline bpf_fentry_test7(struct bpf_fentry_test_t *arg)
  {
        return (long)arg;
  }

It is known that for simple functions like the above (e.g. just returning
a constant or an input argument), the clang compiler may still do inlining
for a noinline function. Adding 'asm volatile ("")' in the beginning of the
bpf_fentry_test7() can prevent inlining.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20230826200843.2210074-1-yonghong.song@linux.dev
2023-08-30 08:36:17 +02:00
Linus Torvalds
c1b7fcf3f6 for-6.6/io_uring-2023-08-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmTs06gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpq2jEACE+A8T1/teQmMA9PcIhG2gWSltlSmAKkVe
 6Yy6FaXBc7M/1yINGWU+dam5xhTshEuGulgris5Yt8VcX7eas9KQvT1NGDa2KzP4
 D44i4UbMD4z9E7arx67Eyvql0sd9LywQTa2nJB+o9yXmQUhTbkWPmMaIWmZi5QwP
 KcOqzdve+xhWSwdzNsPltB3qnma/WDtguaauh41GksKpUVe+aDi8EDEnFyRTM2Be
 HTTJEH2ZyxYwDzemR8xxr82eVz7KdTVDvn+ZoA/UM6I3SGj6SBmtj5mDLF1JKetc
 I+IazsSCAE1kF7vmDbmxYiIvmE6d6I/3zwgGrKfH4dmysqClZvJQeG3/53XzSO5N
 k0uJebB/S610EqQhAIZ1/KgjRVmrd75+2af+AxsC2pFeTaQRE4plCJgWhMbJleAE
 XBnnC7TbPOWCKaZ6a5UKu8CE1FJnEROmOASPP6tRM30ah5JhlyLU4/4ce+uUPUQo
 bhzv0uAQr5Ezs9JPawj+BTa3A4iLCpLzE+aSB46Czl166Eg57GR5tr5pQLtcwjus
 pN5BO7o4y5BeWu1XeQQObQ5OiOBXqmbcl8aQepBbU4W8qRSCMPXYWe2+8jbxk3VV
 3mZZ9iXxs71ntVDL8IeZYXH7NZK4MLIrqdeM5YwgpSioYAUjqxTm7a8I5I9NCvUx
 DZIBNk2sAA==
 =XS+G
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6/io_uring-2023-08-28' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:
 "Fairly quiet round in terms of features, mostly just improvements all
  over the map for existing code. In detail:

   - Initial support for socket operations through io_uring. Latter half
     of this will likely land with the 6.7 kernel, then allowing things
     like get/setsockopt (Breno)

   - Cleanup of the cancel code, and then adding support for canceling
     requests with the opcode as the key (me)

   - Improvements for the io-wq locking (me)

   - Fix affinity setting for SQPOLL based io-wq (me)

   - Remove the io_uring userspace code. These were added initially as
     copies from liburing, but all of them have since bitrotted and are
     way out of date at this point. Rather than attempt to keep them in
     sync, just get rid of them. People will have liburing available
     anyway for these examples. (Pavel)

   - Series improving the CQ/SQ ring caching (Pavel)

   - Misc fixes and cleanups (Pavel, Yue, me)"

* tag 'for-6.6/io_uring-2023-08-28' of git://git.kernel.dk/linux: (47 commits)
  io_uring: move iopoll ctx fields around
  io_uring: move multishot cqe cache in ctx
  io_uring: separate task_work/waiting cache line
  io_uring: banish non-hot data to end of io_ring_ctx
  io_uring: move non aligned field to the end
  io_uring: add option to remove SQ indirection
  io_uring: compact SQ/CQ heads/tails
  io_uring: force inline io_fill_cqe_req
  io_uring: merge iopoll and normal completion paths
  io_uring: reorder cqring_flush and wakeups
  io_uring: optimise extra io_get_cqe null check
  io_uring: refactor __io_get_cqe()
  io_uring: simplify big_cqe handling
  io_uring: cqe init hardening
  io_uring: improve cqe !tracing hot path
  io_uring/rsrc: Annotate struct io_mapped_ubuf with __counted_by
  io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used
  io_uring: simplify io_run_task_work_sig return
  io_uring/rsrc: keep one global dummy_ubuf
  io_uring: never overflow io_aux_cqe
  ...
2023-08-29 20:11:33 -07:00