Commit Graph

75796 Commits

Author SHA1 Message Date
Kirill A. Shutemov
5e0a760b44 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
commit 23baf831a3 ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive.  This has caused
issues with code that was not yet upstream and depended on the previous
definition.

To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.

Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-08 15:27:15 -08:00
Linus Torvalds
5db8752c3b vfs-6.8.iov_iter
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUzBQAKCRCRxhvAZXjc
 ot+3AQCZw1PBD4azVxFMWH76qwlAGoVIFug4+ogKU/iUa4VLygEA2FJh1vLJw5iI
 LpgBEIUTPVkwtzinAW94iJJo1Vr7NAI=
 =p6PB
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iov_iter cleanups from Christian Brauner:
 "This contains a minor cleanup. The patches drop an unused argument
  from import_single_range() allowing to replace import_single_range()
  with import_ubuf() and dropping import_single_range() completely"

* tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: replace import_single_range() with import_ubuf()
  iov_iter: remove unused 'iov' argument from import_single_range()
2024-01-08 11:43:04 -08:00
Linus Torvalds
c604110e66 vfs-6.8.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUxRQAKCRCRxhvAZXjc
 ov/QAQDzvge3oQ9MEymmOiyzzcF+HhAXBr+9oEsYJjFc1p0TsgEA61gXjZo7F1jY
 KBqd6znOZCR+Waj0kIVJRAo/ISRBqQc=
 =0bRl
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual fses.

  Features:

   - Add Jan Kara as VFS reviewer

   - Show correct device and inode numbers in proc/<pid>/maps for vma
     files on stacked filesystems. This is now easily doable thanks to
     the backing file work from the last cycles. This comes with
     selftests

  Cleanups:

   - Remove a redundant might_sleep() from wait_on_inode()

   - Initialize pointer with NULL, not 0

   - Clarify comment on access_override_creds()

   - Rework and simplify eventfd_signal() and eventfd_signal_mask()
     helpers

   - Process aio completions in batches to avoid needless wakeups

   - Completely decouple struct mnt_idmap from namespaces. We now only
     keep the actual idmapping around and don't stash references to
     namespaces

   - Reformat maintainer entries to indicate that a given subsystem
     belongs to fs/

   - Simplify fput() for files that were never opened

   - Get rid of various pointless file helpers

   - Rename various file helpers

   - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from
     last cycle

   - Make relatime_need_update() return bool

   - Use GFP_KERNEL instead of GFP_USER when allocating superblocks

   - Replace deprecated ida_simple_*() calls with their current ida_*()
     counterparts

  Fixes:

   - Fix comments on user namespace id mapping helpers. They aren't
     kernel doc comments so they shouldn't be using /**

   - s/Retuns/Returns/g in various places

   - Add missing parameter documentation on can_move_mount_beneath()

   - Rename i_mapping->private_data to i_mapping->i_private_data

   - Fix a false-positive lockdep warning in pipe_write() for watch
     queues

   - Improve __fget_files_rcu() code generation to improve performance

   - Only notify writer that pipe resizing has finished after setting
     pipe->max_usage otherwise writers are never notified that the pipe
     has been resized and hang

   - Fix some kernel docs in hfsplus

   - s/passs/pass/g in various places

   - Fix kernel docs in ntfs

   - Fix kcalloc() arguments order reported by gcc 14

   - Fix uninitialized value in reiserfs"

* tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  reiserfs: fix uninit-value in comp_keys
  watch_queue: fix kcalloc() arguments order
  ntfs: dir.c: fix kernel-doc function parameter warnings
  fs: fix doc comment typo fs tree wide
  selftests/overlayfs: verify device and inode numbers in /proc/pid/maps
  fs/proc: show correct device and inode numbers in /proc/pid/maps
  eventfd: Remove usage of the deprecated ida_simple_xx() API
  fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation
  fs/hfsplus: wrapper.c: fix kernel-doc warnings
  fs: add Jan Kara as reviewer
  fs/inode: Make relatime_need_update return bool
  pipe: wakeup wr_wait after setting max_usage
  file: remove __receive_fd()
  file: stop exposing receive_fd_user()
  fs: replace f_rcuhead with f_task_work
  file: remove pointless wrapper
  file: s/close_fd_get_file()/file_close_fd()/g
  Improve __fget_files_rcu() code generation (and thus __fget_light())
  file: massage cleanup of files that failed to open
  fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
  ...
2024-01-08 10:26:08 -08:00
NeilBrown
1e3577a452 SUNRPC: discard sv_refcnt, and svc_get/svc_put
sv_refcnt is no longer useful.
lockd and nfs-cb only ever have the svc active when there are a non-zero
number of threads, so sv_refcnt mirrors sv_nrthreads.

nfsd also keeps the svc active between when a socket is added and when
the first thread is started, but we don't really need a refcount for
that.  We can simply not destroy the svc while there are any permanent
sockets attached.

So remove sv_refcnt and the get/put functions.
Instead of a final call to svc_put(), call svc_destroy() instead.
This is changed to also store NULL in the passed-in pointer to make it
easier to avoid use-after-free situations.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:33 -05:00
NeilBrown
7b207ccd98 svc: don't hold reference for poolstats, only mutex.
A future patch will remove refcounting on svc_serv as it is of little
use.
It is currently used to keep the svc around while the pool_stats file is
open.
Change this to get the pointer, protected by the mutex, only in
seq_start, and the release the mutex in seq_stop.
This means that if the nfsd server is stopped and restarted while the
pool_stats file it open, then some pool stats info could be from the
first instance and some from the second.  This might appear odd, but is
unlikely to be a problem in practice.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:33 -05:00
Dai Ngo
05a4b58301 SUNRPC: remove printk when back channel request not found
If the client interface is down, or there is a network partition between
the client and server that prevents the callback request to reach the
client, TCP on the server will keep re-transmitting the callback for about
~9 minutes before giving up and closing the connection.

If the connection between the client and the server is re-established
before the connection is closed and after the callback timed out (9 secs)
then the re-transmitted callback request will arrive at the client. When
the server receives the reply of the callback, receive_cb_reply prints the
"Got unrecognized reply..." message in the system log since the callback
request was already removed from the server xprt's recv_queue.

Even though this scenario has no effect on the server operation, a
malfunctioning or malicious client can fill up the server's system log.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:33 -05:00
Chuck Lever
d3dba53410 svcrdma: Implement multi-stage Read completion again
Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (ie, the client) stops responding.
We need to go back to handling RDMA Reads by getting the svc scheduler
to call svc_rdma_recvfrom() a second time to finish building an RPC
message after a Read completion.

This is the final patch, and makes several changes that have to
happen concurrently:

1. svc_rdma_process_read_list no longer waits for a completion, but
   simply builds and posts the Read WRs.

2. svc_rdma_read_done() now queues a completed Read on
   sc_read_complete_q for later processing rather than calling
   complete().

3. The completed RPC message is no longer built in the
   svc_rdma_process_read_list() path. Finishing the message is now
   done in svc_rdma_recvfrom() when it notices work on the
   sc_read_complete_q. The "finish building this RPC message" code
   is removed from the svc_rdma_process_read_list() path.

This arrangement avoids the need for an nfsd thread to wait for an
RDMA Read non-interruptibly without a timeout. It's basically the
same code structure that Tom Tucker used for Read chunks along with
some clean-up and modernization.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:33 -05:00
Chuck Lever
ecba85e951 svcrdma: Copy construction of svc_rqst::rq_arg to rdma_read_complete()
Once a set of RDMA Reads are complete, the Read completion handler
will poke the transport to trigger a second call to
svc_rdma_recvfrom(). recvfrom() will then merge the RDMA Read
payloads with the previously received RPC header to form a completed
RPC Call message.

The new code is copied from the svc_rdma_process_read_list() path.
A subsequent patch will make use of this code and remove the code
that this was copied from (svc_rdma_rw.c).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:33 -05:00
Chuck Lever
a937693a82 svcrdma: Add back svcxprt_rdma::sc_read_complete_q
Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (ie, the client) stops responding.
We need to go back to handling RDMA Reads by allowing the nfsd
thread to return to the svc scheduler, then waking a second thread
finish the RPC message once the Read completion fires.

As a next step, add a list_head upon which completed Reads are queued.
A subsequent patch will make use of this queue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:32 -05:00
Chuck Lever
4d9d69db89 svcrdma: Add back svc_rdma_recv_ctxt::rc_pages
Having an nfsd thread waiting for an RDMA Read completion is
problematic if the Read responder (the client) stops responding. We
need to go back to handling RDMA Reads by allowing the nfsd thread
to return to the svc scheduler, then waking a second thread finish
the RPC message once the Read completion fires.

To start with, restore the rc_pages field so that RDMA Read pages
can be managed across calls to svc_rdma_recvfrom().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:32 -05:00
Chuck Lever
fc2e69db82 svcrdma: Clean up comment in svc_rdma_accept()
The comment that starts "Qualify ..." applies to only some of the
following code paragraph. Re-arrange the lines so the comment makes
more sense.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:32 -05:00
Chuck Lever
b918bfcf37 svcrdma: Remove queue-shortening warnings
These won't have much diagnostic value for site administrators.
Since they can't be disabled, they become noise.

What's more, the subsequent rdma_create_qp() call adjusts the Send
Queue size (possibly downward) without warning, making the size
reported by these pr_warns inaccurate.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:32 -05:00
Chuck Lever
913cd7668f svcrdma: Remove pointer addresses shown in dprintk()
There are a couple of dprintk() call sites in svc_rdma_accept()
that show pointer addresses. These days, displayed pointer addresses
are hashed and thus have little or no diagnostic value, especially
for site administrators.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:32 -05:00
Chuck Lever
2a95ce479e svcrdma: Optimize svc_rdma_cc_init()
The atomic_inc_return() in svc_rdma_send_cid_init() is expensive.

Some svc_rdma_chunk_ctxt's now reside in long-lived container
structures. They don't need a fresh completion ID for every I/O
operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:32 -05:00
Chuck Lever
28ee0ec894 svcrdma: De-duplicate completion ID initialization helpers
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever
018f34051b svcrdma: Move the svc_rdma_cc_init() call
Now that the chunk_ctxt for Reads is no longer dynamically allocated
it can be initialized once for the life of the object that contains
it (struct svc_rdma_recv_ctxt).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever
57666bbb4e svcrdma: Remove struct svc_rdma_read_info
The remaining fields of struct svc_rdma_read_info are no longer
referenced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever
efd02cb0dd svcrdma: Update the synopsis of svc_rdma_read_special()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_special() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever
23bab3b22d svcrdma: Update the synopsis of svc_rdma_read_call_chunk()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_call_chunk() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever
740a3c895d svcrdma: Update synopsis of svc_rdma_read_multiple_chunks()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_read_multiple_chunks() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:31 -05:00
Chuck Lever
6518204d23 svcrdma: Update synopsis of svc_rdma_copy_inline_range()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_copy_inline_range() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever
6e4b9b8643 svcrdma: Update the synopsis of svc_rdma_read_data_item()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_data_item() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever
c7eb4feb1b svcrdma: Update synopsis of svc_rdma_read_chunk_range()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever
02e8fe1eca svcrdma: Update synopsis of svc_rdma_build_read_chunk()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk() can use that recv_ctxt to derive that
information rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever
fc20f19b4d svcrdma: Update synopsis of svc_rdma_build_read_segment()
Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_segment() can use the recv_ctxt to derive that
information rather than the other way around. This removes one usage
of the ri_readctxt field, enabling its removal in a subsequent
patch.

At the same time, the use of ri_rqst can similarly be replaced with
a passed-in function parameter.

Start with build_read_segment() because it is a common utility
function at the bottom of the Read chunk path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever
919f6e790a svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxt
Further clean up: move the starting byte offset field into
svc_rdma_recv_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:30 -05:00
Chuck Lever
8e12258268 svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt
Further clean up: move the page index field into svc_rdma_recv_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever
b1818412d0 svcrdma: Start moving fields out of struct svc_rdma_read_info
Since the request's svc_rdma_recv_ctxt will stay around for the
duration of the RDMA Read operation, the contents of struct
svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt
rather than being allocated separately. This will eventually save a
call to kmalloc() in a hot path.

Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever
6a04a43493 svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.h
Prepare for nestling these into the send and recv ctxts so they
no longer have to be allocated dynamically.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever
2cc0f23b53 svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma field
In every instance, the pointer address in that field is now
available by other means.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever
bc8fd4e915 svcrdma: Pass a pointer to the transport to svc_rdma_cc_release()
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever
83fe6dd6a8 svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt()
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:29 -05:00
Chuck Lever
4a68edd93f svcrdma: Explicitly pass the transport into Read chunk I/O paths
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever
c3899b7107 svcrdma: Explicitly pass the transport into Write chunk I/O paths
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever
c4fd9f4525 svcrdma: Acquire the svcxprt_rdma pointer from the CQ context
Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a
subsequent patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever
5ef6c66676 svcrdma: Reduce size of struct svc_rdma_rw_ctxt
SG_CHUNK_SIZE is 128, making struct svc_rdma_rw_ctxt + the first
SGL array more than 4200 bytes in length, pushing the memory
allocation well into order 1.

Even so, the RDMA rw core doesn't seem to use more than max_send_sge
entries in that array (typically 32 or less), so that is all wasted
space.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever
2dd6e29a3e svcrdma: Update some svcrdma DMA-related tracepoints
A send/recv_ctxt already records transport-related information
in the cq.id, thus there is no need to record the IP addresses of
the transport endpoints.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever
848760a9e7 svcrdma: DMA error tracepoints should report completion IDs
Update the DMA error flow tracepoints to report the completion ID of
the failing context. This ties the wait/failure to a particular
operation or request, which is more useful than knowing only the
failing transport.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever
ad3656bd84 svcrdma: SQ error tracepoints should report completion IDs
Update the Send Queue's error flow tracepoints to report the
completion ID of the waiting or failing context. This ties the
wait/failure to a particular operation or request, which is a little
more useful than knowing only the transport that is about to close.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
be2acb1048 rpcrdma: Introduce a simple cid tracepoint class
De-duplicate some code, making it easier to add new tracepoints that
report only a completion ID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
907e34a7d0 svcrdma: Add lockdep class keys for transport locks
Two svcrdma-related transport locks can become quite contended.
Collate their use and make them easy to find in /proc/lock_stat for
better observability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
bfb81535c2 svcrdma: Clean up locking
There's no need to protect llist_entry() with a spin lock.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
f09c36c8df svcrdma: Add an async version of svc_rdma_write_info_free()
DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing write_info
structs to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Write completions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
ae225fe27b svcrdma: Add an async version of svc_rdma_send_ctxt_put()
DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing send_ctxts
to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Send completions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
9c7e1a0658 svcrdma: Add a utility workqueue to svcrdma
To handle work in the background, set up an UNBOUND workqueue for
svcrdma. Subsequent patches will make use of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever
877118c667 svcrdma: Pre-allocate svc_rdma_recv_ctxt objects
The original reason for allocating svc_rdma_recv_ctxt objects during
Receive completion was to ensure the objects were allocated on the
NUMA node closest to the underlying IB device.

Since commit c5d68d25bd ("svcrdma: Clean up allocation of
svc_rdma_recv_ctxt"), however, the device's favored node is
explicitly passed to the memory allocator.

To enable switching Receive completion to soft IRQ context, move
memory allocation out of completion handling, since it can be
costly, and it can sleep.

A limited number of objects is now allocated at "accept" time.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever
b541dd554b svcrdma: Eliminate allocation of recv_ctxt objects in backchannel
The svc_rdma_recv_ctxt free list uses a lockless list to avoid the
need for a spin lock in the fast path. llist_del_first(), which is
used by svc_rdma_recv_ctxt_get(), requires serialization, however,
when there are multiple list producers that are unserialized.

I mistakenly thought there was only one caller of
svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit
serialization would not be necessary. But there is another caller:
svc_rdma_bc_sendto(), and these two are not serialized against each
other. I haven't seen ill effects that I could directly ascribe to
a lack of serialization. It's just an observation based on code
audit.

When DMA-mapping before sending a Reply, the passed-in struct
svc_rdma_recv_ctxt is used only for its write and reply PCLs. These
are currently always empty in the backchannel case. So, instead of
passing a full svc_rdma_recv_ctxt object to
svc_rdma_map_reply_msg(), let's pass in just the Write and Reply
PCLs.

This change makes it unnecessary for the backchannel to acquire a
dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need
for svc_rdma_recv_ctxt free list serialization is now completely
avoided.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever
3587b5c753 SUNRPC: Remove RQ_SPLICE_OK
This flag is no longer used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever
deb704281f SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavor
NFSD will use this new API to determine whether nfsd_splice_read is
safe to use. This avoids the need to add a dependency to NFSD for
CONFIG_SUNRPC_GSS.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Eric Dumazet
d375b98e02 ip6_tunnel: fix NEXTHDR_FRAGMENT handling in ip6_tnl_parse_tlv_enc_lim()
syzbot pointed out [1] that NEXTHDR_FRAGMENT handling is broken.

Reading frag_off can only be done if we pulled enough bytes
to skb->head. Currently we might access garbage.

[1]
BUG: KMSAN: uninit-value in ip6_tnl_parse_tlv_enc_lim+0x94f/0xbb0
ip6_tnl_parse_tlv_enc_lim+0x94f/0xbb0
ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline]
ip6_tnl_start_xmit+0xab2/0x1a70 net/ipv6/ip6_tunnel.c:1432
__netdev_start_xmit include/linux/netdevice.h:4940 [inline]
netdev_start_xmit include/linux/netdevice.h:4954 [inline]
xmit_one net/core/dev.c:3548 [inline]
dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564
__dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349
dev_queue_xmit include/linux/netdevice.h:3134 [inline]
neigh_connected_output+0x569/0x660 net/core/neighbour.c:1592
neigh_output include/net/neighbour.h:542 [inline]
ip6_finish_output2+0x23a9/0x2b30 net/ipv6/ip6_output.c:137
ip6_finish_output+0x855/0x12b0 net/ipv6/ip6_output.c:222
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip6_output+0x323/0x610 net/ipv6/ip6_output.c:243
dst_output include/net/dst.h:451 [inline]
ip6_local_out+0xe9/0x140 net/ipv6/output_core.c:155
ip6_send_skb net/ipv6/ip6_output.c:1952 [inline]
ip6_push_pending_frames+0x1f9/0x560 net/ipv6/ip6_output.c:1972
rawv6_push_pending_frames+0xbe8/0xdf0 net/ipv6/raw.c:582
rawv6_sendmsg+0x2b66/0x2e70 net/ipv6/raw.c:920
inet_sendmsg+0x105/0x190 net/ipv4/af_inet.c:847
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
__sys_sendmsg net/socket.c:2667 [inline]
__do_sys_sendmsg net/socket.c:2676 [inline]
__se_sys_sendmsg net/socket.c:2674 [inline]
__x64_sys_sendmsg+0x307/0x490 net/socket.c:2674
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Uninit was created at:
slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768
slab_alloc_node mm/slub.c:3478 [inline]
__kmem_cache_alloc_node+0x5c9/0x970 mm/slub.c:3517
__do_kmalloc_node mm/slab_common.c:1006 [inline]
__kmalloc_node_track_caller+0x118/0x3c0 mm/slab_common.c:1027
kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582
pskb_expand_head+0x226/0x1a00 net/core/skbuff.c:2098
__pskb_pull_tail+0x13b/0x2310 net/core/skbuff.c:2655
pskb_may_pull_reason include/linux/skbuff.h:2673 [inline]
pskb_may_pull include/linux/skbuff.h:2681 [inline]
ip6_tnl_parse_tlv_enc_lim+0x901/0xbb0 net/ipv6/ip6_tunnel.c:408
ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline]
ip6_tnl_start_xmit+0xab2/0x1a70 net/ipv6/ip6_tunnel.c:1432
__netdev_start_xmit include/linux/netdevice.h:4940 [inline]
netdev_start_xmit include/linux/netdevice.h:4954 [inline]
xmit_one net/core/dev.c:3548 [inline]
dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564
__dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349
dev_queue_xmit include/linux/netdevice.h:3134 [inline]
neigh_connected_output+0x569/0x660 net/core/neighbour.c:1592
neigh_output include/net/neighbour.h:542 [inline]
ip6_finish_output2+0x23a9/0x2b30 net/ipv6/ip6_output.c:137
ip6_finish_output+0x855/0x12b0 net/ipv6/ip6_output.c:222
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip6_output+0x323/0x610 net/ipv6/ip6_output.c:243
dst_output include/net/dst.h:451 [inline]
ip6_local_out+0xe9/0x140 net/ipv6/output_core.c:155
ip6_send_skb net/ipv6/ip6_output.c:1952 [inline]
ip6_push_pending_frames+0x1f9/0x560 net/ipv6/ip6_output.c:1972
rawv6_push_pending_frames+0xbe8/0xdf0 net/ipv6/raw.c:582
rawv6_sendmsg+0x2b66/0x2e70 net/ipv6/raw.c:920
inet_sendmsg+0x105/0x190 net/ipv4/af_inet.c:847
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
__sys_sendmsg net/socket.c:2667 [inline]
__do_sys_sendmsg net/socket.c:2676 [inline]
__se_sys_sendmsg net/socket.c:2674 [inline]
__x64_sys_sendmsg+0x307/0x490 net/socket.c:2674
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b

CPU: 0 PID: 7345 Comm: syz-executor.3 Not tainted 6.7.0-rc8-syzkaller-00024-gac865f00af29 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023

Fixes: fbfa743a9d ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07 15:27:29 +00:00
David Howells
4fc68c4c1a rxrpc: Fix skbuff cleanup of call's recvmsg_queue and rx_oos_queue
Fix rxrpc_cleanup_ring() to use rxrpc_purge_queue() rather than
skb_queue_purge() so that the count of outstanding skbuffs is correctly
updated when a failed call is cleaned up.

Without this rmmod may hang waiting for rxrpc_n_rx_skbs to become zero.

Fixes: 5d7edbc923 ("rxrpc: Get rid of the Rx ring")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07 15:25:30 +00:00
Zhengchao Shao
c4a5ee9c09 fib: rules: remove repeated assignment in fib_nl2rule
In fib_nl2rule(), 'err' variable has been set to -EINVAL during
declaration, and no need to set the 'err' variable to -EINVAL again.
So, remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07 15:16:19 +00:00
Pedro Tammela
405cd9fc6f net/sched: simplify tc_action_load_ops parameters
Instead of using two bools derived from a flags passed as arguments to
the parent function of tc_action_load_ops, just pass the flags itself
to tc_action_load_ops to simplify its parameters.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07 14:58:26 +00:00
Ahmed Zaki
948f97f9d8 net: ethtool: reject unsupported RSS input xfrm values
RXFH input_xfrm currently has three supported values: 0 (clear all),
symmetric_xor and NO_CHANGE.

Reject any other value sent from user-space.

Fixes: 13e59344fb ("net: ethtool: add support for symmetric-xor RSS hash")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20240104212653.394424-1-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 19:23:15 -08:00
Jakub Kicinski
8158a50f90 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZZgrfgAKCRDbK58LschI
 g87JAQDu+oUG3aWnRJi+lJTK8vGnKRuBwUxgnI5Ze99N0tuPmAEAz1gpXLYP+fKE
 eqRhZGGhujdHC9if3Le+nG6nvf8Gvw0=
 =KPkZ
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-01-05

We've added 40 non-merge commits during the last 2 day(s) which contain
a total of 73 files changed, 1526 insertions(+), 951 deletions(-).

The main changes are:

1) Fix a memory leak when streaming AF_UNIX sockets were inserted
   into multiple sockmap slots/maps, from John Fastabend.

2) Fix gotol in s390 BPF JIT with large offsets, from Ilya Leoshkevich.

3) Fix reattachment branch in bpf_tracing_prog_attach() and reject
   the request if there is no valid attach_btf, from Jiri Olsa.

4) Remove deprecated bpfilter kernel leftovers given the project
   is developed in user space (https://github.com/facebook/bpfilter),
   from Quentin Deslandes.

5) Relax tracing BPF program recursive attach rules given right now
   it is not possible to create tracing program call cycles,
   from Dmitrii Dolgov.

6) Fix excessive memory consumption for the bpf_global_percpu_ma
   for systems with a large number of CPUs, from Yonghong Song.

7) Small x86 BPF JIT cleanup to reuse emit_nops instead of open-coding
   memcpy of x86_nops, from Leon Hwang.

8) Follow-up for libbpf to support __arg_ctx global function argument tag
   semantics to complement the merged kernel side, from Andrii Nakryiko.

9) Introduce "volatile compare" macros for BPF selftests in order
   to make the latter more robust against compiler optimization,
   from Alexei Starovoitov.

10) Small simplification in verifier's size checking of helper accesses
    along with additional selftests, from Andrei Matei.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (40 commits)
  selftests/bpf: Test re-attachment fix for bpf_tracing_prog_attach
  bpf: Fix re-attachment branch in bpf_tracing_prog_attach
  selftests/bpf: Add test for recursive attachment of tracing progs
  bpf: Relax tracing prog recursive attach rules
  bpf, x86: Use emit_nops to replace memcpy x86_nops
  selftests/bpf: Test gotol with large offsets
  selftests/bpf: Double the size of test_loader log
  s390/bpf: Fix gotol with large offsets
  bpfilter: remove bpfilter
  bpf: Remove unnecessary cpu == 0 check in memalloc
  selftests/bpf: add __arg_ctx BTF rewrite test
  selftests/bpf: add arg:ctx cases to test_global_funcs tests
  libbpf: implement __arg_ctx fallback logic
  libbpf: move BTF loading step after relocation step
  libbpf: move exception callbacks assignment logic into relocation step
  libbpf: use stable map placeholder FDs
  libbpf: don't rely on map->fd as an indicator of map being created
  libbpf: use explicit map reuse flag to skip map creation steps
  libbpf: make uniform use of btf__fd() accessor inside libbpf
  selftests/bpf: Add a selftest with > 512-byte percpu allocation size
  ...
====================

Link: https://lore.kernel.org/r/20240105170105.21070-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 19:15:32 -08:00
Richard Gobert
dff0b0161a net: gro: parse ipv6 ext headers without frag0 invalidation
The existing code always pulls the IPv6 header and sets the transport
offset initially. Then optionally again pulls any extension headers in
ipv6_gso_pull_exthdrs and sets the transport offset again on return from
that call. skb->data is set at the start of the first extension header
before calling ipv6_gso_pull_exthdrs, and must disable the frag0
optimization because that function uses pskb_may_pull/pskb_pull instead of
skb_gro_ helpers. It sets the GRO offset to the TCP header with
skb_gro_pull and sets the transport header. Then returns skb->data to its
position before this block.

This commit introduces a new helper function - ipv6_gro_pull_exthdrs -
which is used in ipv6_gro_receive to pull ipv6 ext headers instead of
ipv6_gso_pull_exthdrs. Thus, there is no modification of skb->data, all
operations use skb_gro_* helpers, and the frag0 fast path can be taken for
IPv6 packets with ext headers.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/504130f6-b56c-4dcc-882c-97942c59f5b7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 08:11:49 -08:00
Richard Gobert
f2e3fc2158 net: gso: add HBH extension header offload support
This commit adds net_offload to IPv6 Hop-by-Hop extension headers (as it
is done for routing and dstopts) since it is supported in GSO and GRO.
This allows to remove specific HBH conditionals in GSO and GRO when
pulling and parsing an incoming packet.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/d4f8825a-1d55-4b12-9d67-a254dbbfa6ae@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 08:11:49 -08:00
Tao Liu
3f14b377d0 net/sched: act_ct: fix skb leak and crash on ooo frags
act_ct adds skb->users before defragmentation. If frags arrive in order,
the last frag's reference is reset in:

  inet_frag_reasm_prepare
    skb_morph

which is not straightforward.

However when frags arrive out of order, nobody unref the last frag, and
all frags are leaked. The situation is even worse, as initiating packet
capture can lead to a crash[0] when skb has been cloned and shared at the
same time.

Fix the issue by removing skb_get() before defragmentation. act_ct
returns TC_ACT_CONSUMED when defrag failed or in progress.

[0]:
[  843.804823] ------------[ cut here ]------------
[  843.809659] kernel BUG at net/core/skbuff.c:2091!
[  843.814516] invalid opcode: 0000 [#1] PREEMPT SMP
[  843.819296] CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G S 6.7.0-rc3 #2
[  843.824107] Hardware name: XFUSION 1288H V6/BC13MBSBD, BIOS 1.29 11/25/2022
[  843.828953] RIP: 0010:pskb_expand_head+0x2ac/0x300
[  843.833805] Code: 8b 70 28 48 85 f6 74 82 48 83 c6 08 bf 01 00 00 00 e8 38 bd ff ff 8b 83 c0 00 00 00 48 03 83 c8 00 00 00 e9 62 ff ff ff 0f 0b <0f> 0b e8 8d d0 ff ff e9 b3 fd ff ff 81 7c 24 14 40 01 00 00 4c 89
[  843.843698] RSP: 0018:ffffc9000cce07c0 EFLAGS: 00010202
[  843.848524] RAX: 0000000000000002 RBX: ffff88811a211d00 RCX: 0000000000000820
[  843.853299] RDX: 0000000000000640 RSI: 0000000000000000 RDI: ffff88811a211d00
[  843.857974] RBP: ffff888127d39518 R08: 00000000bee97314 R09: 0000000000000000
[  843.862584] R10: 0000000000000000 R11: ffff8881109f0000 R12: 0000000000000880
[  843.867147] R13: ffff888127d39580 R14: 0000000000000640 R15: ffff888170f7b900
[  843.871680] FS:  0000000000000000(0000) GS:ffff889ffffc0000(0000) knlGS:0000000000000000
[  843.876242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  843.880778] CR2: 00007fa42affcfb8 CR3: 000000011433a002 CR4: 0000000000770ef0
[  843.885336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  843.889809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  843.894229] PKRU: 55555554
[  843.898539] Call Trace:
[  843.902772]  <IRQ>
[  843.906922]  ? __die_body+0x1e/0x60
[  843.911032]  ? die+0x3c/0x60
[  843.915037]  ? do_trap+0xe2/0x110
[  843.918911]  ? pskb_expand_head+0x2ac/0x300
[  843.922687]  ? do_error_trap+0x65/0x80
[  843.926342]  ? pskb_expand_head+0x2ac/0x300
[  843.929905]  ? exc_invalid_op+0x50/0x60
[  843.933398]  ? pskb_expand_head+0x2ac/0x300
[  843.936835]  ? asm_exc_invalid_op+0x1a/0x20
[  843.940226]  ? pskb_expand_head+0x2ac/0x300
[  843.943580]  inet_frag_reasm_prepare+0xd1/0x240
[  843.946904]  ip_defrag+0x5d4/0x870
[  843.950132]  nf_ct_handle_fragments+0xec/0x130 [nf_conntrack]
[  843.953334]  tcf_ct_act+0x252/0xd90 [act_ct]
[  843.956473]  ? tcf_mirred_act+0x516/0x5a0 [act_mirred]
[  843.959657]  tcf_action_exec+0xa1/0x160
[  843.962823]  fl_classify+0x1db/0x1f0 [cls_flower]
[  843.966010]  ? skb_clone+0x53/0xc0
[  843.969173]  tcf_classify+0x24d/0x420
[  843.972333]  tc_run+0x8f/0xf0
[  843.975465]  __netif_receive_skb_core+0x67a/0x1080
[  843.978634]  ? dev_gro_receive+0x249/0x730
[  843.981759]  __netif_receive_skb_list_core+0x12d/0x260
[  843.984869]  netif_receive_skb_list_internal+0x1cb/0x2f0
[  843.987957]  ? mlx5e_handle_rx_cqe_mpwrq_rep+0xfa/0x1a0 [mlx5_core]
[  843.991170]  napi_complete_done+0x72/0x1a0
[  843.994305]  mlx5e_napi_poll+0x28c/0x6d0 [mlx5_core]
[  843.997501]  __napi_poll+0x25/0x1b0
[  844.000627]  net_rx_action+0x256/0x330
[  844.003705]  __do_softirq+0xb3/0x29b
[  844.006718]  irq_exit_rcu+0x9e/0xc0
[  844.009672]  common_interrupt+0x86/0xa0
[  844.012537]  </IRQ>
[  844.015285]  <TASK>
[  844.017937]  asm_common_interrupt+0x26/0x40
[  844.020591] RIP: 0010:acpi_safe_halt+0x1b/0x20
[  844.023247] Code: ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 65 48 8b 04 25 00 18 03 00 48 8b 00 a8 08 75 0c 66 90 0f 00 2d 81 d0 44 00 fb f4 <fa> c3 0f 1f 00 89 fa ec 48 8b 05 ee 88 ed 00 a9 00 00 00 80 75 11
[  844.028900] RSP: 0018:ffffc90000533e70 EFLAGS: 00000246
[  844.031725] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0000000000000000
[  844.034553] RDX: ffff889ffffc0000 RSI: ffffffff828b7f20 RDI: ffff88a090f45c64
[  844.037368] RBP: ffff88a0901a2800 R08: ffff88a090f45c00 R09: 00000000000317c0
[  844.040155] R10: 00ec812281150475 R11: ffff889fffff0e04 R12: ffffffff828b7fa0
[  844.042962] R13: ffffffff828b7f20 R14: 0000000000000001 R15: 0000000000000000
[  844.045819]  acpi_idle_enter+0x7b/0xc0
[  844.048621]  cpuidle_enter_state+0x7f/0x430
[  844.051451]  cpuidle_enter+0x2d/0x40
[  844.054279]  do_idle+0x1d4/0x240
[  844.057096]  cpu_startup_entry+0x2a/0x30
[  844.059934]  start_secondary+0x104/0x130
[  844.062787]  secondary_startup_64_no_verify+0x16b/0x16b
[  844.065674]  </TASK>

Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: Tao Liu <taoliu828@163.com>
Link: https://lore.kernel.org/r/20231228081457.936732-1-taoliu828@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 08:08:24 -08:00
Jakub Kicinski
cb42010690 net: fill in MODULE_DESCRIPTION()s for CAIF
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to all the CAIF sub-modules.

Link: https://lore.kernel.org/r/20240104144855.1320993-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 08:06:35 -08:00
Jakub Kicinski
b8549d8598 net: fill in MODULE_DESCRIPTION() for AF_PACKET
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add description to net/packet/af_packet.c

Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240104144119.1319055-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 08:06:35 -08:00
Jakub Kicinski
0ed6e95255 net: fill in MODULE_DESCRIPTION()s for DSA tags
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to all the DSA tag modules.

The descriptions are copy/pasted Kconfig names, with s/^Tag/DSA tag/.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20240104143759.1318137-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 08:06:19 -08:00
Jakub Kicinski
fc0caed81b net: fill in MODULE_DESCRIPTION()s for ATM
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to all the ATM modules and drivers.

Link: https://lore.kernel.org/r/20240104143737.1317945-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05 08:04:23 -08:00
Jiri Pirko
94e2557d08 net: sched: move block device tracking into tcf_block_get/put_ext()
Inserting the device to block xarray in qdisc_create() is not suitable
place to do this. As it requires use of tcf_block() callback, it causes
multiple issues. It is called for all qdisc types, which is incorrect.

So, instead, move it to more suitable place, which is tcf_block_get_ext()
and make sure it is only done for qdiscs that use block infrastructure
and also only for blocks which are shared.

Symmetrically, alter the cleanup path, move the xarray entry removal
into tcf_block_put_ext().

Fixes: 913b47d342 ("net/sched: Introduce tc block netdev tracking infra")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Closes: https://lore.kernel.org/all/ZY1hBb8GFwycfgvd@shredder/
Reported-by: Kui-Feng Lee <sinquersw@gmail.com>
Closes: https://lore.kernel.org/all/ce8d3e55-b8bc-409c-ace9-5cf1c4f7c88e@gmail.com/
Reported-and-tested-by: syzbot+84339b9e7330daae4d66@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000007c85f5060dcc3a28@google.com/
Reported-and-tested-by: syzbot+806b0572c8d06b66b234@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/00000000000082f2f2060dcc3a92@google.com/
Reported-and-tested-by: syzbot+0039110f932d438130f9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000007fbc8c060dcc3a5c@google.com/
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-05 11:19:08 +00:00
Jakub Kicinski
e63c1822ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  e009b2efb7 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()")
  0f2b214779 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL")
https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 18:06:46 -08:00
Jakub Kicinski
a180b0b1a6 Just a couple of more things over the holidays:
- first kunit tests for both cfg80211 and mac80211
  - a few multi-link fixes
  - DSCP mapping update
  - RCU fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmWVcYQACgkQ10qiO8sP
 aABEdw//VOD/aR+ZXNwZBcRoJufHYCHu0h3gIKqEcB+W+e7dFs8OYQryHP/jo3AI
 brOjcP9Upooyk6h7TNszL/YvjLjRmEFnaz3mEb41xy62M2NGHyPcA5lB9J5gphpY
 uurhcj+SfxZeB0/YIVPR4Bwf/RTbDkWzJIWJ/f7mfXk2ELQby1ohzYCezK6p6f/p
 vP7w0zk4xFFKZChbyrccBGHl5/Q6oOFOeuwXE4h0J1skBWUCfxjUInGiU7fuQEWb
 FhJUzB1WSJPVKPzWuLC4bUvAcrwfv4JHCcjWECQyPr38+wrEj4DZle8XtLN/dEUB
 aE/2wKyFcbHmGTqRiBuaPhms439WKbave1yDEdOmuvcgyj2AMNkY6hod9Q5bWZJr
 L+MvJ50tTKJcUQNayA7pyGMdLr7lvMxkMsLPgPKGSLBMp/uqW/4SioXO4AqCfL7p
 T3Vw9Z1SMFI5WvSSRssIL8Sbl0LS+vMGEN76HZmxs5m4d0f5Hv9RRFFDmNqFyD4G
 iqKmYPi1XJc7QsoHtonyx+kByImFdl839RsU7WPh9GS045yNwlC3eWhZMgbuF/ol
 1iWNPcXogh9ABh8zgFaeWrLnDj2Kcix6vtec//7YM5cTYklyr0Ruo3bwKR3N00C3
 hNh5zVLd/LHoce6B0V5HAq+ajMnnGw6CT6pM7quF5srAJZLIKLM=
 =dHxp
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Just a couple of more things over the holidays:

 - first kunit tests for both cfg80211 and mac80211
 - a few multi-link fixes
 - DSCP mapping update
 - RCU fix

* tag 'wireless-next-2024-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next:
  wifi: mac80211: remove redundant ML element check
  wifi: cfg80211: parse all ML elements in an ML probe response
  wifi: cfg80211: correct comment about MLD ID
  wifi: cfg80211: Update the default DSCP-to-UP mapping
  wifi: cfg80211: tests: add some scanning related tests
  wifi: mac80211: kunit: extend MFP tests
  wifi: mac80211: kunit: generalize public action test
  wifi: mac80211: add kunit tests for public action handling
  kunit: add a convenience allocation wrapper for SKBs
  kunit: add parameter generation macro using description from array
  wifi: mac80211: fix spelling typo in comment
  wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update
====================

Link: https://lore.kernel.org/r/20240103144423.52269-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 17:00:08 -08:00
Linus Torvalds
1f874787ed Including fixes from wireless and netfilter.
Current release - regressions:
 
  - Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum
    required", it caused issues on networks where routers send prefixes
    with preferred_lft=0
 
  - wifi:
    - iwlwifi: pcie: don't synchronize IRQs from IRQ, prevent deadlock
    - mac80211: fix re-adding debugfs entries during reconfiguration
 
 Current release - new code bugs:
 
  - tcp: print AO/MD5 messages only if there are any keys
 
 Previous releases - regressions:
 
  - virtio_net: fix missing dma unmap for resize, prevent OOM
 
 Previous releases - always broken:
 
  - mptcp: prevent tcp diag from closing listener subflows
 
  - nf_tables:
    - set transport header offset for egress hook, fix IPv4 mangling
    - skip set commit for deleted/destroyed sets, avoid double deactivation
 
  - nat: make sure action is set for all ct states, fix openvswitch
    matching on ICMP packets in related state
 
  - eth: mlxbf_gige: fix receive hang under heavy traffic
 
  - eth: r8169: fix PCI error on system resume for RTL8168FP
 
  - net: add missing getsockopt(SO_TIMESTAMPING_NEW) and cmsg handling
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmWW9ngACgkQMUZtbf5S
 Iru1fQ//fP/aAFlv65Vm5EQKFUxw4jGuNz214xQ7kaufyVxSy7CcUSRIZX7JLHmJ
 URRgIrYtuqoovIeNp5WqLNDcQgiuhpnnQamE2jnH4JiNdEtWADhUFwQVNle+zd6u
 9oGVSgOMYi10z26CPeQXTf97OtZH1HmowmnzdjvvgD0oUuCbxBfsfVjn7flnNY9O
 EePeMVasoFxFJasx1YnlNcVDAJsh3P/idp4nEkCrYcyBCebr8TkYFIDKy9q7U+xi
 +RfBwD5pEdBQD7bWTF9UlAM9R8bOVTQDsnjFhgXl+YFchg9RySi7ZYQCYWItIpEk
 oadYV4Bw4y9IFqoMDPKOsFCQvESNetSys7zIL9QoPDp1eEEl1JinYwdkz3xq5SE/
 sWN6XgERLOZHu5FlTwyEE4CHKofzW6wViFHsPnGSbdyfOjpnB1twuf/3hZWBMTuU
 Iza1m7kTjWuQTI9H8z4AuWL3Kyhn6ocGy2S0QJJNyUkBJ2w8/rFMUrSIsRjoUUX4
 y+UBznUt5OvsXyGlwISat/dYqtS5h7oVbAmLIlYi1yVURYQArUFTueeUy6y00YOd
 OymE3vOonoXxCbBNuHXWbd9C+RZZrPMoWab3K9DAvLXdZx1UHlDnscesMjTexwkB
 NxWAWobYyXBUCqyfNGsSzZ5Lc07w7ppqPP5uYQK7XGFWC+wJev0=
 =+se8
 -----END PGP SIGNATURE-----

Merge tag 'net-6.7-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and netfilter.

  We haven't accumulated much over the break. If it wasn't for the
  uninterrupted stream of fixes for Intel drivers this PR would be very
  slim. There was a handful of user reports, however, either they stood
  out because of the lower traffic or users have had more time to test
  over the break. The ones which are v6.7-relevant should be wrapped up.

  Current release - regressions:

   - Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum
     required", it caused issues on networks where routers send prefixes
     with preferred_lft=0

   - wifi:
      - iwlwifi: pcie: don't synchronize IRQs from IRQ, prevent deadlock
      - mac80211: fix re-adding debugfs entries during reconfiguration

  Current release - new code bugs:

   - tcp: print AO/MD5 messages only if there are any keys

  Previous releases - regressions:

   - virtio_net: fix missing dma unmap for resize, prevent OOM

  Previous releases - always broken:

   - mptcp: prevent tcp diag from closing listener subflows

   - nf_tables:
      - set transport header offset for egress hook, fix IPv4 mangling
      - skip set commit for deleted/destroyed sets, avoid double deactivation

   - nat: make sure action is set for all ct states, fix openvswitch
     matching on ICMP packets in related state

   - eth: mlxbf_gige: fix receive hang under heavy traffic

   - eth: r8169: fix PCI error on system resume for RTL8168FP

   - net: add missing getsockopt(SO_TIMESTAMPING_NEW) and cmsg handling"

* tag 'net-6.7-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  net/tcp: Only produce AO/MD5 logs if there are any keys
  net: Implement missing SO_TIMESTAMPING_NEW cmsg support
  bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
  net: ravb: Wait for operating mode to be applied
  asix: Add check for usbnet_get_endpoints
  octeontx2-af: Re-enable MAC TX in otx2_stop processing
  octeontx2-af: Always configure NIX TX link credits based on max frame size
  net/smc: fix invalid link access in dumping SMC-R connections
  net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
  virtio_net: fix missing dma unmap for resize
  igc: Fix hicredit calculation
  ice: fix Get link status data length
  i40e: Restore VF MSI-X state during PCI reset
  i40e: fix use-after-free in i40e_aqc_add_filters()
  net: Save and restore msg_namelen in sock_sendmsg
  netfilter: nft_immediate: drop chain reference counter on error
  netfilter: nf_nat: fix action not being set for all ct states
  net: bcmgenet: Fix FCS generation for fragmented skbuffs
  mptcp: prevent tcp diag from closing listener subflows
  MAINTAINERS: add Geliang as reviewer for MPTCP
  ...
2024-01-04 16:34:50 -08:00
Jakub Kicinski
fe1eb24bd5 Revert "Introduce PHY listing and link_topology tracking"
This reverts commit 32bb4515e3.
This reverts commit d078d48063.
This reverts commit fcc4b105ca.
This reverts commit 345237dbc1.
This reverts commit 7db69ec9cf.
This reverts commit 95132a018f.
This reverts commit 63d5eaf35a.
This reverts commit c29451aefc.
This reverts commit 2ab0edb505.
This reverts commit dedd702a35.
This reverts commit 034fcc2103.
This reverts commit 9c5625f559.
This reverts commit 02018c544e.

Looks like we need more time for reviews, and incremental
changes will be hard to make sense of. So revert.

Link: https://lore.kernel.org/all/ZZP6FV5sXEf+xd58@shell.armlinux.org.uk/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 16:05:47 -08:00
Jakub Kicinski
172b3fccf5 This pull request mainly brings support for dynamic associations in the
WPAN world. Thanks to the recent improvements it was possible to
 discover nearby devices, it is now also possible to associate with them
 to form a sub-network using a specific PAN ID. The support includes
 several functions, such as:
 * Requesting an association to a coordinator, waiting for the response
 * Sending a disassociation notification to a coordinator
 * Receiving an association request when we are coordinator, answering
   the request (for now all devices are accepted up to a limit, to be
   refined)
 * Sending a disassociation notification to a child
 * Users may request the list of associated devices (the parent and the
   children).
 
 Here are a few example of userspace calls that can be made:
 iwpan dev <dev> associate pan_id 2 coord $COORD
 iwpan dev <dev> list_associations
 iwpan dev <dev> disassociate ext_addr $COORD
 
 There are as well two patches from Uwe turning remove callbacks into
 void functions.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmWCqiYACgkQJWrqGEe9
 VoQxEwf+KzU7esLoTgtIRG+wIQyMaBm8qZ/YNdWwgEiD+kwbybdy258etavumIMN
 HSKxRypXqJwR+R4b9l6yhrLmwn83HNR0L1oq6sHooBD+Zr90Ac8NNgONcF30iRQS
 jSiQ3A7H4kTqEaSsZe3olLotVnhTrLvL/TCXt83FPst551Fj0VSDyRnrbETEmq5F
 s+WJcB8rGYWY6avv5CbWcUDOPPq7yiZlpgcA+qTIDFMUfD90p+xqUZQ2hZKlscMB
 Ia/FPW3gUufWFGd16d1WBaHD38DoLobfo/6Ou/Sufp+ErW17ZOGl4wNnViXlOpwD
 0BUEO6vhtAx+83VYAC522PABWHkxyw==
 =THkg
 -----END PGP SIGNATURE-----

Merge tag 'ieee802154-for-net-next-2023-12-20' of gitolite.kernel.org:pub/scm/linux/kernel/git/wpan/wpan-next

Miquel Raynal says:

====================
This pull request mainly brings support for dynamic associations in
the WPAN world. Thanks to the recent improvements it was possible to
discover nearby devices, it is now also possible to associate with them
to form a sub-network using a specific PAN ID. The support includes
several functions, such as:

* Requesting an association to a coordinator, waiting for the response
* Sending a disassociation notification to a coordinator
* Receiving an association request when we are coordinator, answering
  the request (for now all devices are accepted up to a limit, to be
  refined)
* Sending a disassociation notification to a child
* Users may request the list of associated devices (the parent and the
  children).

Here are a few example of userspace calls that can be made:
 # iwpan dev <dev> associate pan_id 2 coord $COORD
 # iwpan dev <dev> list_associations
 # iwpan dev <dev> disassociate ext_addr $COORD

There are as well two patches from Uwe turning remove callbacks into
void functions.

* tag 'ieee802154-for-net-next-2023-12-20' of gitolite.kernel.org:pub/scm/linux/kernel/git/wpan/wpan-next:
  mac802154: Avoid new associations while disassociating
  ieee802154: Avoid confusing changes after associating
  mac802154: Only allow PAN controllers to process association requests
  mac802154: Use the PAN coordinator parameter when stamping packets
  mac80254: Provide real PAN coordinator info in beacons
  ieee802154: Give the user the association list
  mac802154: Handle disassociation notifications from peers
  mac802154: Follow the number of associated devices
  ieee802154: Add support for limiting the number of associated devices
  mac802154: Handle association requests from peers
  mac802154: Handle disassociations
  ieee802154: Add support for user disassociation requests
  mac802154: Handle associating
  ieee802154: Add support for user association requests
  ieee802154: Internal PAN management
  ieee802154: Let PAN IDs be reset
  ieee802154: hwsim: Convert to platform remove callback returning void
  ieee802154: fakelb: Convert to platform remove callback returning void
====================

Link: https://lore.kernel.org/r/20231220095556.4d9cef91@xps-13
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 14:20:14 -08:00
Benjamin Coddington
57331a59ac NFSv4.1: Use the nfs_client's rpc timeouts for backchannel
For backchannel requests that lookup the appropriate nfs_client, use the
state-management rpc_clnt's rpc_timeout parameters for the backchannel's
response.  When the nfs_client cannot be found, fall back to using the
xprt's default timeout parameters.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04 17:01:01 -05:00
Benjamin Coddington
e6f533b615 SUNRPC: Fixup v4.1 backchannel request timeouts
After commit 59464b262f ("SUNRPC: SOFTCONN tasks should time out when on
the sending list"), any 4.1 backchannel tasks placed on the sending queue
would immediately return with -ETIMEDOUT since their req timers are zero.

Initialize the backchannel's rpc_rqst timeout parameters from the xprt's
default timeout settings.

Fixes: 59464b262f ("SUNRPC: SOFTCONN tasks should time out when on the sending list")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04 17:00:52 -05:00
Quentin Deslandes
98e20e5e13 bpfilter: remove bpfilter
bpfilter was supposed to convert iptables filtering rules into
BPF programs on the fly, from the kernel, through a usermode
helper. The base code for the UMH was introduced in 2018, and
couple of attempts (2, 3) tried to introduce the BPF program
generate features but were abandoned.

bpfilter now sits in a kernel tree unused and unusable, occasionally
causing confusion amongst Linux users (4, 5).

As bpfilter is now developed in a dedicated repository on GitHub (6),
it was suggested a couple of times this year (LSFMM/BPF 2023,
LPC 2023) to remove the deprecated kernel part of the project. This
is the purpose of this patch.

[1]: https://lore.kernel.org/lkml/20180522022230.2492505-1-ast@kernel.org/
[2]: https://lore.kernel.org/bpf/20210829183608.2297877-1-me@ubique.spb.ru/#t
[3]: https://lore.kernel.org/lkml/20221224000402.476079-1-qde@naccy.de/
[4]: https://dxuuu.xyz/bpfilter.html
[5]: https://github.com/linuxkit/linuxkit/pull/3904
[6]: https://github.com/facebook/bpfilter

Signed-off-by: Quentin Deslandes <qde@naccy.de>
Link: https://lore.kernel.org/r/20231226130745.465988-1-qde@naccy.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-04 10:23:10 -08:00
Thomas Lange
382a32018b net: Implement missing SO_TIMESTAMPING_NEW cmsg support
Commit 9718475e69 ("socket: Add SO_TIMESTAMPING_NEW") added the new
socket option SO_TIMESTAMPING_NEW. However, it was never implemented in
__sock_cmsg_send thus breaking SO_TIMESTAMPING cmsg for platforms using
SO_TIMESTAMPING_NEW.

Fixes: 9718475e69 ("socket: Add SO_TIMESTAMPING_NEW")
Link: https://lore.kernel.org/netdev/6a7281bf-bc4a-4f75-bb88-7011908ae471@app.fastmail.com/
Signed-off-by: Thomas Lange <thomas@corelatus.se>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240104085744.49164-1-thomas@corelatus.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 08:18:55 -08:00
Olga Kornievskaia
98b4e51375 SUNRPC: fix _xprt_switch_find_current_entry logic
Fix the logic for picking current transport entry.

Fixes: 95d0d30c66 ("SUNRPC create an iterator to list only OFFLINE xprts")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04 10:47:56 -05:00
Anna Schumaker
31b6290869 SUNRPC: Fix a suspicious RCU usage warning
I received the following warning while running cthon against an ontap
server running pNFS:

[   57.202521] =============================
[   57.202522] WARNING: suspicious RCU usage
[   57.202523] 6.7.0-rc3-g2cc14f52aeb7 #41492 Not tainted
[   57.202525] -----------------------------
[   57.202525] net/sunrpc/xprtmultipath.c:349 RCU-list traversed in non-reader section!!
[   57.202527]
               other info that might help us debug this:

[   57.202528]
               rcu_scheduler_active = 2, debug_locks = 1
[   57.202529] no locks held by test5/3567.
[   57.202530]
               stack backtrace:
[   57.202532] CPU: 0 PID: 3567 Comm: test5 Not tainted 6.7.0-rc3-g2cc14f52aeb7 #41492 5b09971b4965c0aceba19f3eea324a4a806e227e
[   57.202534] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
[   57.202536] Call Trace:
[   57.202537]  <TASK>
[   57.202540]  dump_stack_lvl+0x77/0xb0
[   57.202551]  lockdep_rcu_suspicious+0x154/0x1a0
[   57.202556]  rpc_xprt_switch_has_addr+0x17c/0x190 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
[   57.202596]  rpc_clnt_setup_test_and_add_xprt+0x50/0x180 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
[   57.202621]  ? rpc_clnt_add_xprt+0x254/0x300 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
[   57.202646]  rpc_clnt_add_xprt+0x27a/0x300 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
[   57.202671]  ? __pfx_rpc_clnt_setup_test_and_add_xprt+0x10/0x10 [sunrpc ebe02571b9a8ceebf7d98e71675af20c19bdb1f6]
[   57.202696]  nfs4_pnfs_ds_connect+0x345/0x760 [nfsv4 c716d88496ded0ea6d289bbea684fa996f9b57a9]
[   57.202728]  ? __pfx_nfs4_test_session_trunk+0x10/0x10 [nfsv4 c716d88496ded0ea6d289bbea684fa996f9b57a9]
[   57.202754]  nfs4_fl_prepare_ds+0x75/0xc0 [nfs_layout_nfsv41_files e3a4187f18ae8a27b630f9feae6831b584a9360a]
[   57.202760]  filelayout_write_pagelist+0x4a/0x200 [nfs_layout_nfsv41_files e3a4187f18ae8a27b630f9feae6831b584a9360a]
[   57.202765]  pnfs_generic_pg_writepages+0xbe/0x230 [nfsv4 c716d88496ded0ea6d289bbea684fa996f9b57a9]
[   57.202788]  __nfs_pageio_add_request+0x3fd/0x520 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
[   57.202813]  nfs_pageio_add_request+0x18b/0x390 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
[   57.202831]  nfs_do_writepage+0x116/0x1e0 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
[   57.202849]  nfs_writepages_callback+0x13/0x30 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
[   57.202866]  write_cache_pages+0x265/0x450
[   57.202870]  ? __pfx_nfs_writepages_callback+0x10/0x10 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
[   57.202891]  nfs_writepages+0x141/0x230 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
[   57.202913]  do_writepages+0xd2/0x230
[   57.202917]  ? filemap_fdatawrite_wbc+0x5c/0x80
[   57.202921]  filemap_fdatawrite_wbc+0x67/0x80
[   57.202924]  filemap_write_and_wait_range+0xd9/0x170
[   57.202930]  nfs_wb_all+0x49/0x180 [nfs 6c976fa593a7c2976f5a0aeb4965514a828e6902]
[   57.202947]  nfs4_file_flush+0x72/0xb0 [nfsv4 c716d88496ded0ea6d289bbea684fa996f9b57a9]
[   57.202969]  __se_sys_close+0x46/0xd0
[   57.202972]  do_syscall_64+0x68/0x100
[   57.202975]  ? do_syscall_64+0x77/0x100
[   57.202976]  ? do_syscall_64+0x77/0x100
[   57.202979]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[   57.202982] RIP: 0033:0x7fe2b12e4a94
[   57.202985] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 18 0e 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 c3
[   57.202987] RSP: 002b:00007ffe857ddb38 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
[   57.202989] RAX: ffffffffffffffda RBX: 00007ffe857dfd68 RCX: 00007fe2b12e4a94
[   57.202991] RDX: 0000000000002000 RSI: 00007ffe857ddc40 RDI: 0000000000000003
[   57.202992] RBP: 00007ffe857dfc50 R08: 7fffffffffffffff R09: 0000000065650f49
[   57.202993] R10: 00007fe2b11f8300 R11: 0000000000000202 R12: 0000000000000000
[   57.202994] R13: 00007ffe857dfd80 R14: 00007fe2b1445000 R15: 0000000000000000
[   57.202999]  </TASK>

The problem seems to be that two out of three callers aren't taking the
rcu_read_lock() before calling the list_for_each_entry_rcu() function in
rpc_xprt_switch_has_addr(). I fix this by having
rpc_xprt_switch_has_addr() unconditionaly take the rcu_read_lock(),
which is okay to do recursively in the case that the lock has already
been taken by a caller.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04 10:47:56 -05:00
Anna Schumaker
a902f3dec7 SUNRPC: Create a helper function for accessing the rpc_clnt's xprt_switch
This function takes the necessary rcu read lock to dereference the
client's rpc_xprt_switch and bump the reference count so it doesn't
disappear underneath us before returning. This does mean that callers
are responsible for calling xprt_switch_put() on the returned object
when they are done with it.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04 10:47:56 -05:00
Anna Schumaker
5f1e77b228 SUNRPC: Remove unused function rpc_clnt_xprt_switch_put()
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04 10:47:56 -05:00
Anna Schumaker
ec677b58f6 SUNRPC: Clean up unused variable in rpc_xprt_probe_trunked()
We don't use the rpc_xprt_switch anywhere in this function, so let's not
take an extra reference to in unnecessarily.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04 10:47:56 -05:00
Eric Dumazet
a562c0a2d6 sctp: fix busy polling
Busy polling while holding the socket lock makes litle sense,
because incoming packets wont reach our receive queue.

Fixes: 8465a5fcd1 ("sctp: add support for busy polling to sctp protocol")
Reported-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 10:29:18 +00:00
Mina Almasry
b15a4cfe10 net: kcm: fix direct access to bv_len
Minor fix for kcm: code wanting to access the fields inside an skb
frag should use the skb_frag_*() helpers, instead of accessing the
fields directly.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://lore.kernel.org/r/20240102205959.794513-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:37:22 -08:00
Mina Almasry
06d9b446c4 vsock/virtio: use skb_frag_*() helpers
Minor fix for virtio: code wanting to access the fields inside an skb
frag should use the skb_frag_*() helpers, instead of accessing the
fields directly. This allows for extensions where the underlying
memory is not a page.

Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://lore.kernel.org/r/20240102205905.793738-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:37:16 -08:00
Pedro Tammela
530496985c net/sched: sch_api: conditional netlink notifications
Implement conditional netlink notifications for Qdiscs and classes,
which were missing in the initial patches that targeted tc filters and
actions. Notifications will only be built after passing a check for
'rtnl_notify_needed()'.

For both Qdiscs and classes 'get' operations now call a dedicated
notification function as it was not possible to distinguish between
'create' and 'get' before. This distinction is necessary because 'get'
always send a notification.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231229132642.1489088-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:36:30 -08:00
Pedro Tammela
c2a67de9bb net/sched: introduce ACT_P_BOUND return code
Bound actions always return '0' and as of today we rely on '0'
being returned in order to properly skip bound actions in
tcf_idr_insert_many. In order to further improve maintainability,
introduce the ACT_P_BOUND return code.

Actions are updated to return 'ACT_P_BOUND' instead of plain '0'.
tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:36:24 -08:00
Eric Dumazet
d3d344a1ca net-device: move xdp_prog to net_device_read_rx
xdp_prog is used in receive path, both from XDP enabled drivers
and from netif_elide_gro().

This patch also removes two 4-bytes holes.

Fixes: 43a71cd66b ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240102162220.750823-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:09:14 -08:00
Jakub Kicinski
cbc74fc025 netfilter pull request 24-01-03
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmWVRBgACgkQ1V2XiooU
 IOQKmQ//RxsxlOcFc0R1HbwUDduB31Yl05A30FbPmzN7Ma9I/XT3oPpWxExcBb6z
 baGjK7rlfJk6BOfwo8sHr+Fsz6nvnKTvKzxdNMRt42KD0KS+x/YjOWYaBJcULWRd
 4zRNQe5bWBu/BWBna05YnuQ0w0u3aXw6F/IWt9d+lObqILSpvNTk9Ju8vHjmOxWO
 pa5JhtIhrPNAp+DOaSiCR4wA/XJnj9+Io0h65Cq6GM/GZYeV18fNID6e22IIfojQ
 GAg6FjS4zeROAk+/iymaAtV9hbnXNLIeJwJVI34edJPjbWK7kzuxpRd1l8WzBU+P
 rNWcYJTxALsMh4Ger+oaSXhIExvTJr3yJpZWtwBXGKL9SDbKBJlCiO1I/+fkr4N8
 wfxKzC93AzLdRze4CK8r36veuZaAbsQuhgA3W1RiTZjdwBdo0CHJe8tIW4/qxXgE
 4F1vjdoA2q7u0DM+GVZ6FWe1B3mQWb3XD42WzdUJeHhpfKFxQ61mQ+35+TqNYq6+
 TeNleGF7BkpAjsFl0Gadhj9TQYSAPpY6rFaRzPjy1aFvHZdmPZhhyr8kCZQDjQHC
 zWIxYwfy2WyT4FLEkJsJxCwc674ehnJpMSWKt1/vduZ8o+fBasqQaikKu0I/01p1
 dTRp6pBC2Sazz582DFKwUVyL4R72uOf6NcFDsM2DMNZRijHWUMY=
 =BsTS
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix nat packets in the related state in OVS, from Brad Cowie.

2) Drop chain reference counter on error path in case chain binding
   fails.

* tag 'nf-24-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_immediate: drop chain reference counter on error
  netfilter: nf_nat: fix action not being set for all ct states
====================

Link: https://lore.kernel.org/r/20240103113001.137936-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:05:23 -08:00
Wen Gu
9dbe086c69 net/smc: fix invalid link access in dumping SMC-R connections
A crash was found when dumping SMC-R connections. It can be reproduced
by following steps:

- environment: two RNICs on both sides.
- run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group
  will be created.
- set the first RNIC down on either side and link group will turn to
  SMC_LGR_ASYMMETRIC_LOCAL then.
- run 'smcss -R' and the crash will be triggered.

 BUG: kernel NULL pointer dereference, address: 0000000000000010
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W   E      6.7.0-rc6+ #51
 RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
 Call Trace:
  <TASK>
  ? __die+0x24/0x70
  ? page_fault_oops+0x66/0x150
  ? exc_page_fault+0x69/0x140
  ? asm_exc_page_fault+0x26/0x30
  ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
  smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
  smc_diag_dump+0x26/0x60 [smc_diag]
  netlink_dump+0x19f/0x320
  __netlink_dump_start+0x1dc/0x300
  smc_diag_handler_dump+0x6a/0x80 [smc_diag]
  ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
  sock_diag_rcv_msg+0x121/0x140
  ? __pfx_sock_diag_rcv_msg+0x10/0x10
  netlink_rcv_skb+0x5a/0x110
  sock_diag_rcv+0x28/0x40
  netlink_unicast+0x22a/0x330
  netlink_sendmsg+0x240/0x4a0
  __sock_sendmsg+0xb0/0xc0
  ____sys_sendmsg+0x24e/0x300
  ? copy_msghdr_from_user+0x62/0x80
  ___sys_sendmsg+0x7c/0xd0
  ? __do_fault+0x34/0x1a0
  ? do_read_fault+0x5f/0x100
  ? do_fault+0xb0/0x110
  __sys_sendmsg+0x4d/0x80
  do_syscall_64+0x45/0xf0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

When the first RNIC is set down, the lgr->lnk[0] will be cleared and an
asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1]
by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections
in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting
in this issue. So fix it by accessing the right link.

Fixes: f16a7dd5cf ("smc: netlink interface for SMC sockets")
Reported-by: henaumars <henaumars@sina.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1703662835-53416-1-git-send-email-guwen@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 16:53:17 -08:00
John Fastabend
16b2f26498 bpf: sockmap, fix proto update hook to avoid dup calls
When sockets are added to a sockmap or sockhash we allocate and init a
psock. Then update the proto ops with sock_map_init_proto the flow is

  sock_hash_update_common
    sock_map_link
      psock = sock_map_psock_get_checked() <-returns existing psock
      sock_map_init_proto(sk, psock)       <- updates sk_proto

If the socket is already in a map this results in the sock_map_init_proto
being called multiple times on the same socket. We do this because when
a socket is added to multiple maps this might result in a new set of BPF
programs being attached to the socket requiring an updated ops struct.

This creates a rule where it must be safe to call psock_update_sk_prot
multiple times. When we added a fix for UAF through unix sockets in patch
4dd9a38a753fc we broke this rule by adding a sock_hold in that path
to ensure the sock is not released. The result is if a af_unix stream sock
is placed in multiple maps it results in a memory leak because we call
sock_hold multiple times with only a single sock_put on it.

Fixes: 8866730aed ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock")
Reported-by: Xingwei Lee <xrivendell7@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20231221232327.43678-2-john.fastabend@gmail.com
2024-01-03 16:50:06 -08:00
Zhengchao Shao
b4c1d4d973 fib: remove unnecessary input parameters in fib_default_rule_add
When fib_default_rule_add is invoked, the value of the input parameter
'flags' is always 0. Rules uses kzalloc to allocate memory, so 'flags' has
been initialized to 0. Therefore, remove the input parameter 'flags' in
fib_default_rule_add.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240102071519.3781384-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 16:42:48 -08:00
Johannes Berg
3aca362a4c wifi: mac80211: remove redundant ML element check
If "ml_basic" is assigned, we already know that the type
of ML element is basic, so we don't need to check again,
that check can never happen. Simplify the code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240102213313.bb9b636e66f6.I7fc0897022142d46f39ac0b912a4f7b0f1b6ea26@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:35:38 +01:00
Benjamin Berg
d18125b640 wifi: cfg80211: parse all ML elements in an ML probe response
A probe response from a transmitting AP in an Multi-BSSID setup will
contain more than one Multi-Link element. Most likely, only one of these
elements contains per-STA profiles.

Fixes: 2481b5da9c ("wifi: cfg80211: handle BSS data contained in ML probe responses")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240102213313.6635eb152735.I94289002d4a2f7b6b44dfa428344854e37b0b29c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:35:36 +01:00
Benjamin Berg
2a0698f86d wifi: cfg80211: correct comment about MLD ID
The comment was referencing the wrong section of the documentation and
was also subtly wrong as it assumed the rules that apply when sending
probe requests directly to a nontransmitted AP. However, in that case
the response comes from the transmitting AP and the AP MLD ID will be
included.

Fixes: 2481b5da9c ("wifi: cfg80211: handle BSS data contained in ML probe responses")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240102213313.0917ab4b5d7f.I76aff0e261a5de44ffb467e591a46597a30d7c0a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:35:31 +01:00
Ilan Peer
6fdb8b8781 wifi: cfg80211: Update the default DSCP-to-UP mapping
The default DSCP-to-UP mapping method defined in RFC8325
applied to packets marked per recommendations in RFC4594 and
destined to 802.11 WLAN clients will yield a number of inconsistent
QoS mappings.

To handle this, modify the mapping of specific DSCP values for
which the default mapping will create inconsistencies, based on
the recommendations in section 4 in RFC8325.

Note: RFC8235 is used as it referenced by both IEEE802.11Revme_D4.0
and WFA QoS Management Specification.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231218093005.3064013-1-ilan.peer@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:35:26 +01:00
Benjamin Berg
9d027a35a5 wifi: cfg80211: tests: add some scanning related tests
This adds some scanning related tests, mainly exercising the ML element
parsing and inheritance.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://msgid.link/20231220151952.415232-7-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:35:22 +01:00
Johannes Berg
bbd97bbed0 wifi: mac80211: kunit: extend MFP tests
Extend the MFP tests to handle the case of deauth/disassoc
and robust action frames (that are not protected dual of
public action frames).

Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://msgid.link/20231220151952.415232-6-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:35:19 +01:00
Johannes Berg
951c4684a3 wifi: mac80211: kunit: generalize public action test
Generalize the test to be able to handle arbitrary
action categories and non-action frames, for further
test expansion.

Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://msgid.link/20231220151952.415232-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:35:13 +01:00
Johannes Berg
0738e55c38 wifi: mac80211: add kunit tests for public action handling
Check the logic in ieee80211_drop_unencrypted_mgmt()
according to a list of test cases derived from the
spec.

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://msgid.link/20231220151952.415232-4-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:35:09 +01:00
Zheng tan
a5bb4e1a37 wifi: mac80211: fix spelling typo in comment
Fix spelling of "attributes" in a comment.

Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Zheng tan <tanzheng@kylinos.cn>
Link: https://msgid.link/20240102015418.3673858-1-tanzheng@kylinos.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:34:56 +01:00
Edward Adam Davis
1184950e34 wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update
Replace rcu_dereference() with rcu_access_pointer() since we hold
the lock here (and aren't in an RCU critical section).

Fixes: 32af9a9e10 ("wifi: cfg80211: free beacon_ies when overridden from hidden BSS")
Reported-and-tested-by: syzbot+864a269c27ee06b58374@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Link: https://msgid.link/tencent_BF8F0DF0258C8DBF124CDDE4DD8D992DCF07@qq.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-01-03 15:34:53 +01:00
Lin Ma
2ab1efad60 net/sched: cls_api: complement tcf_tfilter_dump_policy
In function `tc_dump_tfilter`, the attributes array is parsed via
tcf_tfilter_dump_policy which only describes TCA_DUMP_FLAGS. However,
the NLA TCA_CHAIN is also accessed with `nla_get_u32`.

The access to TCA_CHAIN is introduced in commit 5bc1701881 ("net:
sched: introduce multichain support for filters") and no nla_policy is
provided for parsing at that point. Later on, tcf_tfilter_dump_policy is
introduced in commit f8ab1807a9 ("net: sched: introduce terse dump
flag") while still ignoring the fact that TCA_CHAIN needs a check. This
patch does that by complementing the policy to allow the access
discussed here can be safe as other cases just choose rtm_tca_policy as
the parsing policy.

Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-03 11:54:39 +00:00
Marc Dionne
01b2885d94 net: Save and restore msg_namelen in sock_sendmsg
Commit 86a7e0b69b ("net: prevent rewrite of msg_name in
sock_sendmsg()") made sock_sendmsg save the incoming msg_name pointer
and restore it before returning, to insulate the caller against
msg_name being changed by the called code.  If the address length
was also changed however, we may return with an inconsistent structure
where the length doesn't match the address, and attempts to reuse it may
lead to lost packets.

For example, a kernel that doesn't have commit 1c5950fc6f ("udp6: fix
potential access to stale information") will replace a v4 mapped address
with its ipv4 equivalent, and shorten namelen accordingly from 28 to 16.
If the caller attempts to reuse the resulting msg structure, it will have
the original ipv6 (v4 mapped) address but an incorrect v4 length.

Fixes: 86a7e0b69b ("net: prevent rewrite of msg_name in sock_sendmsg()")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-03 11:37:57 +00:00
Pablo Neira Ayuso
b29be0ca8e netfilter: nft_immediate: drop chain reference counter on error
In the init path, nft_data_init() bumps the chain reference counter,
decrement it on error by following the error path which calls
nft_data_release() to restore it.

Fixes: 4bedf9eee0 ("netfilter: nf_tables: fix chain binding transaction logic")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-03 11:17:17 +01:00