1250765 Commits

Author SHA1 Message Date
Josef Bacik
4b14885411 nfsd: make all of the nfsd stats per-network namespace
We have a global set of counters that we modify for all of the nfsd
operations, but now that we're exposing these stats across all network
namespaces we need to make the stats also be per-network namespace.  We
already have some caching stats that are per-network namespace, so move
these definitions into the same counter and then adjust all the helpers
and users of these stats to provide the appropriate nfsd_net struct so
that the stats are maintained for the per-network namespace objects.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:10 -05:00
Josef Bacik
93483ac5fe nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
We are running nfsd servers inside of containers with their own network
namespace, and we want to monitor these services using the stats found
in /proc.  However these are not exposed in the proc inside of the
container, so we have to bind mount the host /proc into our containers
to get at this information.

Separate out the stat counters init and the proc registration, and move
the proc registration into the pernet operations entry and exit points
so that these stats can be exposed inside of network namespaces.

This is an intermediate step, this just exposes the global counters in
the network namespace.  Subsequent patches will move these counters into
the per-network namespace container.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:10 -05:00
Josef Bacik
d98416cc21 nfsd: rename NFSD_NET_* to NFSD_STATS_*
We're going to merge the stats all into per network namespace in
subsequent patches, rename these nn counters to be consistent with the
rest of the stats.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:10 -05:00
Josef Bacik
418b9687de sunrpc: use the struct net as the svc proc private
nfsd is the only thing using this helper, and it doesn't use the private
currently.  When we switch to per-network namespace stats we will need
the struct net * in order to get to the nfsd_net.  Use the net as the
proc private so we can utilize this when we make the switch over.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:09 -05:00
Josef Bacik
3f6ef182f1 sunrpc: remove ->pg_stats from svc_program
Now that this isn't used anywhere, remove it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:09 -05:00
Josef Bacik
f094323867 sunrpc: pass in the sv_stats struct through svc_create_pooled
Since only one service actually reports the rpc stats there's not much
of a reason to have a pointer to it in the svc_program struct.  Adjust
the svc_create_pooled function to take the sv_stats as an argument and
pass the struct through there as desired instead of getting it from the
svc_program->pg_stats.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:08 -05:00
Josef Bacik
a2214ed588 nfsd: stop setting ->pg_stats for unused stats
A lot of places are setting a blank svc_stats in ->pg_stats and never
utilizing these stats.  Remove all of these extra structs as we're not
reporting these stats anywhere.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:08 -05:00
Josef Bacik
ab42f4d9a2 sunrpc: don't change ->sv_stats if it doesn't exist
We check for the existence of ->sv_stats elsewhere except in the core
processing code.  It appears that only nfsd actual exports these values
anywhere, everybody else just has a write only copy of sv_stats in their
svc_program.  Add a check for ->sv_stats before every adjustment to
allow us to eliminate the stats struct from all the users who don't
report the stats.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:07 -05:00
Jorge Mora
31e4bb8fb8 NFSD: fix LISTXATTRS returning more bytes than maxcount
The maxcount is the maximum number of bytes for the LISTXATTRS4resok
result. This includes the cookie and the count for the name array,
thus subtract 12 bytes from the maxcount: 8 (cookie) + 4 (array count)
when filling up the name array.

Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:07 -05:00
Jorge Mora
2f73f37d66 NFSD: fix LISTXATTRS returning a short list with eof=TRUE
If the XDR buffer is not large enough to fit all attributes
and the remaining bytes left in the XDR buffer (xdrleft) is
equal to the number of bytes for the current attribute, then
the loop will prematurely exit without setting eof to FALSE.
Also in this case, adding the eof flag to the buffer will
make the reply 4 bytes larger than lsxa_maxcount.

Need to check if there are enough bytes to fit not only the
next attribute name but also the eof as well.

Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:07 -05:00
Jorge Mora
61ab5e0758 NFSD: change LISTXATTRS cookie encoding to big-endian
Function nfsd4_listxattr_validate_cookie() expects the cookie
as an offset to the list thus it needs to be encoded in big-endian.

Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:06 -05:00
Jorge Mora
52a357db80 NFSD: fix nfsd4_listxattr_validate_cookie
If LISTXATTRS is sent with a correct cookie but a small maxcount,
this could lead function nfsd4_listxattr_validate_cookie to
return NFS4ERR_BAD_COOKIE. If maxcount = 20, then second check
on function gives RHS = 3 thus any cookie larger than 3 returns
NFS4ERR_BAD_COOKIE.

There is no need to validate the cookie on the return XDR buffer
since attribute referenced by cookie will be the first in the
return buffer.

Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:06 -05:00
NeilBrown
5ff318f645 nfsd: use __fput_sync() to avoid delayed closing of files.
Calling fput() directly or though filp_close() from a kernel thread like
nfsd causes the final __fput() (if necessary) to be called from a
workqueue.  This means that nfsd is not forced to wait for any work to
complete.  If the ->release or ->destroy_inode function is slow for any
reason, this can result in nfsd closing files more quickly than the
workqueue can complete the close and the queue of pending closes can
grow without bounces (30 million has been seen at one customer site,
though this was in part due to a slowness in xfs which has since been
fixed).

nfsd does not need this.  It is quite appropriate and safe for nfsd to
do its own close work.  There is no reason that close should ever wait
for nfsd, so no deadlock can occur.

It should be safe and sensible to change all fput() calls to
__fput_sync().  However in the interests of caution this patch only
changes two - the two that can be most directly affected by client
behaviour and could occur at high frequency.

- the fput() implicitly in flip_close() is changed to __fput_sync()
  by calling get_file() first to ensure filp_close() doesn't do
  the final fput() itself.  If is where files opened for IO are closed.

- the fput() in nfsd_read() is also changed.  This is where directories
  opened for readdir are closed.

This ensure that minimal fput work is queued to the workqueue.

This removes the need for the flush_delayed_fput() call in
nfsd_file_close_inode_sync()

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:05 -05:00
NeilBrown
ffb4025961 nfsd: Don't leave work of closing files to a work queue
The work of closing a file can have non-trivial cost.  Doing it in a
separate work queue thread means that cost isn't imposed on the nfsd
threads and an imbalance can be created.  This can result in files being
queued for the work queue more quickly that the work queue can process
them, resulting in unbounded growth of the queue and memory exhaustion.

To avoid this work imbalance that exhausts memory, this patch moves all
closing of files into the nfsd threads.  This means that when the work
imposes a cost, that cost appears where it would be expected - in the
work of the nfsd thread.  A subsequent patch will ensure the final
__fput() is called in the same (nfsd) thread which calls filp_close().

Files opened for NFSv3 are never explicitly closed by the client and are
kept open by the server in the "filecache", which responds to memory
pressure, is garbage collected even when there is no pressure, and
sometimes closes files when there is particular need such as for rename.
These files currently have filp_close() called in a dedicated work
queue, so their __fput() can have no effect on nfsd threads.

This patch discards the work queue and instead has each nfsd thread call
flip_close() on as many as 8 files from the filecache each time it acts
on a client request (or finds there are no pending client requests).  If
there are more to be closed, more threads are woken.  This spreads the
work of __fput() over multiple threads and imposes any cost on those
threads.

The number 8 is somewhat arbitrary.  It needs to be greater than 1 to
ensure that files are closed more quickly than they can be added to the
cache.  It needs to be small enough to limit the per-request delays that
will be imposed on clients when all threads are busy closing files.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:05 -05:00
Chuck Lever
561141dd49 SUNRPC: Use a static buffer for the checksum initialization vector
Allocating and zeroing a buffer during every call to
krb5_etm_checksum() is inefficient. Instead, set aside a static
buffer that is the maximum crypto block size, and use a portion
(or all) of that.

Reported-by: Markus Elfring <Markus.Elfring@web.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:05 -05:00
Zhipeng Lu
3cfcfc102a SUNRPC: fix some memleaks in gssx_dec_option_array
The creds and oa->data need to be freed in the error-handling paths after
their allocation. So this patch add these deallocations in the
corresponding paths.

Fixes: 1d658336b05f ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:04 -05:00
Zhipeng Lu
e67b652d8e SUNRPC: fix a memleak in gss_import_v2_context
The ctx->mech_used.data allocated by kmemdup is not freed in neither
gss_import_v2_context nor it only caller gss_krb5_import_sec_context,
which frees ctx on error.

Thus, this patch reform the last call of gss_import_v2_context to the
gss_krb5_import_ctx_v2, preventing the memleak while keepping the return
formation.

Fixes: 47d848077629 ("gss_krb5: handle new context format from gssd")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:04 -05:00
Linus Torvalds
d206a76d7d Linux 6.8-rc6 v6.8-rc6 2024-02-25 15:46:06 -08:00
Linus Torvalds
e231dbd452 bcachefs fixes for 6.8-rc6
Some more mostly boring fixes, but some not
 
 User reported ones:
  - the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty performance
    bug; user reported an unter initially taking 2 seconds and then ~2
    minutes
 
  - kill a __GFP_NOFAIL in the buffered read path; this was a leftover
    from the trickier fix to kill __GFP_NOFAIL in readahead, where we
    can't return errors (and have to silently truncate the read
    ourselves).
 
    bcachefs can't use GFP_NOFAIL for folio state unlike iomap based
    filesystems because our folio state is just barely too big, 2MB
    hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL.
 
    additionally, the flags argument was just buggy, we weren't supplying
    GFP_KERNEL previously (!).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmXbqqMACgkQE6szbY3K
 bnYjnhAApY0vT6eVIYrZ7JGR6tw++xw02xRkcNW4zFE8INAvxQor5TXMEKkJs9Ui
 owh8WZjydXe0FJPE+pROcHMfxkkup4yP2SafgzR8DGERBwZbV9x7hvUbdG90EngY
 V/MevV+vr6UaV7133sY70K8BqUA/yAlCmmtOQVFgGRprEtEPS4Ur3vYR5+IzA0N7
 OhNXu6LxzkYbrNp9qroCN2UEVgRDJ/Mtda6uHfIUrqOQMUhiq2og9kvzJXzIrW9l
 URxm4eFQtJe0Yz09Ppypve+FutJIbtuDEYbcMJNT9Ig7BosD5vDjy9nhp8A5Q1Uk
 oDWBbCJhDdSYSVC/EQY8bv0AaCkyCa7vshSoKq0fDCFJ8k+nQ1YMF5wNhfgJhtU9
 Tl2Qytphp9/dxkvpIsR/5iNhLply9xTka1Wkp3G+3QJk0c17Dftpvz0/WhKI0P2B
 d6y4mz/hfCtWoSQOJbJl3fM/ZVpjH54VHDmb7sGyb5f+bTUkX6OUoJ4os8MNKGcS
 GdpEoWt/IAQj69c7w8aama5TXJ4kYe0XtXwbHTRE4j1PIQJA5SPvVt+32spRtb6i
 1gIa94uWKYMuG2U0XGxookHfZZZaMQkl79oXJOYRiC589YVyZC1Lp5iqr027jHEQ
 1HacrWPekPfmrhchyIzpH1mHOgaS+FKoD7eKrkvj0QSxpwfwpbI=
 =KNWR
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Some more mostly boring fixes, but some not

  User reported ones:

   - the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty
     performance bug; user reported an untar initially taking two
     seconds and then ~2 minutes

   - kill a __GFP_NOFAIL in the buffered read path; this was a leftover
     from the trickier fix to kill __GFP_NOFAIL in readahead, where we
     can't return errors (and have to silently truncate the read
     ourselves).

     bcachefs can't use GFP_NOFAIL for folio state unlike iomap based
     filesystems because our folio state is just barely too big, 2MB
     hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL.

     additionally, the flags argument was just buggy, we weren't
     supplying GFP_KERNEL previously (!)"

* tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: fix bch2_save_backtrace()
  bcachefs: Fix check_snapshot() memcpy
  bcachefs: Fix bch2_journal_flush_device_pins()
  bcachefs: fix iov_iter count underflow on sub-block dio read
  bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
  bcachefs: Kill __GFP_NOFAIL in buffered read path
  bcachefs: fix backpointer_to_text() when dev does not exist
2024-02-25 15:31:57 -08:00
Kent Overstreet
5197728f81 bcachefs: fix bch2_save_backtrace()
Missed a call in the previous fix.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-25 15:45:36 -05:00
Linus Torvalds
70ff1fe626 Two documentation build fixes:
- The XFS online fsck documentation uses incredibly deeply nested
   subsection and list nesting; that broke the PDF docs build.  Tweak a
   parameter to tell LaTeX to allow the deeper nesting.
 
 - Fix a 6.8 PDF-build regression
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmXbi5QPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YZSMH/RIZh48S/Jh5mhjzqnKhGf1sFn6lSk8sFY3I
 uJqML/LPo6GYzX8WvYKlfyP9+UvrLiDcQF0Er6MeIhK6mhKE1Lp7w1YvRgeXcgFR
 H9DtxA4fJSGWlAaMqZBwsXjF2EFwjyxHtHUeNyaJ+YocHfrT6L9Cp9uBEvdT3Iye
 F191VpjWLrFD0DJEh64CcmNd3rggN5jeD/n24dbNOmnem1cak2brIIUeltdkUmQG
 48Hr27xqYF1QyVckfoRtnT/C3AyaCKbxRbTxeAjwUDjU+7nCsHf1MKltiFAZHnFs
 7ZLsOboLhmR+y9xiZUg7OlpRaVj1C+7JSYC+WSaNjwRkkIfJUu4=
 =MEzm
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.8-fixes3' of git://git.lwn.net/linux

Pull two documentation build fixes from Jonathan Corbet:

 - The XFS online fsck documentation uses incredibly deeply nested
   subsection and list nesting; that broke the PDF docs build. Tweak a
   parameter to tell LaTeX to allow the deeper nesting.

 - Fix a 6.8 PDF-build regression

* tag 'docs-6.8-fixes3' of git://git.lwn.net/linux:
  docs: translations: use attribute to store current language
  docs: Instruct LaTeX to cope with deeper nesting
2024-02-25 10:58:12 -08:00
Linus Torvalds
c46ac50ebe USB fixes for 6.8-rc6
Here are some small USB fixes for 6.8-rc6 to resolve some reported
 problems.  These include:
   - regression fixes with typec tpcm code as reported by many
   - cdnsp and cdns3 driver fixes
   - usb role setting code bugfixes
   - build fix for uhci driver
   - ncm gadget driver bugfix
   - MAINTAINERS entry update
 
 All of these have been in linux-next all week with no reported issues
 and there is at least one fix in here that is in Thorsten's regression
 list that is being tracked.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZdtGEA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymzsgCg2IsWqIR72XUGsa5rrbRnskOP/G4An24BmUb6
 t34d0VjiHagZTFlfRx6g
 =eOL1
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for 6.8-rc6 to resolve some reported
  problems. These include:

   - regression fixes with typec tpcm code as reported by many

   - cdnsp and cdns3 driver fixes

   - usb role setting code bugfixes

   - build fix for uhci driver

   - ncm gadget driver bugfix

   - MAINTAINERS entry update

  All of these have been in linux-next all week with no reported issues
  and there is at least one fix in here that is in Thorsten's regression
  list that is being tracked"

* tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tpcm: Fix issues with power being removed during reset
  MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers
  usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
  Revert "usb: typec: tcpm: reset counter when enter into unattached state after try role"
  usb: gadget: omap_udc: fix USB gadget regression on Palm TE
  usb: dwc3: gadget: Don't disconnect if not started
  usb: cdns3: fix memory double free when handle zero packet
  usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
  usb: roles: don't get/set_role() when usb_role_switch is unregistered
  usb: roles: fix NULL pointer issue when put module's reference
  usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers
  usb: cdnsp: blocked some cdns3 specific code
  usb: uhci-grlib: Explicitly include linux/platform_device.h
2024-02-25 10:41:57 -08:00
Linus Torvalds
1e592e9536 TTY/Serial driver fixes for 6.8-rc6
Here are 3 small serial/tty driver fixes for 6.8-rc6 that resolve the
 following reported errors:
   - riscv hvc console driver fix that was reported by many
   - amba-pl011 serial driver fix for RS485 mode
   - stm32 serial driver fix for RS485 mode
 
 All of these have been in linux-next all week with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZdtGnA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymrqwCfSIsUj9GLazXJTTTgMz1I94HXLrQAnjq9QOtg
 EFt6xmUGcF4zFhnfSLal
 =/k5+
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
 "Here are three small serial/tty driver fixes for 6.8-rc6 that resolve
  the following reported errors:

   - riscv hvc console driver fix that was reported by many

   - amba-pl011 serial driver fix for RS485 mode

   - stm32 serial driver fix for RS485 mode

  All of these have been in linux-next all week with no reported
  problems"

* tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: amba-pl011: Fix DMA transmission in RS485 mode
  serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485 is enabled
  tty: hvc: Don't enable the RISC-V SBI console by default
2024-02-25 10:35:41 -08:00
Linus Torvalds
1eee4ef38c - Make sure clearing CPU buffers using VERW happens at the latest possible
point in the return-to-userspace path, otherwise memory accesses after
   the VERW execution could cause data to land in CPU buffers again
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXbG7IACgkQEsHwGGHe
 VUoEEg//d1qt/PEWCC23wMO6gLMl4J/e4ZQAuGOKGed/jUmOaQKpHJmpDMRc0li5
 llRYDdfE0ikmtQT3t9vQDs3xbWfT5bLMsijliRimb193FaS1HGlHMMS1nxhfjyfv
 MecbWfkwzX2JnrxJpsbfue+7kks3HyIXYsXV7kSFiHavk4F3GFQXYLO11pKbNQwN
 9UfjJDeVsrcWPGCHhoPKF5NHUnQKIA8ZC6g8yBq894AtdWOhFY7ePKBZefUWQQ1n
 myc5GJ3dKFICMCZvkMABtHYCmHU/W3y/6tPtnrz3kT8GdCIAHG+K9VRUfY1ml94H
 x327GoM3sEzHLsPizKy00/Uao+j6FOtv631LoDLsO2MF3sHoTZDaSgg5y2D/ZC7t
 IZdK3mUGtdINRhGiWWpdxyaMfkQ62cdZk8FkeYkRAewYS6WYSdMX3cPqFNy4Ss5u
 r3reMOD3JcxAatcqhHMXjARMfY+N08gQBpxBul3ejgH8t8aY7xJx6Vggty5kBlHZ
 7urV9jIRxSXfbBmOcYu6HP1ucFLWNSUQCBn7Imrh+5zbE1XVv7NaAWvT4Nmgb0/X
 57fHoYYSVwaJ0k3zWWM7QYEdcuJ7IZnVgTCQYx26Ec2AOQRxE9ose+awTLYtTbp1
 T+XaOlItHKMRzx9K46D7xHwmC5qiokFki3exp5vfGZxGyT3+t/c=
 =n5us
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Make sure clearing CPU buffers using VERW happens at the latest
   possible point in the return-to-userspace path, otherwise memory
   accesses after the VERW execution could cause data to land in CPU
   buffers again

* tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  KVM/VMX: Move VERW closer to VMentry for MDS mitigation
  KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
  x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
  x86/entry_32: Add VERW just before userspace transition
  x86/entry_64: Add VERW just before userspace transition
  x86/bugs: Add asm helpers for executing VERW
2024-02-25 10:22:21 -08:00
Linus Torvalds
8c46ed3740 - Make sure GICv4 always gets initialized to prevent a kexec-ed kernel
from silently failing to set it up
 
 - Do not call bus_get_dev_root() for the mbigen irqchip as it always
   returns NULL - use NULL directly
 
 - Fix hardware interrupt number truncation when assigning MSI interrupts
 
 - Correct sending end-of-interrupt messages to disabled interrupts lines on
   RISC-V PLIC
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXbGgcACgkQEsHwGGHe
 VUqrig//ay2UcLEi8CwxobHXIpuUq+pMt1pLhdDtyehTKx+T44GCwMFXGML8H27A
 8CszKTEJsRxXuUP1iXquECfYqYqGmOZHcIMCX0vDodezRriJXq3m549zdoVY6LIy
 m7x5mN4rfc8xaK/krSz0IKgCn7TZ7Nugw8zHE9PEJ7hj/exIA6EH2f1p0dbDc2z8
 PRWsexi39mVLEstOl7yf5+hys6RN07a+9+PFJrEWCC0bO5We9Z+m7gnpu3zUrwcO
 LlDAU6UwWhVc+xFipW9SFYEhCqtprdfUftf1OW2BLe1TM7pHxdvA3OwlT5ZxxN90
 h4wmQ084v08hcn8YpUkaK5fWEtT+1isD3/8dVUMSRQQ4jcjLiEAVdVOvKOmJNEeJ
 +MYqAktCoyay9ZYCrpZRRIVYfC4/FLMEPExPwILFM6nfMMVEBfkDqFyjGyw3uls7
 QT7eHSo121kQsZc5V/8SuU30f64w/vJbtaZIYOjSR9+hQMQ+i+8cMuV0RSvmDIa2
 1vGdhOLvG5Rk7Wy9xd9ITXbeq+z9KD/tH8BadyfARosvQTrg6aMhhqQRHWJtS6Pf
 Vg50yS6D8ETzNNZSPFABrjBKiqQmEo3ILlUpMbR8jGBaLggqZhTs8eBRj3+XIXp8
 UxgB2b47qKmZI9eaFzne9scnOGmQSRuZ7x8IrwworFGzChSzwpU=
 =cMur
 -----END PGP SIGNATURE-----

Merge tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Make sure GICv4 always gets initialized to prevent a kexec-ed kernel
   from silently failing to set it up

 - Do not call bus_get_dev_root() for the mbigen irqchip as it always
   returns NULL - use NULL directly

 - Fix hardware interrupt number truncation when assigning MSI
   interrupts

 - Correct sending end-of-interrupt messages to disabled interrupts
   lines on RISC-V PLIC

* tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Do not assume vPE tables are preallocated
  irqchip/mbigen: Don't use bus_get_dev_root() to find the parent
  PCI/MSI: Prevent MSI hardware interrupt number truncation
  irqchip/sifive-plic: Enable interrupt if needed before EOI
2024-02-25 10:14:12 -08:00
Linus Torvalds
4ca0d9894f Change since last update:
- Fix page refcount leak when looking up specific inodes
    introduced by metabuf reworking.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmXbQekRHHhpYW5nQGtl
 cm5lbC5vcmcACgkQUXZn5Zlu5qpkyg//cjKnuzI2gHL3ff1o0jiTMD11Y/EmwzQS
 C+JgE8WDMjUlVQpbwYnFWo/6se8qvk/fwTXPcRc45piF8YNZVchwsagxO626Kab7
 0xTX7ZUgjuth6ahrkAJZkJNMgO4mYf928uYB6EVWaTb4iF0iT6glEOsSVAR/cssL
 wYKbtgv3OP9t6fQcN/XL31hRkqr2MPk0y5Q27KyT4zo4lrx3xyah7Ndo3aEK/RcM
 +6FUwqRiDsgDF/Ga65ylDvEp9eA03OFNHBn4DrORe3B9KV75NkmSJf/8QVEceNV/
 9D072Hvt7/iyOq53AxWH3Jvp7aro/i0rvAHPbXZX4RVyqcJxaLYCQyrBFvQL0/Ie
 B+793Iua8zbkQCbZ85LpTGrxAb5WydlSJp10AuHTr2MO+wS8bBqf96Jp9x1MuS9D
 vqq7jbjwuZfnFUjpzu49GF6htG+WRVgY5TDzU6IAr3izXuqVxmz+zw5zExbGMmKm
 2S5lb+q68DOBP4YaelAQHh97k0pYDW3eQ6GhDD2FA4P/DOr8p3vszZFBeaqaL70v
 WS8z1wzOgEaleTRN4iRFlMCTvRLjOtDEaUNquohcHqJLe7w/DF7gzCU+0//8dVjD
 cZieFX8uZDtzzcjrlYWeTo8oHd0q2WYixbCN8P92/YXUFIVpfdl85ugyr9KvkZxd
 jKzpAy2LbSw=
 =/Bfj
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fix from Gao Xiang:

 - Fix page refcount leak when looking up specific inodes
   introduced by metabuf reworking

* tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix refcount on the metabuf used for inode lookup
2024-02-25 09:53:13 -08:00
Linus Torvalds
66a97c2ec9 We still have some races in filesystem methods when exposed to RCU
pathwalk.  This series is a result of code audit (the second round
 of it) and it should deal with most of that stuff.  Exceptions: ntfs3
 ->d_hash()/->d_compare() and ceph_d_revalidate().  Up to maintainers (a
 note for NTFS folks - when documentation says that a method may not block,
 it *does* imply that blocking allocations are to be avoided.  Really).
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZdroDAAKCRBZ7Krx/gZQ
 60dKAQCzp6rYr3ye4nylho9Rzu8LEpH04TuNf3C6JuyUaNHxHwEAvNLatZsyFnmV
 Ksp2Rg/IlKPNtQgYJ8xPxv9DFmNe8gI=
 =47Un
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull RCU pathwalk fixes from Al Viro:
 "We still have some races in filesystem methods when exposed to RCU
  pathwalk. This series is a result of code audit (the second round of
  it) and it should deal with most of that stuff.

  Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
  Up to maintainers (a note for NTFS folks - when documentation says
  that a method may not block, it *does* imply that blocking allocations
  are to be avoided. Really)"

[ More explanations for people who aren't familiar with the vagaries of
  RCU path walking: most of it is hidden from filesystems, but if a
  filesystem actively participates in the low-level path walking it
  needs to make sure the fields involved in that walk are RCU-safe.

  That "actively participate in low-level path walking" includes things
  like having its own ->d_hash()/->d_compare() routines, or by having
  its own directory permission function that doesn't just use the common
  helpers.  Having a ->d_revalidate() function will also have this issue.

  Note that instead of making everything RCU safe you can also choose to
  abort the RCU pathwalk if your operation cannot be done safely under
  RCU, but that obviously comes with a performance penalty. One common
  pattern is to allow the simple cases under RCU, and abort only if you
  need to do something more complicated.

  So not everything needs to be RCU-safe, and things like the inode etc
  that the VFS itself maintains obviously already are. But these fixes
  tend to be about properly RCU-delaying things like ->s_fs_info that
  are maintained by the filesystem and that got potentially released too
  early.   - Linus ]

* tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ext4_get_link(): fix breakage in RCU mode
  cifs_get_link(): bail out in unsafe case
  fuse: fix UAF in rcu pathwalks
  procfs: make freeing proc_fs_info rcu-delayed
  procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
  nfs: fix UAF on pathwalk running into umount
  nfs: make nfs_set_verifier() safe for use in RCU pathwalk
  afs: fix __afs_break_callback() / afs_drop_open_mmap() race
  hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
  exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
  affs: free affs_sb_info with kfree_rcu()
  rcu pathwalk: prevent bogus hard errors from may_lookup()
  fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
2024-02-25 09:29:05 -08:00
Linus Torvalds
9b24349279 A couple of fixes - revert of regression from this cycle
and a fix for erofs failure exit breakage (had been there since
 way back).
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZdrkZAAKCRBZ7Krx/gZQ
 67D8AP0eM68yZvbThA/Hb5iElDh3Aogt1AW/QAu9/alkDVHr+wD+PKqhamC8WXGk
 b1QZ5AOHQFwzkzdF4738fdbujquBWQE=
 =Ra0D
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "A couple of fixes - revert of regression from this cycle and a fix for
  erofs failure exit breakage (had been there since way back)"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  erofs: fix handling kern_mount() failure
  Revert "get rid of DCACHE_GENOCIDE"
2024-02-25 09:17:15 -08:00
Al Viro
9fa8e282c2 ext4_get_link(): fix breakage in RCU mode
1) errors from ext4_getblk() should not be propagated to caller
unless we are really sure that we would've gotten the same error
in non-RCU pathwalk.
2) we leak buffer_heads if ext4_getblk() is successful, but bh is
not uptodate.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:32 -05:00
Al Viro
0511fdb4a3 cifs_get_link(): bail out in unsafe case
->d_revalidate() bails out there, anyway.  It's not enough
to prevent getting into ->get_link() in RCU mode, but that
could happen only in a very contrieved setup.  Not worth
trying to do anything fancy here unless ->d_revalidate()
stops kicking out of RCU mode at least in some cases.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:32 -05:00
Al Viro
053fc4f755 fuse: fix UAF in rcu pathwalks
->permission(), ->get_link() and ->inode_get_acl() might dereference
->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
as well) when called from rcu pathwalk.

Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
and dropping ->user_ns rcu-delayed too.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:32 -05:00
Al Viro
e31f0a57ae procfs: make freeing proc_fs_info rcu-delayed
makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
is still synchronous, but that's not a problem - it does
rcu-delay everything that needs to be)

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:32 -05:00
Al Viro
47458802f6 procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
that keeps both around until struct inode is freed, making access
to them safe from rcu-pathwalk

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:32 -05:00
Al Viro
c1b967d03c nfs: fix UAF on pathwalk running into umount
NFS ->d_revalidate(), ->permission() and ->get_link() need to access
some parts of nfs_server when called in RCU mode:
	server->flags
	server->caps
	*(server->io_stats)
and, worst of all, call
	server->nfs_client->rpc_ops->have_delegation
(the last one - as NFS_PROTO(inode)->have_delegation()).  We really
don't want to RCU-delay the entire nfs_free_server() (it would have
to be done with schedule_work() from RCU callback, since it can't
be made to run from interrupt context), but actual freeing of
nfs_server and ->io_stats can be done via call_rcu() just fine.
nfs_client part is handled simply by making nfs_free_client() use
kfree_rcu().

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:32 -05:00
Al Viro
10a973fc4f nfs: make nfs_set_verifier() safe for use in RCU pathwalk
nfs_set_verifier() relies upon dentry being pinned; if that's
the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
that ->d_parent points to a positive dentry.  For something
we'd run into in RCU mode that is *not* true - dentry might've
been through dentry_kill() just as we grabbed ->d_lock, with
its parent going through the same just as we get to into
nfs_set_verifier_locked().  It might get to detaching inode
(and zeroing ->d_inode) before nfs_set_verifier_locked() gets
to fetching that; we get an oops as the result.

That can happen in nfs{,4} ->d_revalidate(); the call chain in
question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
We have checked that the parent had been positive, but that's
done before we get to nfs_set_verifier() and it's possible for
memory pressure to pick our dentry as eviction candidate by that
time.  If that happens, back-to-back attempts to kill dentry and
its parent are quite normal.  Sure, in case of eviction we'll
fail the ->d_seq check in the caller, but we need to survive
until we return there...

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:31 -05:00
Al Viro
275655d320 afs: fix __afs_break_callback() / afs_drop_open_mmap() race
In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
do queue_work(&vnode->cb_work).  In afs_drop_open_mmap() we decrement
->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.

The trouble is, there's nothing to prevent __afs_break_callback() from
seeing ->cb_nr_mmap before the decrement and do queue_work() after both
the decrement and flush_work().  If that happens, we might be in trouble -
vnode might get freed before the queued work runs.

__afs_break_callback() is always done under ->cb_lock, so let's make
sure that ->cb_nr_mmap can change from non-zero to zero while holding
->cb_lock (the spinlock component of it - it's a seqlock and we don't
need to mess with the counter).

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:31 -05:00
Al Viro
af072cf683 hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
->d_hash() and ->d_compare() use those, so we need to delay freeing
them.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:31 -05:00
Al Viro
a13d1a4de3 exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
that is in process of getting shut down.

Besides, having nls and upcase table cleanup moved from ->put_super() towards
the place where sbi is freed makes for simpler failure exits.

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:31 -05:00
Al Viro
529f89a9e4 affs: free affs_sb_info with kfree_rcu()
one of the flags in it is used by ->d_hash()/->d_compare()

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:31 -05:00
Al Viro
cdb67fdeed rcu pathwalk: prevent bogus hard errors from may_lookup()
If lazy call of ->permission() returns a hard error, check that
try_to_unlazy() succeeds before returning it.  That both makes
life easier for ->permission() instances and closes the race
in ENOTDIR handling - it is possible that positive d_can_lookup()
seen in link_path_walk() applies to the state *after* unlink() +
mkdir(), while nd->inode matches the state prior to that.

Normally seeing e.g. EACCES from permission check in rcu pathwalk
means that with some timings non-rcu pathwalk would've run into
the same; however, running into a non-executable regular file
in the middle of a pathname would not get to permission check -
it would fail with ENOTDIR instead.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:31 -05:00
Al Viro
583340de1d fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
Avoids fun races in RCU pathwalk...  Same goes for freeing LSM shite
hanging off super_block's arse.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:31 -05:00
Kent Overstreet
c4333eb541 bcachefs: Fix check_snapshot() memcpy
check_snapshot() copies the bch_snapshot to a temporary to easily handle
older versions that don't have all the fields of the current version,
but it lacked a min() to correctly handle keys newer and larger than the
current version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24 20:47:47 -05:00
Kent Overstreet
097471f9e4 bcachefs: Fix bch2_journal_flush_device_pins()
If a journal write errored, the list of devices it was written to could
be empty - we're not supposed to mark an empty replicas list.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24 20:46:48 -05:00
Brian Foster
b58b1b883b bcachefs: fix iov_iter count underflow on sub-block dio read
bch2_direct_IO_read() checks the request offset and size for sector
alignment and then falls through to a couple calculations to shrink
the size of the request based on the inode size. The problem is that
these checks round up to the fs block size, which runs the risk of
underflowing iter->count if the block size happens to be large
enough. This is triggered by fstest generic/361 with a 4k block
size, which subsequently leads to a crash. To avoid this crash,
check that the shorten length doesn't exceed the overall length of
the iter.

Fixes:
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24 20:45:24 -05:00
Kent Overstreet
204f45140f bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the
keyspace where no keys are visible in the current snapshot, we have a
problem - we'll scan for a very long time before scanning terminates.

Awhile back, this was fixed for most cases with peek_upto() (and
assertions that enforce that it's being used).

But the fix missed the fact that the inodes btree is different - every
key offset is in a different snapshot tree, not just the inode field.

Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24 20:41:46 -05:00
Kent Overstreet
04fee68dd9 bcachefs: Kill __GFP_NOFAIL in buffered read path
Recently, we fixed our __GFP_NOFAIL usage in the readahead path, but the
easy one in read_single_folio() (where wa can return an error) was
missed - oops.

Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24 20:41:42 -05:00
Kent Overstreet
1f626223a0 bcachefs: fix backpointer_to_text() when dev does not exist
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24 20:41:37 -05:00
Linus Torvalds
ab0a97cffa powerpc fixes for 6.8 #4
- Fix a crash when hot adding a PCI device to an LPAR since recent changes.
 
  - Fix nested KVM level-2 guest reboot failure due to empty 'arch_compat'.
 
 Thanks to: Amit Machhiwal, Aneesh Kumar K.V (IBM), Brian King, Gaurav Batra,
 Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmXaf8kTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJUyEACK+gmbPJxVwRnt60e7l8SJrjaXE9ZH
 llI2ffSM63KOL0qeJlWj8EfNcmU3ihCnMpVS4VHAIvEvCq8KBdcHh7EfLmIjMSl2
 oHfcb+dpdnWpQUkI2jci6ptxv8Al7hmd5kP6X5TGn0XJaXWatW4tj7QDP574DCZY
 UfbHGvGnCVmlcxm3xhn7JzJT4Lgj9ehxgZEMHB68qfwU2rJgUXNITBzOsGwfq4c0
 wiIdzVGMuJQdaFLt0JLNP2KY8kpVsAeKRYgSsLrkDkp+tKOIPvgeBmC221pOb1Y4
 iZbpXR3h06WfFBnA2PJpdfuacN55/gF0U5BnyOj5tB7CTx65oTyX/BJLIocKWAlF
 hRRPVEYqBH0mZCHXn03+J/A2/iut1XxDlVMpEhjjZjb57CTYCApSpHer6shPw6PV
 QqG+iAr7lBCZ70qS2m2+mjPhgDf5TNlxlsip7S3xJJpJvScN/JyPDupDF7tQYuHD
 GjIOm8gSHL0i3F4L3NAZ9FtPRwxXcv85sWCTbE7jsfVOT/kkDGTwhx/vdPIR0la8
 agDv/Q0FyyFHWd5IS4oqXFop6P3boulHexdIec/MIhBrPSZ6oG0ySzj3rqdI7VYh
 7BYj/cHerKONR/BU6S3vKddh9RDD9Fe5ghEAgng2DxG+Za/m/M2v/76123/ZeZuw
 qQ8aiIvYOJ4DGA==
 =+tcX
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a crash when hot adding a PCI device to an LPAR since
   recent changes

 - Fix nested KVM level-2 guest reboot failure due to empty
   'arch_compat'

Thanks to Amit Machhiwal, Aneesh Kumar K.V (IBM), Brian King, Gaurav
Batra, and Vaibhav Jain.

* tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
  powerpc/pseries/iommu: DLPAR add doesn't completely initialize pci_controller
2024-02-24 16:49:51 -08:00
Linus Torvalds
91403d50e9 IOMMU Fixes for Linux v6.8-rc5
Including:
 
 	- Intel VT-d fixes for nested domain handling:
 	  - Cache invalidation for changes in a parent domain
 	  - Dirty tracking setting for parent and nested domains
 	  - Fix a constant-out-of-range warning
 
 	- ARM SMMU fixes:
 	  - Fix CD allocation from atomic context when using SVA with SMMUv3
 	  - Revert the conversion of SMMUv2 to domain_alloc_paging(), as it
 	    breaks the boot for Qualcomm MSM8996 devices
 
 	- Restore SVA handle sharing in core code as it turned out there are
 	  still drivers relying on it
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmXaZ/QACgkQK/BELZcB
 GuPTaQ/9GJTAmfIktgPWqAgUOUa6fWpEsmChwVgpMDxw4fyQsJrKZ/YQfX8Vy17S
 TV9EyCFGALRdcE+VX41KtrvG5MWY5cHa4BlJ6/KaD3tilhViTsACN7JkGiWnlEIs
 4t6YefoaMvgdaJ4nqysFlHtlaVKHObTi6toyyCCHIywOcei2YqX3mqTIiPzUyflj
 dqx0HwBG5uz6q00JNbHVQHeMc8rIEvT61oMssQUMNt8KPvdNJl9OrZSRvXimHABU
 Vh0nMKYLAqHp40IvoXScA9Aj/DWTwE2346/Xpd6hnZ/yJvBlm6YQWKpJtjJz3z2Y
 ZnK+cmFPAaC0EE7dlEpN7hcwtEqumw1K+CJ4s8rfnNY5IdcY0DIRAVxCLsh2YH15
 rcupp3iNJMR3JeVUYHKe+mEHcYSyC9SABw01aq9NdEu1LRfXjLRrYFVh3yovpshV
 abXssThTFWQTfTvUs2Vt9zjVIinST3ogki+mxyfgqTuKYj9GY8P3eqYRFrdxGJnI
 mtwu0ByRwVpdNUBUWsOk+3sSdvPBsgd/Fchr8Gmpl5W2cAjPOetYn0th3hksG7tn
 9qyUDwTHDaQsXEONFwS44eP9KlccUUvvLhFHCBghDm5A6i/aDdnz6d1CdfPOdb4y
 DwOQem4AiOmeTBGEZA6Z/ZSKsqZGmpZditBveki0ImauxrUZXMU=
 =aVOB
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Intel VT-d fixes for nested domain handling:

      - Cache invalidation for changes in a parent domain

      - Dirty tracking setting for parent and nested domains

      - Fix a constant-out-of-range warning

 - ARM SMMU fixes:

      - Fix CD allocation from atomic context when using SVA with SMMUv3

      - Revert the conversion of SMMUv2 to domain_alloc_paging(), as it
        breaks the boot for Qualcomm MSM8996 devices

 - Restore SVA handle sharing in core code as it turned out there are
   still drivers relying on it

* tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/sva: Restore SVA handle sharing
  iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock
  iommu/vt-d: Fix constant-out-of-range warning
  iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking
  iommu/vt-d: Add missing dirty tracking set for parent domain
  iommu/vt-d: Wrap the dirty tracking loop to be a helper
  iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()
  iommu/vt-d: Add missing device iotlb flush for parent domain
  iommu/vt-d: Update iotlb in nested domain attach
  iommu/vt-d: Add missing iotlb flush for parent domain
  iommu/vt-d: Add __iommu_flush_iotlb_psi()
  iommu/vt-d: Track nested domains in parent
  Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
2024-02-24 15:59:26 -08:00
Linus Torvalds
ac389bc0ca cxl fixes for 6.8-rc6
- Fix NUMA initialization from ACPI CEDT.CFMWS
 
 - Fix region assembly failures due to async init order
 
 - Fix / simplify export of qos_class information
 
 - Fix cxl_acpi initialization vs single-window-init failures
 
 - Fix handling of repeated 'pci_channel_io_frozen' notifications
 
 - Workaround platforms that violate host-physical-address ==
   system-physical address assumptions
 
 - Defer CXL CPER notification handling to v6.9
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZdpH9gAKCRDfioYZHlFs
 ZwZlAQDE+PxTJnjCXDVnDylVF4yeJF2G/wSkH1CFVFVxa0OjhAD/ZFScS/nz/76l
 1IYYiiLqmVO5DdmJtfKtq16m7e1cZwc=
 =PuPF
 -----END PGP SIGNATURE-----

Merge tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dan Williams:
 "A collection of significant fixes for the CXL subsystem.

  The largest change in this set, that bordered on "new development", is
  the fix for the fact that the location of the new qos_class attribute
  did not match the Documentation. The fix ends up deleting more code
  than it added, and it has a new unit test to backstop basic errors in
  this interface going forward. So the "red-diff" and unit test saved
  the "rip it out and try again" response.

  In contrast, the new notification path for firmware reported CXL
  errors (CXL CPER notifications) has a locking context bug that can not
  be fixed with a red-diff. Given where the release cycle stands, it is
  not comfortable to squeeze in that fix in these waning days. So, that
  receives the "back it out and try again later" treatment.

  There is a regression fix in the code that establishes memory NUMA
  nodes for platform CXL regions. That has an ack from x86 folks. There
  are a couple more fixups for Linux to understand (reassemble) CXL
  regions instantiated by platform firmware. The policy around platforms
  that do not match host-physical-address with system-physical-address
  (i.e. systems that have an address translation mechanism between the
  address range reported in the ACPI CEDT.CFMWS and endpoint decoders)
  has been softened to abort driver load rather than teardown the memory
  range (can cause system hangs). Lastly, there is a robustness /
  regression fix for cases where the driver would previously continue in
  the face of error, and a fixup for PCI error notification handling.

  Summary:

   - Fix NUMA initialization from ACPI CEDT.CFMWS

   - Fix region assembly failures due to async init order

   - Fix / simplify export of qos_class information

   - Fix cxl_acpi initialization vs single-window-init failures

   - Fix handling of repeated 'pci_channel_io_frozen' notifications

   - Workaround platforms that violate host-physical-address ==
     system-physical address assumptions

   - Defer CXL CPER notification handling to v6.9"

* tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/acpi: Fix load failures due to single window creation failure
  acpi/ghes: Remove CXL CPER notifications
  cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
  cxl/test: Add support for qos_class checking
  cxl: Fix sysfs export of qos_class for memdev
  cxl: Remove unnecessary type cast in cxl_qos_class_verify()
  cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
  cxl/region: Allow out of order assembly of autodiscovered regions
  cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
  x86/numa: Fix the sort compare func used in numa_fill_memblks()
  x86/numa: Fix the address overlap check in numa_fill_memblks()
  cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
2024-02-24 15:53:40 -08:00