1237378 Commits

Author SHA1 Message Date
Kirill A. Shutemov
fd37721803 mm, treewide: introduce NR_PAGE_ORDERS
NR_PAGE_ORDERS defines the number of page orders supported by the page
allocator, ranging from 0 to MAX_ORDER, MAX_ORDER + 1 in total.

NR_PAGE_ORDERS assists in defining arrays of page orders and allows for
more natural iteration over them.

[kirill.shutemov@linux.intel.com: fixup for kerneldoc warning]
  Link: https://lkml.kernel.org/r/20240101111512.7empzyifq7kxtzk3@box
Link: https://lkml.kernel.org/r/20231228144704.14033-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-08 15:27:15 -08:00
Linus Torvalds
fc5e5c5923 - Replace the paravirt patching functionality using the alternatives
infrastructure and remove the former
 
 - Misc other improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWbIJ0ACgkQEsHwGGHe
 VUrOSg//duslGRzbLkrjijYmlb9C9yjdf+7611biRNExxlOUbQvgMX8O5/qnvfhk
 ddqcyRJlg6GeooMI3ch25KRqdUbcgI5Zq1ftG7hFiX94gIiWhiHibYxtmnNGsDuy
 sq4R3nxqZxpTh2Seajh9AfXS4uINu1wsedLxuSyNiCBk2/BB7zqC3aPCyAtjJDdT
 tpPN+dx5s/rmPkdCcdFxwAgSxsohLjZgktdZIUCbVM+blbghDgPV1wjhCUimxUYx
 2woqS0zgOZb6eRcZgdXkuaNvOqODOwIZo3iebr6XSIb49YCC/KhuUlDDr/ZZGV3a
 CeMR42sa2CEdl0YHsnHEm8LswpUVqL6FBfFnC+TP0bAGRC+L4pNEaS6eQk+ktWf/
 +k6bvuBhuPgZG5KdxgQiJB8wogDxCXqQtI7MNLI0IuBS7GC2hjHb35WB6Qb8i+EB
 ffyIJYhK0shpVzo9H1efosmLfO+i2RPn5YZiC+S27MVqGbYEkW6713hr56vp58mK
 UaxXWoTnfRR9O5Hyf0xEf/qjZFO2zpmQRgXIM2ECzLxn7GNvsPtcHsYJmLJyeUXf
 749glVHz8Ue9HsDTCQB35Z9yRT4cVuz2FclwEt51Qlta8QtX8MRaL/l+F3FOrHoo
 CPgo/fzSav3Z/XpQpjbC7vI886azQivuX5i5J9LhO5IDc8O3OMs=
 =Q5U5
 -----END PGP SIGNATURE-----

Merge tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 paravirt updates from Borislav Petkov:

 - Replace the paravirt patching functionality using the alternatives
   infrastructure and remove the former

 - Misc other improvements

* tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternative: Correct feature bit debug output
  x86/paravirt: Remove no longer needed paravirt patching code
  x86/paravirt: Switch mixed paravirt/alternative calls to alternatives
  x86/alternative: Add indirect call patching
  x86/paravirt: Move some functions and defines to alternative.c
  x86/paravirt: Introduce ALT_NOT_XEN
  x86/paravirt: Make the struct paravirt_patch_site packed
  x86/paravirt: Use relative reference for the original instruction offset
2024-01-08 13:41:42 -08:00
Linus Torvalds
41a80ca4ae - Add an informational message which gets issued when IA32 emulation has
been disabled on the cmdline
 
 - Clarify in detail how /proc/cpuinfo is used on x86
 
 - Fix a theoretical overflow in num_digits()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWZbXYACgkQEsHwGGHe
 VUpvDhAAmyvuYTVxWlCwHVfAfsaP9CczhOVTKy72fAIyTAa93if0pzzgNKd83k3f
 V1g3e2azk9VBGiRqDrexz3jJ6lKgQ4OUSWC3p+ywI5PuFmOWDZt3Xj4tUb6wTitA
 5NuBBue/K1tm5AIXHRRX4b0yGCNQnp1nuuDeSci8R/Y8/41+6S1dxzPd4okVoegj
 1Fjkn6l3gfTjW11xXP+OHP758xOvsbO1vFpyQFH+i9gHBructV4AN0UpsIFBOOnX
 ySaVL5w2bd5bVyRoVcJzVuvBvOnRwyLrTDzOmSqn57xnCL1Yc/YvBU9voLjo99XX
 GUQRd/ezfwOiKjf4EcomZZDnL3yEDyEm9gcmRvTYCq0OBxEaI0TEtmsF87eQig3e
 xe4qbiiFGRbTNb7VjxqbELmXgELE8+euv7pk6NgScA2DZP36H1SRDKujU7jIiwBM
 pKYJZwyTMC9JkJ+u9dqK0vHPihLBowFlXwKunuhCmk5iTmpLtXDo5ItesI29P/6Z
 viuu4ja07/7t91BEXwWaJjnVlsqfJNY28g5NyPNUhwXBMWEV7bHApUIn4XaRjkj0
 wGzjD482+1TkfGHe5uIjM8dY9/+xJY/WIAO22liU4oUbGSmR/tFCwM6ZC4XeJfnP
 Q5aO9tcQBIrpIZMGMNd+eBvX2AnLFZ80l0iOHPayWBUoyhS8Wm8=
 =lzR+
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

 - Add an informational message which gets issued when IA32 emulation
   has been disabled on the cmdline

 - Clarify in detail how /proc/cpuinfo is used on x86

 - Fix a theoretical overflow in num_digits()

* tag 'x86_misc_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ia32: State that IA32 emulation is disabled
  Documentation/x86: Document what /proc/cpuinfo is for
  x86/lib: Fix overflow when counting digits
2024-01-08 13:27:43 -08:00
Linus Torvalds
6e0b939180 - Correct minor issues after the microcode revision reporting sanitization
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWZaMsACgkQEsHwGGHe
 VUpiig//cDmMbajaeInAVVEilmJnw+YYxKCpvgOGdUnL+dAxBJ2SehVeEGZLiMRs
 79o5X2In+tdTjXRH1oTxWJWSMx9MW4aGbDbgZDAJoPigdZYkFoWE2p81t5n11LNt
 xjMNcnGMbns1aNCdBqGlg0baQ/xJavdFPndziq7kUMS/f3QLRm8Jp7MA+zUkDgCf
 DPuJcyFVYq7ffmc9VuVEsrh6yNREOj88Ek1tatDobbOd6LA/a3N/aOjosEwCS9W1
 qBVcMSaNt34ySzHV4sd/Uw8Dj3LXmxe2a51O+nsy+AjjkmHPYU9F0jSK4UqZgI3X
 +NPl9o8eNz8ACnK9oF/OLogoK3R9ED8vsok0r8OdvziSpi0XXaAJFGJCIYYcwJWb
 5aJTVtwKCoOhdV80NIcwjY/QJLTji4LKK46w+HaHy8wNQhAWY3Kt3P4XAn7ERTh6
 vBjeeFQtJLmE+KKYcNC90VGf/efztN9AdtYxwdb8JkkFZwOyc7M2MPV74YvQI0iz
 O0FCf2PWbQ0hKXwsVvfwn5bZoNXyIcvRc+rGpYi0NaGaiUcq53JY00NateOyhG1Z
 ebN30AWeivqW4Mwz3izKddmeOwniWpCbbJHR8E3KGZH/Uy6S9F6UQY14y0NhucKq
 RWTwFs5+YWB6rsmJRtUPgkX18hRVl5AjURcBXK7WbR0cONntXHs=
 =0Ewr
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode updates from Borislav Petkov:

 - Correct minor issues after the microcode revision reporting
   sanitization

* tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Set new revision only after a successful update
  x86/microcode/intel: Remove redundant microcode late updated message
2024-01-08 13:24:30 -08:00
Linus Torvalds
1dee7f509d - The EDAC drivers part of the effort to make the ->remove() platform
driver callback return void
 
 - Add support for AMD AI accelerators
 
 - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P,
   Meteor Lake-{P,PS}
 
 - Random fixes and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWZRIAACgkQEsHwGGHe
 VUpesxAAux/mgyqJ4lfzpP4I3LkDA7X9uU3Iv6j9NtS64TGvulT4ZMiSL/U/nqTY
 /jsJYNJE4yIXm2Bf/T3zx+Dxh6MmUZNEYzSgZyyNx9NPEkHi4gX+b9F2nUx9i66s
 D0Pn9IVbM1liNfGr2RSSsKwFa/5UBg+58nGF1DSUz+rOaQ0KOoeta9WE98ne4n81
 fC0o4RCzcPswy3NVhi3wPoBiAfyMYmrtvBcSezIbV6CaVJB8aFkvZK4xGKkEBFoJ
 ApVrSr6bEjJ7I7DdudisfIhLfU6iji7ZGfkwiouN2TIfVCaOZdAVan/OETpPLC6b
 IPNac1iIIl6v2OfxQp6s6b/Vq+dL+vVa4SW5tNj242s9eHunuevEcly37x4W254m
 vt91K9dvZI3I5DzAbVagw3Z9tFHbIH7uiSYyjMoyYw11EvwY+bZPXfZExvEUuPQK
 2f+VL4ELlBj0imrWSCur5hF1g8R2ejTjb+kYlTcULyaXB4ANeBVzeShff3qc/21g
 3QojVA7v+7xkPwURKxRwlqkwE113cz0Rr/kKjXhXnBOuPulX0T5eSzX8WExZzSCT
 4FsEvAwZPFt7iCTxB+cspykNxyxLqzMb8GZFG+Ji8szGNbVgiidn/QEQzohs8yBu
 aFenuLEM7XwvpPCjmLGGnRkFV9mBDfPATLegTFvMpPQVYhjPdp4=
 =Bptq
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - The EDAC drivers part of the effort to make the ->remove() platform
   driver callback return void

 - Add support for AMD AI accelerators

 - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P,
   Meteor Lake-{P,PS}

 - Random fixes and cleanups all over the place

* tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (39 commits)
  EDAC/skx_common: Filter out the invalid address
  EDAC, pnd2: Sort headers alphabetically
  EDAC, pnd2: Correct misleading error message in mk_region_mask()
  EDAC, pnd2: Apply bit macros and helpers where it makes sense
  EDAC, pnd2: Replace custom definition by one from sizes.h
  EDAC/igen6: Add Intel Meteor Lake-P SoCs support
  EDAC/igen6: Add Intel Meteor Lake-PS SoCs support
  EDAC/igen6: Add Intel Raptor Lake-P SoCs support
  EDAC/igen6: Add Intel Alder Lake-N SoCs support
  EDAC/igen6: Make get_mchbar() helper function
  EDAC/amd64: Add support for family 0x19, models 0x90-9f devices
  EDAC/mc: Add support for HBM3 memory type
  EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer
  EDAC/armada_xp: Explicitly include correct DT includes
  EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals
  EDAC/thunderx: Fix possible out-of-bounds string access
  EDAC/fsl_ddr: Convert to platform remove callback returning void
  EDAC/zynqmp: Convert to platform remove callback returning void
  EDAC/xgene: Convert to platform remove callback returning void
  EDAC/ti: Convert to platform remove callback returning void
  ...
2024-01-08 13:22:30 -08:00
Linus Torvalds
5db8752c3b vfs-6.8.iov_iter
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUzBQAKCRCRxhvAZXjc
 ot+3AQCZw1PBD4azVxFMWH76qwlAGoVIFug4+ogKU/iUa4VLygEA2FJh1vLJw5iI
 LpgBEIUTPVkwtzinAW94iJJo1Vr7NAI=
 =p6PB
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iov_iter cleanups from Christian Brauner:
 "This contains a minor cleanup. The patches drop an unused argument
  from import_single_range() allowing to replace import_single_range()
  with import_ubuf() and dropping import_single_range() completely"

* tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: replace import_single_range() with import_ubuf()
  iov_iter: remove unused 'iov' argument from import_single_range()
2024-01-08 11:43:04 -08:00
Linus Torvalds
26458409a9 vfs-6.8.cachefiles
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZU0egAKCRCRxhvAZXjc
 onAqAP9s2ohvjE4QE2ad7svXOzNWKesGcyDyoEBwBpt3Yq8hvAEA+J4xiaMBlRAg
 FmBobDwtcvOzxL1q+BbB3IsmmuFrRww=
 =ZS18
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs cachefiles updates from Christian Brauner:
 "This contains improvements for on-demand cachefiles.

  If the daemon crashes and the on-demand cachefiles fd is unexpectedly
  closed in-flight requests and subsequent read operations associated
  with the fd will fail with EIO. This causes issues in various
  scenarios as this failure is currently unrecoverable.

  The work contained in this pull request introduces a failover mode and
  enables the daemon to recover in-flight requested-related objects. A
  restarted daemon will be able to process requests as usual.

  This requires that in-flight requests are stored during daemon crash
  or while the daemon is offline. In addition, a handle to
  /dev/cachefiles needs to be stored.

  This can be done by e.g., systemd's fdstore (cf. [1]) which enables
  the restarted daemon to recover state.

  Three new states are introduced in this patchset:

   (1) CLOSE
       Object is closed by the daemon.

   (2) OPEN
       Object is open and ready for processing. IOW, the open request
       has been handled successfully.

   (3) REOPENING
       Object has been previously closed and is now reopened due to a
       read request.

  A restarted daemon can recover the /dev/cachefiles fd from systemd's
  fdstore and writes "restore" to the device. This causes the object
  state to be reset from CLOSE to REOPENING and reinitializes the
  object.

  The daemon may now handle the open request. Any in-flight operations
  are restored and handled avoiding interruptions for users"

Link: https://systemd.io/FILE_DESCRIPTOR_STORE [1]

* tag 'vfs-6.8.cachefiles' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  cachefiles: add restore command to recover inflight ondemand read requests
  cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode
  cachefiles: resend an open request if the read request's object is closed
  cachefiles: extract ondemand info field from cachefiles_object
  cachefiles: introduce object ondemand state
2024-01-08 11:26:50 -08:00
Linus Torvalds
bb93c5ed45 vfs-6.8.rw
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUzXQAKCRCRxhvAZXjc
 ogOtAQDpqUp1zY4dV/dZisCJ5xarZTsSZ1AvgmcxZBtS0NhbdgEAshWvYGA9ryS/
 ChL5jjtjjZDLhRA//reoFHTQIrdp2w8=
 =bF+R
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs rw updates from Christian Brauner:
 "This contains updates from Amir for read-write backing file helpers
  for stacking filesystems such as overlayfs:

   - Fanotify is currently in the process of introducing pre content
     events. Roughly, a new permission event will be added indicating
     that it is safe to write to the file being accessed. These events
     are used by hierarchical storage managers to e.g., fill the content
     of files on first access.

     During that work we noticed that our current permission checking is
     inconsistent in rw_verify_area() and remap_verify_area().
     Especially in the splice code permission checking is done multiple
     times. For example, one time for the whole range and then again for
     partial ranges inside the iterator.

     In addition, we mostly do permission checking before we call
     file_start_write() except for a few places where we call it after.
     For pre-content events we need such permission checking to be done
     before file_start_write(). So this is a nice reason to clean this
     all up.

     After this series, all permission checking is done before
     file_start_write().

     As part of this cleanup we also massaged the splice code a bit. We
     got rid of a few helpers because we are alredy drowning in special
     read-write helpers. We also cleaned up the return types for splice
     helpers.

   - Introduce generic read-write helpers for backing files. This lifts
     some overlayfs code to common code so it can be used by the FUSE
     passthrough work coming in over the next cycles. Make Amir and
     Miklos the maintainers for this new subsystem of the vfs"

* tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: fix __sb_write_started() kerneldoc formatting
  fs: factor out backing_file_mmap() helper
  fs: factor out backing_file_splice_{read,write}() helpers
  fs: factor out backing_file_{read,write}_iter() helpers
  fs: prepare for stackable filesystems backing file helpers
  fsnotify: optionally pass access range in file permission hooks
  fsnotify: assert that file_start_write() is not held in permission hooks
  fsnotify: split fsnotify_perm() into two hooks
  fs: use splice_copy_file_range() inline helper
  splice: return type ssize_t from all helpers
  fs: use do_splice_direct() for nfsd/ksmbd server-side-copy
  fs: move file_start_write() into direct_splice_actor()
  fs: fork splice_file_range() from do_splice_direct()
  fs: create {sb,file}_write_not_started() helpers
  fs: create file_write_started() helper
  fs: create __sb_write_started() helper
  fs: move kiocb_start_write() into vfs_iocb_iter_write()
  fs: move permission hook out of do_iter_read()
  fs: move permission hook out of do_iter_write()
  fs: move file_start_write() into vfs_iter_write()
  ...
2024-01-08 11:11:51 -08:00
Linus Torvalds
8c9440fea7 vfs-6.8.mount
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZU0CgAKCRCRxhvAZXjc
 osncAQDSJK0frJL+72NqXxa4YNzivrnuw6fhp5iaDAEqxdm8ygEAoJWyh7Rmkt8G
 drAXWGyGnCYqv7UgC6axLyciid7TxQg=
 =vJuv
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:
 "This contains the work to retrieve detailed information about mounts
  via two new system calls. This is hopefully the beginning of the end
  of the saga that started with fsinfo() years ago.

  The LWN articles in [1] and [2] can serve as a summary so we can avoid
  rehashing everything here.

  At LSFMM in May 2022 we got into a room and agreed on what we want to
  do about fsinfo(). Basically, split it into pieces. This is the first
  part of that agreement. Specifically, it is concerned with retrieving
  information about mounts. So this only concerns the mount information
  retrieval, not the mount table change notification, or the extended
  filesystem specific mount option work. That is separate work.

  Currently mounts have a 32bit id. Mount ids are already in heavy use
  by libmount and other low-level userspace but they can't be relied
  upon because they're recycled very quickly. We agreed that mounts
  should carry a unique 64bit id by which they can be referenced
  directly. This is now implemented as part of this work.

  The new 64bit mount id is exposed in statx() through the new
  STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
  returned. If it is raised and the kernel supports the new 64bit mount
  id the flag is raised in the result mask and the new 64bit mount id is
  returned. New and old mount ids do not overlap so they cannot be
  conflated.

  Two new system calls are introduced that operate on the 64bit mount
  id: statmount() and listmount(). A summary of the api and usage can be
  found on LWN as well (cf. [3]) but of course, I'll provide a summary
  here as well.

  Both system calls rely on struct mnt_id_req. Which is the request
  struct used to pass the 64bit mount id identifying the mount to
  operate on. It is extensible to allow for the addition of new
  parameters and for future use in other apis that make use of mount
  ids.

  statmount() mimicks the semantics of statx() and exposes a set flags
  that userspace may raise in mnt_id_req to request specific information
  to be retrieved. A statmount() call returns a struct statmount filled
  in with information about the requested mount. Supported requests are
  indicated by raising the request flag passed in struct mnt_id_req in
  the @mask argument in struct statmount.

  Currently we do support:

   - STATMOUNT_SB_BASIC:
     Basic filesystem info

   - STATMOUNT_MNT_BASIC
     Mount information (mount id, parent mount id, mount attributes etc)

   - STATMOUNT_PROPAGATE_FROM
     Propagation from what mount in current namespace

   - STATMOUNT_MNT_ROOT
     Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla)

   - STATMOUNT_MNT_POINT
     Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt)

   - STATMOUNT_FS_TYPE
     Name of the filesystem type as the magic number isn't enough due to submounts

  The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE
  are appended to the end of the struct. Userspace can use the offsets
  in @fs_type, @mnt_root, and @mnt_point to reference those strings
  easily.

  The struct statmount reserves quite a bit of space currently for
  future extensibility. This isn't really a problem and if this bothers
  us we can just send a follow-up pull request during this cycle.

  listmount() is given a 64bit mount id via mnt_id_req just as
  statmount(). It takes a buffer and a size to return an array of the
  64bit ids of the child mounts of the requested mount. Userspace can
  thus choose to either retrieve child mounts for a mount in batches or
  iterate through the child mounts. For most use-cases it will be
  sufficient to just leave space for a few child mounts. But for big
  mount tables having an iterator is really helpful. Iterating through a
  mount table works by setting @param in mnt_id_req to the mount id of
  the last child mount retrieved in the previous listmount() call"

Link: https://lwn.net/Articles/934469 [1]
Link: https://lwn.net/Articles/829212 [2]
Link: https://lwn.net/Articles/950569 [3]

* tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  add selftest for statmount/listmount
  fs: keep struct mnt_id_req extensible
  wire up syscalls for statmount/listmount
  add listmount(2) syscall
  statmount: simplify string option retrieval
  statmount: simplify numeric option retrieval
  add statmount(2) syscall
  namespace: extract show_path() helper
  mounts: keep list of mounts in an rbtree
  add unique mount ID
2024-01-08 10:57:34 -08:00
Linus Torvalds
3f6984e730 vfs-6.8.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUx4wAKCRCRxhvAZXjc
 osaNAQC/c+xXVfiq/pFbuK9MQLna4RGZaGcG9k312YniXbHq0AD9HAf4aPcZwPy1
 /wkD4pauj3UZ3f0xBSyazGBvAXyN0Qc=
 =iFAQ
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs super updates from Christian Brauner:
 "This contains the super work for this cycle including the long-awaited
  series by Jan to make it possible to prevent writing to mounted block
  devices:

   - Writing to mounted devices is dangerous and can lead to filesystem
     corruption as well as crashes. Furthermore syzbot comes with more
     and more involved examples how to corrupt block device under a
     mounted filesystem leading to kernel crashes and reports we can do
     nothing about. Add tracking of writers to each block device and a
     kernel cmdline argument which controls whether other writeable
     opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
     allowed.

     Note that this effectively only prevents modification of the
     particular block device's page cache by other writers. The actual
     device content can still be modified by other means - e.g. by
     issuing direct scsi commands, by doing writes through devices lower
     in the storage stack (e.g. in case loop devices, DM, or MD are
     involved) etc. But blocking direct modifications of the block
     device page cache is enough to give filesystems a chance to perform
     data validation when loading data from the underlying storage and
     thus prevent kernel crashes.

     Syzbot can use this cmdline argument option to avoid uninteresting
     crashes. Also users whose userspace setup does not need writing to
     mounted block devices can set this option for hardening. We expect
     that this will be interesting to quite a few workloads.

     Btrfs is currently opted out of this because they still haven't
     merged patches we require for this to work from three kernel
     releases ago.

   - Reimplement block device freezing and thawing as holder operations
     on the block device.

     This allows us to extend block device freezing to all devices
     associated with a superblock and not just the main device. It also
     allows us to remove get_active_super() and thus another function
     that scans the global list of superblocks.

     Freezing via additional block devices only works if the filesystem
     chooses to use @fs_holder_ops for these additional devices as well.
     That currently only includes ext4 and xfs.

     Earlier releases switched get_tree_bdev() and mount_bdev() to use
     @fs_holder_ops. The remaining nilfs2 open-coded version of
     mount_bdev() has been converted to rely on @fs_holder_ops as well.
     So block device freezing for the main block device will continue to
     work as before.

     There should be no regressions in functionality. The only special
     case is btrfs where block device freezing for the main block device
     never worked because sb->s_bdev isn't set. Block device freezing
     for btrfs can be fixed once they can switch to @fs_holder_ops but
     that can happen whenever they're ready"

* tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
  block: Fix a memory leak in bdev_open_by_dev()
  super: don't bother with WARN_ON_ONCE()
  super: massage wait event mechanism
  ext4: Block writes to journal device
  xfs: Block writes to log device
  fs: Block writes to mounted block devices
  btrfs: Do not restrict writes to btrfs devices
  block: Add config option to not allow writing to mounted devices
  block: Remove blkdev_get_by_*() functions
  bcachefs: Convert to bdev_open_by_path()
  fs: handle freezing from multiple devices
  fs: remove dead check
  nilfs2: simplify device handling
  fs: streamline thaw_super_locked
  ext4: simplify device handling
  xfs: simplify device handling
  fs: simplify setup_bdev_super() calls
  blkdev: comment fs_holder_ops
  porting: document block device freeze and thaw changes
  fs: remove unused helper
  ...
2024-01-08 10:43:51 -08:00
Linus Torvalds
c604110e66 vfs-6.8.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUxRQAKCRCRxhvAZXjc
 ov/QAQDzvge3oQ9MEymmOiyzzcF+HhAXBr+9oEsYJjFc1p0TsgEA61gXjZo7F1jY
 KBqd6znOZCR+Waj0kIVJRAo/ISRBqQc=
 =0bRl
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual fses.

  Features:

   - Add Jan Kara as VFS reviewer

   - Show correct device and inode numbers in proc/<pid>/maps for vma
     files on stacked filesystems. This is now easily doable thanks to
     the backing file work from the last cycles. This comes with
     selftests

  Cleanups:

   - Remove a redundant might_sleep() from wait_on_inode()

   - Initialize pointer with NULL, not 0

   - Clarify comment on access_override_creds()

   - Rework and simplify eventfd_signal() and eventfd_signal_mask()
     helpers

   - Process aio completions in batches to avoid needless wakeups

   - Completely decouple struct mnt_idmap from namespaces. We now only
     keep the actual idmapping around and don't stash references to
     namespaces

   - Reformat maintainer entries to indicate that a given subsystem
     belongs to fs/

   - Simplify fput() for files that were never opened

   - Get rid of various pointless file helpers

   - Rename various file helpers

   - Rename struct file members after SLAB_TYPESAFE_BY_RCU switch from
     last cycle

   - Make relatime_need_update() return bool

   - Use GFP_KERNEL instead of GFP_USER when allocating superblocks

   - Replace deprecated ida_simple_*() calls with their current ida_*()
     counterparts

  Fixes:

   - Fix comments on user namespace id mapping helpers. They aren't
     kernel doc comments so they shouldn't be using /**

   - s/Retuns/Returns/g in various places

   - Add missing parameter documentation on can_move_mount_beneath()

   - Rename i_mapping->private_data to i_mapping->i_private_data

   - Fix a false-positive lockdep warning in pipe_write() for watch
     queues

   - Improve __fget_files_rcu() code generation to improve performance

   - Only notify writer that pipe resizing has finished after setting
     pipe->max_usage otherwise writers are never notified that the pipe
     has been resized and hang

   - Fix some kernel docs in hfsplus

   - s/passs/pass/g in various places

   - Fix kernel docs in ntfs

   - Fix kcalloc() arguments order reported by gcc 14

   - Fix uninitialized value in reiserfs"

* tag 'vfs-6.8.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  reiserfs: fix uninit-value in comp_keys
  watch_queue: fix kcalloc() arguments order
  ntfs: dir.c: fix kernel-doc function parameter warnings
  fs: fix doc comment typo fs tree wide
  selftests/overlayfs: verify device and inode numbers in /proc/pid/maps
  fs/proc: show correct device and inode numbers in /proc/pid/maps
  eventfd: Remove usage of the deprecated ida_simple_xx() API
  fs: super: use GFP_KERNEL instead of GFP_USER for super block allocation
  fs/hfsplus: wrapper.c: fix kernel-doc warnings
  fs: add Jan Kara as reviewer
  fs/inode: Make relatime_need_update return bool
  pipe: wakeup wr_wait after setting max_usage
  file: remove __receive_fd()
  file: stop exposing receive_fd_user()
  fs: replace f_rcuhead with f_task_work
  file: remove pointless wrapper
  file: s/close_fd_get_file()/file_close_fd()/g
  Improve __fget_files_rcu() code generation (and thus __fget_light())
  file: massage cleanup of files that failed to open
  fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
  ...
2024-01-08 10:26:08 -08:00
Dmitry Torokhov
1ab33c0314 asm-generic: make sparse happy with odd-sized put_unaligned_*()
__put_unaligned_be24() and friends use implicit casts to convert
larger-sized data to bytes, which trips sparse truncation warnings when
the argument is a constant:

    CC [M]  drivers/input/touchscreen/hynitron_cstxxx.o
    CHECK   drivers/input/touchscreen/hynitron_cstxxx.c
  drivers/input/touchscreen/hynitron_cstxxx.c: note: in included file (through arch/x86/include/generated/asm/unaligned.h):
  include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (aa01a0 becomes a0)
  include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (aa01 becomes 1)
  include/asm-generic/unaligned.h:119:16: warning: cast truncates bits from constant value (ab00d0 becomes d0)
  include/asm-generic/unaligned.h:120:20: warning: cast truncates bits from constant value (ab00 becomes 0)

To avoid this let's mask off upper bits explicitly, the resulting code
should be exactly the same, but it will keep sparse happy.

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/oe-kbuild-all/202401070147.gqwVulOn-lkp@intel.com/
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-08 09:48:35 -08:00
Rafael J. Wysocki
f1e5e46397 Merge branch 'pm-sleep'
Merge system-wide power management updates for 6.8-rc1:

 - Fix possible deadlocks in the core system-wide PM code that occur if
   device-handling functions cannot be executed asynchronously during
   resune from system-wide suspend (Rafael J. Wysocki).

 - Clean up unnecessary local variable initializations in multiple
   places in the hibernation code (Wang chaodong, Li zeming).

 - Adjust core hibernation code to avoid missing wakeup events that
   occur after saving an image to persistent storage (Chris Feng).

 - Update hibernation code to enforce correct ordering during image
   compression and decompression (Hongchen Zhang).

 - Use kmap_local_page() instead of kmap_atomic() in copy_data_page()
   during hibernation and restore (Chen Haonan).

 - Adjust documentation and code comments to reflect recent task freezer
   changes (Kevin Hao).

 - Repair excess function parameter description warning in the
   hibernation image-saving code (Randy Dunlap).

* pm-sleep:
  PM: sleep: Fix possible deadlocks in core system-wide PM code
  async: Introduce async_schedule_dev_nocall()
  async: Split async_schedule_node_domain()
  PM: hibernate: Repair excess function parameter description warning
  PM: sleep: Remove obsolete comment from unlock_system_sleep()
  Documentation: PM: Adjust freezing-of-tasks.rst to the freezer changes
  PM: hibernate: Use kmap_local_page() in copy_data_page()
  PM: hibernate: Enforce ordering during image compression/decompression
  PM: hibernate: Avoid missing wakeup events during hibernation
  PM: hibernate: Do not initialize error in snapshot_write_next()
  PM: hibernate: Do not initialize error in swap_write_page()
  PM: hibernate: Drop unnecessary local variable initialization
2024-01-08 13:42:48 +01:00
Rafael J. Wysocki
0b055cf441 Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-devfreq'
Merge cpuidle, cpufreq and devfreq updates for 6.8-rc1:

 - Add support for the Sierra Forest, Grand Ridge and Meteorlake SoCs to
   the intel_idle cpuidle driver (Artem Bityutskiy, Zhang Rui).

 - Do not enable interrupts when entering idle in the haltpoll cpuidle
   driver (Borislav Petkov).

 - Add Emerald Rapids support in no-HWP mode to the intel_pstate cpufreq
   driver (Zhenguo Yao).

 - Use EPP values programmed by the platform firmware as balance
   performance ones by default in intel_pstate (Srinivas Pandruvada).

 - Add a missing function return value check to the SCMI cpufreq driver
   to avoid unexpected behavior (Alexandra Diupina).

 - Fix parameter type warning in the armada-8k cpufreq driver (Gregory
   CLEMENT).

 - Rework trans_stat_show() in the devfreq core code to avoid buffer
   overflows (Christian Marangi).

 - Synchronize devfreq_monitor_[start/stop] so as to prevent a timer
   list corruption from occurring when devfreq governors are switched
   frequently (Mukesh Ojha).

* pm-cpuidle:
  cpuidle: haltpoll: Do not enable interrupts when entering idle
  intel_idle: add Sierra Forest SoC support
  intel_idle: add Grand Ridge SoC support
  intel_idle: Add Meteorlake support

* pm-cpufreq:
  cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode
  cpufreq: armada-8k: Fix parameter type warning
  cpufreq: scmi: process the result of devm_of_clk_add_hw_provider()
  cpufreq: intel_pstate: Prioritize firmware-provided balance performance EPP

* pm-devfreq:
  PM / devfreq: Synchronize devfreq_monitor_[start/stop]
  PM / devfreq: Convert to use sysfs_emit_at() API
  PM / devfreq: Fix buffer overflow in trans_stat_show
2024-01-08 13:39:59 +01:00
Rafael J. Wysocki
4ee4ffccc0 OPP updates for 6.8
- Fix _set_required_opps when opp is NULL (Bryan O'Donoghue).
 - ti: Use device_get_match_data() (Rob Herring).
 - Minor cleanups around OPP level and other parts and call
   dev_pm_opp_set_opp() recursively for required OPPs (Viresh Kumar).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmWbxSkACgkQ0rkcPK6B
 EhzVHRAAhd5c0gD8xCeQt1A/RjJCxNp6Dp8wo1+oo6uv7vcTyXrQTsgwj4tfanxc
 wI5nCwKDRAGPg7CRnFzSGX6R+GiMMkZOhIR1icuHYwMrX1DOeC/6wKOIiXIGVoSr
 +vnB3okXIPanBna0TM4Z7JR5G33CW1GagG94H5md/CoWOFznfHWOCssxdFGiJw0K
 O+PKzM5ObQAdXtSXebo1ViH0xvbk0jpT8Ys5yU2qwGpf7gbGHDSLycJlFgWDkofR
 AOOG9jrMmKh7sPdtist5rbE7+mDF2XQVIbdYJF6SVwtYN4ZlHpbFTIEYw6EJjeJQ
 LMhIMpGhnm98TMIFit88F0nkclGVktDc7o+a3hm+ViN0zCDx/ucXSQyq6Ppwv3gX
 4kNBkheV6rFgoYesBD2gQyrL2vQAt8biQOAsg/reVl/jeb0O+/KjkBvF3P3DlFh7
 aKO6H6IWOtYoeU5UGAfKzR+Wl+pfywXaLRhydjqKiFT7BwKLDiW6r/HoOrutMqhJ
 cSNr879JU/NucB3EDy5A6P7ompfPfmKqRH6to6/m+MAfa9sngSZIkN6tczHiNEdn
 Lz+64SAFvEzkp4obPHGwthLWTbpVHIUrtCF8W8QZ7DU/2aJ9ZVCgSQwhAgb6liZS
 EzZoVPIQurzSpvYmbhraLxiTN0NChbRtAJ7N/JLq+dI5OLeABFY=
 =a5D2
 -----END PGP SIGNATURE-----

Merge tag 'opp-updates-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp

Merge OPP (Operating Performance Points) updates for 6.8 from Viresh
Kumar:

"- Fix _set_required_opps when opp is NULL (Bryan O'Donoghue).
 - ti: Use device_get_match_data() (Rob Herring).
 - Minor cleanups around OPP level and other parts and call
   dev_pm_opp_set_opp() recursively for required OPPs (Viresh Kumar)."

* tag 'opp-updates-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  OPP: Rename 'rate_clk_single'
  OPP: Pass rounded rate to _set_opp()
  OPP: Relocate dev_pm_opp_sync_regulators()
  OPP: Move dev_pm_opp_icc_bw to internal opp.h
  OPP: Fix _set_required_opps when opp is NULL
  OPP: The level field is always of unsigned int type
  OPP: Check for invalid OPP in dev_pm_opp_find_level_ceil()
  OPP: Don't set OPP recursively for a parent genpd
  OPP: Call dev_pm_opp_set_opp() for required OPPs
  OPP: Use _set_opp_level() for single genpd case
  OPP: Level zero is valid
  opp: ti: Use device_get_match_data()
2024-01-08 13:17:17 +01:00
Ingo Molnar
cdb3033e19 Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes for the v6.8 merge window
This fix didn't make it upstream in time, pick it up
for the v6.8 merge window.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-08 12:57:28 +01:00
Ingo Molnar
2b9d9e0a9b locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, can still use the lock object after it's unlocked
Clarify the mutex lock lifetime rules a bit more.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231201121808.GL3818@noisy.programming.kicks-ass.net
2024-01-08 09:55:31 +01:00
Linus Torvalds
0dd3ee3112 Linux 6.7 v6.7 2024-01-07 12:18:38 -08:00
Linus Torvalds
52b1853b08 Improve the detection when to run atomic transfer handlers for kernels
with preemption disabled. This removes some false positive splats a
 number of users were seeing if their driver didn't have support for
 atomic transfers. Also, fix a typo in the docs while we are here.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmWZUukPHHdzYUBrZXJu
 ZWwub3JnAAoJEBQN5MwUoCm2XUMP/Rm/6oiH7ASUOKpmLi42a68llEJuBCkK8oPW
 PmO/QmuTc6K1rDE6ehmBgTNeBQDkI7NdrUViu7xel1G0p57q5rZCdxl/BV7qmGjv
 aq46Y3omw06UybqceVlmx8NJe9r7+HXh+qJJ0UXg/+uA9VKd0jaiGo40wQjR6AxA
 Ow92+p6rUcFDGpEdXFRQJnov8p7GnCGRKumA593goju+sJRUVozjz0nifTVqdyA5
 +L3V02j5INx285sAngTPSHSn2vd0arxwumXKYusehfUtytzkI0RmBWebfqxbg5QE
 7WRnqtHe5aR78gr1RWWo/pI91yTtlfslGhj7uFQxcPpkKkyD57Z17N8rxS+u3rwH
 xAwvYG7CiWygee+q2RTDFmz49FNxRCeB3O5cjPp1Imco6MVG5ZOwZjIjn3CkDyvi
 CGQD8vXdiDfRBDSCI25XdORNsTBzrhSBAs412xGo3waflvgowtw076FYpXfNnAmZ
 z79lIlrR+Hg7ynZBs3qcjVOXvaj8rzXiLkfFWrBihGfbzVOTJv9cfetEpYcOcSaw
 Ae/wC3TKhvowdaFr43wYJrIsni93xdsCjHeIwQYozKbD6iiKWymwxzhWRDTM0cjP
 eSifFBoa1mzVqUNzSVzCHFbdKV0nlhZqxpZGrsTKhUm2EU0934sb9hBvn8o1weaR
 WRl0sHoU
 =LZdW
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Improve the detection when to run atomic transfer handlers for kernels
  with preemption disabled. This removes some false positive splats a
  number of users were seeing if their driver didn't have support for
  atomic transfers.

  Also, fix a typo in the docs while we are here"

* tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core: Fix atomic xfer check for non-preempt config
  Documentation/i2c: fix spelling error in i2c-address-translators
2024-01-06 11:35:37 -08:00
Benjamin Bara
a3368e1186 i2c: core: Fix atomic xfer check for non-preempt config
Since commit aa49c90894d0 ("i2c: core: Run atomic i2c xfer when
!preemptible"), the whole reboot/power off sequence on non-preempt kernels
is using atomic i2c xfer, as !preemptible() always results to 1.

During device_shutdown(), the i2c might be used a lot and not all busses
have implemented an atomic xfer handler. This results in a lot of
avoidable noise, like:

[   12.687169] No atomic I2C transfer handler for 'i2c-0'
[   12.692313] WARNING: CPU: 6 PID: 275 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x118
...

Fix this by allowing non-atomic xfer when the interrupts are enabled, as
it was before.

Link: https://lore.kernel.org/r/20231222230106.73f030a5@yea
Link: https://lore.kernel.org/r/20240102150350.3180741-1-mwalle@kernel.org
Link: https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/
Fixes: aa49c90894d0 ("i2c: core: Run atomic i2c xfer when !preemptible")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com>
Tested-by: Michael Walle <mwalle@kernel.org>
Tested-by: Tor Vic <torvic9@mailbox.org>
[wsa: removed a comment which needs more work, code is ok]
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2024-01-06 14:10:10 +01:00
Linus Torvalds
95c8a35f1c 12 hotfixes. 2 are cc:stable and the remainder either address post-6.7
issues or aren't considered necessary for earlier kernel versions.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZhbAwAKCRDdBJ7gKXxA
 jiFeAQD20H8wGT9hbMUYr/PxE4rOxyDXlhnv/mZ0Av3neve4SQD/YlgCFWYpEu8G
 F5rEAtq89UYE13qlgS3o9KjYOPzhtgQ=
 =vZ0P
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-01-05-11-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc mm fixes from Andrew Morton:
 "12 hotfixes.

  Two are cc:stable and the remainder either address post-6.7 issues or
  aren't considered necessary for earlier kernel versions"

* tag 'mm-hotfixes-stable-2024-01-05-11-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()
  mailmap: add entries for Mathieu Othacehe
  MAINTAINERS: change vmware.com addresses to broadcom.com
  arch/mm/fault: fix major fault accounting when retrying under per-VMA lock
  mm/mglru: skip special VMAs in lru_gen_look_around()
  MAINTAINERS: hand over hwpoison maintainership to Miaohe Lin
  MAINTAINERS: remove hugetlb maintainer Mike Kravetz
  mm: fix unmap_mapping_range high bits shift bug
  mm: memcg: fix split queue list crash when large folio migration
  mm: fix arithmetic for max_prop_frac when setting max_ratio
  mm: fix arithmetic for bdi min_ratio
  mm: align larger anonymous mappings on THP boundaries
2024-01-05 13:46:18 -08:00
Linus Torvalds
0d3ac66ed8 nfsd-6.7 fixes:
- Fix another regression in the NFSD administrative API
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmWYCHIACgkQM2qzM29m
 f5c7FRAAoXVRhP/vyua6PcfF8Je0bg83fhvqG8ppbLE6EjFzbMM7fZ9G5MudtY7/
 w4PuDJ8rhQFl8oQTy004qkqV3JfZGo7buCumKoj8IA8m67cQ0WyB6Lz1fSW9BGHq
 cc0PWQR/z88laRvCPViW+jOsLx5f+VvvyEsDUvEYdG6xXtTVhDfNxYQFPc3bRfd0
 Sf5cPBuBbMAlz9PZm/c276y46rkBP3+2ONx+zHz0k/ZR/EgJ0RoxmzZIrAUguCRy
 fAUW0xvrAuTMY6LMb/0YyKH6IOqYoImKWMGjhi0VFFG5AuxbmU7lLVo/S9tUlWMs
 aAcyEtyuVU1422dikF/FRtstswfz64fj+PqIwiq8qjt14WB+BX/tV1tjwAjuI2py
 YVt9T50FkhG42IG+Nkub7SXm6OuinfiUBcAilPMfBnSapT5l6BvVwrSXGIApcF87
 NmQvRqjO4tEYE6/aYsUV/jZGQVYPTspvhv5kzXMQnL/6h/KlhrNLvTF9gTJFs+xM
 ahY7I2nVmXayoLfrweaWX/a1VNMeYLvVOUPqm/DFiWWw51TqkmD8CHQwCsBiR0i9
 iVu10eh0x9gakPBDVFuCLG+BOL+vsUxfgbeAOJdHYC0s78YLRovyFsDTHxRNv7hW
 hDp+eblGUb+v5kNPbEZJKMmWXCwivOHnacc2EJyA1UcGRLL7I24=
 =TKjN
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix another regression in the NFSD administrative API

* tag 'nfsd-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: drop the nfsd_put helper
2024-01-05 13:12:29 -08:00
Linus Torvalds
a4ab2706bb firewire fixes for 6.7-final
This pull request includes a single patch to suppress unexpected system
 reboot in AMD Ryzen machines with PCIe card consisting of Asmedia
 ASM1083/1085 and VT6306/6307/6308. When 1394 OHCI driver for the card
 accesses to a specific register in PCI memory space, the system reboot
 often occurs. The issue affects all versions of Linux kernel as long as
 the 1394 OHCI driver is included. The mechanism of unexpected system
 reboot is not clear, so the driver is changed to avoid the access itself
 when detecting the combination of hardware.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQE66IEYNDXNBPeGKSsLtaWM8LwEwUCZZf7jQAKCRCsLtaWM8Lw
 E7D2AP9JXU29+5HRNDYBonC1tEItRK8auF+wbAOe7HcOjRXoUgEA2ghLbA6I2bot
 iQ/VjH/45TPyo7YAfLw1Kx+vpaysIwU=
 =EGMc
 -----END PGP SIGNATURE-----

Merge tag 'firewire-fixes-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
 "A single patch to suppress unexpected system reboot in AMD Ryzen
  machines with PCIe card consisting of Asmedia ASM1083/1085 and
  VT6306/6307/6308.

  When the 1394 OHCI driver for the card accesses a specific register
  in PCI memory space, the system reboot often occurs.

  The issue affects all versions of Linux kernel as long as the 1394
  OHCI driver is included. The mechanism of unexpected system reboot is
  not clear, so the driver is changed to avoid the access itself when
  detecting the combination of hardware"

* tag 'firewire-fixes-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
2024-01-05 12:26:26 -08:00
Linus Torvalds
6c23529c08 MMC core:
- Fix path when releasing the host by canceling the delayed work
  - Fix pause retune on all RPMB partitions
 
 MMC host:
  - meson-mx-sdhc: Fix HW hang during card initialization
  - sdhci-sprd: Fix eMMC init failure after HW reset
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAmWXzxQXHHVsZi5oYW5z
 c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCnUiA/+Nirp2Jemjr/UqAvs4+yh++Ib
 wRaUtvPaZbGxB1JTPoGG5s5S8WZfz1neSax8xS++/A/PVFtpSDnNtUiY6YIZ4TPm
 E2PkhsCPDkgYfPtikMmzJbRvrh60EhdaWfd4A8BT/SsZnJxJHAwpvc5a3ayxbYxj
 TreNbnzK0JidJxaEtZQpjZ9fSxViMQ8OB04QpiWG2c5tXlgEJDX8R76nv5/4pdTK
 3y3jcy1FztUz0vAkAbQ0qvaP32PlJZ1Yei5mToHLOE0cYgwdtSfgYt89jT1GeePZ
 fiBS6ODURh5npZeS55DWq8lgQRbEzJxuDQcZgYoa85ZO5MDRFOlYdGdyvDB458s4
 HqUvZyO5aMIMwREXDXixynZj4WVde1XYDUGRoWu3N/pjfB/qVIMry+cTSNeGobhk
 yHvS3WCROXG3/wy4Nb5QN7r6UogaIoxITINP5vCgHpxUW8c1tYes/NmL05joeHUT
 61dxsXrIGMGMQaTeUuC6CTqbzhG/xOYLo51K8jVnAKshJsZlGdPj3tIyND4Xo4Mm
 MuVfCtEr7vyMwBjkOF2nDTdj5Kn67jmuNXY9zf6NNN+bgYcWC3zvrRk+OiVzz8mF
 MkjbIadWLJ2J5gpQtjYFpKs4oAUPth0wHiVI9c5mXhVR3qapA9v7kOTXycWUVrfu
 mO5oaJvuBzM3DSlwTJw=
 =SfSL
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Fix releasing the host by canceling the delayed work
   - Fix pause retune on all RPMB partitions

  MMC host:
   - meson-mx-sdhc: Fix HW hang during card initialization
   - sdhci-sprd: Fix eMMC init failure after HW reset"

* tag 'mmc-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-sprd: Fix eMMC init failure after hw reset
  mmc: core: Cancel delayed work before releasing host
  mmc: rpmb: fixes pause retune on all RPMB partitions.
  mmc: meson-mx-sdhc: Fix initialization frozen issue
2024-01-05 12:12:33 -08:00
Linus Torvalds
2b5bd1498d drm fixes for 6.7 final (part 2 - correct version)
amdgpu:
 - DP MST fix
 - SMU 13.0.6 fixes
 - Fix displays on macbooks using vega12
 - Fix VSC and colorimetry on DP/eDP
 
 nouveau:
 - fix deadlock between fence signalling and irq paths
 - fix GSP memory leaks
 - fix GSP leftover debug
 - hide some GSP callback messages
 - fix GSP display disable path
 - fix GSP ACPI interaction
 - handle errors in ctrl messages
 - use errors info to fix DP link training
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmWXdSIACgkQDHTzWXnE
 hr5kOhAAlmRnc42SX802l+pLwP4uxFMySPVRRhrX0bfldwRdE5OnxuC0c86Foi3Z
 31AiQnq+oJJ35TuArwe4ktDehurkOIan3BBzQ3x62tJbx7GQzK69L5XwhRc6Hicd
 jLnMeQzKbCofG5MvnpcWLYlOYmYHjb4lluB1SAtngBRuueXZPBuhGQ+PvVNfK/mk
 1Wq2767P8xwzLBBqlF3j7+2N7iU5t2VEtjlgYYNYQI2wx3/GDVUrSUk64o+5wBAX
 Bw3yZ1lHuSV5Nczd2KKLxUoXAocqpn0GbVscD6TQ2i5m1ZHUMC6Fn1ijZHLSAqD2
 XIICUxOI0ieTREDeA8xTKUHGhz5db9QFfxE7zgZPSKvWeiL4y0gJZ0MxyoPIPTWK
 7WbPzHAQxI3iQeeOqcAbW4juIsXSAuhq6Ex8SP0lZ2O1ZGsLd3FVoy7i/xWr14ui
 LtOCMS9D5oK4aC+ZOo41LZZY4TgdLQ22EBbEnKChQi4Rt7t/++1hj/PmEE6vLigW
 WFk67tqAQFuA1kZfjnIu9/Oft+wMwD7/b29PjTCOdsi7c21gmOSWONjj8Xvd1NbK
 7WhRC5DZX6pgFHjb7jRqYxqa70IL91VzfUqAt63PSNv+/joP8r06Bqoi1tyUfxkJ
 8yPrWleT8f+skE2zkHwsw8UUqR5eKbTek8aeS0Mp5udNQZYB2dg=
 =94nf
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2024-01-05' of git://anongit.freedesktop.org/drm/drm

Pull more drm fixes from Dave Airlie:
 "The amdgpu ones are fairly normal, the one that is a bit large is a
  fix for a newly introduced IP in 6.7 so unlikely to cause regressions.

  The nouveau ones are mostly memory leaks and debugging cleanups from
  the GSP (new nvidia firmware) enablement. There are some GSP changes
  to the message passing code and a subsequent fix for eDP panel turn
  on, that means my laptop can turn on the panel in GSP mode. These are
  fairly low chance of disrupting things since GSP is new in 6.7. The
  final not all in GSP fix is a deadlock seen with i915/nouveau when GSP
  is used where the the fence and irq paths have locking inversions,
  I've pushed some irq enablement out to a workqueue, and this has seen
  some fairly decent testing.

  amdgpu:
   - DP MST fix
   - SMU 13.0.6 fixes
   - fix displays on macbooks using vega12
   - fix VSC and colorimetry on DP/eDP

  nouveau:
   - fix deadlock between fence signalling and irq paths
   - fix GSP memory leaks
   - fix GSP leftover debug
   - hide some GSP callback messages
   - fix GSP display disable path
   - fix GSP ACPI interaction
   - handle errors in ctrl messages
   - use errors info to fix DP link training"

* tag 'drm-fixes-2024-01-05' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau/dp: Honor GSP link training retry timeouts
  nouveau: push event block/allowing out of the fence context
  nouveau/gsp: always free the alloc messages on r535
  nouveau/gsp: don't free ctrl messages on errors
  nouveau/gsp: convert gsp errors to generic errors
  drm/nouveau/gsp: Fix ACPI MXDM/MXDS method invocations
  nouveau/gsp: free userd allocation.
  nouveau/gsp: free acpi object after use
  nouveau: fix disp disabling with GSP
  nouveau/gsp: drop some acpi related debug
  nouveau/gsp: add three notifier callbacks that we see in normal operation (v2)
  drm/amd/pm: Use gpu_metrics_v1_5 for SMUv13.0.6
  drm/amd/pm: Add gpu_metrics_v1_5
  drm/amd/pm: Add mem_busy_percent for GCv9.4.3 apu
  drm/amd/display: Fix sending VSC (+ colorimetry) packets for DP/eDP displays without PSR
  drm/amdgpu: skip gpu_info fw loading on navi12
  drm/amd/display: add nv12 bounding box
  drm/amd/pm: Update metric table for jpeg/vcn data
  drm/amd/pm: Use separate metric table for APU
  drm/amd/display: pbn_div need be updated for hotplug event
2024-01-05 12:02:20 -08:00
Yuntao Wang
6dff315972 crash_core: fix and simplify the logic of crash_exclude_mem_range()
The purpose of crash_exclude_mem_range() is to remove all memory ranges
that overlap with [mstart-mend].  However, the current logic only removes
the first overlapping memory range.

Commit a2e9a95d2190 ("kexec: Improve & fix crash_exclude_mem_range() to
handle overlapping ranges") attempted to address this issue, but it did
not fix all error cases.

Let's fix and simplify the logic of crash_exclude_mem_range().

Link: https://lkml.kernel.org/r/20240102144905.110047-4-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:45:25 -08:00
Yuntao Wang
61bb219f9d x86/crash: use SZ_1M macro instead of hardcoded value
Use SZ_1M macro instead of hardcoded 1<<20 to make code more readable.

Link: https://lkml.kernel.org/r/20240102144905.110047-3-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:45:25 -08:00
Yuntao Wang
83d4a42a91 x86/crash: remove the unused image parameter from prepare_elf_headers()
Patch series "crash: Some cleanups and fixes", v2.

This patchset includes two cleanups and one fix.


This patch (of 3):

The image parameter is no longer in use, remove it.  Also, tidy up the
code formatting.

Link: https://lkml.kernel.org/r/20240102144905.110047-1-ytcoode@gmail.com
Link: https://lkml.kernel.org/r/20240102144905.110047-2-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:45:25 -08:00
Suren Baghdasaryan
a5b7620bab selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
Add a test for UFFDIO_MOVE ioctl operating on a hugepage which has to be
split because destination is marked with MADV_NOHUGEPAGE.  With this we
cover all 3 cases: normal page move, hugepage move, hugepage splitting
before move.

Link: https://lkml.kernel.org/r/20231230025636.2477429-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:49 -08:00
Muhammad Usama Anjum
8c9eea721a selftests/mm: skip test if application doesn't has root privileges
The test depends on writing to nr_hugepages which isn't possible without
root privileges.  So skip the test in this case.

Link: https://lkml.kernel.org/r/20240101083614.1076768-2-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:48 -08:00
Muhammad Usama Anjum
9a21701edc selftests/mm: conform test to TAP format output
Conform the layout, informational and status messages to TAP.  No
functional change is intended other than the layout of output messages.

Link: https://lkml.kernel.org/r/20240101083614.1076768-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:48 -08:00
Muhammad Usama Anjum
84ba3f226c selftests: mm: hugepage-mmap: conform to TAP format output
Conform the layout, informational and status messages to TAP.  No
functional change is intended other than the layout of output messages.

Link: https://lkml.kernel.org/r/20240102053223.2099572-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:48 -08:00
Muhammad Usama Anjum
cb6e7cae18 selftests/mm: gup_test: conform test to TAP format output
Conform the layout, informational and status messages to TAP.  No
functional change is intended other than the layout of output messages.

Link: https://lkml.kernel.org/r/20240102053807.2114200-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:48 -08:00
Muhammad Usama Anjum
e2cfedf4b0 mm/selftests: hugepage-mremap: conform test to TAP format output
Conform the layout, informational and status messages to TAP.  No
functional change is intended other than the layout of output messages.

Link: https://lkml.kernel.org/r/20240102081919.2325570-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:47 -08:00
Li Zhijian
b805ab3c69 mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
Demotion can work well without CONFIG_NUMA_BALANCING.  But the commit
23e9f0138963 ("mm/vmstat: move pgdemote_* to per-node stats") wrongly hid
it behind CONFIG_NUMA_BALANCING.

Fix it by moving them out of CONFIG_NUMA_BALANCING.

Link: https://lkml.kernel.org/r/20231229022651.3229174-1-lizhijian@fujitsu.com
Fixes: 23e9f0138963 ("mm/vmstat: move pgdemote_* to per-node stats")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:47 -08:00
Barry Song
fc8580edba mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
This is the case the "compressed" data is larger than the original data,
it is better to return -ENOSPC which can help zswap record a poor compr
rather than an invalid request.  Then we get more friendly counting for
reject_compress_poor in debugfs.

 bool zswap_store(struct folio *folio)
 {
 	...
 	ret = zpool_malloc(zpool, dlen, gfp, &handle);
 	if (ret == -ENOSPC) {
 		zswap_reject_compress_poor++;
 		goto put_dstmem;
 	}
 	if (ret) {
 		zswap_reject_alloc_fail++;
 		goto put_dstmem;
 	}
 	...
 }

Also, zbud_alloc() and z3fold_alloc() are returning ENOSPC in the same
case, eg

 static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp,
 			unsigned long *handle)
 {
 	...
 	if (!size || (gfp & __GFP_HIGHMEM))
 		return -EINVAL;

 	if (size > PAGE_SIZE)
 		return -ENOSPC;
 	...
 }

Link: https://lkml.kernel.org/r/20231228061802.25280-1-v-songbaohua@oppo.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:47 -08:00
Matthew Wilcox (Oracle)
c701123bd6 mm/memcontrol: remove __mod_lruvec_page_state()
There are no more callers of __mod_lruvec_page_state(), so convert the
implementation to __lruvec_stat_mod_folio(), removing two calls to
compound_head() (one explicit, one hidden inside page_memcg()).

Link: https://lkml.kernel.org/r/20231228085748.1083901-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:47 -08:00
Matthew Wilcox (Oracle)
b54d60b18e mm/khugepaged: use a folio more in collapse_file()
This function is not yet fully converted to the folio API, but this
removes a few uses of old APIs.

Link: https://lkml.kernel.org/r/20231228085748.1083901-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:46 -08:00
Matthew Wilcox (Oracle)
82feeaa009 slub: use a folio in __kmalloc_large_node
Mirror the code in free_large_kmalloc() and alloc_pages_node() and use a
folio directly.  Avoid the use of folio_alloc() as that will set up an
rmappable folio which we do not want here.

Link: https://lkml.kernel.org/r/20231228085748.1083901-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:46 -08:00
Matthew Wilcox (Oracle)
2443fb5bec slub: use folio APIs in free_large_kmalloc()
Save a few calls to compound_head() by using the folio APIs directly.

Link: https://lkml.kernel.org/r/20231228085748.1083901-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:46 -08:00
Matthew Wilcox (Oracle)
8014c46ad9 slub: use alloc_pages_node() in alloc_slab_page()
For no apparent reason, we were open-coding alloc_pages_node() in this
function.

Link: https://lkml.kernel.org/r/20231228085748.1083901-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:46 -08:00
Matthew Wilcox (Oracle)
e435ca8788 mm: remove inc/dec lruvec page state functions
Patch series "Remove some lruvec page accounting functions", v2.

Some functions are now unused; remove them.  Make
__mod_lruvec_page_state() unused and then remove it.


This patch (of 6):

All callers of these have been converted to their folio equivalents.

Link: https://lkml.kernel.org/r/20231228085748.1083901-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20231228085748.1083901-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:45 -08:00
Shakeel Butt
d4a5b369ad mm: ratelimit stat flush from workingset shrinker
One of our workloads (Postgres 14 + sysbench OLTP) regressed on newer
upstream kernel and on further investigation, it seems like the cause is
the always synchronous rstat flush in the count_shadow_nodes() added by
the commit f82e6bf9bb9b ("mm: memcg: use rstat for non-hierarchical
stats").  On further inspection it seems like we don't really need
accurate stats in this function as it was already approximating the amount
of appropriate shadow entries to keep for maintaining the refault
information.  Since there is already 2 sec periodic rstat flush, we don't
need exact stats here.  Let's ratelimit the rstat flush in this code path.

Link: https://lkml.kernel.org/r/20231228073055.4046430-1-shakeelb@google.com
Fixes: f82e6bf9bb9b ("mm: memcg: use rstat for non-hierarchical stats")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:45 -08:00
Andrey Konovalov
63b85ac56a kasan: stop leaking stack trace handles
Commit 773688a6cb24 ("kasan: use stack_depot_put for Generic mode") added
support for stack trace eviction for Generic KASAN.

However, that commit didn't evict stack traces when the object is not put
into quarantine.  As a result, some stack traces are never evicted from
the stack depot.

In addition, with the "kasan: save mempool stack traces" series, the free
stack traces for mempool objects are also not properly evicted from the
stack depot.

Fix both issues by:

1. Evicting all stack traces when an object if freed if it was not put
   into quarantine;

2. Always evicting an existing free stack trace when a new one is saved.

Also do a few related clean-ups:

- Do not zero out free track when initializing/invalidating free meta:
  set a value in shadow memory instead;

- Rename KASAN_SLAB_FREETRACK to KASAN_SLAB_FREE_META;

- Drop the kasan_init_cache_meta function as it's not used by KASAN;

- Add comments for the kasan_alloc_meta and kasan_free_meta structs.

[akpm@linux-foundation.org: make release_free_meta() and release_alloc_meta() static]
Link: https://lkml.kernel.org/r/20231226225121.235865-1-andrey.konovalov@linux.dev
Fixes: 773688a6cb24 ("kasan: use stack_depot_put for Generic mode")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:45 -08:00
Kinsey Ho
7eb2d01a1b mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
Improve code readability by removing CONFIG_TRANSPARENT_HUGEPAGE,
since the compiler should be able to automatically optimize out the
code that promotes THPs during page table walks.

No functional changes.

Link: https://lkml.kernel.org/r/20231227141205.2200125-6-kinseyho@google.com
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Donet Tom <donettom@linux.vnet.ibm.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:45 -08:00
Kinsey Ho
533c67e635 mm/mglru: add dummy pmd_dirty()
Add dummy pmd_dirty() for architectures that don't provide it.
This is similar to commit 6617da8fb565 ("mm: add dummy pmd_young()
for architectures not having it").

Link: https://lkml.kernel.org/r/20231227141205.2200125-5-kinseyho@google.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312210606.1Etqz3M4-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202312210042.xQEiqlEh-lkp@intel.com/
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Donet Tom <donettom@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:44 -08:00
Kinsey Ho
745b13e647 mm/mglru: remove CONFIG_MEMCG
Remove CONFIG_MEMCG in a refactoring to improve code readability at
the cost of a few bytes in struct lru_gen_folio per node when
CONFIG_MEMCG=n.

Link: https://lkml.kernel.org/r/20231227141205.2200125-4-kinseyho@google.com
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Donet Tom <donettom@linux.vnet.ibm.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:44 -08:00
Kinsey Ho
61dd3f246b mm/mglru: add CONFIG_LRU_GEN_WALKS_MMU
Add CONFIG_LRU_GEN_WALKS_MMU such that if disabled, the code that
walks page tables to promote pages into the youngest generation will
not be built.

Also improves code readability by adding two helper functions
get_mm_state() and get_next_mm().

Link: https://lkml.kernel.org/r/20231227141205.2200125-3-kinseyho@google.com
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Donet Tom <donettom@linux.vnet.ibm.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:44 -08:00
Kinsey Ho
71ce1ab54a mm/mglru: add CONFIG_ARCH_HAS_HW_PTE_YOUNG
Patch series "mm/mglru: Kconfig cleanup", v4.

This series is the result of the following discussion:
https://lore.kernel.org/47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com/

It mainly avoids building the code that walks page tables on CPUs that
use it, i.e., those don't support hardware accessed bit. Specifically,
it introduces a new Kconfig to guard some of functions added by
commit bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
on CPUs like POWER9, on which the series was tested.


This patch (of 5):

Some architectures are able to set the accessed bit in PTEs when PTEs
are used as part of linear address translations.

Add CONFIG_ARCH_HAS_HW_PTE_YOUNG for such architectures to be able to
override arch_has_hw_pte_young().

Link: https://lkml.kernel.org/r/20231227141205.2200125-1-kinseyho@google.com
Link: https://lkml.kernel.org/r/20231227141205.2200125-2-kinseyho@google.com
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Donet Tom <donettom@linux.vnet.ibm.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:44 -08:00
David Hildenbrand
9c5938694c mm/rmap: silence VM_WARN_ON_FOLIO() in __folio_rmap_sanity_checks()
Unfortunately, vm_insert_page() and friends and up passing
driver-allocated folios into folio_add_file_rmap_pte() using
insert_page_into_pte_locked().

While these driver-allocated folios can be compound pages (large folios),
they are not proper "rmappable" folios.

In these VM_MIXEDMAP VMAs, there isn't really the concept of a reverse
mapping, so long-term, we should clean that up and not call into rmap
code.

For the time being, document how we can end up in rmap code with large
folios that are not marked rmappable.

Link: https://lkml.kernel.org/r/793c5cee-d5fc-4eb1-86a2-39e05686233d@redhat.com
Fixes: 68f0320824fa ("mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()")
Reported-by: syzbot+50ef73537bbc393a25bb@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000014174060e09316e@google.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:43 -08:00