39571 Commits

Author SHA1 Message Date
Linus Torvalds
b32273ee89 execve updates for v6.9-rc1
- Drop needless error path code in remove_arg_zero() (Li kunyu, Kees Cook)
 
 - binfmt_elf_efpic: Don't use missing interpreter's properties (Max Filippov)
 
 - Use /bin/bash for execveat selftests
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmXvlWUWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJueMEACVrxXuXlpozupTtixMzWkvoUjo
 bDmsyuX55PEmKwZXppD7cyxzHM0cdOzQmwMTBB8RWlMzZDMB/U6A8vxwKdoqGNT6
 8nQ7/+GkeZLL32BSf8rtMsCrnFx58elOzEuiogkUwz73G/fBe+tbbZAFsR7q5cvr
 6sHT9gP2Topycr01fHUwL41yDLZReCasxWdR+kYfn2akmpBGHpw12auHmZcVmWCc
 /uJTF4FUBt6Fa2h2OmQ3IByNZ50UoORfFkpP93ZaL1MUlILWMXo3DHOAM9vhowut
 PMa/9Blw86hZBIjKEkeeCIU83LSnI5PQCd7V+zCJmaslxkNPvoeH09rqHfGL37Pv
 DAOPpTEEm0l6ifunIAruSRmislBzQgO6n5ALPmMp4PcdBi5bbsk9PCLDEFwaTCeV
 9H4kZnPl00Q7yyEXwHSJi1FFF3/DM0ntXVND2KQJVzqrszB51lALkI8fypWvTb9h
 POmU7PrYEXdjiTcMsWarajHYeV/VjmY7vwzjl8lXiw5nWnLJYQua8TAx4dEhpM3z
 qwa5K2L724ncsgKkwDZPDA3DsUAN9jYK+eqRRi6kD5zWdTkBHVvdLQrBjkUhndw/
 DL2FkcLDewbHInEdbbIFOJUUmBxbRLcXEqb2nzQtiYIBQm4VqZFKTQqZVDWHF1UP
 +VeLTdDf6piwoP0cvQ==
 =MLV7
 -----END PGP SIGNATURE-----

Merge tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve updates from Kees Cook:

 - Drop needless error path code in remove_arg_zero() (Li kunyu, Kees
   Cook)

 - binfmt_elf_efpic: Don't use missing interpreter's properties (Max
   Filippov)

 - Use /bin/bash for execveat selftests

* tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  exec: Simplify remove_arg_zero() error path
  selftests/exec: Perform script checks with /bin/bash
  exec: Delete unnecessary statements in remove_arg_zero()
  fs: binfmt_elf_efpic: don't use missing interpreter's properties
2024-03-12 14:45:12 -07:00
Linus Torvalds
691632f0e8 s390 updates for 6.9 merge window
- Various virtual vs physical address usage fixes
 
 - Fix error handling in Processor Activity Instrumentation device driver, and
   export number of counters with a sysfs file
 
 - Allow for multiple events when Processor Activity Instrumentation counters
   are monitored in system wide sampling
 
 - Change multiplier and shift values of the Time-of-Day clock source to improve
   steering precision
 
 - Remove a couple of unneeded GFP_DMA flags from allocations
 
 - Disable mmap alignment if randomize_va_space is also disabled, to avoid a too
   small heap
 
 - Various changes to allow s390 to be compiled with LLVM=1, since ld.lld and
   llvm-objcopy will have proper s390 support witch clang 19
 
 - Add __uninitialized macro to Compiler Attributes. This is helpful with s390's
   FPU code where some users have up to 520 byte stack frames. Clearing such
   stack frames (if INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled)
   before they are used contradicts the intention (performance improvement) of
   such code sections.
 
 - Convert switch_to() to an out-of-line function, and use the generic switch_to
   header file
 
 - Replace the usage of s390's debug feature with pr_debug() calls within the
   zcrypt device driver
 
 - Improve hotplug support of the Adjunct Processor device driver
 
 - Improve retry handling in the zcrypt device driver
 
 - Various changes to the in-kernel FPU code:
 
   - Make in-kernel FPU sections preemptible
 
   - Convert various larger inline assemblies and assembler files to C, mainly
     by using singe instruction inline assemblies. This increases readability,
     but also allows makes it easier to add proper instrumentation hooks
 
   - Cleanup of the header files
 
 - Provide fast variants of csum_partial() and csum_partial_copy_nocheck() based
   on vector instructions
 
 - Introduce and use a lock to synchronize accesses to zpci device data
   structures to avoid inconsistent states caused by concurrent accesses
 
 - Compile the kernel without -fPIE. This addresses the following problems if
   the kernel is compiled with -fPIE:
 
   - It uses dynamic symbols (.dynsym), for which the linker refuses to allow
     more than 64k sections. This can break features which use
     '-ffunction-sections' and '-fdata-sections', including kpatch-build and
     function granular KASLR
 
   - It unnecessarily uses GOT relocations, adding an extra layer of indirection
     for many memory accesses
 
 - Fix shared_cpu_list for CPU private L2 caches, which incorrectly were
   reported as globally shared
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmXu3jEACgkQIg7DeRsp
 bsJC8A/9Gi9JSMKWpIDR4WE2MQGwP/PnYdEamtK6c9ewOjIR/UzRIyIM3J1pyV0L
 RwL8k7EBuv3f7shTcwfPzZWlnAwNwqr1UdcafjFNtHTig50YtdP5fBL33frKHBrm
 ATedlCjagojOuVbh1gB45WUgzjSSkPyn0vqwjjo4h6uEAQ35zMEWwCs5Hpajlkhi
 GCdJaiBLJcnhT96QGurQdke+MsrpGCzeBVBnA0qopQEWaQo8OdiAJ1uMD2WKbgPR
 817kNzvmE6nXnfd5JevYbaiLjK/HQUSw2dZUS6/fjuIrzTsZEUhSg4ECaprKXDg7
 5qiVVPNg4WbJAp0SsB+w7c4U99VxhbS7IVHXju18GrXw6SSAupdxIo7R7YiaT8vC
 YIXZ1uIQ4Vbts3w/UqWUczIl/ooQt2DdrWT5NDNA+84OlOM42rthzA3vznTWuPTb
 U21R7cZmN++hAUjR6s4aO2LfS7HQdnKL8nvJW2y99qSfrOXm+M973W2pDhYEVXQh
 ixQ/lxfQpbBT1yUGlquIErokCPB85VY6ZTdGu6Erziywf4CWGsT5CspyaQnX2KTJ
 s4CpFPnilrW3OnxmIkrM+pNJDun1nnkGA388Xq1NEKX8Oe65OMXEFNCb0kAHQ1ua
 vb6534Ib/iuPnxsGpz1sX9iRqtUd06aBovPcbwIvatHCSfkWws8=
 =KZ31
 -----END PGP SIGNATURE-----

Merge tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Various virtual vs physical address usage fixes

 - Fix error handling in Processor Activity Instrumentation device
   driver, and export number of counters with a sysfs file

 - Allow for multiple events when Processor Activity Instrumentation
   counters are monitored in system wide sampling

 - Change multiplier and shift values of the Time-of-Day clock source to
   improve steering precision

 - Remove a couple of unneeded GFP_DMA flags from allocations

 - Disable mmap alignment if randomize_va_space is also disabled, to
   avoid a too small heap

 - Various changes to allow s390 to be compiled with LLVM=1, since
   ld.lld and llvm-objcopy will have proper s390 support witch clang 19

 - Add __uninitialized macro to Compiler Attributes. This is helpful
   with s390's FPU code where some users have up to 520 byte stack
   frames. Clearing such stack frames (if INIT_STACK_ALL_PATTERN or
   INIT_STACK_ALL_ZERO is enabled) before they are used contradicts the
   intention (performance improvement) of such code sections.

 - Convert switch_to() to an out-of-line function, and use the generic
   switch_to header file

 - Replace the usage of s390's debug feature with pr_debug() calls
   within the zcrypt device driver

 - Improve hotplug support of the Adjunct Processor device driver

 - Improve retry handling in the zcrypt device driver

 - Various changes to the in-kernel FPU code:

     - Make in-kernel FPU sections preemptible

     - Convert various larger inline assemblies and assembler files to
       C, mainly by using singe instruction inline assemblies. This
       increases readability, but also allows makes it easier to add
       proper instrumentation hooks

     - Cleanup of the header files

 - Provide fast variants of csum_partial() and
   csum_partial_copy_nocheck() based on vector instructions

 - Introduce and use a lock to synchronize accesses to zpci device data
   structures to avoid inconsistent states caused by concurrent accesses

 - Compile the kernel without -fPIE. This addresses the following
   problems if the kernel is compiled with -fPIE:

     - It uses dynamic symbols (.dynsym), for which the linker refuses
       to allow more than 64k sections. This can break features which
       use '-ffunction-sections' and '-fdata-sections', including
       kpatch-build and function granular KASLR

     - It unnecessarily uses GOT relocations, adding an extra layer of
       indirection for many memory accesses

 - Fix shared_cpu_list for CPU private L2 caches, which incorrectly were
   reported as globally shared

* tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (117 commits)
  s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64
  s390/cache: prevent rebuild of shared_cpu_list
  s390/crypto: remove retry loop with sleep from PAES pkey invocation
  s390/pkey: improve pkey retry behavior
  s390/zcrypt: improve zcrypt retry behavior
  s390/zcrypt: introduce retries on in-kernel send CPRB functions
  s390/ap: introduce mutex to lock the AP bus scan
  s390/ap: rework ap_scan_bus() to return true on config change
  s390/ap: clarify AP scan bus related functions and variables
  s390/ap: rearm APQNs bindings complete completion
  s390/configs: increase number of LOCKDEP_BITS
  s390/vfio-ap: handle hardware checkstop state on queue reset operation
  s390/pai: change sampling event assignment for PMU device driver
  s390/boot: fix minor comment style damages
  s390/boot: do not check for zero-termination relocation entry
  s390/boot: make type of __vmlinux_relocs_64_start|end consistent
  s390/boot: sanitize kaslr_adjust_relocs() function prototype
  s390/boot: simplify GOT handling
  s390: vmlinux.lds.S: fix .got.plt assertion
  s390/boot: workaround current 'llvm-objdump -t -j ...' behavior
  ...
2024-03-12 10:14:22 -07:00
Linus Torvalds
685d982112 Core x86 changes for v6.9:
- The biggest change is the rework of the percpu code,
   to support the 'Named Address Spaces' GCC feature,
   by Uros Bizjak:
 
    - This allows C code to access GS and FS segment relative
      memory via variables declared with such attributes,
      which allows the compiler to better optimize those accesses
      than the previous inline assembly code.
 
    - The series also includes a number of micro-optimizations
      for various percpu access methods, plus a number of
      cleanups of %gs accesses in assembly code.
 
    - These changes have been exposed to linux-next testing for
      the last ~5 months, with no known regressions in this area.
 
 - Fix/clean up __switch_to()'s broken but accidentally
   working handling of FPU switching - which also generates
   better code.
 
 - Propagate more RIP-relative addressing in assembly code,
   to generate slightly better code.
 
 - Rework the CPU mitigations Kconfig space to be less idiosyncratic,
   to make it easier for distros to follow & maintain these options.
 
 - Rework the x86 idle code to cure RCU violations and
   to clean up the logic.
 
 - Clean up the vDSO Makefile logic.
 
 - Misc cleanups and fixes.
 
 [ Please note that there's a higher number of merge commits in
   this branch (three) than is usual in x86 topic trees. This happened
   due to the long testing lifecycle of the percpu changes that
   involved 3 merge windows, which generated a longer history
   and various interactions with other core x86 changes that we
   felt better about to carry in a single branch. ]
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmXvB0gRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jUqRAAqnEQPiabF5acQlHrwviX+cjSobDlqtH5
 9q2AQy9qaEHapzD0XMOxvFye6XIvehGOGxSPvk6CoviSxBND8rb56lvnsEZuLeBV
 Bo5QSIL2x42Zrvo11iPHwgXZfTIusU90sBuKDRFkYBAxY3HK2naMDZe8MAsYCUE9
 nwgHF8DDc/NYiSOXV8kosWoWpNIkoK/STyH5bvTQZMqZcwyZ49AIeP1jGZb/prbC
 e/rbnlrq5Eu6brpM7xo9kELO0Vhd34urV14KrrIpdkmUKytW2KIsyvW8D6fqgDBj
 NSaQLLcz0pCXbhF+8Nqvdh/1coR4L7Ymt08P1rfEjCsQgb/2WnSAGUQuC5JoGzaj
 ngkbFcZllIbD9gNzMQ1n4Aw5TiO+l9zxCqPC/r58Uuvstr+K9QKlwnp2+B3Q73Ft
 rojIJ04NJL6lCHdDgwAjTTks+TD2PT/eBWsDfJ/1pnUWttmv9IjMpnXD5sbHxoiU
 2RGGKnYbxXczYdq/ALYDWM6JXpfnJZcXL3jJi0IDcCSsb92xRvTANYFHnTfyzGfw
 EHkhbF4e4Vy9f6QOkSP3CvW5H26BmZS9DKG0J9Il5R3u2lKdfbb5vmtUmVTqHmAD
 Ulo5cWZjEznlWCAYSI/aIidmBsp9OAEvYd+X7Z5SBIgTfSqV7VWHGt0BfA1heiVv
 F/mednG0gGc=
 =3v4F
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:

 - The biggest change is the rework of the percpu code, to support the
   'Named Address Spaces' GCC feature, by Uros Bizjak:

      - This allows C code to access GS and FS segment relative memory
        via variables declared with such attributes, which allows the
        compiler to better optimize those accesses than the previous
        inline assembly code.

      - The series also includes a number of micro-optimizations for
        various percpu access methods, plus a number of cleanups of %gs
        accesses in assembly code.

      - These changes have been exposed to linux-next testing for the
        last ~5 months, with no known regressions in this area.

 - Fix/clean up __switch_to()'s broken but accidentally working handling
   of FPU switching - which also generates better code

 - Propagate more RIP-relative addressing in assembly code, to generate
   slightly better code

 - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to
   make it easier for distros to follow & maintain these options

 - Rework the x86 idle code to cure RCU violations and to clean up the
   logic

 - Clean up the vDSO Makefile logic

 - Misc cleanups and fixes

* tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/idle: Select idle routine only once
  x86/idle: Let prefer_mwait_c1_over_halt() return bool
  x86/idle: Cleanup idle_setup()
  x86/idle: Clean up idle selection
  x86/idle: Sanitize X86_BUG_AMD_E400 handling
  sched/idle: Conditionally handle tick broadcast in default_idle_call()
  x86: Increase brk randomness entropy for 64-bit systems
  x86/vdso: Move vDSO to mmap region
  x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together
  x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o
  x86/retpoline: Ensure default return thunk isn't used at runtime
  x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32
  x86/vdso: Use $(addprefix ) instead of $(foreach )
  x86/vdso: Simplify obj-y addition
  x86/vdso: Consolidate targets and clean-files
  x86/bugs: Rename CONFIG_RETHUNK              => CONFIG_MITIGATION_RETHUNK
  x86/bugs: Rename CONFIG_CPU_SRSO             => CONFIG_MITIGATION_SRSO
  x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY       => CONFIG_MITIGATION_IBRS_ENTRY
  x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY      => CONFIG_MITIGATION_UNRET_ENTRY
  x86/bugs: Rename CONFIG_SLS                  => CONFIG_MITIGATION_SLS
  ...
2024-03-11 19:53:15 -07:00
Linus Torvalds
73f0d1d7b4 Two changes to simplify the x86 decoder logic a bit.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmXu9zERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g63w//RlHznVWzZE6XrL3kKc0kKLNlzvHwD84h
 V/5UC+lMzFgirULxnnleOL4/GePoubv4NppOgFnpSLpynVbd+m3Fv5yg550LTdnu
 acus7IbF7KUVpVYdCUXZQohhS+aAdG3QsWcATuuvxQHTzaxrp5G5OWYWSKT6xb2X
 2/oUq8oKXLC6XFNJVe8uEG6uqLx3U2AuUfgQ7uMRpZYiwCIeGTPBgXudL6yYhjIF
 TTHJ6kfTp+TeUnPX7WP2n0z917GrV5B4V/7jBcsMy90oHfAdqi+ibqgdO5hyiXgK
 s/jdSESoCXB6Hq108+R+hiq9NEe5GIv7472jaWLdsoq7lun85T/fHiME/HChOnZg
 yUZ/AeMQvhfpMxMFyomjObzTQAnHSwHZ8aqc1wG86+NoHACXwoWhhzvZ48zruhCj
 wxbn22p4E2fHq60++L24HaYIqi0C1tWNMr2i9xh9Beks6ZGHnPRK1FDXMwXu92fm
 LklAEu1aDIJA28RfDqH6vjY/I4dI0z2zP3foM42O0wOd5Kon1EIGk5U9Rs1R18+h
 NgoQFq0vpU+Y5wD2evgoUiaNnl90XI5KT+Jeq9VjNWswKN54ZSB94UprxK6uwPJ9
 LH2QX2yS48nuecErjZ2qacXF7K8tj0o0FV1HB/v2dUzTF+s/IPnp/aP10+aUknIu
 sKPLbgiXS5E=
 =H9XK
 -----END PGP SIGNATURE-----

Merge tag 'x86-asm-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm updates from Ingo Molnar:
 "Two changes to simplify the x86 decoder logic a bit"

* tag 'x86-asm-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/insn: Directly assign x86_64 state in insn_init()
  x86/insn: Remove superfluous checks from instruction decoding routines
2024-03-11 19:13:06 -07:00
Linus Torvalds
38b334fc76 - Add the x86 part of the SEV-SNP host support. This will allow the
kernel to be used as a KVM hypervisor capable of running SNP (Secure
   Nested Paging) guests. Roughly speaking, SEV-SNP is the ultimate goal
   of the AMD confidential computing side, providing the most
   comprehensive confidential computing environment up to date.
 
   This is the x86 part and there is a KVM part which did not get ready
   in time for the merge window so latter will be forthcoming in the next
   cycle.
 
 - Rework the early code's position-dependent SEV variable references in
   order to allow building the kernel with clang and -fPIE/-fPIC and
   -mcmodel=kernel
 
 - The usual set of fixes, cleanups and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXvH0wACgkQEsHwGGHe
 VUrzmA//VS/n6dhHRnm/nAGngr4PeegkgV1OhyKYFfiZ272rT6P9QvblQrgcY0dc
 Ij1DOhEKlke51pTHvMOQ33B3P4Fuc0mx3dpCLY0up5V26kzQiKCjRKEkC4U1bcw8
 W4GqMejaR89bE14bYibmwpSib9T/uVsV65eM3xf1iF5UvsnoUaTziymDoy+nb43a
 B1pdd5vcl4mBNqXeEvt0qjg+xkMLpWUI9tJDB8mbMl/cnIFGgMZzBaY8oktHSROK
 QpuUnKegOgp1RXpfLbNjmZ2Q4Rkk4MNazzDzWq3EIxaRjXL3Qp507ePK7yeA2qa0
 J3jCBQc9E2j7lfrIkUgNIzOWhMAXM2YH5bvH6UrIcMi1qsWJYDmkp2MF1nUedjdf
 Wj16/pJbeEw1aKKIywJGwsmViSQju158vY3SzXG83U/A/Iz7zZRHFmC/ALoxZptY
 Bi7VhfcOSpz98PE3axnG8CvvxRDWMfzBr2FY1VmQbg6VBNo1Xl1aP/IH1I8iQNKg
 /laBYl/qP+1286TygF1lthYROb1lfEIJprgi2xfO6jVYUqPb7/zq2sm78qZRfm7l
 25PN/oHnuidfVfI/H3hzcGubjOG9Zwra8WWYBB2EEmelf21rT0OLqq+eS4T6pxFb
 GNVfc0AzG77UmqbrpkAMuPqL7LrGaSee4NdU3hkEdSphlx1/YTo=
 =c1ps
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Add the x86 part of the SEV-SNP host support.

   This will allow the kernel to be used as a KVM hypervisor capable of
   running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP
   is the ultimate goal of the AMD confidential computing side,
   providing the most comprehensive confidential computing environment
   up to date.

   This is the x86 part and there is a KVM part which did not get ready
   in time for the merge window so latter will be forthcoming in the
   next cycle.

 - Rework the early code's position-dependent SEV variable references in
   order to allow building the kernel with clang and -fPIE/-fPIC and
   -mcmodel=kernel

 - The usual set of fixes, cleanups and improvements all over the place

* tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/sev: Disable KMSAN for memory encryption TUs
  x86/sev: Dump SEV_STATUS
  crypto: ccp - Have it depend on AMD_IOMMU
  iommu/amd: Fix failure return from snp_lookup_rmpentry()
  x86/sev: Fix position dependent variable references in startup code
  crypto: ccp: Make snp_range_list static
  x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
  Documentation: virt: Fix up pre-formatted text block for SEV ioctls
  crypto: ccp: Add the SNP_SET_CONFIG command
  crypto: ccp: Add the SNP_COMMIT command
  crypto: ccp: Add the SNP_PLATFORM_STATUS command
  x86/cpufeatures: Enable/unmask SEV-SNP CPU feature
  KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
  iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown
  crypto: ccp: Handle legacy SEV commands when SNP is enabled
  crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
  crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  x86/sev: Introduce an SNP leaked pages list
  crypto: ccp: Provide an API to issue SEV and SNP commands
  ...
2024-03-11 17:44:11 -07:00
Linus Torvalds
720c857907 Support for x86 Fast Return and Event Delivery (FRED):
FRED is a replacement for IDT event delivery on x86 and addresses most of
 the technical nightmares which IDT exposes:
 
  1) Exception cause registers like CR2 need to be manually preserved in
     nested exception scenarios.
 
  2) Hardware interrupt stack switching is suboptimal for nested exceptions
     as the interrupt stack mechanism rewinds the stack on each entry which
     requires a massive effort in the low level entry of #NMI code to handle
     this.
 
  3) No hardware distinction between entry from kernel or from user which
     makes establishing kernel context more complex than it needs to be
     especially for unconditionally nestable exceptions like NMI.
 
  4) NMI nesting caused by IRET unconditionally reenabling NMIs, which is a
     problem when the perf NMI takes a fault when collecting a stack trace.
 
  5) Partial restore of ESP when returning to a 16-bit segment
 
  6) Limitation of the vector space which can cause vector exhaustion on
     large systems.
 
  7) Inability to differentiate NMI sources
 
 FRED addresses these shortcomings by:
 
  1) An extended exception stack frame which the CPU uses to save exception
     cause registers. This ensures that the meta information for each
     exception is preserved on stack and avoids the extra complexity of
     preserving it in software.
 
  2) Hardware interrupt stack switching is non-rewinding if a nested
     exception uses the currently interrupt stack.
 
  3) The entry points for kernel and user context are separate and GS BASE
     handling which is required to establish kernel context for per CPU
     variable access is done in hardware.
 
  4) NMIs are now nesting protected. They are only reenabled on the return
     from NMI.
 
  5) FRED guarantees full restore of ESP
 
  6) FRED does not put a limitation on the vector space by design because it
     uses a central entry points for kernel and user space and the CPUstores
     the entry type (exception, trap, interrupt, syscall) on the entry stack
     along with the vector number. The entry code has to demultiplex this
     information, but this removes the vector space restriction.
 
     The first hardware implementations will still have the current
     restricted vector space because lifting this limitation requires
     further changes to the local APIC.
 
  7) FRED stores the vector number and meta information on stack which
     allows having more than one NMI vector in future hardware when the
     required local APIC changes are in place.
 
 The series implements the initial FRED support by:
 
  - Reworking the existing entry and IDT handling infrastructure to
    accomodate for the alternative entry mechanism.
 
  - Expanding the stack frame to accomodate for the extra 16 bytes FRED
    requires to store context and meta information
 
  - Providing FRED specific C entry points for events which have information
    pushed to the extended stack frame, e.g. #PF and #DB.
 
  - Providing FRED specific C entry points for #NMI and #MCE
 
  - Implementing the FRED specific ASM entry points and the C code to
    demultiplex the events
 
  - Providing detection and initialization mechanisms and the necessary
    tweaks in context switching, GS BASE handling etc.
 
 The FRED integration aims for maximum code reuse vs. the existing IDT
 implementation to the extent possible and the deviation in hot paths like
 context switching are handled with alternatives to minimalize the
 impact. The low level entry and exit paths are seperate due to the extended
 stack frame and the hardware based GS BASE swichting and therefore have no
 impact on IDT based systems.
 
 It has been extensively tested on existing systems and on the FRED
 simulation and as of now there are know outstanding problems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuKPgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWyUEACevJMHU+Ot9zqBPizSWxByM1uunHbp
 bjQXhaFeskd3mt7k7HU6GsPRSmC3q4lliP1Y9ypfbU0DvYSI2h/PhMWizjhmot2y
 nIvFpl51r/NsI+JHx1oXcFetz0eGHEqBui/4YQ/swgOCMymYgfqgHhazXTdldV3g
 KpH9/8W3AeGvw79uzXFH9tjBzTkbvywpam3v0LYNDJWTCuDkilyo8PjhsgRZD4x3
 V9f1nLD7nSHZW8XLoktdJJ38bKwI2Lhao91NQ0ErwopekA4/9WphZEKsDpidUSXJ
 sn1O148oQ8X92IO2OaQje8XC5pLGr5GqQBGPWzRH56P/Vd3+WOwBxaFoU6Drxc5s
 tIe23ZjkVcpA8EEG7BQBZV1Un/NX7XaCCnMniOt0RauXw+1NaslX7t/tnUAh5F1V
 TWCH4D0I0oJ0qJ7kNliGn2BP3agYXOVg81xVEUjT6KfHcYU4ImUrwi+BkeNXuXtL
 Ch5ADnbYAcUjWLFnAmEmaRtfmfNGY5T7PeGFHW2RRkaOJ88v5g14Voo6gPJaDUPn
 wMQ0nLq1xN4xZWF6ZgfRqAhArvh20k38ZujRku5vXEqnhOugQ76TF2UYiFEwOXbQ
 8jcM+yEBLGgBz7tGMwmIAml6kfxaFF1KPpdrtcPxNkGlbE6KTSuIolLx2YGUvlSU
 6/O8nwZy49ckmQ==
 =Ib7w
 -----END PGP SIGNATURE-----

Merge tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FRED support from Thomas Gleixner:
 "Support for x86 Fast Return and Event Delivery (FRED).

  FRED is a replacement for IDT event delivery on x86 and addresses most
  of the technical nightmares which IDT exposes:

   1) Exception cause registers like CR2 need to be manually preserved
      in nested exception scenarios.

   2) Hardware interrupt stack switching is suboptimal for nested
      exceptions as the interrupt stack mechanism rewinds the stack on
      each entry which requires a massive effort in the low level entry
      of #NMI code to handle this.

   3) No hardware distinction between entry from kernel or from user
      which makes establishing kernel context more complex than it needs
      to be especially for unconditionally nestable exceptions like NMI.

   4) NMI nesting caused by IRET unconditionally reenabling NMIs, which
      is a problem when the perf NMI takes a fault when collecting a
      stack trace.

   5) Partial restore of ESP when returning to a 16-bit segment

   6) Limitation of the vector space which can cause vector exhaustion
      on large systems.

   7) Inability to differentiate NMI sources

  FRED addresses these shortcomings by:

   1) An extended exception stack frame which the CPU uses to save
      exception cause registers. This ensures that the meta information
      for each exception is preserved on stack and avoids the extra
      complexity of preserving it in software.

   2) Hardware interrupt stack switching is non-rewinding if a nested
      exception uses the currently interrupt stack.

   3) The entry points for kernel and user context are separate and GS
      BASE handling which is required to establish kernel context for
      per CPU variable access is done in hardware.

   4) NMIs are now nesting protected. They are only reenabled on the
      return from NMI.

   5) FRED guarantees full restore of ESP

   6) FRED does not put a limitation on the vector space by design
      because it uses a central entry points for kernel and user space
      and the CPUstores the entry type (exception, trap, interrupt,
      syscall) on the entry stack along with the vector number. The
      entry code has to demultiplex this information, but this removes
      the vector space restriction.

      The first hardware implementations will still have the current
      restricted vector space because lifting this limitation requires
      further changes to the local APIC.

   7) FRED stores the vector number and meta information on stack which
      allows having more than one NMI vector in future hardware when the
      required local APIC changes are in place.

  The series implements the initial FRED support by:

   - Reworking the existing entry and IDT handling infrastructure to
     accomodate for the alternative entry mechanism.

   - Expanding the stack frame to accomodate for the extra 16 bytes FRED
     requires to store context and meta information

   - Providing FRED specific C entry points for events which have
     information pushed to the extended stack frame, e.g. #PF and #DB.

   - Providing FRED specific C entry points for #NMI and #MCE

   - Implementing the FRED specific ASM entry points and the C code to
     demultiplex the events

   - Providing detection and initialization mechanisms and the necessary
     tweaks in context switching, GS BASE handling etc.

  The FRED integration aims for maximum code reuse vs the existing IDT
  implementation to the extent possible and the deviation in hot paths
  like context switching are handled with alternatives to minimalize the
  impact. The low level entry and exit paths are seperate due to the
  extended stack frame and the hardware based GS BASE swichting and
  therefore have no impact on IDT based systems.

  It has been extensively tested on existing systems and on the FRED
  simulation and as of now there are no outstanding problems"

* tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/fred: Fix init_task thread stack pointer initialization
  MAINTAINERS: Add a maintainer entry for FRED
  x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly
  x86/fred: Invoke FRED initialization code to enable FRED
  x86/fred: Add FRED initialization functions
  x86/syscall: Split IDT syscall setup code into idt_syscall_init()
  KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling
  x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI
  x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code
  x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
  x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled
  x86/traps: Add sysvec_install() to install a system interrupt handler
  x86/fred: FRED entry/exit and dispatch code
  x86/fred: Add a machine check entry stub for FRED
  x86/fred: Add a NMI entry stub for FRED
  x86/fred: Add a debug fault entry stub for FRED
  x86/idtentry: Incorporate definitions/declarations of the FRED entries
  x86/fred: Make exc_page_fault() work for FRED
  x86/fred: Allow single-step trap and NMI when starting a new task
  x86/fred: No ESPFIX needed when FRED is enabled
  ...
2024-03-11 16:00:17 -07:00
Linus Torvalds
d08c407f71 A large set of updates and features for timers and timekeeping:
- The hierarchical timer pull model
 
     When timer wheel timers are armed they are placed into the timer wheel
     of a CPU which is likely to be busy at the time of expiry. This is done
     to avoid wakeups on potentially idle CPUs.
 
     This is wrong in several aspects:
 
      1) The heuristics to select the target CPU are wrong by
         definition as the chance to get the prediction right is close
         to zero.
 
      2) Due to #1 it is possible that timers are accumulated on a
         single target CPU
 
      3) The required computation in the enqueue path is just overhead for
      	dubious value especially under the consideration that the vast
      	majority of timer wheel timers are either canceled or rearmed
      	before they expire.
 
     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on which
     they get armed.
 
     This is achieved by having separate wheels for CPU pinned timers and
     global timers which do not care about where they expire.
 
     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.
 
     When a CPU goes idle it evaluates its own timer wheels:
 
       - If the first expiring timer is a pinned timer, then the global
       	timers can be ignored as the CPU will wake up before they expire.
 
       - If the first expiring timer is a global timer, then the expiry time
         is propagated into the timer pull hierarchy and the CPU makes sure
         to wake up for the first pinned timer.
 
     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to the
     point where no further aggregation of groups is required, i.e. the
     number of levels is log8(NR_CPUS). The magic number of eight has been
     established by experimention, but can be adjusted if needed.
 
     In each group one busy CPU acts as the migrator. It's only one CPU to
     avoid lock contention on remote timer wheels.
 
     The migrator CPU checks in its own timer wheel handling whether there
     are other CPUs in the group which have gone idle and have global timers
     to expire. If there are global timers to expire, the migrator locks the
     remote CPU timer wheel and handles the expiry.
 
     Depending on the group level in the hierarchy this handling can require
     to walk the hierarchy downwards to the CPU level.
 
     Special care is taken when the last CPU goes idle. At this point the
     CPU is the systemwide migrator at the top of the hierarchy and it
     therefore cannot delegate to the hierarchy. It needs to arm its own
     timer device to expire either at the first expiring timer in the
     hierarchy or at the first CPU local timer, which ever expires first.
 
     This completely removes the overhead from the enqueue path, which is
     e.g. for networking a true hotpath and trades it for a slightly more
     complex idle path.
 
     This has been in development for a couple of years and the final series
     has been extensively tested by various teams from silicon vendors and
     ran through extensive CI.
 
     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them to
     power down a die completely on a mult-die socket for the first time in
     a mostly idle scenario.
 
     There is only one outstanding ~1.5% regression on a specific overloaded
     netperf test which is currently investigated, but the rest is either
     positive or neutral performance wise and positive on the power
     management side.
 
   - Fixes for the timekeeping interpolation code for cross-timestamps:
 
     cross-timestamps are used for PTP to get snapshots from hardware timers
     and interpolated them back to clock MONOTONIC. The changes address a
     few corner cases in the interpolation code which got the math and logic
     wrong.
 
   - Simplifcation of the clocksource watchdog retry logic to automatically
     adjust to handle larger systems correctly instead of having more
     incomprehensible command line parameters.
 
   - Treewide consolidation of the VDSO data structures.
 
   - The usual small improvements and cleanups all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuAN0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVKXEADIR45rjR1Xtz32js7B53Y65O4WNoOQ
 6/ycWcswuGzg/h4QUpPSJ6gOGVmKSWwZi4n0P/VadCiXGSPPm0aUKsoRUt9DZsPY
 mtj2wjCSXKXiyhTl9OtrZME86ZAIGO1dQXa/sOHsiP5PCjgQkD0b5CYi1+B6eHDt
 1/Uo2Tb9g8VAPppq20V5Uo93GrPf642oyi3FCFrR1M112Uuak5DmqHJYiDpreNcG
 D5SgI+ykSiaUaVyHifvqijoJk0rYXkqEC6evl02477lJ/X0vVo2/M8XPS95BxHST
 s5Iruo4rP+qeAy8QvhZpoPX59fO0m/AgA7cf77XXAtOpVdLH+bs4ILsEbouAIOtv
 lsmRkcYt+TpvrZFHPAxks+6g3afuROiDtxD5sXXpVWxvofi8FwWqubdlqdsbw9MP
 ZCTNyzNyKL47QeDwBfSynYUL1RSyqsphtIwk4oeQklH9rwMAnW21hi30z15hQ0pQ
 FOVkmcwi79JNvl/G+jRkDzw7r8/zcHshWdSjyUM04CDjjnCDjQOFWSIjEPwbQjjz
 S4HXpJKJW963dBgs9Z84/Ctw1GwoBk1qedDWDJE1257Qvmo/Wpe/7GddWcazOGnN
 RRFMzGPbOqBDbjtErOKGU+iCisgNEvz2XK+TI16uRjWde7DxZpiTVYgNDrZ+/Pyh
 rQ23UBms6ZRR+A==
 =iQlu
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "A large set of updates and features for timers and timekeeping:

   - The hierarchical timer pull model

     When timer wheel timers are armed they are placed into the timer
     wheel of a CPU which is likely to be busy at the time of expiry.
     This is done to avoid wakeups on potentially idle CPUs.

     This is wrong in several aspects:

       1) The heuristics to select the target CPU are wrong by
          definition as the chance to get the prediction right is
          close to zero.

       2) Due to #1 it is possible that timers are accumulated on
          a single target CPU

       3) The required computation in the enqueue path is just overhead
          for dubious value especially under the consideration that the
          vast majority of timer wheel timers are either canceled or
          rearmed before they expire.

     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on
     which they get armed.

     This is achieved by having separate wheels for CPU pinned timers
     and global timers which do not care about where they expire.

     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.

     When a CPU goes idle it evaluates its own timer wheels:

       - If the first expiring timer is a pinned timer, then the global
         timers can be ignored as the CPU will wake up before they
         expire.

       - If the first expiring timer is a global timer, then the expiry
         time is propagated into the timer pull hierarchy and the CPU
         makes sure to wake up for the first pinned timer.

     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to
     the point where no further aggregation of groups is required, i.e.
     the number of levels is log8(NR_CPUS). The magic number of eight
     has been established by experimention, but can be adjusted if
     needed.

     In each group one busy CPU acts as the migrator. It's only one CPU
     to avoid lock contention on remote timer wheels.

     The migrator CPU checks in its own timer wheel handling whether
     there are other CPUs in the group which have gone idle and have
     global timers to expire. If there are global timers to expire, the
     migrator locks the remote CPU timer wheel and handles the expiry.

     Depending on the group level in the hierarchy this handling can
     require to walk the hierarchy downwards to the CPU level.

     Special care is taken when the last CPU goes idle. At this point
     the CPU is the systemwide migrator at the top of the hierarchy and
     it therefore cannot delegate to the hierarchy. It needs to arm its
     own timer device to expire either at the first expiring timer in
     the hierarchy or at the first CPU local timer, which ever expires
     first.

     This completely removes the overhead from the enqueue path, which
     is e.g. for networking a true hotpath and trades it for a slightly
     more complex idle path.

     This has been in development for a couple of years and the final
     series has been extensively tested by various teams from silicon
     vendors and ran through extensive CI.

     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them
     to power down a die completely on a mult-die socket for the first
     time in a mostly idle scenario.

     There is only one outstanding ~1.5% regression on a specific
     overloaded netperf test which is currently investigated, but the
     rest is either positive or neutral performance wise and positive on
     the power management side.

   - Fixes for the timekeeping interpolation code for cross-timestamps:

     cross-timestamps are used for PTP to get snapshots from hardware
     timers and interpolated them back to clock MONOTONIC. The changes
     address a few corner cases in the interpolation code which got the
     math and logic wrong.

   - Simplifcation of the clocksource watchdog retry logic to
     automatically adjust to handle larger systems correctly instead of
     having more incomprehensible command line parameters.

   - Treewide consolidation of the VDSO data structures.

   - The usual small improvements and cleanups all over the place"

* tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  timer/migration: Fix quick check reporting late expiry
  tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n
  vdso/datapage: Quick fix - use asm/page-def.h for ARM64
  timers: Assert no next dyntick timer look-up while CPU is offline
  tick: Assume timekeeping is correctly handed over upon last offline idle call
  tick: Shut down low-res tick from dying CPU
  tick: Split nohz and highres features from nohz_mode
  tick: Move individual bit features to debuggable mask accesses
  tick: Move got_idle_tick away from common flags
  tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
  tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
  tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
  tick: Start centralizing tick related CPU hotplug operations
  tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
  tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
  tick: Use IS_ENABLED() whenever possible
  tick/sched: Remove useless oneshot ifdeffery
  tick/nohz: Remove duplicate between lowres and highres handlers
  tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
  hrtimer: Select housekeeping CPU during migration
  ...
2024-03-11 14:38:26 -07:00
Linus Torvalds
ff887eb07c workqueue: Changes for v6.9
This cycle, a lot of workqueue changes including some that are significant
 and invasive.
 
 - During v6.6 cycle, unbound workqueues were updated so that they are more
   topology aware and flexible, which among other things improved workqueue
   behavior on modern multi-L3 CPUs. In the process, 636b927eba5b
   ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
   switched unbound workqueues to use per-CPU frontend pool_workqueues as a
   part of increasing front-back mapping flexibility.
 
   An unwelcome side effect of this change was that this made max concurrency
   enforcement per-CPU blowing up the maximum number of allowed concurrent
   executions. I incorrectly assumed that this wouldn't cause practical
   problems as most unbound workqueue users are self-regulate max
   concurrency; however, there definitely are which don't (e.g. on IO paths)
   and the drastic increase in the allowed max concurrency led to noticeable
   perf regressions in some use cases.
 
   This is now addressed by separating out max concurrency enforcement to a
   separate struct - wq_node_nr_active - which makes @max_active consistently
   mean system-wide max concurrency regardless of the number of CPUs or
   (finally) NUMA nodes. This is a rather invasive and, in places, a bit
   clunky; however, the clunkiness rises from the the inherent requirement to
   handle the disagreement between the execution locality domain and max
   concurrency enforcement domain on some modern machines. See 5797b1c18919
   ("workqueue: Implement system-wide nr_active enforcement for unbound
   workqueues") for more details.
 
 - BH workqueue support is added. They are similar to per-CPU workqueues but
   execute work items in the softirq context. This is expected to replace
   tasklet. However, currently, it's missing the ability to disable and
   enable work items which is needed to convert many tasklet users. To avoid
   crowding this merge window too much, this will be included in the next
   merge window. A separate pull request will be sent for the couple
   conversion patches that are currently pending.
 
 - Waiman plugged a long-standing hole in workqueue CPU isolation where
   ordered workqueues didn't follow wq_unbound_cpumask updates. Ordered
   workqueues now follow the same rules as other unbound workqueues.
 
 - More CPU isolation improvements: Juri fixed another deficit in workqueue
   isolation where unbound rescuers don't respect wq_unbound_cpumask.
   Leonardo fixed delayed_work timers firing on isolated CPUs.
 
 - Other misc changes.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZe7JCQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGcnqAP9UP8zEM1la19cilhboDumxmRWyRpV/egFOqsMP
 Y5PuoAEAtsBJtQWtm5w46+y+fk3nK2ugXlQio2gH0qQcxX6SdgQ=
 =/ovv
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:
 "This cycle, a lot of workqueue changes including some that are
  significant and invasive.

   - During v6.6 cycle, unbound workqueues were updated so that they are
     more topology aware and flexible, which among other things improved
     workqueue behavior on modern multi-L3 CPUs. In the process, commit
     636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu
     pool_workqueues") switched unbound workqueues to use per-CPU
     frontend pool_workqueues as a part of increasing front-back mapping
     flexibility.

     An unwelcome side effect of this change was that this made max
     concurrency enforcement per-CPU blowing up the maximum number of
     allowed concurrent executions. I incorrectly assumed that this
     wouldn't cause practical problems as most unbound workqueue users
     are self-regulate max concurrency; however, there definitely are
     which don't (e.g. on IO paths) and the drastic increase in the
     allowed max concurrency led to noticeable perf regressions in some
     use cases.

     This is now addressed by separating out max concurrency enforcement
     to a separate struct - wq_node_nr_active - which makes @max_active
     consistently mean system-wide max concurrency regardless of the
     number of CPUs or (finally) NUMA nodes. This is a rather invasive
     and, in places, a bit clunky; however, the clunkiness rises from
     the the inherent requirement to handle the disagreement between the
     execution locality domain and max concurrency enforcement domain on
     some modern machines.

     See commit 5797b1c18919 ("workqueue: Implement system-wide
     nr_active enforcement for unbound workqueues") for more details.

   - BH workqueue support is added.

     They are similar to per-CPU workqueues but execute work items in
     the softirq context. This is expected to replace tasklet. However,
     currently, it's missing the ability to disable and enable work
     items which is needed to convert many tasklet users. To avoid
     crowding this merge window too much, this will be included in the
     next merge window. A separate pull request will be sent for the
     couple conversion patches that are currently pending.

   - Waiman plugged a long-standing hole in workqueue CPU isolation
     where ordered workqueues didn't follow wq_unbound_cpumask updates.
     Ordered workqueues now follow the same rules as other unbound
     workqueues.

   - More CPU isolation improvements: Juri fixed another deficit in
     workqueue isolation where unbound rescuers don't respect
     wq_unbound_cpumask. Leonardo fixed delayed_work timers firing on
     isolated CPUs.

   - Other misc changes"

* tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (54 commits)
  workqueue: Drain BH work items on hot-unplugged CPUs
  workqueue: Introduce from_work() helper for cleaner callback declarations
  workqueue: Control intensive warning threshold through cmdline
  workqueue: Make @flags handling consistent across set_work_data() and friends
  workqueue: Remove clear_work_data()
  workqueue: Factor out work_grab_pending() from __cancel_work_sync()
  workqueue: Clean up enum work_bits and related constants
  workqueue: Introduce work_cancel_flags
  workqueue: Use variable name irq_flags for saving local irq flags
  workqueue: Reorganize flush and cancel[_sync] functions
  workqueue: Rename __cancel_work_timer() to __cancel_timer_sync()
  workqueue: Use rcu_read_lock_any_held() instead of rcu_read_lock_held()
  workqueue: Cosmetic changes
  workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK
  workqueue: Fix queue_work_on() with BH workqueues
  async: Use a dedicated unbound workqueue with raised min_active
  workqueue: Implement workqueue_set_min_active()
  workqueue: Fix kernel-doc comment of unplug_oldest_pwq()
  workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask
  kernel/workqueue: Let rescuers follow unbound wq cpumask changes
  ...
2024-03-11 12:50:42 -07:00
Linus Torvalds
b5683a37c8 vfs-6.9.pidfd
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4/wAKCRCRxhvAZXjc
 opnBAQCaQWwxjT0VLHebPniw6tel/KYlZ9jH9kBQwLrk1pembwEA+BsCY2C8YS4a
 75v9jOPxr+Z8j1SjxwwubcONPyqYXwQ=
 =+Wa3
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pdfd updates from Christian Brauner:

 - Until now pidfds could only be created for thread-group leaders but
   not for threads. There was no technical reason for this. We simply
   had no users that needed support for this. Now we do have users that
   need support for this.

   This introduces a new PIDFD_THREAD flag for pidfd_open(). If that
   flag is set pidfd_open() creates a pidfd that refers to a specific
   thread.

   In addition, we now allow clone() and clone3() to be called with
   CLONE_PIDFD | CLONE_THREAD which wasn't possible before.

   A pidfd that refers to an individual thread differs from a pidfd that
   refers to a thread-group leader:

    (1) Pidfds are pollable. A task may poll a pidfd and get notified
        when the task has exited.

        For thread-group leader pidfds the polling task is woken if the
        thread-group is empty. In other words, if the thread-group
        leader task exits when there are still threads alive in its
        thread-group the polling task will not be woken when the
        thread-group leader exits but rather when the last thread in the
        thread-group exits.

        For thread-specific pidfds the polling task is woken if the
        thread exits.

    (2) Passing a thread-group leader pidfd to pidfd_send_signal() will
        generate thread-group directed signals like kill(2) does.

        Passing a thread-specific pidfd to pidfd_send_signal() will
        generate thread-specific signals like tgkill(2) does.

        The default scope of the signal is thus determined by the type
        of the pidfd.

        Since use-cases exist where the default scope of the provided
        pidfd needs to be overriden the following flags are added to
        pidfd_send_signal():

         - PIDFD_SIGNAL_THREAD
           Send a thread-specific signal.

         - PIDFD_SIGNAL_THREAD_GROUP
           Send a thread-group directed signal.

         - PIDFD_SIGNAL_PROCESS_GROUP
           Send a process-group directed signal.

        The scope change will only work if the struct pid is actually
        used for this scope.

        For example, in order to send a thread-group directed signal the
        provided pidfd must be used as a thread-group leader and
        similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be
        used as a process group leader.

 - Move pidfds from the anonymous inode infrastructure to a tiny pseudo
   filesystem. This will unblock further work that we weren't able to do
   simply because of the very justified limitations of anonymous inodes.
   Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds
   to become useful for the first time. They can now be compared by
   inode number which are unique for the system lifetime.

   Instead of stashing struct pid in file->private_data we can now stash
   it in inode->i_private. This makes it possible to introduce concepts
   that operate on a process once all file descriptors have been closed.
   A concrete example is kill-on-last-close. Another side-effect is that
   file->private_data is now freed up for per-file options for pidfds.

   Now, each struct pid will refer to a different inode but the same
   struct pid will refer to the same inode if it's opened multiple
   times. In contrast to now where each struct pid refers to the same
   inode.

   The tiny pseudo filesystem is not visible anywhere in userspace
   exactly like e.g., pipefs and sockfs. There's no lookup, there's no
   complex inode operations, nothing. Dentries and inodes are always
   deleted when the last pidfd is closed.

   We allocate a new inode and dentry for each struct pid and we reuse
   that inode and dentry for all pidfds that refer to the same struct
   pid. The code is entirely optional and fairly small. If it's not
   selected we fallback to anonymous inodes. Heavily inspired by nsfs.

   The dentry and inode allocation mechanism is moved into generic
   infrastructure that is now shared between nsfs and pidfs. The
   path_from_stashed() helper must be provided with a stashing location,
   an inode number, a mount, and the private data that is supposed to be
   used and it will provide a path that can be passed to dentry_open().

   The helper will try retrieve an existing dentry from the provided
   stashing location. If a valid dentry is found it is reused. If not a
   new one is allocated and we try to stash it in the provided location.
   If this fails we retry until we either find an existing dentry or the
   newly allocated dentry could be stashed. Subsequent openers of the
   same namespace or task are then able to reuse it.

 - Currently it is only possible to get notified when a task has exited,
   i.e., become a zombie and userspace gets notified with EPOLLIN. We
   now also support waiting until the task has been reaped, notifying
   userspace with EPOLLHUP.

 - Ensure that ESRCH is reported for getfd if a task is exiting instead
   of the confusing EBADF.

 - Various smaller cleanups to pidfd functions.

* tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
  libfs: improve path_from_stashed()
  libfs: add stashed_dentry_prune()
  libfs: improve path_from_stashed() helper
  pidfs: convert to path_from_stashed() helper
  nsfs: convert to path_from_stashed() helper
  libfs: add path_from_stashed()
  pidfd: add pidfs
  pidfd: move struct pidfd_fops
  pidfd: allow to override signal scope in pidfd_send_signal()
  pidfd: change pidfd_send_signal() to respect PIDFD_THREAD
  signal: fill in si_code in prepare_kill_siginfo()
  selftests: add ESRCH tests for pidfd_getfd()
  pidfd: getfd should always report ESRCH if a task is exiting
  pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together
  pidfd: exit: kill the no longer used thread_group_exited()
  pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN))
  pid: kill the obsolete PIDTYPE_PID code in transfer_pid()
  pidfd: kill the no longer needed do_notify_pidfd() in de_thread()
  pidfd_poll: report POLLHUP when pid_task() == NULL
  pidfd: implement PIDFD_THREAD flag for pidfd_open()
  ...
2024-03-11 10:21:06 -07:00
Linus Torvalds
7ea65c89d8 vfs-6.9.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem3wQAKCRCRxhvAZXjc
 otRMAQDeo8qsuuIAcS2KUicKqZR5yMVvrY9r4sQzf7YRcJo5HQD+NQXkKwQuv1VO
 OUeScsic/+I+136AgdjWnlEYO5dp0go=
 =4WKU
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Misc features, cleanups, and fixes for vfs and individual filesystems.

  Features:

   - Support idmapped mounts for hugetlbfs.

   - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
     where the passed offset is ignored if the file is O_APPEND. The new
     flag allows a caller to enforce that the offset is honored to
     conform to posix even if the file was opened in append mode.

   - Move i_mmap_rwsem in struct address_space to avoid false sharing
     between i_mmap and i_mmap_rwsem.

   - Convert efs, qnx4, and coda to use the new mount api.

   - Add a generic is_dot_dotdot() helper that's used by various
     filesystems and the VFS code instead of open-coding it multiple
     times.

   - Recently we've added stable offsets which allows stable ordering
     when iterating directories exported through NFS on e.g., tmpfs
     filesystems. Originally an xarray was used for the offset map but
     that caused slab fragmentation issues over time. This switches the
     offset map to the maple tree which has a dense mode that handles
     this scenario a lot better. Includes tests.

   - Finally merge the case-insensitive improvement series Gabriel has
     been working on for a long time. This cleanly propagates case
     insensitive operations through ->s_d_op which in turn allows us to
     remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
     It also improves performance by trying a case-sensitive comparison
     first and then fallback to case-insensitive lookup if that fails.
     This also fixes a bug where overlayfs would be able to be mounted
     over a case insensitive directory which would lead to all sort of
     odd behaviors.

  Cleanups:

   - Make file_dentry() a simple accessor now that ->d_real() is
     simplified because of the backing file work we did the last two
     cycles.

   - Use the dedicated file_mnt_idmap helper in ntfs3.

   - Use smp_load_acquire/store_release() in the i_size_read/write
     helpers and thus remove the hack to handle i_size reads in the
     filemap code.

   - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
     fs/

   - It's no longer necessary to perform a second built-in initramfs
     unpack call because we retain the contents of the previous
     extraction. Remove it.

   - Now that we have removed various allocators kfree_rcu() always
     works with kmem caches and kmalloc(). So simplify various places
     that only use an rcu callback in order to handle the kmem cache
     case.

   - Convert the pipe code to use a lockdep comparison function instead
     of open-coding the nesting making lockdep validation easier.

   - Move code into fs-writeback.c that was located in a header but can
     be made static as it's only used in that one file.

   - Rewrite the alignment checking iterators for iovec and bvec to be
     easier to read, and also significantly more compact in terms of
     generated code. This saves 270 bytes of text on x86-64 (with
     clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
     saves a bit of time for the same workload.

   - Switch various places to use KMEM_CACHE instead of
     kmem_cache_create().

   - Use inode_set_ctime_to_ts() in inode_set_ctime_current()

   - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.

   - Various smaller cleanups for eventfds.

  Fixes:

   - Fix various comments and typos, and unneeded initializations.

   - Fix stack allocation hack for clang in the select code.

   - Improve dump_mapping() debug code on a best-effort basis.

   - Fix build errors in various selftests.

   - Avoid wrap-around instrumentation in various places.

   - Don't allow user namespaces without an idmapping to be used for
     idmapped mounts.

   - Fix sysv sb_read() call.

   - Fix fallback implementation of the get_name() export operation"

* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
  hugetlbfs: support idmapped mounts
  qnx4: convert qnx4 to use the new mount api
  fs: use inode_set_ctime_to_ts to set inode ctime to current time
  libfs: Drop generic_set_encrypted_ci_d_ops
  ubifs: Configure dentry operations at dentry-creation time
  f2fs: Configure dentry operations at dentry-creation time
  ext4: Configure dentry operations at dentry-creation time
  libfs: Add helper to choose dentry operations at mount-time
  libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
  fscrypt: Drop d_revalidate once the key is added
  fscrypt: Drop d_revalidate for valid dentries during lookup
  fscrypt: Factor out a helper to configure the lookup dentry
  ovl: Always reject mounting over case-insensitive directories
  libfs: Attempt exact-match comparison first during casefolded lookup
  efs: remove SLAB_MEM_SPREAD flag usage
  jfs: remove SLAB_MEM_SPREAD flag usage
  minix: remove SLAB_MEM_SPREAD flag usage
  openpromfs: remove SLAB_MEM_SPREAD flag usage
  proc: remove SLAB_MEM_SPREAD flag usage
  qnx6: remove SLAB_MEM_SPREAD flag usage
  ...
2024-03-11 09:38:17 -07:00
Linus Torvalds
97ec9715a8 linux_kselftest-kunit-6.9-rc1
This KUnit next update for Linux 6.9-rc1 consists of:
 
 -- fix to make kunit_bus_type const
 
 -- kunit tool change to Print UML command
 
 -- DRM device creation helpers are now using the new kunit device
    creation helpers. This change resulted in DRM helpers switching
    from using a platform_device, to a dedicated bus and device type
    used by kunit. kunit devices don't set DMA mask and this caused
    regression on some drm tests as they can't allocate DMA buffers.
    Fix this problem by setting DMA masks on the kunit device during
    initialization.
 
 -- KUnit has several macros which accept a log message, which can
    contain printf format specifiers. Some of these (the explicit
    log macros) already use the __printf() gcc attribute to ensure
    the format specifiers are valid, but those which could fail the
    test, and hence used __kunit_do_failed_assertion() behind the scenes,
    did not.
 
    These include: KUNIT_EXPECT_*_MSG(), KUNIT_ASSERT_*_MSG(), and
    KUNIT_FAIL()
 
    A 9 patch series adds the __printf() attribute, and fixes all of
    the issues uncovered.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmXpHUsACgkQCwJExA0N
 QxxucA//VQt+qPeYHysK75Vu9icGGD/apxwMQiKIygVf8Mxg9IN3L7mJDDRIJj3h
 kAY2yJG9MxiezvTK58pqV38A4l1pB2L/qqyDhdFbgD6XSgJS5beNh4Sne5gL2Okw
 lHJkkZGj+g65UKTIbzhMFVzPsHPvJLdJAG2GSJcS6m2GfaSCBoOmRvugZ1OM40d0
 ruxU6/ucR6u8jtN7fUTEuOSpfngJrUpBGer4i4+qC4mlI26XZ96oh35gFJTsE/CK
 2WAXqv+Y9WhdFTihMHUcy11CWEM7XrkSXdp5GsdEOvg2SpqyP6C7kVCZ9aYV0FFe
 LOo3D3rCA+WggMI5wJ51P0F3KkHu+mr+XBcl3TQ1de6mnX4+qZKJSyCt+69PzeIi
 3TGWGO9lKkFrZ4StYdfCy8M/ABIpWq/UqIQAPOYtpQAEkGSj7H6J4OK9SG3oH1Oa
 Jnn+yeTDr6B7v0gzkS57wBZg10uL+FG1MoOYqi7p1ZbyHc1lOPbx5AboPAh20cqW
 h4UEIg50aGvHT6VjAidzI/CfZDmgkusCEnip0c2HaCg+AMi03JD1lQZTOuM9S6os
 dkFrPoDGXyBQytyJmdWi6GKULn3DG8llFKDEGZnyYgszQS8hw21iqmK5/Kuit+sN
 oJpjdSmXwoG5h6R9hUWnz+NcjNe44F4P88agVyrBYk2nZu97IiY=
 =ilEb
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-kunit-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit updates from Shuah Khan:

 - fix to make kunit_bus_type const

 - kunit tool change to Print UML command

 - DRM device creation helpers are now using the new kunit device
   creation helpers. This change resulted in DRM helpers switching from
   using a platform_device, to a dedicated bus and device type used by
   kunit. kunit devices don't set DMA mask and this caused regression on
   some drm tests as they can't allocate DMA buffers. Fix this problem
   by setting DMA masks on the kunit device during initialization.

 - KUnit has several macros which accept a log message, which can
   contain printf format specifiers. Some of these (the explicit log
   macros) already use the __printf() gcc attribute to ensure the format
   specifiers are valid, but those which could fail the test, and hence
   used __kunit_do_failed_assertion() behind the scenes, did not.

   These include: KUNIT_EXPECT_*_MSG(), KUNIT_ASSERT_*_MSG(), and
   KUNIT_FAIL()

   A nine-patch series adds the __printf() attribute, and fixes all of
   the issues uncovered.

* tag 'linux_kselftest-kunit-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Annotate _MSG assertion variants with gnu printf specifiers
  drm: tests: Fix invalid printf format specifiers in KUnit tests
  drm/xe/tests: Fix printf format specifiers in xe_migrate test
  net: test: Fix printf format specifier in skb_segment kunit test
  rtc: test: Fix invalid format specifier.
  time: test: Fix incorrect format specifier
  lib: memcpy_kunit: Fix an invalid format specifier in an assertion msg
  lib/cmdline: Fix an invalid format specifier in an assertion msg
  kunit: test: Log the correct filter string in executor_test
  kunit: Setup DMA masks on the kunit device
  kunit: make kunit_bus_type const
  kunit: Mark filter* params as rw
  kunit: tool: Print UML command
2024-03-11 09:32:28 -07:00
Linus Torvalds
d451b075f7 linux_kselftest-next-6.9-rc1
This kselftest next update for Linux 6.9-rc1 consists of:
 
 -- livepatch restructuring to move the module out of lib to be
    built as a out-of-tree modules during kselftest build. This
    change makes it easier change, debug and rebuild the tests by
    running make on the selftests/livepatch directory, which is not
    currently possible since the modules on lib/livepatch are build
    and installed using the main makefile modules target.
 
 -- livepatch restructuring fixes for problems found by kernel test
    robot. The change skips the test if kernel-devel isn't installed
    (default value of KDIR), or if KDIR variable passed doesn't exists.
 
 -- resctrl test restructuring and new non-contiguous CBMs CAT test
 
 -- new ktap_helpers to print diagnostic messages, pass/fail tests
    based on exit code, abort test, and finish the test.
 
 -- a new test verify power supply properties.
 
 -- a new ftrace to exercise function tracer across cpu hotplug.
 
 -- timeout increase for mqueue test to allow the test to run on
    i3.metal AWS instances.
 
 -- minor spelling corrections in several tests.
 
 -- missing gitignore files and changes to existing gitignore files.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmXo/kUACgkQCwJExA0N
 Qxy0aBAAk0SLA1ZIAdlNjo5B13C7GC7rFRrtaai9ReXSvU/X5TX9sD5T9DIULKdj
 Mcqi+oaP88GPSUZS+bn7DyVxKyuvHg/f4jWQwqZ34WxK4K1K+yt+3YhTnHZx7ezU
 6WIbUsD1Zs7tXXI2v76riHFbD3pfxZ+AXQaf/1cXDi4SpIpLkiqyeYWoWN5Z2rtJ
 BwMzrI2RBiLMox4g8F3Ey4BX+bOIYiiJq5bdl7gJVKcp74VdU3S7IyOuXFbSdcFR
 xxmFMxWGFOgRzexW0fmDWLudD2dII0XQAExSsl5xMnR/lmSh+lHWheoNgphQl050
 VcLmrPugWVJSioe0fHEgmDQXe3lPqDtepUg921tIlWvCmtR3Ur6+GpILTbSvQ4qp
 SK+2pt7nGSAT2UkRO/6/TYFG3mELADvj6tglj0b1SkIXmNiF+7OZ+hJ2XqyM7peo
 Z7gtmSmpbAotxp64Jj8HsNZLpCX0xdaxoTMEWPoG09fwTXY7Hy03yoWDKBKB4MZ9
 jBtNXDolhpEQ/ppSGFnRPzXuNVapYX28UY0cwBBVgke5jwB8SUnBEr2dbNnVU1q0
 y5uxtj/EFQzxSynB3eM1us2OuXvr5TfAWmKVpyE/cNC3WreHeA+Y2kN1dzv8hgpw
 o4NbltdF8F+a9qQF9B1XvjVhqa5By1esS1jOg96cJgGseAVWiQs=
 =G+DO
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest update from Shuah Khan:

 - livepatch restructuring to move the module out of lib to be built as
   a out-of-tree modules during kselftest build. This makes it easier
   change, debug and rebuild the tests by running make on the
   selftests/livepatch directory, which is not currently possible since
   the modules on lib/livepatch are build and installed using the main
   makefile modules target.

 - livepatch restructuring fixes for problems found by kernel test
   robot. The change skips the test if kernel-devel isn't installed
   (default value of KDIR), or if KDIR variable passed doesn't exists.

 - resctrl test restructuring and new non-contiguous CBMs CAT test

 - new ktap_helpers to print diagnostic messages, pass/fail tests based
   on exit code, abort test, and finish the test.

 - a new test verify power supply properties.

 - a new ftrace to exercise function tracer across cpu hotplug.

 - timeout increase for mqueue test to allow the test to run on i3.metal
   AWS instances.

 - minor spelling corrections in several tests.

 - missing gitignore files and changes to existing gitignore files.

* tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits)
  kselftest: Add basic test for probing the rust sample modules
  selftests: lib.mk: Do not process TEST_GEN_MODS_DIR
  selftests: livepatch: Avoid running the tests if kernel-devel is missing
  selftests: livepatch: Add initial .gitignore
  selftests/resctrl: Add non-contiguous CBMs CAT test
  selftests/resctrl: Add resource_info_file_exists()
  selftests/resctrl: Split validate_resctrl_feature_request()
  selftests/resctrl: Add a helper for the non-contiguous test
  selftests/resctrl: Add test groups and name L3 CAT test L3_CAT
  selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy"
  selftests/mqueue: Set timeout to 180 seconds
  selftests/ftrace: Add test to exercize function tracer across cpu hotplug
  selftest: ftrace: fix minor typo in log
  selftests: thermal: intel: workload_hint: add missing gitignore
  selftests: thermal: intel: power_floor: add missing gitignore
  selftests: uevent: add missing gitignore
  selftests: Add test to verify power supply properties
  selftests: ktap_helpers: Add a helper to finish the test
  selftests: ktap_helpers: Add a helper to abort the test
  selftests: ktap_helpers: Add helper to pass/fail test based on exit code
  ...
2024-03-11 09:25:33 -07:00
Linus Torvalds
137e0ec05a KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
   avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not writable
   from userspace, so there would be no way to write to a read-only
   guest_memfd).
 
 - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
   clear that such VMs are purely for development and testing.
 
 - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
   is to support confidential VMs with deterministic private memory (SNP
   and TDX) only in the TDP MMU.
 
 - Fix a bug in a GUEST_MEMFD dirty logging test that caused false passes.
 
 x86 fixes:
 
 - Fix missing marking of a guest page as dirty when emulating an atomic access.
 
 - Check for mmu_notifier invalidation events before faulting in the pfn,
   and before acquiring mmu_lock, to avoid unnecessary work and lock
   contention with preemptible kernels (including CONFIG_PREEMPT_DYNAMIC
   in non-preemptible mode).
 
 - Disable AMD DebugSwap by default, it breaks VMSA signing and will be
   re-enabled with a better VM creation API in 6.10.
 
 - Do the cache flush of converted pages in svm_register_enc_region() before
   dropping kvm->lock, to avoid a race with unregistering of the same region
   and the consequent use-after-free issue.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmXskdYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroN1TAf/SUGf4QuYG7nnfgWDR+goFO6Gx7NE
 pJr3kAwv6d2f+qTlURfGjnX929pgZDLgoTkXTNeZquN6LjgownxMjBIpymVobvAD
 AKvqJS/ECpryuehXbeqlxJxJn+TrxJ5r4QeNILMHc3AOZoiUqM6xl3zFfXWDNWVo
 IazwT8P3d8wxiHAxv1eG6OVWHxbcg31068FVKRX3f/bWPbVwROJrPkCopmz2BJvU
 6KYdYcn2rkpDTEM3ouDC/6gxJ9vpSY3+nW7Q7dNtGtOH2+BddfSA6I0rphCQWCNs
 uXOxd5bDrC+KmkiULTPostuvwBgIm1k9wC2kW9A4P2VEf6Ay+ZHEdAOBJQ==
 =+MT/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "KVM GUEST_MEMFD fixes for 6.8:

   - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
     to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not
     writable from userspace, so there would be no way to write to a
     read-only guest_memfd).

   - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
     clear that such VMs are purely for development and testing.

   - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term
     plan is to support confidential VMs with deterministic private
     memory (SNP and TDX) only in the TDP MMU.

   - Fix a bug in a GUEST_MEMFD dirty logging test that caused false
     passes.

  x86 fixes:

   - Fix missing marking of a guest page as dirty when emulating an
     atomic access.

   - Check for mmu_notifier invalidation events before faulting in the
     pfn, and before acquiring mmu_lock, to avoid unnecessary work and
     lock contention with preemptible kernels (including
     CONFIG_PREEMPT_DYNAMIC in non-preemptible mode).

   - Disable AMD DebugSwap by default, it breaks VMSA signing and will
     be re-enabled with a better VM creation API in 6.10.

   - Do the cache flush of converted pages in svm_register_enc_region()
     before dropping kvm->lock, to avoid a race with unregistering of
     the same region and the consequent use-after-free issue"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  SEV: disable SEV-ES DebugSwap by default
  KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
  KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
  KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive
  KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
  KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
  KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
  KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
  KVM: x86: Mark target gfn of emulated atomic instruction as dirty
2024-03-10 09:27:39 -07:00
Linus Torvalds
df4793505a Including fixes from bpf, ipsec and netfilter.
No solution yet for the stmmac issue mentioned in the last PR,
 but it proved to be a lockdep false positive, not a blocker.
 
 Current release - regressions:
 
   - dpll: move all dpll<>netdev helpers to dpll code, fix build
     regression with old compilers
 
 Current release - new code bugs:
 
   - page_pool: fix netlink dump stop/resume
 
 Previous releases - regressions:
 
   - bpf: fix verifier to check bpf_func_state->callback_depth when pruning
        states as otherwise unsafe programs could get accepted
 
   - ipv6: avoid possible UAF in ip6_route_mpath_notify()
 
   - ice: reconfig host after changing MSI-X on VF
 
   - mlx5:
     - e-switch, change flow rule destination checking
     - add a memory barrier to prevent a possible null-ptr-deref
     - switch to using _bh variant of of spinlock where needed
 
 Previous releases - always broken:
 
   - netfilter: nf_conntrack_h323: add protection for bmp length out of range
 
   - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
 	program in CPU map which led to random xdp_md fields
 
   - xfrm: fix UDP encapsulation in TX packet offload
 
   - netrom: fix data-races around sysctls
 
   - ice:
     - fix potential NULL pointer dereference in ice_bridge_setlink()
     - fix uninitialized dplls mutex usage
 
   - igc: avoid returning frame twice in XDP_REDIRECT
 
   - i40e: disable NAPI right after disabling irqs when handling xsk_pool
 
   - geneve: make sure to pull inner header in geneve_rx()
 
   - sparx5: fix use after free inside sparx5_del_mact_entry
 
   - dsa: microchip: fix register write order in ksz8_ind_write8()
 
 Misc:
 
   -  selftests: mptcp: fixes for diag.sh
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmXptoYSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkK3IP+QGe1Q37l75YM8IPpihjNYvBTiP6VWv0
 3cKoI0kz2EF5zmt3RAPK1M/ea1GY1L4Fsa/tdV0b9BzP9xC3si7IdFLZLqXh5tUX
 tW5m1LIoPqYLXE2i7qtOS5omMuCqKm2gM7TURarJA0XsAGyu645bYiJeT5dybnZQ
 AuAsXKj9RM3AkcLiqB4PZjdDuG9vIQLi2wSIybP4KFGqY7UMRlkRKFYlu2rpF29s
 XPlR671chaX90sP4bNwf+qVr81Ebu9APmDA0a9tVFDkgEqhPezpRDGHr2Kj+W25s
 j3XXwoygL6gIpJKzRgHsugAaZjla82DpCuygPOcmtTEEtHmF6fn8mBebjY/QDL6w
 ibbcOYJpzPFccRfMyHiiwzjqcaj+Zc58DktFf3H4EnKJULPralhKyMoyPngiAo1Y
 wNIGlWR8SNLhJzyZMeFPMKsz3RnLiC5vMdXMFfZdyH1RHHib5L+8AVogya+SaVkF
 1J1DrrShOEddvlrbZbM8c/03WHkAJXSRD34oHW9c3PkZscSzHmB1xqI1bER6sc5U
 5FjuDnsQDQ61pa6pip2Ug71UOw6ZAwZJs6AgestI49caDvUpSKI7jg/F6Dle6wNT
 p2KVUWFoz5BQBXG8Ut7yWpWvoEmaHe0cEn03rqZSYFnltWgkNvWMRMhkzuroOHWO
 UmOnuVIQH9Vh
 =0bH0
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf, ipsec and netfilter.

  No solution yet for the stmmac issue mentioned in the last PR, but it
  proved to be a lockdep false positive, not a blocker.

  Current release - regressions:

   - dpll: move all dpll<>netdev helpers to dpll code, fix build
     regression with old compilers

  Current release - new code bugs:

   - page_pool: fix netlink dump stop/resume

  Previous releases - regressions:

   - bpf: fix verifier to check bpf_func_state->callback_depth when
     pruning states as otherwise unsafe programs could get accepted

   - ipv6: avoid possible UAF in ip6_route_mpath_notify()

   - ice: reconfig host after changing MSI-X on VF

   - mlx5:
       - e-switch, change flow rule destination checking
       - add a memory barrier to prevent a possible null-ptr-deref
       - switch to using _bh variant of of spinlock where needed

  Previous releases - always broken:

   - netfilter: nf_conntrack_h323: add protection for bmp length out of
     range

   - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
     program in CPU map which led to random xdp_md fields

   - xfrm: fix UDP encapsulation in TX packet offload

   - netrom: fix data-races around sysctls

   - ice:
       - fix potential NULL pointer dereference in ice_bridge_setlink()
       - fix uninitialized dplls mutex usage

   - igc: avoid returning frame twice in XDP_REDIRECT

   - i40e: disable NAPI right after disabling irqs when handling
     xsk_pool

   - geneve: make sure to pull inner header in geneve_rx()

   - sparx5: fix use after free inside sparx5_del_mact_entry

   - dsa: microchip: fix register write order in ksz8_ind_write8()

  Misc:

   - selftests: mptcp: fixes for diag.sh"

* tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  net: pds_core: Fix possible double free in error handling path
  netrom: Fix data-races around sysctl_net_busy_read
  netrom: Fix a data-race around sysctl_netrom_link_fails_count
  netrom: Fix a data-race around sysctl_netrom_routing_control
  netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
  netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
  netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
  netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
  netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
  netrom: Fix a data-race around sysctl_netrom_transport_timeout
  netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
  netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
  netrom: Fix a data-race around sysctl_netrom_default_path_quality
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
  netfilter: nft_ct: fix l3num expectations with inet pseudo family
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
  net/rds: fix WARNING in rds_conn_connect_if_down
  net: dsa: microchip: fix register write order in ksz8_ind_write8()
  ...
2024-03-07 09:23:33 -08:00
Daniel Borkmann
0bfc0336e1 selftests/bpf: Fix up xdp bonding test wrt feature flags
Adjust the XDP feature flags for the bond device when no bond slave
devices are attached. After 9b0ed890ac2a ("bonding: do not report
NETDEV_XDP_ACT_XSK_ZEROCOPY"), the empty bond device must report 0
as flags instead of NETDEV_XDP_ACT_MASK.

  # ./vmtest.sh -- ./test_progs -t xdp_bond
  [...]
  [    3.983311] bond1 (unregistering): (slave veth1_1): Releasing backup interface
  [    3.995434] bond1 (unregistering): Released all slaves
  [    4.022311] bond2: (slave veth2_1): Releasing backup interface
  #507/1   xdp_bonding/xdp_bonding_attach:OK
  #507/2   xdp_bonding/xdp_bonding_nested:OK
  #507/3   xdp_bonding/xdp_bonding_features:OK
  #507/4   xdp_bonding/xdp_bonding_roundrobin:OK
  #507/5   xdp_bonding/xdp_bonding_activebackup:OK
  #507/6   xdp_bonding/xdp_bonding_xor_layer2:OK
  #507/7   xdp_bonding/xdp_bonding_xor_layer23:OK
  #507/8   xdp_bonding/xdp_bonding_xor_layer34:OK
  #507/9   xdp_bonding/xdp_bonding_redirect_multi:OK
  #507     xdp_bonding:OK
  Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED
  [    4.185255] bond2 (unregistering): Released all slaves
  [...]

Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Message-ID: <20240305090829.17131-2-daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-05 16:19:42 -08:00
Eduard Zingerman
5c2bc5e2f8 selftests/bpf: test case for callback_depth states pruning logic
The test case was minimized from mailing list discussion [0].
It is equivalent to the following C program:

    struct iter_limit_bug_ctx { __u64 a; __u64 b; __u64 c; };

    static __naked void iter_limit_bug_cb(void)
    {
    	switch (bpf_get_prandom_u32()) {
    	case 1:  ctx->a = 42; break;
    	case 2:  ctx->b = 42; break;
    	default: ctx->c = 42; break;
    	}
    }

    int iter_limit_bug(struct __sk_buff *skb)
    {
    	struct iter_limit_bug_ctx ctx = { 7, 7, 7 };

    	bpf_loop(2, iter_limit_bug_cb, &ctx, 0);
    	if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7)
    	  asm volatile("r1 /= 0;":::"r1");
    	return 0;
    }

The main idea is that each loop iteration changes one of the state
variables in a non-deterministic manner. Hence it is premature to
prune the states that have two iterations left comparing them to
states with one iteration left.
E.g. {{7,7,7}, callback_depth=0} can reach state {42,42,7},
while {{7,7,7}, callback_depth=1} can't.

[0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240222154121.6991-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-05 16:15:56 -08:00
Kees Cook
1710742994 selftests/exec: Perform script checks with /bin/bash
It seems some shells linked to /bin/sh don't have consistent behavior
with error codes on execution failures. Explicitly use /bin/bash so that
"not found" errors are correctly generated. Repeating the comment from
the test:

/*
 * Execute as a long pathname relative to "/".  If this is a script,
 * the interpreter will launch but fail to open the script because its
 * name ("/dev/fd/5/xxx....") is bigger than PATH_MAX.
 *
 * The failure code is usually 127 (POSIX: "If a command is not found,
 * the exit status shall be 127."), but some systems give 126 (POSIX:
 * "If the command name is found, but it is not an executable utility,
 * the exit status shall be 126."), so allow either.
 */

Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Closes: https://lore.kernel.org/lkml/02c8bf8e-1934-44ab-a886-e065b37366a7@collabora.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kselftest@vger.kernel.org
2024-03-05 16:06:01 -08:00
Laura Nao
5d94da7ff0 kselftest: Add basic test for probing the rust sample modules
Add new basic kselftest that checks if the available rust sample modules
can be added and removed correctly.

Signed-off-by: Laura Nao <laura.nao@collabora.com>
Reviewed-by: Sergio Gonzalez Collado <sergio.collado@gmail.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-04 13:13:04 -07:00
Matthieu Baerts (NGI0)
f05d2283d1 selftests: mptcp: diag: avoid extra waiting
When creating a lot of listener sockets, it is enough to wait only for
the last one, like we are doing before in diag.sh for other subtests.

If we do a check for each listener sockets, each time listing all
available sockets, it can take a very long time in very slow
environments, at the point we can reach some timeout.

When using the debug kconfig, the waiting time switches from more than
8 sec to 0.1 sec on my side. In slow/busy environments, and with a poll
timeout set to 30 ms, the waiting time could go up to ~100 sec because
the listener socket would timeout and stop, while the script would still
be checking one by one if all sockets are ready. The result is that
after having waited for everything to be ready, all sockets have been
stopped due to a timeout, and it is too late for the script to check how
many there were.

While at it, also removed ss options we don't need: we only need the
filtering options, to count how many listener sockets have been created.
We don't need to ask ss to display internal TCP information, and the
memory if the output is dropped by the 'wc -l' command anyway.

Fixes: b4b51d36bbaa ("selftests: mptcp: explicitly trigger the listener diag code-path")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/r/20240301063754.2ecefecf@kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04 13:05:15 +00:00
Geliang Tang
45bcc03465 selftests: mptcp: diag: return KSFT_FAIL not test_cnt
The test counter 'test_cnt' should not be returned in diag.sh, e.g. what
if only the 4th test fail? Will do 'exit 4' which is 'exit ${KSFT_SKIP}',
the whole test will be marked as skipped instead of 'failed'!

So we should do ret=${KSFT_FAIL} instead.

Fixes: df62f2ec3df6 ("selftests/mptcp: add diag interface tests")
Cc: stable@vger.kernel.org
Fixes: 42fb6cddec3b ("selftests: mptcp: more stable diag tests")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04 13:05:15 +00:00
Linus Torvalds
e4f7900095 powerpc fixes for 6.8 #5
- Fix IOMMU table initialisation when doing kdump over SR-IOV.
 
  - Fix incorrect RTAS function name for resetting TCE tables.
 
  - Fix fpu_signal selftest failures since a recent change.
 
 Thanks to: Gaurav Batra, Nathan Lynch.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmXkauwTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLTSD/4/boNe5MtrGgk6RtxnyWJv/p9KCMcC
 rRTa5pSR6HFVK3m89V17O0onL8aKyIljO5P3rDS8X4SUhr41Z9b9/FUoHv277E7a
 f7hgX4/901DYiMLJEt9jkfmM30IxTYkPmlft0Uus/NiesNdTcdQOO4UScSysnZac
 0HM1POp32KSC2HQRc/i+WIshRnaZcC+f0PPTU/qfS/u7/pwK4eekWdayLNvvvSvH
 TfjV5Hu4JVDF7hBLsoY4PdqVQGVD3t3d1D5+UrhHuYzPx+Afc8rIDfJx/+o339W6
 ZTXrRPOfiticfhHQvMX1AgsYr3/A0BbZj/wCvv+pPsCjZPfox9XsW6CgiMKUxVDf
 ifrBhvNx0fICMf+cEjH9q2dcGwMba7dZTX5HBlSXR4xLeNitUI8pt6bYCaqw7UjH
 ohwl9aAI7Rl1hcW6qBaviKaIDqhbmj3N4B4ZdgMAKj2gnovbF9gUSG3ARLwEjsqB
 qfd0c3x6UUThj4vYfGC/iI1z1LCXC8myqof6EGArLTc13R9vFv+ycx5FwERvAxtY
 ALNBh6LMIkKI5z8ZrGWULoXHoS2QrE1SXpQ5ooH2g9n+vLibd37JdJNcrEMf/TJZ
 PluhRCJcEWkw98aUSzIoRFIFpZ2wMg/uzg3KZePhRus3GhgttTqFTbrcKHhyONBO
 pq/EDB8FizZ9/A==
 =MSDF
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix IOMMU table initialisation when doing kdump over SR-IOV

 - Fix incorrect RTAS function name for resetting TCE tables

 - Fix fpu_signal selftest failures since a recent change

Thanks to Gaurav Batra and Nathan Lynch.

* tag 'powerpc-6.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix fpu_signal failures
  powerpc/rtas: use correct function name for resetting TCE tables
  powerpc/pseries/iommu: IOMMU table is not initialized for kdump over SR-IOV
2024-03-03 09:47:19 -08:00
Michael Ellerman
380cb2f4df selftests/powerpc: Fix fpu_signal failures
My recent commit e5d00aaac651 ("selftests/powerpc: Check all FPRs in
fpu_preempt") inadvertently broke the fpu_signal test.

It needs to take into account that fpu_preempt now loads 32 FPRs, so
enlarge darray.

Also use the newly added randomise_darray() to properly randomise darray.

Finally the checking done in signal_fpu_sig() needs to skip checking
f30/f31, because they are used as scratch registers in check_all_fprs(),
called by preempt_fpu(), and so could hold other values when the signal
is taken.

Fixes: e5d00aaac651 ("selftests/powerpc: Check all FPRs in fpu_preempt")
Reported-by: Spoorthy <spoorthy@linux.ibm.com>
Depends-on: 2ba107f6795d ("selftests/powerpc: Generate better bit patterns for FPU tests")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240301101035.1230024-1-mpe@ellerman.id.au
2024-03-01 22:15:30 +11:00
Linus Torvalds
87adedeba5 Including fixes from bluetooth, WiFi and netfilter.
We have one outstanding issue with the stmmac driver, which may
 be a LOCKDEP false positive, not a blocker.
 
 Current release - regressions:
 
  - netfilter: nf_tables: re-allow NFPROTO_INET in
    nft_(match/target)_validate()
 
  - eth: ionic: fix error handling in PCI reset code
 
 Current release - new code bugs:
 
  - eth: stmmac: complete meta data only when enabled, fix null-deref
 
  - kunit: fix again checksum tests on big endian CPUs
 
 Previous releases - regressions:
 
  - veth: try harder when allocating queue memory
 
  - Bluetooth:
    - hci_bcm4377: do not mark valid bd_addr as invalid
    - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST
 
 Previous releases - always broken:
 
  - info leak in __skb_datagram_iter() on netlink socket
 
  - mptcp:
    - map v4 address to v6 when destroying subflow
    - fix potential wake-up event loss due to sndbuf auto-tuning
    - fix double-free on socket dismantle
 
  - wifi: nl80211: reject iftype change with mesh ID change
 
  - fix small out-of-bound read when validating netlink be16/32 types
 
  - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
 
  - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()
 
  - ip_tunnel: prevent perpetual headroom growth with huge number of
    tunnels on top of each other
 
  - mctp: fix skb leaks on error paths of mctp_local_output()
 
  - eth: ice: fixes for DPLL state reporting
 
  - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF
 
  - eth: dpaa: accept phy-interface-type = "10gbase-r" in the device tree
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXg6ioACgkQMUZtbf5S
 IrupHQ/+Jt9OK8AYiUUBpeE0E0pb4yHS4KuiGWChx2YECJCeeeU6Ko4gaPI6+Nyv
 mMh/3sVsLnX7w4OXp2HddMMiFGbd1ufIptS0T/EMhHbbg1h7Qr1jhpu8aM8pb9jM
 5DwjfTZijDaW84Oe+Kk9BOonxR6A+Df27O3PSEUtLk4JCy5nwEwUr9iCxgCla499
 3aLu5eWRw8PTSsJec4BK6hfCKWiA/6oBHS1pQPwYvWuBWFZe8neYHtvt3LUwo1HR
 DwN9gtMiGBzYSSQmk8V1diGIokn80G5Krdq4gXbhsLxIU0oEJA7ltGpqasxy/OCs
 KGLHcU5wCd3j42gZOzvBzzzj8RQyd2ZekyvCu7B5Rgy3fx6JWI1jLalsQ/tT9yQg
 VJgFM2AZBb1EEAw/P2DkVQ8Km8ZuVlGtzUoldvIY1deP1/LZFWc0PftA6ndT7Ldl
 wQwKPQtJ5DMzqEe3mwSjFkL+AiSmcCHCkpnGBIi4c7Ek2/GgT1HeUMwJPh0mBftz
 smlLch3jMH2YKk7AmH7l9o/Q9ypgvl+8FA+icLaX0IjtSbzz5Q7gNyhgE0w1Hdb2
 79q6SE3ETLG/dn75XMA1C0Wowrr60WKHwagMPUl57u9bchfUT8Ler/4Sd9DWn8Vl
 55YnGPWMLCkxgpk+DHXYOWjOBRszCkXrAA71NclMnbZ5cQ86JYY=
 =T2ty
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, WiFi and netfilter.

  We have one outstanding issue with the stmmac driver, which may be a
  LOCKDEP false positive, not a blocker.

  Current release - regressions:

   - netfilter: nf_tables: re-allow NFPROTO_INET in
     nft_(match/target)_validate()

   - eth: ionic: fix error handling in PCI reset code

  Current release - new code bugs:

   - eth: stmmac: complete meta data only when enabled, fix null-deref

   - kunit: fix again checksum tests on big endian CPUs

  Previous releases - regressions:

   - veth: try harder when allocating queue memory

   - Bluetooth:
      - hci_bcm4377: do not mark valid bd_addr as invalid
      - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST

  Previous releases - always broken:

   - info leak in __skb_datagram_iter() on netlink socket

   - mptcp:
      - map v4 address to v6 when destroying subflow
      - fix potential wake-up event loss due to sndbuf auto-tuning
      - fix double-free on socket dismantle

   - wifi: nl80211: reject iftype change with mesh ID change

   - fix small out-of-bound read when validating netlink be16/32 types

   - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back

   - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()

   - ip_tunnel: prevent perpetual headroom growth with huge number of
     tunnels on top of each other

   - mctp: fix skb leaks on error paths of mctp_local_output()

   - eth: ice: fixes for DPLL state reporting

   - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF

   - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device
     tree"

* tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
  dpll: fix build failure due to rcu_dereference_check() on unknown type
  kunit: Fix again checksum tests on big endian CPUs
  tls: fix use-after-free on failed backlog decryption
  tls: separate no-async decryption request handling from async
  tls: fix peeking with sync+async decryption
  tls: decrement decrypt_pending if no async completion will be called
  gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
  net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
  igb: extend PTP timestamp adjustments to i211
  rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
  tools: ynl: fix handling of multiple mcast groups
  selftests: netfilter: add bridge conntrack + multicast test case
  netfilter: bridge: confirm multicast packets before passing them up the stack
  netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
  Bluetooth: qca: Fix triggering coredump implementation
  Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
  Bluetooth: qca: Fix wrong event type for patch config command
  Bluetooth: Enforce validation on max value of connection interval
  Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
  Bluetooth: mgmt: Fix limited discoverable off timeout
  ...
2024-02-29 12:40:20 -08:00
Paolo Abeni
b611b776a9 netfilter pull request 24-02-29
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXfx4MACgkQ1V2XiooU
 IOSX+g//UHBfqYJASMMQJpWdMwWe7tB2m1LRzLYI+WUdUenK/MEylS7rNp/bwGkW
 42eDeGA0eov7kYNOY0rLB7lQBdUHwCpNZkdetTWFV9eHcEKA8cQ6OqcD1G8i41qg
 sCvObS+K/hq3f7fX9bJ9RvS5RvYoeuS1trw4mezhHwPS+1sj80v4FdqDOFCUqiT3
 65BfeoV65pVVteCRmJQxeeZ4Bepd4LRXW+VVyr3uXli/H87jqQOFxsOTqyXNEXIq
 jMYL0jnbYs0ARbNYXRYySLYQCWmbVXpfnt4JIBRP0S1e6Prby2hqUwJBeyNcXBAu
 CwBTjCEdLIV5G25EWTZWBYQdihct58s0GDRX078Sj/AozQJAWTxBEn0QLhKy2gvH
 2uspA0S2z1PS69hUvHfgGjDiBKw41T2O6D/12NBxI1DOYDLsk7ApE5tKqynUnUIj
 pOLUiolFnJd4JKnGZ/CTATpGi8KX/iSWdX8OElCpGOvKQgZyU8IXrydjcHnJz7b4
 AdsIfpjjZSdz2VU6ZmzLYJrWf6ukAchO5kYL2FIJt/eFEyGqDfwGL36FIO7YGcnu
 NPHtIF23Ldl+GIesc9UT08k+IOsfR9LMbUduJC6Dg63FDrEkFfOv+wXA1eURW3kS
 tq+eWs+QjlCeWG9FgW2NHj3+rGyjQbGOe+v1yTgl1x/BhXNV1cM=
 =2BRo
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.

Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
packets.

There is a day 0 bug in br_netfilter when used with connection tracking.

Conntrack assumes that an nf_conn structure that is not yet added to
hash table ("unconfirmed"), is only visible by the current cpu that is
processing the sk_buff.

For bridge this isn't true, sk_buff can get cloned in between, and
clones can be processed in parallel on different cpu.

This patch disables NAT and conntrack helpers for multicast packets.

Patch #3 adds a selftest to cover for the br_netfilter bug.

netfilter pull request 24-02-29

* tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: netfilter: add bridge conntrack + multicast test case
  netfilter: bridge: confirm multicast packets before passing them up the stack
  netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
====================

Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29 12:16:08 +01:00
Jakub Kicinski
b6c65eb20f tools: ynl: fix handling of multiple mcast groups
We never increment the group number iterator, so all groups
get recorded into index 0 of the mcast_groups[] array.

As a result YNL can only handle using the last group.
For example using the "netdev" sample on kernel with
page pool commands results in:

  $ ./samples/netdev
  YNL: Multicast group 'mgmt' not found

Most families have only one multicast group, so this hasn't
been noticed. Plus perhaps developers usually test the last
group which would have worked.

Fixes: 86878f14d71a ("tools: ynl: user space helpers")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 15:24:34 -08:00
Florian Westphal
6523cf516c selftests: netfilter: add bridge conntrack + multicast test case
Add test case for multicast packet confirm race.
Without preceding patch, this should result in:

 WARNING: CPU: 0 PID: 38 at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm+0x3ed/0x5f0
 Workqueue: events_unbound macvlan_process_broadcast
 RIP: 0010:__nf_conntrack_confirm+0x3ed/0x5f0
  ? __nf_conntrack_confirm+0x3ed/0x5f0
  nf_confirm+0x2ad/0x2d0
  nf_hook_slow+0x36/0xd0
  ip_local_deliver+0xce/0x110
  __netif_receive_skb_one_core+0x4f/0x70
  process_backlog+0x8c/0x130
  [..]

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-29 00:22:48 +01:00
Marcos Paulo de Souza
539cd3f4da selftests: lib.mk: Do not process TEST_GEN_MODS_DIR
The directory itself doesn't need have path handling, since it's only to
mean where is the directory that contains modules to be built.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-27 17:03:42 -07:00
Marcos Paulo de Souza
54ee352679 selftests: livepatch: Avoid running the tests if kernel-devel is missing
By checking if KDIR is a valid directory we can safely skip the tests if
kernel-devel isn't installed (default value of KDIR), or if KDIR
variable passed doesn't exists.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402191417.XULH88Ct-lkp@intel.com/
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-27 17:03:35 -07:00
Marcos Paulo de Souza
8ab37b0d98 selftests: livepatch: Add initial .gitignore
Ignore the binary used to test livepatching a syscall.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-27 17:03:28 -07:00
Mickaël Salaün
ee8bd4a428 kunit: tool: Print UML command
As for the Qemu command, print the command used to run tests with UML.

Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-27 14:46:34 -07:00
Paolo Abeni
b4b51d36bb selftests: mptcp: explicitly trigger the listener diag code-path
The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.

Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.

Be careful to cope with very slow environments.

Note that we don't need an explicit timeout on the mptcp_connect
subprocess to cope with eventual bug/hang-up as the final cleanup
terminating the child processes will take care of that.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:56 -08:00
Geliang Tang
9480f388a2 selftests: mptcp: join: add ss mptcp support check
Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP
sockets. So it must be checked if ss tool supports MPTCP in this script.

Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:56 -08:00
Geliang Tang
7092dbee23 selftests: mptcp: rm subflow with v4/v4mapped addr
Now both a v4 address and a v4-mapped address are supported when
destroying a userspace pm subflow, this patch adds a second subflow
to "userspace pm add & remove address" test, and two subflows could
be removed two different ways, one with the v4mapped and one with v4.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387
Fixes: 48d73f609dcc ("selftests: mptcp: update userspace pm addr tests")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:41:55 -08:00
Jakub Kicinski
1a825e4cdf selftests: net: veth: test syncing GRO and XDP state while device is down
Test that we keep GRO flag in sync when XDP is disabled while
the device is closed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:34:13 +00:00
Linus Torvalds
ac389bc0ca cxl fixes for 6.8-rc6
- Fix NUMA initialization from ACPI CEDT.CFMWS
 
 - Fix region assembly failures due to async init order
 
 - Fix / simplify export of qos_class information
 
 - Fix cxl_acpi initialization vs single-window-init failures
 
 - Fix handling of repeated 'pci_channel_io_frozen' notifications
 
 - Workaround platforms that violate host-physical-address ==
   system-physical address assumptions
 
 - Defer CXL CPER notification handling to v6.9
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZdpH9gAKCRDfioYZHlFs
 ZwZlAQDE+PxTJnjCXDVnDylVF4yeJF2G/wSkH1CFVFVxa0OjhAD/ZFScS/nz/76l
 1IYYiiLqmVO5DdmJtfKtq16m7e1cZwc=
 =PuPF
 -----END PGP SIGNATURE-----

Merge tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dan Williams:
 "A collection of significant fixes for the CXL subsystem.

  The largest change in this set, that bordered on "new development", is
  the fix for the fact that the location of the new qos_class attribute
  did not match the Documentation. The fix ends up deleting more code
  than it added, and it has a new unit test to backstop basic errors in
  this interface going forward. So the "red-diff" and unit test saved
  the "rip it out and try again" response.

  In contrast, the new notification path for firmware reported CXL
  errors (CXL CPER notifications) has a locking context bug that can not
  be fixed with a red-diff. Given where the release cycle stands, it is
  not comfortable to squeeze in that fix in these waning days. So, that
  receives the "back it out and try again later" treatment.

  There is a regression fix in the code that establishes memory NUMA
  nodes for platform CXL regions. That has an ack from x86 folks. There
  are a couple more fixups for Linux to understand (reassemble) CXL
  regions instantiated by platform firmware. The policy around platforms
  that do not match host-physical-address with system-physical-address
  (i.e. systems that have an address translation mechanism between the
  address range reported in the ACPI CEDT.CFMWS and endpoint decoders)
  has been softened to abort driver load rather than teardown the memory
  range (can cause system hangs). Lastly, there is a robustness /
  regression fix for cases where the driver would previously continue in
  the face of error, and a fixup for PCI error notification handling.

  Summary:

   - Fix NUMA initialization from ACPI CEDT.CFMWS

   - Fix region assembly failures due to async init order

   - Fix / simplify export of qos_class information

   - Fix cxl_acpi initialization vs single-window-init failures

   - Fix handling of repeated 'pci_channel_io_frozen' notifications

   - Workaround platforms that violate host-physical-address ==
     system-physical address assumptions

   - Defer CXL CPER notification handling to v6.9"

* tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/acpi: Fix load failures due to single window creation failure
  acpi/ghes: Remove CXL CPER notifications
  cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
  cxl/test: Add support for qos_class checking
  cxl: Fix sysfs export of qos_class for memdev
  cxl: Remove unnecessary type cast in cxl_qos_class_verify()
  cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
  cxl/region: Allow out of order assembly of autodiscovered regions
  cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
  x86/numa: Fix the sort compare func used in numa_fill_memblks()
  x86/numa: Fix the address overlap check in numa_fill_memblks()
  cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
2024-02-24 15:53:40 -08:00
Maciej Wieczor-Retman
ae638551ab selftests/resctrl: Add non-contiguous CBMs CAT test
Add tests for both L2 and L3 CAT to verify the return values
generated by writing non-contiguous CBMs don't contradict the
reported non-contiguous support information.

Use a logical XOR to confirm return value of write_schemata() and
non-contiguous CBMs support information match.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-23 15:19:25 -07:00
Maciej Wieczor-Retman
74e76cbabd selftests/resctrl: Add resource_info_file_exists()
Feature checking done by resctrl_mon_feature_exists() covers features
represented by the feature name presence inside the 'mon_features' file
in /sys/fs/resctrl/info/L3_MON directory. There exists a different way
to represent feature support and that is by the presence of 0 or 1 in a
single file in the info/resource directory. In this case the filename
represents what feature support is being indicated.

Add a generic function to check file presence in the
/sys/fs/resctrl/info/<RESOURCE> directory.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-23 15:19:20 -07:00
Maciej Wieczor-Retman
0061641648 selftests/resctrl: Split validate_resctrl_feature_request()
validate_resctrl_feature_request() is used to test both if a resource is
present in the info directory, and if a passed monitoring feature is
present in the mon_features file.

Refactor validate_resctrl_feature_request() into two smaller functions
that each accomplish one check to give feature checking more
granularity:
- Resource directory presence in the /sys/fs/resctrl/info directory.
- Feature name presence in the /sys/fs/resctrl/info/<RESOURCE>/mon_features
  file.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-23 15:19:15 -07:00
Maciej Wieczor-Retman
e331ac141f selftests/resctrl: Add a helper for the non-contiguous test
The CAT non-contiguous selftests have to read the file responsible for
reporting support of non-contiguous CBMs in kernel (resctrl). Then the
test compares if that information matches what is reported by CPUID
output.

Add a generic helper function to read an unsigned number from
/sys/fs/resctrl/info/<RESOURCE>/<FILE>.

Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-23 15:19:09 -07:00
Ilpo Järvinen
5339792bd6 selftests/resctrl: Add test groups and name L3 CAT test L3_CAT
To select test to run -t parameter can be used. However, -t cat
currently maps to L3 CAT test which will be confusing after more CAT
related tests will be added.

Allow selecting tests as groups and call L3 CAT test "L3_CAT", "CAT"
group will enable all CAT related tests.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-23 15:19:03 -07:00
Linus Torvalds
95e73fb16a 14 hotfixes. 10 are cc:stable and the remainder address post-6.7 issues
or aren't considered appropriate for backporting.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZdfSoQAKCRDdBJ7gKXxA
 jtdcAQCEMwrTnSVOfP0WxixL11HK8a1bJdezHBLWFUDscm4BXgD9EV4mgQ2yPDA6
 ka+Lf9MucZSpJ2QYZA4GoDqP4wefawA=
 =zI6b
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "A batch of MM (and one non-MM) hotfixes.

  Ten are cc:stable and the remainder address post-6.7 issues or aren't
  considered appropriate for backporting"

* tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kasan: guard release_free_meta() shadow access with kasan_arch_is_ready()
  mm/damon/lru_sort: fix quota status loss due to online tunings
  mm/damon/reclaim: fix quota stauts loss due to online tunings
  MAINTAINERS: mailmap: update Shakeel's email address
  mm/damon/sysfs-schemes: handle schemes sysfs dir removal before commit_schemes_quota_goals
  mm: memcontrol: clarify swapaccount=0 deprecation warning
  mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array
  mm/zswap: invalidate duplicate entry when !zswap_enabled
  lib/Kconfig.debug: TEST_IOV_ITER depends on MMU
  mm/swap: fix race when skipping swapcache
  mm/swap_state: update zswap LRU's protection range with the folio locked
  selftests/mm: uffd-unit-test check if huge page size is 0
  mm/damon/core: check apply interval in damon_do_apply_schemes()
  mm: zswap: fix missing folio cleanup in writeback race path
2024-02-23 09:43:21 -08:00
Sean Christopherson
2dfd238303 KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive
Extend set_memory_region_test's invalid flags subtest to verify that
GUEST_MEMFD is incompatible with READONLY.  GUEST_MEMFD doesn't currently
support writes from userspace and KVM doesn't support emulated MMIO on
private accesses, and so KVM is supposed to reject the GUEST_MEMFD+READONLY
in order to avoid configuration that KVM can't support.

Link: https://lore.kernel.org/r/20240222190612.2942589-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-22 17:07:06 -08:00
Sean Christopherson
63e5c5a105 KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
Actually create a GUEST_MEMFD instance and pass it to KVM when doing
negative tests for KVM_SET_USER_MEMORY_REGION2 + KVM_MEM_GUEST_MEMFD.
Without a valid GUEST_MEMFD file descriptor, KVM_SET_USER_MEMORY_REGION2
will always fail with -EINVAL, resulting in false passes for any and all
tests of illegal combinations of KVM_MEM_GUEST_MEMFD and other flags.

Fixes: 5d74316466f4 ("KVM: selftests: Add a memory region subtest to validate invalid flags")
Link: https://lore.kernel.org/r/20240222190612.2942589-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-22 17:07:06 -08:00
Linus Torvalds
4c36fbb46f iommufd for 6.8 rc
- Fix dirty tracking bitmap collection when using reporting bitmaps that
   are not neatly aligned to u64's or match the IO page table radix tree
   layout.
 
 - Add self tests to cover the cases that were found to be broken.
 
 - Add missing enforcement of invalidation type in the uapi.
 
 - Fix selftest config generation
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZddJHAAKCRCFwuHvBreF
 YaRyAQCuywXMOQsPcTXZk+bepQ0EacRZKfyPrIcKMaHC1QLwKgEA9ApiVSbai0Q+
 5IFX+rWrLUB4jiH5D12kmfhxgydFDAk=
 =CCxs
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:

 - Fix dirty tracking bitmap collection when using reporting bitmaps
   that are not neatly aligned to u64's or match the IO page table radix
   tree layout.

 - Add self tests to cover the cases that were found to be broken.

 - Add missing enforcement of invalidation type in the uapi.

 - Fix selftest config generation

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  selftests/iommu: fix the config fragment
  iommufd: Reject non-zero data_type if no data_len is provided
  iommufd/iova_bitmap: Consider page offset for the pages to be pinned
  iommufd/selftest: Add mock IO hugepages tests
  iommufd/selftest: Hugepage mock domain support
  iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()
  iommufd/selftest: Refactor dirty bitmap tests
  iommufd/iova_bitmap: Handle recording beyond the mapped pages
  iommufd/selftest: Test u64 unaligned bitmaps
  iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
  iommufd/iova_bitmap: Bounds check mapped::pages access
2024-02-22 11:53:09 -08:00
Linus Torvalds
6714ebb922 Including fixes from bpf and netfilter.
Current release - regressions:
 
   - af_unix: fix another unix GC hangup
 
 Previous releases - regressions:
 
   - core: fix a possible AF_UNIX deadlock
 
   - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready()
 
   - netfilter: nft_flow_offload: release dst in case direct xmit path is used
 
   - bridge: switchdev: ensure MDB events are delivered exactly once
 
   - l2tp: pass correct message length to ip6_append_data
 
   - dccp/tcp: unhash sk from ehash for tb2 alloc failure after check_estalblished()
 
   - tls: fixes for record type handling with PEEK
 
   - devlink: fix possible use-after-free and memory leaks in devlink_init()
 
 Previous releases - always broken:
 
   - bpf: fix an oops when attempting to read the vsyscall
   	 page through bpf_probe_read_kernel
 
   - sched: act_mirred: use the backlog for mirred ingress
 
   - netfilter: nft_flow_offload: fix dst refcount underflow
 
   - ipv6: sr: fix possible use-after-free and null-ptr-deref
 
   - mptcp: fix several data races
 
   - phonet: take correct lock to peek at the RX queue
 
 Misc:
 
   - handful of fixes and reliability improvements for selftests
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmXXKMMSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkmgAQAIV2NAVEvHVBtnm0Df9PuCcHQx6i9veS
 tGxOZMVwb5ePFI+dpiNyyn61koEiRuFLOm66pfJAuT5j5z6m4PEFfPZgtiVpCHVK
 4sz4UD4+jVLmYijv+YlWkPU3RWR0RejSkDbXwY5Y9Io/DWHhA2iq5IyMy2MncUPY
 dUc12ddEsYRH60Kmm2/96FcdbHw9Y64mDC8tIeIlCAQfng4U98EXJbCq9WXsPPlW
 vjwSKwRG76QGDugss9XkatQ7Bsva1qTobFGDOvBMQpMt+dr81pTGVi0c1h/drzvI
 EJaDO8jJU3Xy0pQ80beboCJ1KlVCYhWSmwlBMZUA1f0lA2m3U5UFEtHA5hHKs3Mi
 jNe/sgKXzThrro0fishAXbzrro2QDhCG3Vm4PRlOGexIyy+n0gIp1lHwEY1p2vX9
 RJPdt1e3xt/5NYRv6l2GVQYFi8Wd0endgzCdJeXk0OWQFLFtnxhG6ejpgxtgN0fp
 CzKU6orFpsddQtcEOdIzKMUA3CXYWAdQPXOE5Ptjoz3MXZsQqtMm3vN4and8jJ19
 8/VLsCNPp11bSRTmNY3Xt85e+gjIA2mRwgRo+ieL6b1x2AqNeVizlr6IZWYQ4TdG
 rUdlEX0IVmov80TSeQoWgtzTO7xMER+qN6FxAs3pQoUFjtol3pEURq9FQ2QZ8jW4
 5rKpNBrjKxdk
 =eUOc
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - af_unix: fix another unix GC hangup

  Previous releases - regressions:

   - core: fix a possible AF_UNIX deadlock

   - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready()

   - netfilter: nft_flow_offload: release dst in case direct xmit path
     is used

   - bridge: switchdev: ensure MDB events are delivered exactly once

   - l2tp: pass correct message length to ip6_append_data

   - dccp/tcp: unhash sk from ehash for tb2 alloc failure after
     check_estalblished()

   - tls: fixes for record type handling with PEEK

   - devlink: fix possible use-after-free and memory leaks in
     devlink_init()

  Previous releases - always broken:

   - bpf: fix an oops when attempting to read the vsyscall page through
     bpf_probe_read_kernel

   - sched: act_mirred: use the backlog for mirred ingress

   - netfilter: nft_flow_offload: fix dst refcount underflow

   - ipv6: sr: fix possible use-after-free and null-ptr-deref

   - mptcp: fix several data races

   - phonet: take correct lock to peek at the RX queue

  Misc:

   - handful of fixes and reliability improvements for selftests"

* tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  l2tp: pass correct message length to ip6_append_data
  net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
  selftests: ioam: refactoring to align with the fix
  Fix write to cloned skb in ipv6_hop_ioam()
  phonet/pep: fix racy skb_queue_empty() use
  phonet: take correct lock to peek at the RX queue
  net: sparx5: Add spinlock for frame transmission from CPU
  net/sched: flower: Add lock protection when remove filter handle
  devlink: fix port dump cmd type
  net: stmmac: Fix EST offset for dwmac 5.10
  tools: ynl: don't leak mcast_groups on init error
  tools: ynl: make sure we always pass yarg to mnl_cb_run
  net: mctp: put sock on tag allocation failure
  netfilter: nf_tables: use kzalloc for hook allocation
  netfilter: nf_tables: register hooks last when adding new chain/flowtable
  netfilter: nft_flow_offload: release dst in case direct xmit path is used
  netfilter: nft_flow_offload: reset dst in route object after setting up flow
  netfilter: nf_tables: set dormant flag on hook register failure
  selftests: tls: add test for peeking past a record of a different type
  selftests: tls: add test for merging of same-type control messages
  ...
2024-02-22 09:57:58 -08:00
Eric Farman
559a146290 KVM: s390: selftests: memop: add a simple AR test
There is a selftest that checks for an (expected) error when an
invalid AR is specified, but not one that exercises the AR path.

Add a simple test that mirrors the vanilla write/read test while
providing an AR. An AR that contains zero will direct the CPU to
use the primary address space normally used anyway. AR[1] is
selected for this test because the host AR[1] is usually non-zero,
and KVM needs to correctly swap those values.

Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20240220211211.3102609-3-farman@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-22 16:06:56 +01:00
Muhammad Usama Anjum
510325e5ac selftests/iommu: fix the config fragment
The config fragment doesn't follow the correct format to enable those
config options which make the config options getting missed while
merging with other configs.

➜ merge_config.sh -m .config tools/testing/selftests/iommu/config
Using .config as base
Merging tools/testing/selftests/iommu/config
➜ make olddefconfig
.config:5295:warning: unexpected data: CONFIG_IOMMUFD
.config:5296:warning: unexpected data: CONFIG_IOMMUFD_TEST

While at it, add CONFIG_FAULT_INJECTION as well which is needed for
CONFIG_IOMMUFD_TEST. If CONFIG_FAULT_INJECTION isn't present in base
config (such as x86 defconfig), CONFIG_IOMMUFD_TEST doesn't get enabled.

Fixes: 57f0988706fe ("iommufd: Add a selftest")
Link: https://lore.kernel.org/r/20240222074934.71380-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-22 09:02:05 -04:00
Nikolay Borisov
07a5d4bcbf x86/insn: Directly assign x86_64 state in insn_init()
No point in checking again as this was already done by the caller.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240222111636.2214523-3-nik.borisov@suse.com
2024-02-22 12:23:27 +01:00
Nikolay Borisov
427e1646f1 x86/insn: Remove superfluous checks from instruction decoding routines
It's pointless checking if a particular part of an instruction is
decoded before calling the routine responsible for decoding it as this
check is duplicated in the routines itself. Streamline the code by
removing the superfluous checks. No functional difference.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240222111636.2214523-2-nik.borisov@suse.com
2024-02-22 12:23:04 +01:00
Paolo Abeni
fdcd4467ba bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZdaBCwAKCRDbK58LschI
 g3EhAP0d+S18mNabiEGz8efnE2yz3XcFchJgjiRS8WjOv75GvQEA6/sWncFjbc8k
 EqxPHmeJa19rWhQlFrmlyNQfLYGe4gY=
 =VkOs
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-02-22

The following pull-request contains BPF updates for your *net* tree.

We've added 11 non-merge commits during the last 24 day(s) which contain
a total of 15 files changed, 217 insertions(+), 17 deletions(-).

The main changes are:

1) Fix a syzkaller-triggered oops when attempting to read the vsyscall
   page through bpf_probe_read_kernel and friends, from Hou Tao.

2) Fix a kernel panic due to uninitialized iter position pointer in
   bpf_iter_task, from Yafang Shao.

3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel,
   from Martin KaFai Lau.

4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET)
   due to incorrect truesize accounting, from Sebastian Andrzej Siewior.

5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready,
   from Shigeru Yoshida.

6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be
   resolved, from Hari Bathini.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()
  selftests/bpf: Add negtive test cases for task iter
  bpf: Fix an issue due to uninitialized bpf_iter_task
  selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
  bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
  selftest/bpf: Test the read of vsyscall page under x86-64
  x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
  x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
  bpf, scripts: Correct GPL license name
  xsk: Add truesize to skb_add_rx_frag().
  bpf: Fix warning for bpf_cpumask in verifier
====================

Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22 10:04:47 +01:00