43943 Commits

Author SHA1 Message Date
Linus Torvalds
b066935bf8 ARM:
* Address some fallout of the locking rework, this time affecting
   the way the vgic is configured
 
 * Fix an issue where the page table walker frees a subtree and
   then proceeds with walking what it has just freed...
 
 * Check that a given PA donated to the guest is actually memory
   (only affecting pKVM)
 
 * Correctly handle MTE CMOs by Set/Way
 
 * Fix the reported address of a watchpoint forwarded to userspace
 
 * Fix the freeing of the root of stage-2 page tables
 
 * Stop creating spurious PMU events to perform detection of the
   default PMU and use the existing PMU list instead.
 
 x86:
 
 * Fix a memslot lookup bug in the NX recovery thread that could
    theoretically let userspace bypass the NX hugepage mitigation
 
 * Fix a s/BLOCKING/PENDING bug in SVM's vNMI support
 
 * Account exit stats for fastpath VM-Exits that never leave the super
   tight run-loop
 
 * Fix an out-of-bounds bug in the optimized APIC map code, and add a
   regression test for the race.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmR7k1QUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNblwf/faUVOBMv7mQBGsGa7FNcmaNhYeIT
 U1k4pFNlo7dNNuNJrGdpo+sOGP5A8CRLNSVvlyjgCHF1Qc9gVtXNvZ9PnA6nAYmB
 qqvUz/TDw9/NLTlJEkbSs05B4am4yfd5pV6R/32jrPIbXOW++6ae2LpILS/NPBrB
 y0tGiVUJrO3zVXdBKa4PFmlO8jsXPmMEiicEJa5v2Boeo5SFyFfErw9zDNwSMsQc
 27bzbs3O2daXTNMFnwVCCpWUxt1EqWYUXGvBjsChAUI0K10F2/GW9f6YeFsGXqKI
 d8g1QuCukSt/CvN0pT+g/540mR6i0Azpek1myQfuCu2IhQ1jCJaSWOjoEw==
 =8VrO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Address some fallout of the locking rework, this time affecting the
     way the vgic is configured

   - Fix an issue where the page table walker frees a subtree and then
     proceeds with walking what it has just freed...

   - Check that a given PA donated to the guest is actually memory (only
     affecting pKVM)

   - Correctly handle MTE CMOs by Set/Way

   - Fix the reported address of a watchpoint forwarded to userspace

   - Fix the freeing of the root of stage-2 page tables

   - Stop creating spurious PMU events to perform detection of the
     default PMU and use the existing PMU list instead

  x86:

   - Fix a memslot lookup bug in the NX recovery thread that could
     theoretically let userspace bypass the NX hugepage mitigation

   - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support

   - Account exit stats for fastpath VM-Exits that never leave the super
     tight run-loop

   - Fix an out-of-bounds bug in the optimized APIC map code, and add a
     regression test for the race"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: Add test for race in kvm_recalculate_apic_map()
  KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
  KVM: x86: Account fastpath-only VM-Exits in vCPU stats
  KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
  KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
  KVM: arm64: Document default vPMU behavior on heterogeneous systems
  KVM: arm64: Iterate arm_pmus list to probe for default PMU
  KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()
  KVM: arm64: Populate fault info for watchpoint
  KVM: arm64: Reload PTE after invoking walker callback on preorder traversal
  KVM: arm64: Handle trap of tagged Set/Way CMOs
  arm64: Add missing Set/Way CMO encodings
  KVM: arm64: Prevent unconditional donation of unmapped regions from the host
  KVM: arm64: vgic: Fix a comment
  KVM: arm64: vgic: Fix locking comment
  KVM: arm64: vgic: Wrap vgic_its_create() with config_lock
  KVM: arm64: vgic: Fix a circular locking issue
2023-06-04 07:16:53 -04:00
Sean Christopherson
4364b28798 KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
Bail from kvm_recalculate_phys_map() and disable the optimized map if the
target vCPU's x2APIC ID is out-of-bounds, i.e. if the vCPU was added
and/or enabled its local APIC after the map was allocated.  This fixes an
out-of-bounds access bug in the !x2apic_format path where KVM would write
beyond the end of phys_map.

Check the x2APIC ID regardless of whether or not x2APIC is enabled,
as KVM's hardcodes x2APIC ID to be the vCPU ID, i.e. it can't change, and
the map allocation in kvm_recalculate_apic_map() doesn't check for x2APIC
being enabled, i.e. the check won't get false postivies.

Note, this also affects the x2apic_format path, which previously just
ignored the "x2apic_id > new->max_apic_id" case.  That too is arguably a
bug fix, as ignoring the vCPU meant that KVM would not send interrupts to
the vCPU until the next map recalculation.  In practice, that "bug" is
likely benign as a newly present vCPU/APIC would immediately trigger a
recalc.  But, there's no functional downside to disabling the map, and
a future patch will gracefully handle the -E2BIG case by retrying instead
of simply disabling the optimized map.

Opportunistically add a sanity check on the xAPIC ID size, along with a
comment explaining why the xAPIC ID is guaranteed to be "good".

Reported-by: Michal Luczaj <mhal@rbox.co>
Fixes: 5b84b0291702 ("KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602233250.1014316-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02 17:20:50 -07:00
Sean Christopherson
8b703a49c9 KVM: x86: Account fastpath-only VM-Exits in vCPU stats
Increment vcpu->stat.exits when handling a fastpath VM-Exit without
going through any part of the "slow" path.  Not bumping the exits stat
can result in wildly misleading exit counts, e.g. if the primary reason
the guest is exiting is to program the TSC deadline timer.

Fixes: 404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602011920.787844-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02 16:37:49 -07:00
Maciej S. Szmigiero
b2ce899788 KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware
I noticed that with vCPU count large enough (> 16) they sometimes froze at
boot.
With vCPU count of 64 they never booted successfully - suggesting some kind
of a race condition.

Since adding "vnmi=0" module parameter made these guests boot successfully
it was clear that the problem is most likely (v)NMI-related.

Running kvm-unit-tests quickly showed failing NMI-related tests cases, like
"multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests
and the NMI parts of eventinj test.

The issue was that once one NMI was being serviced no other NMI was allowed
to be set pending (NMI limit = 0), which was traced to
svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather
than for the "NMI pending" flag.

Fix this by testing for the right flag in svm_is_vnmi_pending().
Once this is done, the NMI-related kvm-unit-tests pass successfully and
the Windows guest no longer freezes at boot.

Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02 16:34:20 -07:00
Sean Christopherson
817fa99836 KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
Factor in the address space (non-SMM vs. SMM) of the target shadow page
when recovering potential NX huge pages, otherwise KVM will retrieve the
wrong memslot when zapping shadow pages that were created for SMM.  The
bug most visibly manifests as a WARN on the memslot being non-NULL, but
the worst case scenario is that KVM could unaccount the shadow page
without ensuring KVM won't install a huge page, i.e. if the non-SMM slot
is being dirty logged, but the SMM slot is not.

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 3911 at arch/x86/kvm/mmu/mmu.c:7015
 kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
 CPU: 1 PID: 3911 Comm: kvm-nx-lpage-re
 RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
 RSP: 0018:ffff99b284f0be68 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff99b284edd000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffff9271397024e0 R08: 0000000000000000 R09: ffff927139702450
 R10: 0000000000000000 R11: 0000000000000001 R12: ffff99b284f0be98
 R13: 0000000000000000 R14: ffff9270991fcd80 R15: 0000000000000003
 FS:  0000000000000000(0000) GS:ffff927f9f640000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f0aacad3ae0 CR3: 000000088fc2c005 CR4: 00000000003726e0
 Call Trace:
  <TASK>
__pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm]
  kvm_vm_worker_thread+0x106/0x1c0 [kvm]
  kthread+0xd9/0x100
  ret_from_fork+0x2c/0x50
  </TASK>
 ---[ end trace 0000000000000000 ]---

This bug was exposed by commit edbdb43fc96b ("KVM: x86: Preserve TDP MMU
roots until they are explicitly invalidated"), which allowed KVM to retain
SMM TDP MMU roots effectively indefinitely.  Before commit edbdb43fc96b,
KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages
once all vCPUs exited SMM, which made the window where this bug (recovering
an SMM NX huge page) could be encountered quite tiny.  To hit the bug, the
NX recovery thread would have to run while at least one vCPU was in SMM.
Most VMs typically only use SMM during boot, and so the problematic shadow
pages were gone by the time the NX recovery thread ran.

Now that KVM preserves TDP MMU roots until they are explicitly invalidated
(e.g. by a memslot deletion), the window to trigger the bug is effectively
never closed because most VMMs don't delete memslots after boot (except
for a handful of special scenarios).

Fixes: eb298605705a ("KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages")
Reported-by: Fabio Coatti <fabio.coatti@gmail.com>
Closes: https://lore.kernel.org/all/CADpTngX9LESCdHVu_2mQkNGena_Ng2CphWNwsRGSMxzDsTjU2A@mail.gmail.com
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602010137.784664-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02 16:34:10 -07:00
Mike Christie
f9010dbdce fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.

To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.

This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.

Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.

Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c.  Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn.  vhost_worker now returns true if work was done.

The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit.  This collection clears
__fatal_signal_pending.  This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.

For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file.  To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec.  The
coredump code and de_thread have been modified to ignore vhost threads.

Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.

Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.

Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-01 17:15:33 -04:00
Linus Torvalds
7a6c8e512f This push fixes an alignment crash in x86/aria.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmRt4tIACgkQxycdCkmx
 i6fKpw/6A52EyoKBOjfVE+FKc40Fs/1ecmMUumN0kqnpPFKMHvJehnFsb6U0/gCS
 l6BI3CsFVogeICpBYAo1tjvrt3B4FO+zhBWME+il3aOzYMpdryXf3oLMMORqWlCl
 a+TA0LZH+NmySfYKy6rg1fYTQwKICyv25dp0aFC2n/Nj5SQxVtSVN2TFZ8uXNwcT
 ubjMcVUdmGXVls0KO7/ZbhRf9t/2VELzrS/WovhxUJxxXXW4Qhu1tI7X4UJPbnqc
 m14/dj6aZuaElst8B82BT3OSqZdT2rEbomfcnKOo7ePTp5tkJcpXmtM8eDO40AXn
 lzwZ4d9TIPiXalZWZ4ZL0htvbaC6QAvioSlcFmPPZJudSXYm4n31eimMYBmYlgLq
 uAm5wG4Tl84USumoekTNusZJXinDi0oNDXR7RV3gACy1TQJ17+5vgXKr+1/kNH2u
 clHoBIuon9oFAvxZ44iNyBDc4770D+gU+dNsuFCreBUT/8fvAtFWeptpGY08iptY
 c1hu5Tv3kK68JfEgxTgVWs+ObX1pLWO0bEPeQummd5O9jNXRLLPrSB7wOZrkqrpq
 nl4AchdGE9u9OmB5h0VvNj0tUjJqxr8h04a+Hkhu7NqkREYajGWPJkTZQjNzrTSP
 lu52zuvflim1wyWMozMbFPg1O0J78JqI8EQxs/nuFAjF77u2xgs=
 =JBjy
 -----END PGP SIGNATURE-----

Merge tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "Fix an alignment crash in x86/aria"

* tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
2023-05-29 07:05:49 -04:00
Linus Torvalds
f8b2507c26 A single fix for x86:
- Prevent a bogus setting for the number of HT siblings, which is caused
    by the CPUID evaluation trainwreck of X86. That recomputes the value
    for each CPU, so the last CPU "wins". That can cause completely bogus
    sibling values.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBxMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodfbEACp+oRBD2zIcmf8kpakMxIbH2CkDAxl
 AgqGHYVKAvsqKPQZhYmOFEsk+0isMzssuVaub9gMeLRmpMQxUqv7K20pluvWmuQm
 h7MTYEAvCZpBU2BGB1mixp57tQ/gpLSB4uaJU2Q5hh9po6uImvalIdqThf7ArxgA
 mm3BhtbFObLGwRaV2981zdsEs8Cjs9RusnAOrX5LbBBb6ibcnrqWy7DAhySMA0zr
 7RfaGyG/T9z6/BgX54FBQ7aDNU/RkZ8YYQoMj0kAFXdM3V+sa3oPfMRQpACPHKm4
 kwskJfWVdMq06kgOD7N9+WrOfBAgIsmzasT6VhTuGAGNo4CiEOthLqXDHwf4NLhN
 hj0XReHVyBiQ1KuI2lGNJ71c30KRkh7SCdGNiEFtnlpteXnS4bRnzWtfMf/+2SCC
 WOziRRYRzuAvvoTwoQS1BYJrDAB9f5jOX8FTYbC12rEk/RPOB6t4iFNLUzid7u1H
 yPYphoftXlQlli7EajVc5pSZCftMHRCzWernptiPFU5tMvyagYppcg25JY54rquC
 CTJQqoa/HPAEZhBjgZElwO9QBgu+VyJUq3R/Z+LbkQuz5z+PVDfqeDuDWhaqbcz1
 WGtdGdNo4bqVhwqgwyHjgKCFe05mF6tPuotnYZE/VSkXUCvLkjGEVg8HQ7WiPhJd
 IYDqxkIhX5h1Mg==
 =l43p
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpu fix from Thomas Gleixner:
 "A single fix for x86:

   - Prevent a bogus setting for the number of HT siblings, which is
     caused by the CPUID evaluation trainwreck of X86. That recomputes
     the value for each CPU, so the last CPU "wins". That can cause
     completely bogus sibling values"

* tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
2023-05-28 07:42:05 -04:00
Linus Torvalds
2d5438f4c6 A small set of perf fixes:
- Make the CHA discovery based on MSR readout to work around broken
    discovery tables in some SPR firmwares.
 
  - Prevent saving PEBS configuration which has software bits set that
    cause a crash when restored into the relevant MSR.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBlETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofUsEAC6VMmscn6MDmCpO71+o80IhFUeN/Pa
 /Qu4kKCQmeT/gonMG+bKCYSNobgdi4a7KNDRZypTjzMORTDmS3lT4aTbzy3ry+eI
 7rUTolAo3HiY0ZwPhU/evrGDf0O2d/19nd9pXiLZFinH9WsMf7mKsXhAJvlq6SKu
 SliGCQlB7+0bicdYgBkJqWPUbKY7QvcCcXiPx1Ujxg7CbMTcuj7MLzu/TVYmwZYH
 Je/k+vBnhl54PF9nGiqNYZGBPh4cBgkpHEFGFrvwplBRBdwHrjapNxhIRcA2N66u
 mr+f7Cl6StOP16Bn4dzG9nHLVTDz5NVgsbtPStVTcOoIzfOhch7mOL+tu/pafXdt
 zRL7U6usfbQqt1rcBBtPoeFlEifQwHoz3ZGFSyIZn5IRfXtXlDJHyPxlRYMB5/uz
 WjR10GzE0tZFLetvW4PdlMHcwwrOZNU+crvADmCk4KzzVDip9QLbomiGfxHVHI2J
 Fjstd6ZaNkuDHRy/r76YfwRDDzQWPvtsaWRLzD4aFQQxYdHHZOabBittK9UvENvu
 MjLRO+1ymN52Ap6TdcK9ZMNpb8ol/Xn5aYivNRlHsUjJ8RxJj94sElIlNHLI2yoa
 hICdMcGSCd7H+FIkXdgT2O8OBxEsf139xSrG2oSTOZFCjkR1BSxtyqNY47MFVSiA
 KBtbnJMkde48pg==
 =Z3GU
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A small set of perf fixes:

   - Make the MSR-readout based CHA discovery work around broken
     discovery tables in some SPR firmwares.

   - Prevent saving PEBS configuration which has software bits set that
     cause a crash when restored into the relevant MSR"

* tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Correct the number of CHAs on SPR
  perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
2023-05-28 07:37:23 -04:00
Linus Torvalds
abbf7fa15b A set of unwinder and tooling fixes:
- Ensure that the stack pointer on x86 is aligned again so that the
     unwinder does not read past the end of the stack
 
   - Discard .note.gnu.property section which has a pointlessly different
     alignment than the other note sections. That confuses tooling of all
     sorts including readelf, libbpf and pahole.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBW0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoczHD/4vCopMPBFsT5w5HhSTQSX5Kglx3UQL
 g+qa0Z/uJKLpMgGkBzWBQGXHH/UqyY4sk5T3DDOkVQjdSfD+zmJu4R44wAonEU+y
 SDW2MeBciyp1WtjwPNWNY9QgDUQj7Cr2dSLTVqBR9akladTcj7EKHApq0pgaqsbi
 MnmfXzPyH17PRfyW8FbsBjdQvhpkYZSZntQ0cU+W5VtsUtZvg2w4I8IU76ri9L0E
 OYBbk2zV8RabGVJBv79fnm4fXb78+Zkp/xvs5sTBRKRS+3T5FNNJjo8tNp1AtxdF
 eJMlOeY7PjSFyAL6+/+/uRqYNCnH4Rw2MlMqJJIhzKYCw7BFo0FAhi+mro63Z85b
 byGuriv1g7suOgOCK8IzTl7e1nTDP4YCZ0cMDDvcRceIiqR6Rrk3C/B5cBVz0Bng
 iFYFUJSlLT4G+tGZwupV2UUGsCwS5+vVCG/FVWes8Mz4g4zr2Dmrh6YMLzrcDshE
 eAY2yKMypD/JD9cJa4KOVJuFARaSDJDiBpsO1BjRC6MxLFaw1biB0mE2hkwn3LJe
 fLVvU/sWnIQ+dQoy7+GBzm7Pi5SmKBuLDsgOt/ocbCwfUwUQMGf+I7bNg+08UOOS
 dNzM6/agdd8rLZTS/Ma7EvnA367ZwVOb5COVMqwxAoKZbPkvgz1xvRGkFQCHYapx
 BwMILF9FIM0kvg==
 =TPwY
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull unwinder fixes from Thomas Gleixner:
 "A set of unwinder and tooling fixes:

   - Ensure that the stack pointer on x86 is aligned again so that the
     unwinder does not read past the end of the stack

   - Discard .note.gnu.property section which has a pointlessly
     different alignment than the other note sections. That confuses
     tooling of all sorts including readelf, libbpf and pahole"

* tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/show_trace_log_lvl: Ensure stack pointer is aligned, again
  vmlinux.lds.h: Discard .note.gnu.property section
2023-05-28 07:33:29 -04:00
Linus Torvalds
4e893b5aa4 xen: branch for v6.4-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZHGVRAAKCRCAXGG7T9hj
 vqtJAQDizKasLE7tSnfs/FrZ/4xPaDLe3bQifMx2C1dtYCjRcAD/ciZSa1L0WzZP
 dpEZnlYRzsR3bwLktQEMQFOvlbh1SwE=
 =K860
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - a double free fix in the Xen pvcalls backend driver

 - a fix for a regression causing the MSI related sysfs entries to not
   being created in Xen PV guests

 - a fix in the Xen blkfront driver for handling insane input data
   better

* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/pci/xen: populate MSI sysfs entries
  xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
  xen/blkfront: Only check REQ_FUA for writes
2023-05-27 09:42:56 -07:00
Linus Torvalds
47ee3f1dd9 x86: re-introduce support for ERMS copies for user space accesses
I tried to streamline our user memory copy code fairly aggressively in
commit adfcf4231b8c ("x86: don't use REP_GOOD or ERMS for user memory
copies"), in order to then be able to clean up the code and inline the
modern FSRM case in commit 577e6a7fd50d ("x86: inline the 'rep movs' in
user copies for the FSRM case").

We had reports [1] of that causing regressions earlier with blogbench,
but that turned out to be a horrible benchmark for that case, and not a
sufficient reason for re-instating "rep movsb" on older machines.

However, now Eric Dumazet reported [2] a regression in performance that
seems to be a rather more real benchmark, where due to the removal of
"rep movs" a TCP stream over a 100Gbps network no longer reaches line
speed.

And it turns out that with the simplified the calling convention for the
non-FSRM case in commit 427fda2c8a49 ("x86: improve on the non-rep
'copy_user' function"), re-introducing the ERMS case is actually fairly
simple.

Of course, that "fairly simple" is glossing over several missteps due to
having to fight our assembler alternative code.  This code really wanted
to rewrite a conditional branch to have two different targets, but that
made objtool sufficiently unhappy that this instead just ended up doing
a choice between "jump to the unrolled loop, or use 'rep movsb'
directly".

Let's see if somebody finds a case where the kernel memory copies also
care (see commit 68674f94ffc9: "x86: don't use REP_GOOD or ERMS for
small memory copies").  But Eric does argue that the user copies are
special because networking tries to copy up to 32KB at a time, if
order-3 pages allocations are possible.

In-kernel memory copies are typically small, unless they are the special
"copy pages at a time" kind that still use "rep movs".

Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1]
Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2]
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Fixes: adfcf4231b8c ("x86: don't use REP_GOOD or ERMS for user memory copies")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-26 12:34:20 -07:00
Zhang Rui
edc0a2b595 x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
Traditionally, all CPUs in a system have identical numbers of SMT
siblings.  That changes with hybrid processors where some logical CPUs
have a sibling and others have none.

Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2.  If it
is an Ecore, smp_num_siblings == 1.

smp_num_siblings describes if the *system* supports SMT.  It should
specify the maximum number of SMT threads among all cores.

Ensure that smp_num_siblings represents the system-wide maximum number
of siblings by always increasing its value. Never allow it to decrease.

On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
not updated in any cpu sibling map because the system is treated as an
UP system when probing Ecore CPUs.

Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21

Notice that the "before" 'package_cpus_list' has only one CPU.  This
means that userspace tools like lscpu will see a little laptop like
an 11-socket system:

-Core(s) per socket:  1
-Socket(s):           11
+Core(s) per socket:  16
+Socket(s):           1

This is also expected to make the scheduler do rather wonky things
too.

[ dhansen: remove CPUID detail from changelog, add end user effects ]

CC: stable@kernel.org
Fixes: bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology")
Fixes: 95f3d39ccf7a ("x86/cpu/topology: Provide detect_extended_topology_early()")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com
2023-05-25 10:48:42 -07:00
Kan Liang
38776cc45e perf/x86/uncore: Correct the number of CHAs on SPR
The number of CHAs from the discovery table on some SPR variants is
incorrect, because of a firmware issue. An accurate number can be read
from the MSR UNC_CBO_CONFIG.

Fixes: 949b11381f81 ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230508140206.283708-1-kan.liang@linux.intel.com
2023-05-24 22:19:41 +02:00
Maximilian Heyne
335b422346 x86/pci/xen: populate MSI sysfs entries
Commit bf5e758f02fc ("genirq/msi: Simplify sysfs handling") reworked the
creation of sysfs entries for MSI IRQs. The creation used to be in
msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs.
Then it moved into __msi_domain_alloc_irqs which is an implementation of
domain_alloc_irqs. However, Xen comes with the only other implementation
of domain_alloc_irqs and hence doesn't run the sysfs population code
anymore.

Commit 6c796996ee70 ("x86/pci/xen: Fixup fallout from the PCI/MSI
overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info
but that doesn't actually have an effect because Xen uses it's own
domain_alloc_irqs implementation.

Fix this by making use of the fallback functions for sysfs population.

Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-24 18:08:49 +02:00
Ard Biesheuvel
6ab39f9992 crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
The GFNI routines in the AVX version of the ARIA implementation now use
explicit VMOVDQA instructions to load the constant input vectors, which
means they must be 16 byte aligned. So ensure that this is the case, by
dropping the section split and the incorrect .align 8 directive, and
emitting the constants into the 16-byte aligned section instead.

Note that the AVX2 version of this code deviates from this pattern, and
does not require a similar fix, given that it loads these contants as
8-byte memory operands, for which AVX2 permits any alignment.

Cc: Taehee Yoo <ap420073@gmail.com>
Fixes: 8b84475318641c2b ("crypto: x86/aria-avx - Do not use avx2 instructions")
Reported-by: syzbot+a6abcf08bad8b18fd198@syzkaller.appspotmail.com
Tested-by: syzbot+a6abcf08bad8b18fd198@syzkaller.appspotmail.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-24 18:10:27 +08:00
Like Xu
3c845304d2 perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
After commit b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing
PEBS_DATA_CFG"), the cpuc->pebs_data_cfg may save some bits that are not
supported by real hardware, such as PEBS_UPDATE_DS_SW. This would cause
the VMX hardware MSR switching mechanism to save/restore invalid values
for PEBS_DATA_CFG MSR, thus crashing the host when PEBS is used for guest.
Fix it by using the active host value from cpuc->active_pebs_data_cfg.

Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
Signed-off-by: Like Xu <likexu@tencent.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230517133808.67885-1-likexu@tencent.com
2023-05-23 10:01:13 +02:00
Linus Torvalds
ae8373a5ad Work around a TLB invalidation issue in recent hybrid CPUs
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRr6MMACgkQaDWVMHDJ
 krAaIA/+IIV2rijvxzlnN41RksfiuI8lmDXuOeAqFCFVAukm6/faiDHmXQC/4ipw
 JTdgq4JbB6wTZmYsijSMdwgWDt3BGXZAAXK7sSOXLNRkaf0EuPk6h59P7QXBAkRD
 A6ibjWPipV6/k/Mczm8o+WnTSHUln6CVz8yB13gF3eB6+QMO7yYznL4EJ4iUBdZC
 OWCukkI8zT07p1u7nqpOmTixXZEqSauC3qB9tg856w8UJoS1evOocgExZr2d8GT3
 Ex9Q+Hj8ULkM6tDeke9mfoDDEdHtHDkQIMH5L63siTc0LzvRIs/EZEnZGdZWNy9F
 zw6AaIShzJQ9Z9KTP/SRxQ3nvUh/UuI2z7KnZLLxHYqLZZVMNQtinK+ZDGlzvvqg
 xr3ZI2PXUXoQT8Mg0BzzpeLwuuSXS8DFyj63SErValRuyIQQs07MhIRFiU3abQtq
 A4nfXMzfCwfRn5ggm9HIPLKOtiw1ZlsQtDxmHULpXsUqGgBxEfmCF5f210TmkOhN
 yrNlDmNBr39j1xlUeFlJRXWm4badlo39G8s/70oqFTushvatfet7Gyt7fZseGhJL
 KNQCfxyCu8bYYmtYfuW6WTc8nq5sJmnTdWEjcT0ftbgDeYNtBjfKfhJ+8+q04FWl
 wBxu5nqfKMT1bAKIOLse9eVbaMXNuHJJkStQZAZP5/wUMoVvfHI=
 =TZgh
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Dave Hansen:
 "This works around and issue where the INVLPG instruction may miss
  invalidating kernel TLB entries in recent hybrid CPUs.

  I do expect an eventual microcode fix for this. When the microcode
  version numbers are known, we can circle back around and add them the
  model table to disable this workaround"

* tag 'x86_urgent_for_6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Avoid incomplete Global INVLPG flushes
2023-05-22 16:31:28 -07:00
Linus Torvalds
a35747c310 ARM:
* Plug a race in the stage-2 mapping code where the IPA and the PA
   would end up being out of sync
 
 * Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...)
 
 * FP/SVE/SME documentation update, in the hope that this field
   becomes clearer...
 
 * Add workaround for Apple SEIS brokenness to a new SoC
 
 * Random comment fixes
 
 x86:
 
 * add MSR_IA32_TSX_CTRL into msrs_to_save
 
 * fixes for XCR0 handling in SGX enclaves
 
 Generic:
 
 * Fix vcpu_array[0] races
 
 * Fix race between starting a VM and "reboot -f"
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRp0WIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPqVwf+OFayNPpURAFqfrOuISYW7hoCL24+
 sCtXyVv4Ei0np1vGekit2h/m8GmxO12xEBibcFeYj+YQItIqu9HvC08fRxAKaMeE
 N3p9iLuS1zcM3cEuZpg0r6QN+pKybttdadl70yho43CtagEM4FmB7dgyAo9AhyXk
 pZUaVfoO6beBQ/J6A6V/Q5xlue1LvHk1+K4rmNcYVTYn6ZOd+yYgvqng1nv5/h9b
 0HgW0aUWkEHAB67/sSnUUro707loMNTowsZlMCtgDk4Fzf8RwQ7qc8lClLk1UPjJ
 DHB6Hif9F0Q5mkrwn+c7xyVlKARaY6/FOshS2Q620q19+4fq5fUD9HgjrQ==
 =ARzH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Plug a race in the stage-2 mapping code where the IPA and the PA
     would end up being out of sync

   - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...)

   - FP/SVE/SME documentation update, in the hope that this field
     becomes clearer...

   - Add workaround for Apple SEIS brokenness to a new SoC

   - Random comment fixes

  x86:

   - add MSR_IA32_TSX_CTRL into msrs_to_save

   - fixes for XCR0 handling in SGX enclaves

  Generic:

   - Fix vcpu_array[0] races

   - Fix race between starting a VM and 'reboot -f'"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save
  KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)
  KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE
  KVM: Fix vcpu_array[0] races
  KVM: VMX: Fix header file dependency of asm/vmx.h
  KVM: Don't enable hardware after a restart/shutdown is initiated
  KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown
  KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations
  KVM: arm64: Clarify host SME state management
  KVM: arm64: Restructure check for SVE support in FP trap handler
  KVM: arm64: Document check for TIF_FOREIGN_FPSTATE
  KVM: arm64: Fix repeated words in comments
  KVM: arm64: Constify start/end/phys fields of the pgtable walker data
  KVM: arm64: Infer PA offset from VA in hyp map walker
  KVM: arm64: Infer the PA offset from IPA in stage-2 map walker
  KVM: arm64: Use the bitmap API to allocate bitmaps
  KVM: arm64: Slightly optimize flush_context()
2023-05-21 13:58:37 -07:00
Mingwei Zhang
b9846a698c KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save
Add MSR_IA32_TSX_CTRL into msrs_to_save[] to explicitly tell userspace to
save/restore the register value during migration. Missing this may cause
userspace that relies on KVM ioctl(KVM_GET_MSR_INDEX_LIST) fail to port the
value to the target VM.

In addition, there is no need to add MSR_IA32_TSX_CTRL when
ARCH_CAP_TSX_CTRL_MSR is not supported in kvm_get_arch_capabilities(). So
add the checking in kvm_probe_msr_to_save().

Fixes: c11f83e0626b ("KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20230509032348.1153070-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-21 04:05:51 -04:00
Sean Christopherson
275a87244e KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)
Drop KVM's manipulation of guest's CPUID.0x12.1 ECX and EDX, i.e. the
allowed XFRM of SGX enclaves, now that KVM explicitly checks the guest's
allowed XCR0 when emulating ECREATE.

Note, this could theoretically break a setup where userspace advertises
a "bad" XFRM and relies on KVM to provide a sane CPUID model, but QEMU
is the only known user of KVM SGX, and QEMU explicitly sets the SGX CPUID
XFRM subleaf based on the guest's XCR0.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230503160838.3412617-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-21 04:05:51 -04:00
Sean Christopherson
ad45413d22 KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE
Explicitly check the vCPU's supported XCR0 when determining whether or not
the XFRM for ECREATE is valid.  Checking CPUID works because KVM updates
guest CPUID.0x12.1 to restrict the leaf to a subset of the guest's allowed
XCR0, but that is rather subtle and KVM should not modify guest CPUID
except for modeling true runtime behavior (allowed XFRM is most definitely
not "runtime" behavior).

Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230503160838.3412617-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-21 04:05:51 -04:00
Jacob Xu
3367eeab97 KVM: VMX: Fix header file dependency of asm/vmx.h
Include a definition of WARN_ON_ONCE() before using it.

Fixes: bb1fcc70d98f ("KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT")
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jacob Xu <jacobhxu@google.com>
[reworded commit message; changed <asm/bug.h> to <linux/bug.h>]
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225012959.1554168-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-19 13:56:25 -04:00
Linus Torvalds
2d1bcbc6cd Probes fixes for 6.4-rc1:
- Initialize 'ret' local variables on fprobe_handler() to fix the smatch
   warning. With this, fprobe function exit handler is not working
   randomly.
 
 - Fix to use preempt_enable/disable_notrace for rethook handler to
   prevent recursive call of fprobe exit handler (which is based on
   rethook)
 
 - Fix recursive call issue on fprobe_kprobe_handler().
 
 - Fix to detect recursive call on fprobe_exit_handler().
 
 - Fix to make all arch-dependent rethook code notrace.
   (the arch-independent code is already notrace)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmRmKgQACgkQ2/sHvwUr
 PxvlCgf+OJk5O9IJlTgqDV6JNPsTzFS7qqyAyQmZW9Bj8STfWAIRxa0zeGbZE58K
 5LwgzAj+SqzYRwIvzzZ3xsA5j7f1Wj7wG0TQgmpnIW+hprwDrLsUhoZ5s1D/Ojel
 A4rAnqCrgnh5m5SenU2QCUngGKn004j4RASaZvRELDyvyIkBSqNhswCH8ZWGPror
 KuCu5AmEnFagYl0lmNL3H2aCITAg3QEK+fE6iR+lYsqfR3xbs4YAcqiylHBdY0wX
 ssK7LVdRmv7O6TxSj4P2ohDvLJP3eL9bVirsJpg0OVbqWJCs65T2rJJjXiKojYXf
 vSVWFJFK5oV98ZHfXTG9R7x0DEwc+g==
 =jO68
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - Initialize 'ret' local variables on fprobe_handler() to fix the
   smatch warning. With this, fprobe function exit handler is not
   working randomly.

 - Fix to use preempt_enable/disable_notrace for rethook handler to
   prevent recursive call of fprobe exit handler (which is based on
   rethook)

 - Fix recursive call issue on fprobe_kprobe_handler()

 - Fix to detect recursive call on fprobe_exit_handler()

 - Fix to make all arch-dependent rethook code notrace (the
   arch-independent code is already notrace)"

* tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rethook, fprobe: do not trace rethook related functions
  fprobe: add recursion detection in fprobe_exit_handler
  fprobe: make fprobe_kprobe_handler recursion free
  rethook: use preempt_{disable, enable}_notrace in rethook_trampoline_handler
  tracing: fprobe: Initialize ret valiable to fix smatch error
2023-05-18 09:04:45 -07:00
Ze Gao
571a2a50a8 rethook, fprobe: do not trace rethook related functions
These functions are already marked as NOKPROBE to prevent recursion and
we have the same reason to blacklist them if rethook is used with fprobe,
since they are beyond the recursion-free region ftrace can guard.

Link: https://lore.kernel.org/all/20230517034510.15639-5-zegao@tencent.com/

Fixes: f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86")
Signed-off-by: Ze Gao <zegao@tencent.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-05-18 07:08:01 +09:00
Dave Hansen
ce0b15d11a x86/mm: Avoid incomplete Global INVLPG flushes
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address.  When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries.  (Note: Only kernel mappings set
Global=1.)

Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.

As a workaround, never enable PCIDs on affected processors.

I expect there to eventually be microcode mitigations to replace this
software workaround.  However, the exact version numbers where that
will happen are not known today.  Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.

Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2023-05-17 08:55:02 -07:00
Vernon Lovejoy
2e4be0d011 x86/show_trace_log_lvl: Ensure stack pointer is aligned, again
The commit e335bb51cc15 ("x86/unwind: Ensure stack pointer is aligned")
tried to align the stack pointer in show_trace_log_lvl(), otherwise the
"stack < stack_info.end" check can't guarantee that the last read does
not go past the end of the stack.

However, we have the same problem with the initial value of the stack
pointer, it can also be unaligned. So without this patch this trivial
kernel module

	#include <linux/module.h>

	static int init(void)
	{
		asm volatile("sub    $0x4,%rsp");
		dump_stack();
		asm volatile("add    $0x4,%rsp");

		return -EAGAIN;
	}

	module_init(init);
	MODULE_LICENSE("GPL");

crashes the kernel.

Fixes: e335bb51cc15 ("x86/unwind: Ensure stack pointer is aligned")
Signed-off-by: Vernon Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20230512104232.GA10227@redhat.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-05-16 06:31:04 -07:00
Linus Torvalds
ef21831c2e - Make sure the PEBS buffer is flushed before reprogramming the hardware
so that the correct record sizes are used
 
 - Update the sample size for AMD BRS events
 
 - Fix a confusion with using the same on-stack struct with different
   events in the event processing path
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRgzWIACgkQEsHwGGHe
 VUpdgw//a1toWyjwrIV1YMu8lEpsrPKpOqIFuDQcLSl1vsYrmTRJ47PI1j/ZTQeo
 HgNkEE6lxAa9h/lKAjlE/lACE6Hr59xnQmu0BdG/SS+hlhWkT+oKLEUWz5qD4MuE
 bWdpxwHOhMIFR1ASAMThy/mE9V4TKsI/tsd7lMXUo6/skDGCmCGIgRq//3NUB5fV
 0ivp5lv6NXFnUwS34Ot3fbWj/be7rr2vkYgN8WbwMAaEbpCIyseh6Tz+5ZRbENfP
 dMdh6ryuJ2BJ9BcDe9XlcEvPcaTvz7LVnzOVFz/AnBgtBTIOw/26xt17pgXBH7NK
 kpTKQTPp0mnt6ysnX5zYkeumKaxxqvVWaf18AQHkupj1HwggjiEFPnKK9KfslSy4
 1tcED/D3i5QLOx+A8lCtA4ACwGl0Cvwgvw98Gp9imLst/zmMKa4MK96BYCodirKJ
 iDKN5aFA6c3pKJ4KTE7N6KKFzwhslTrehTHAJIL7BiVw3aMGin6514OnMELZBzam
 /zud81OWAKywWWRSwg7wy+K8RGH0R6K5dhwFrrm2BMqAluMq+rX1pRY9pEsL6jDj
 bCl45L52IsXZBSz2JTwWHGTssPyeDIe157ICFDOBnIx08u4KzJ+Knxsbaq2Jjs3R
 9wm5H9yp/+q7//3XcEkdFjQwDVh2LJkY0QinH+6rPiAseBC9ukU=
 =OCba
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Make sure the PEBS buffer is flushed before reprogramming the
   hardware so that the correct record sizes are used

 - Update the sample size for AMD BRS events

 - Fix a confusion with using the same on-stack struct with different
   events in the event processing path

* tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG
  perf/x86: Fix missing sample size update on AMD BRS
  perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event()
2023-05-14 07:56:51 -07:00
Linus Torvalds
011e33ee48 - Add the required PCI IDs so that the generic SMN accesses provided by
amd_nb.c work for drivers which switch to them. Add a PCI device ID
   to k10temp's table so that latter is loaded on such systems too
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmRgx6IACgkQEsHwGGHe
 VUpzJQ//WL//vtzAyQyEQQ0HP5cUNwtts79rEWpNyHJD0nFymbOrR7ho97HTi7pu
 a+wc3p9gwTC626GJD9dFVZs9YPqXI/Q2BKndQ6vrct9eaVnxWiJ45B5yJYLCUppE
 AHAChrDZ8ErXQjV1jeHtkIZY1mSa5Q7slfn8KFLzY5gIWA/3ds2pmIDK4r2wqQUR
 xIn+kDaaprsu8vhffWtHKIj5n1zRkLwPNZZ/2rZbQ7NJHt0yqu4Ld/L6myC6cQVj
 cX1Z1Wg++T3rZHxChx69w/jYuCY2EeM5JHXEbQqUG1tnfkpInZqvyjz1FUs9iYOY
 NOv9hcod9jIRjGvMooKn7MDhVy3yCtyrR590tKVP4qj3Wb+wKKJtPECjMfCo6I0H
 nVIiQPMZAbbo9UVLVA9UickSxKUgNCvhANUcnYWEQp+DM8wYxbV35PT8lomNz6Cj
 3mKo/uqQPfGn3yUR1GIQagYSxHZJ1NTv50lPQOV0tZtfP2Gk/H4uT2w6ZFJKiJIV
 KTiE5vFW9VhbVye/VcHWyJ3+mB5SgClaA+t87zGpgJiPjfVWo2bnB6JkwZodEWi9
 V8c53Hq+NG78QnFYZR5WNDYQ96LYWfurhcuqol8Sr836Y9LaOCGEAIsPMvG7esB2
 y/GHRWdhkjOf5uNHduGoOVzlH95qW9mBtdFacVpaM/eocb1Zghk=
 =dn2p
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

 - Add the required PCI IDs so that the generic SMN accesses provided by
   amd_nb.c work for drivers which switch to them. Add a PCI device ID
   to k10temp's table so that latter is loaded on such systems too

* tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hwmon: (k10temp) Add PCI ID for family 19, model 78h
  x86/amd_nb: Add PCI ID for family 19h model 78h
2023-05-14 07:44:48 -07:00
Borislav Petkov (AMD)
9a48d60467 x86/retbleed: Fix return thunk alignment
SYM_FUNC_START_LOCAL_NOALIGN() adds an endbr leading to this layout
(leaving only the last 2 bytes of the address):

  3bff <zen_untrain_ret>:
  3bff:       f3 0f 1e fa             endbr64
  3c03:       f6                      test   $0xcc,%bl

  3c04 <__x86_return_thunk>:
  3c04:       c3                      ret
  3c05:       cc                      int3
  3c06:       0f ae e8                lfence

However, "the RET at __x86_return_thunk must be on a 64 byte boundary,
for alignment within the BTB."

Use SYM_START instead.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-12 17:19:53 -05:00
Mario Limonciello
23a5b8bb02 x86/amd_nb: Add PCI ID for family 19h model 78h
Commit

  310e782a99c7 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe")

switched to using amd_smn_read() which relies upon the misc PCI ID used
by DF function 3 being included in a table.  The ID for model 78h is
missing in that table, so amd_smn_read() doesn't work.

Add the missing ID into amd_nb, restoring s2idle on this system.

  [ bp: Simplify commit message. ]

Fixes: 310e782a99c7 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>  # pci_ids.h
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230427053338.16653-2-mario.limonciello@amd.com
2023-05-08 11:25:19 +02:00
Kan Liang
b752ea0c28 perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG
Several similar kernel warnings can be triggered,

  [56605.607840] CPU0 PEBS record size 0, expected 32, config 0 cpuc->record_size=208

when the below commands are running in parallel for a while on SPR.

  while true;
  do
	perf record --no-buildid -a --intr-regs=AX  \
		    -e cpu/event=0xd0,umask=0x81/pp \
		    -c 10003 -o /dev/null ./triad;
  done &

  while true;
  do
	perf record -o /tmp/out -W -d \
		    -e '{ld_blocks.store_forward:period=1000000, \
                         MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}' \
		    -c 1037 ./triad;
  done

The triad program is just the generation of loads/stores.

The warnings are triggered when an unexpected PEBS record (with a
different config and size) is found.

A system-wide PEBS event with the large PEBS config may be enabled
during a context switch. Some PEBS records for the system-wide PEBS
may be generated while the old task is sched out but the new one
hasn't been sched in yet. When the new task is sched in, the
cpuc->pebs_record_size may be updated for the per-task PEBS events. So
the existing system-wide PEBS records have a different size from the
later PEBS records.

The PEBS buffer should be flushed right before the hardware is
reprogrammed. The new size and threshold should be updated after the
old buffer has been flushed.

Reported-by: Stephane Eranian <eranian@google.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230421184529.3320912-1-kan.liang@linux.intel.com
2023-05-08 10:58:27 +02:00
Namhyung Kim
90befef5a9 perf/x86: Fix missing sample size update on AMD BRS
It missed to convert a PERF_SAMPLE_BRANCH_STACK user to call the new
perf_sample_save_brstack() helper in order to update the dyn_size.
This affects AMD Zen3 machines with the branch-brs event.

Fixes: eb55b455ef9c ("perf/core: Add perf_sample_save_brstack() helper")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20230427030527.580841-1-namhyung@kernel.org
2023-05-08 10:58:26 +02:00
Linus Torvalds
b115d85a95 Locking changes in v6.4:
- Introduce local{,64}_try_cmpxchg() - a slightly more optimal
    primitive, which will be used in perf events ring-buffer code.
 
  - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation.
 
  - Misc cleanups/fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRUvUoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hlIhAArP33rTKi+HAndQ3UHW3XtmHRxEEQTfiE
 wvIoN89h58QW4DGMeAV4ltafbIPQAkI233Aogwz903L0qbDV0Ro4OU3XJembRuWl
 LeOADKwYyypXdOa8XICuY9aIP7e1/h0DF3ySs7inLcwK9JCyAIxnsVHYej+hsRXA
 kZoXN98T3TR1C0V9UQy4SU3HI1lC3tsG3R9Ti9TnYUg3ygVXhRE9lOQ4kv9lFPVz
 BNuj2Blj7KNiVaY9kehrhO54THI7NmsCVZO44Rcl48I0KAcFulAmFcNlE7GnR8Nj
 thj38pU6XAFVHXG8MYjgE+Al+PnK48NtJxexCtHyGvGG4D2aLzRMnkolxAUCcVuK
 G+UBsQm3ybjYgHgt1zuN6ehcpT+5tULkDH8JA7vrgZYaVgxHzsUaHgYfCCWKnmUY
 mPR6aImEmYZwZVNLskhe0HT4mq244bp+VnWlnJ6LZK7t/itenvDhqnj7KTi4Bfej
 lTHplOTitV/8uCEW8V4pX+YTEenVsIQmTc/G3iIabXP/6HzLffA3q4vyW6vKIErE
 pqrpuFA0Z4GB+pU0mJXt7+I7zscDVthwI055jDyQBjA7IcdVGm2MjQ6xcNRW5FYN
 UynvaEMocue4ZO4WdFsd1ZBUd9VfoNzGQspBw46DhCL1MEQBYv36SKQNjej/9aRr
 ilVwqnOWI2s=
 =mM0A
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Introduce local{,64}_try_cmpxchg() - a slightly more optimal
   primitive, which will be used in perf events ring-buffer code

 - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation

 - Misc cleanups/fixes

* tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/atomic: Correct (cmp)xchg() instrumentation
  locking/x86: Define arch_try_cmpxchg_local()
  locking/arch: Wire up local_try_cmpxchg()
  locking/generic: Wire up local{,64}_try_cmpxchg()
  locking/atomic: Add generic try_cmpxchg{,64}_local() support
  locking/rwbase: Mitigate indefinite writer starvation
  locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-05 12:56:55 -07:00
Linus Torvalds
d5ed10bb80 Merge branch 'x86-uaccess-cleanup': x86 uaccess header cleanups
Merge my x86 uaccess updates branch.

The LAM ("Linear Address Masking") updates in this release made me
unhappy about how "access_ok()" was done, and it actually turned out to
have a couple of small bugs in it too.  This is my cleanup of the code:

 - use the sign bit of the __user pointer rather than masking the
   address and checking it against the TASK_SIZE range.

   We already did this part for the get/put_user() side, but
   'access_ok()' did the naïve "mask and range check" thing, which not
   only generates nasty code, but also ended up meaning that __access_ok
   itself didn't do a good job, and so copy_from_user_nmi() didn't get
   the check right.

 - move all the code that is 64-bit only into the 64-bit version of the
   header file, so that we don't unnecessarily pollute the shared x86
   code and make it look like LAM might work in 32-bit too.

 - fix a bug in the address masking (that doesn't end up mattering: in
   this case the fix was to just remove the buggy code entirely).

 - a couple of trivial cleanups and added commentary about the
   access_ok() rules.

* x86-uaccess-cleanup:
  x86-64: mm: clarify the 'positive addresses' user address rules
  x86: mm: remove 'sign' games from LAM untagged_addr*() macros
  x86: uaccess: move 32-bit and 64-bit parts into proper <asm/uaccess_N.h> header
  x86: mm: remove architecture-specific 'access_ok()' define
  x86-64: make access_ok() independent of LAM
2023-05-05 12:29:57 -07:00
Paolo Bonzini
29b38e7650 Fix a long-standing flaw in x86's TDP MMU where unloading roots on a vCPU can
result in the root being freed even though the root is completely valid and
 can be reused as-is (with a TLB flush).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmRP/3ESHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5J7kQAIg6v9UzM/qp7/d6C4laZLTWC2YlGhiI
 1ZrfLU3/gQPYnnxv8GzLZ1CaXDhku2IIdyl2AQe8sUEmold45EapAW32rw2127j1
 z4jW8x8dKYXUd1HGe823O0Rm+Ls6bGcXmHj8LaBCBIV6loBINeNfLXNllsO/yIcR
 PmagzEqkNsMW3mvutdqb9mFP8p+mBzQu5qHlMEUb4WOXBmL06teHjR3qo7hi9Kl0
 nM0ZvuvCLGvufoI0RESiq7mXPKBz3yvhFbkjrUgBKQ/rij2PMO8iyULsLfGY1iAI
 m60aBfQPLJIH0NgvNHXkQOW59COYaY+I8udZqZZNr2uVb5A8J+/rQFSG/BP1Ccsw
 mtJgZRD5WdplcAjYlZCcEgBmwjznjSOFGYaOrAp02dJlbPw2/Tjaj1GHMvMjEIME
 KLvWTsN6xB9K0OhiXFvo1N4FCJbfi+PJPK0qVG7UttPnziCwYqAeIhGk4Kj6SHsX
 P23HnDO8U/rCwRG2tuyZmbllpUXsX0q08wyGlp1UcKAbtD8PPGPyz8+I7YakKI97
 RddIAh2qh5hwHON1xe35VSQ8X0OPOK1UnkiGTuBDdfldzxXK7OCfKKVQ6hsnpV6e
 0a6nQc2Ni7/f5jThPo2cTaKz389ZpVE2j1DaTT8QXq5JuBcTzrNI6HImcJwPFTWP
 +kUxewuRaaog
 =pzJT
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-mmu-6.4-2' of https://github.com/kvm-x86/linux into HEAD

Fix a long-standing flaw in x86's TDP MMU where unloading roots on a vCPU can
result in the root being freed even though the root is completely valid and
can be reused as-is (with a TLB flush).
2023-05-05 06:12:36 -04:00
Linus Torvalds
342528ff00 This pull request contains the following changes for UML:
- Make stub data pages configurable
 - Make it harder to mix user and kernel code by accident
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmRSslcWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wVbGEADWL/4rcVNqaD/b9V3bUXDkNhpG
 81Jc9fXcx3OXV87GsxfadiQI05UaOhTAMWJlsdKUF0/IW5tSQSZc0QcDNxTs6aS1
 NDkWvjk8s8OpbqdmV0cNKFmaeginrH4AuASajvNleU0BxX4ziqw6BPadih+wGVGz
 iVEd/M5YH2yet33pvD7Wzh3KZOPlo2OOKBqXzE6Snc3yoOiIfN95U8Cl23nxzMI4
 F235LS5m+4+GWIMkYs6+hlWEa8SxSnsH/beQJxAybtetSHRxsDXOHlEpDI9olxYI
 +OFNqsFLDyhyjib2Snwt8HMES9VmMfUUwT7ZK1cChXN8Quix3lXAyz6rpocmdnL/
 X1tzqWb1bz+yqrVRjtN3AGbcav+sQp4UWXsoExVsiz3lzaXfrbPvRZFucH2+SStK
 Wadyy2v2rHZKNq9sDLUBf5GJ0C4LrBAyT15ADim2L5/K7pZyI7AKSPcdU0Xc8m5y
 kMswml7FLaI+gsuKOBl7jySaoNDiVZFvQjsxd1+hn5cBNOcyBSt2rSErDw3geJgn
 pFzVkLtklfxMpHMm+bwXONBZMrXOBPNgo0gSkVrCVzO68x6j/MaBGLFazSqOKxbc
 UcVULhiomjIae4AcTomzOQx7oM1Xs3XuPo0x2RgJ3gMlCCZKoDNuqDgB3ZmPP/gq
 cbbHHveXEx7aMOPz0A==
 =Qb+I
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull uml updates from Richard Weinberger:

 - Make stub data pages configurable

 - Make it harder to mix user and kernel code by accident

* tag 'uml-for-linus-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: make stub data pages size tweakable
  um: prevent user code in modules
  um: further clean up user_syms
  um: don't export printf()
  um: hostfs: define our own API boundary
  um: add __weak for exported functions
2023-05-03 19:02:03 -07:00
Linus Torvalds
798dec3304 x86-64: mm: clarify the 'positive addresses' user address rules
Dave Hansen found the "(long) addr >= 0" code in the x86-64 access_ok
checks somewhat confusing, and suggested using a helper to clarify what
the code is doing.

So this does exactly that: clarifying what the sign bit check is all
about, by adding a helper macro that makes it clear what it is testing.

This also adds some explicit comments talking about how even with LAM
enabled, any addresses with the sign bit will still GP-fault in the
non-canonical region just above the sign bit.

This is all what allows us to do the user address checks with just the
sign bit, and furthermore be a bit cavalier about accesses that might be
done with an additional offset even past that point.

(And yes, this talks about 'positive' even though zero is also a valid
user address and so technically we should call them 'non-negative'.  But
I don't think using 'non-negative' ends up being more understandable).

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
1dbc0a9515 x86: mm: remove 'sign' games from LAM untagged_addr*() macros
The intent of the sign games was to not modify kernel addresses when
untagging them.  However, that had two issues:

 (a) it didn't actually work as intended, since the mask was calculated
     as 'addr >> 63' on an _unsigned_ address. So instead of getting a
     mask of all ones for kernel addresses, you just got '1'.

 (b) untagging a kernel address isn't actually a valid operation anyway.

Now, (a) had originally been true for both 'untagged_addr()' and the
remote version of it, but had accidentally been fixed for the regular
version of untagged_addr() by commit e0bddc19ba95 ("x86/mm: Reduce
untagged_addr() overhead for systems without LAM").  That one rewrote
the shift to be part of the alternative asm code, and in the process
changed the unsigned shift into a signed 'sar' instruction.

And while it is true that we don't want to turn what looks like a kernel
address into a user address by masking off the high bit, that doesn't
need these sign masking games - all it needs is that the mm context
'untag_mask' value has the high bit set.

Which it always does.

So simplify the code by just removing the superfluous (and in the case
of untagged_addr_remote(), still buggy) sign bit games in the address
masking.

Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
b9bd9f605c x86: uaccess: move 32-bit and 64-bit parts into proper <asm/uaccess_N.h> header
The x86 <asm/uaccess.h> file has grown features that are specific to
x86-64 like LAM support and the related access_ok() changes.  They
really should be in the <asm/uaccess_64.h> file and not pollute the
generic x86 header.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
6ccdc91d6a x86: mm: remove architecture-specific 'access_ok()' define
There's already a generic definition of 'access_ok()' in the
asm-generic/access_ok.h header file, and the only difference bwteen that
and the x86-specific one is the added check for WARN_ON_IN_IRQ().

And it turns out that the reason for that check is long gone: it used to
use a "user_addr_max()" inline function that depended on the current
thread, and caused problems in non-thread contexts.

For details, see commits 7c4788950ba5 ("x86/uaccess, sched/preempt:
Verify access_ok() context") and in particular commit ae31fe51a3cc
("perf/x86: Restore TASK_SIZE check on frame pointer") about how and why
this came to be.

But that "current task" issue was removed in the big set_fs() removal by
Christoph Hellwig in commit 47058bb54b57 ("x86: remove address space
overrides using set_fs()").

So the reason for the test and the architecture-specific access_ok()
define no longer exists, and is actually harmful these days.  For
example, it led various 'copy_from_user_nmi()' games (eg using
__range_not_ok() instead, and then later converted to __access_ok() when
that became ok).

And that in turn meant that LAM was broken for the frame following
before this series, because __access_ok() used to not do the address
untagging.

Accessing user state still needs care in many contexts, but access_ok()
is not the place for this test.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
6014bc2756 x86-64: make access_ok() independent of LAM
The linear address masking (LAM) code made access_ok() more complicated,
in that it now needs to untag the address in order to verify the access
range.  See commit 74c228d20a51 ("x86/uaccess: Provide untagged_addr()
and remove tags before address check").

We were able to avoid that overhead in the get_user/put_user code paths
by simply using the sign bit for the address check, and depending on the
GP fault if the address was non-canonical, which made it all independent
of LAM.

And we can do the same thing for access_ok(): simply check that the user
pointer range has the high bit clear.  No need to bother with any
address bit masking.

In fact, we can go a bit further, and just check the starting address
for known small accesses ranges: any accesses that overflow will still
be in the non-canonical area and will still GP fault.

To still make syzkaller catch any potentially unchecked user addresses,
we'll continue to warn about GP faults that are caused by accesses in
the non-canonical range.  But we'll limit that to purely "high bit set
and past the one-page 'slop' area".

We could probably just do that "check only starting address" for any
arbitrary range size: realistically all kernel accesses to user space
will be done starting at the low address.  But let's leave that kind of
optimization for later.  As it is, this already allows us to generate
simpler code and not worry about any tag bits in the address.

The one thing to look out for is the GUP address check: instead of
actually copying data in the virtual address range (and thus bad
addresses being caught by the GP fault), GUP will look up the page
tables manually.  As a result, the page table limits need to be checked,
and that was previously implicitly done by the access_ok().

With the relaxed access_ok() check, we need to just do an explicit check
for TASK_SIZE_MAX in the GUP code instead.  The GUP code already needs
to do the tag bit unmasking anyway, so there this is all very
straightforward, and there are no LAM issues.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03 10:37:22 -07:00
Linus Torvalds
c8c655c34e s390:
* More phys_to_virt conversions
 
 * Improvement of AP management for VSIE (nested virtualization)
 
 ARM64:
 
 * Numerous fixes for the pathological lock inversion issue that
   plagued KVM/arm64 since... forever.
 
 * New framework allowing SMCCC-compliant hypercalls to be forwarded
   to userspace, hopefully paving the way for some more features
   being moved to VMMs rather than be implemented in the kernel.
 
 * Large rework of the timer code to allow a VM-wide offset to be
   applied to both virtual and physical counters as well as a
   per-timer, per-vcpu offset that complements the global one.
   This last part allows the NV timer code to be implemented on
   top.
 
 * A small set of fixes to make sure that we don't change anything
   affecting the EL1&0 translation regime just after having having
   taken an exception to EL2 until we have executed a DSB. This
   ensures that speculative walks started in EL1&0 have completed.
 
 * The usual selftest fixes and improvements.
 
 KVM x86 changes for 6.4:
 
 * Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
   and by giving the guest control of CR0.WP when EPT is enabled on VMX
   (VMX-only because SVM doesn't support per-bit controls)
 
 * Add CR0/CR4 helpers to query single bits, and clean up related code
   where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
   as a bool
 
 * Move AMD_PSFD to cpufeatures.h and purge KVM's definition
 
 * Avoid unnecessary writes+flushes when the guest is only adding new PTEs
 
 * Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
   when emulating invalidations
 
 * Clean up the range-based flushing APIs
 
 * Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
   A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
   changed SPTE" overhead associated with writing the entire entry
 
 * Track the number of "tail" entries in a pte_list_desc to avoid having
   to walk (potentially) all descriptors during insertion and deletion,
   which gets quite expensive if the guest is spamming fork()
 
 * Disallow virtualizing legacy LBRs if architectural LBRs are available,
   the two are mutually exclusive in hardware
 
 * Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
   after KVM_RUN, similar to CPUID features
 
 * Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
 
 * Apply PMU filters to emulated events and add test coverage to the
   pmu_event_filter selftest
 
 x86 AMD:
 
 * Add support for virtual NMIs
 
 * Fixes for edge cases related to virtual interrupts
 
 x86 Intel:
 
 * Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
   not being reported due to userspace not opting in via prctl()
 
 * Fix a bug in emulation of ENCLS in compatibility mode
 
 * Allow emulation of NOP and PAUSE for L2
 
 * AMX selftests improvements
 
 * Misc cleanups
 
 MIPS:
 
 * Constify MIPS's internal callbacks (a leftover from the hardware enabling
   rework that landed in 6.3)
 
 Generic:
 
 * Drop unnecessary casts from "void *" throughout kvm_main.c
 
 * Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
   size by 8 bytes on 64-bit kernels by utilizing a padding hole
 
 Documentation:
 
 * Fix goof introduced by the conversion to rST
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
 Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
 WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
 huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
 XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
 j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
 =S2Hq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "s390:

   - More phys_to_virt conversions

   - Improvement of AP management for VSIE (nested virtualization)

  ARM64:

   - Numerous fixes for the pathological lock inversion issue that
     plagued KVM/arm64 since... forever.

   - New framework allowing SMCCC-compliant hypercalls to be forwarded
     to userspace, hopefully paving the way for some more features being
     moved to VMMs rather than be implemented in the kernel.

   - Large rework of the timer code to allow a VM-wide offset to be
     applied to both virtual and physical counters as well as a
     per-timer, per-vcpu offset that complements the global one. This
     last part allows the NV timer code to be implemented on top.

   - A small set of fixes to make sure that we don't change anything
     affecting the EL1&0 translation regime just after having having
     taken an exception to EL2 until we have executed a DSB. This
     ensures that speculative walks started in EL1&0 have completed.

   - The usual selftest fixes and improvements.

  x86:

   - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
     enabled, and by giving the guest control of CR0.WP when EPT is
     enabled on VMX (VMX-only because SVM doesn't support per-bit
     controls)

   - Add CR0/CR4 helpers to query single bits, and clean up related code
     where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
     return as a bool

   - Move AMD_PSFD to cpufeatures.h and purge KVM's definition

   - Avoid unnecessary writes+flushes when the guest is only adding new
     PTEs

   - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
     optimizations when emulating invalidations

   - Clean up the range-based flushing APIs

   - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
     single A/D bit using a LOCK AND instead of XCHG, and skip all of
     the "handle changed SPTE" overhead associated with writing the
     entire entry

   - Track the number of "tail" entries in a pte_list_desc to avoid
     having to walk (potentially) all descriptors during insertion and
     deletion, which gets quite expensive if the guest is spamming
     fork()

   - Disallow virtualizing legacy LBRs if architectural LBRs are
     available, the two are mutually exclusive in hardware

   - Disallow writes to immutable feature MSRs (notably
     PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features

   - Overhaul the vmx_pmu_caps selftest to better validate
     PERF_CAPABILITIES

   - Apply PMU filters to emulated events and add test coverage to the
     pmu_event_filter selftest

   - AMD SVM:
       - Add support for virtual NMIs
       - Fixes for edge cases related to virtual interrupts

   - Intel AMX:
       - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
         XTILE_DATA is not being reported due to userspace not opting in
         via prctl()
       - Fix a bug in emulation of ENCLS in compatibility mode
       - Allow emulation of NOP and PAUSE for L2
       - AMX selftests improvements
       - Misc cleanups

  MIPS:

   - Constify MIPS's internal callbacks (a leftover from the hardware
     enabling rework that landed in 6.3)

  Generic:

   - Drop unnecessary casts from "void *" throughout kvm_main.c

   - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
     struct size by 8 bytes on 64-bit kernels by utilizing a padding
     hole

  Documentation:

   - Fix goof introduced by the conversion to rST"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
  KVM: s390: pci: fix virtual-physical confusion on module unload/load
  KVM: s390: vsie: clarifications on setting the APCB
  KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
  KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
  KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
  KVM: selftests: Test the PMU event "Instructions retired"
  KVM: selftests: Copy full counter values from guest in PMU event filter test
  KVM: selftests: Use error codes to signal errors in PMU event filter test
  KVM: selftests: Print detailed info in PMU event filter asserts
  KVM: selftests: Add helpers for PMC asserts in PMU event filter test
  KVM: selftests: Add a common helper for the PMU event filter guest code
  KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
  KVM: arm64: vhe: Drop extra isb() on guest exit
  KVM: arm64: vhe: Synchronise with page table walker on MMU update
  KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
  KVM: arm64: nvhe: Synchronise with page table walker on TLBI
  KVM: arm64: Handle 32bit CNTPCTSS traps
  KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
  KVM: arm64: vgic: Don't acquire its_lock before config_lock
  KVM: selftests: Add test to verify KVM's supported XCR0
  ...
2023-05-01 12:06:20 -07:00
Linus Torvalds
58390c8ce1 IOMMU Updates for Linux 6.4
Including:
 
 	- Convert to platform remove callback returning void
 
 	- Extend changing default domain to normal group
 
 	- Intel VT-d updates:
 	    - Remove VT-d virtual command interface and IOASID
 	    - Allow the VT-d driver to support non-PRI IOPF
 	    - Remove PASID supervisor request support
 	    - Various small and misc cleanups
 
 	- ARM SMMU updates:
 	    - Device-tree binding updates:
 	        * Allow Qualcomm GPU SMMUs to accept relevant clock properties
 	        * Document Qualcomm 8550 SoC as implementing an MMU-500
 	        * Favour new "qcom,smmu-500" binding for Adreno SMMUs
 
 	    - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
 	      implementations
 
 	    - Acknowledge SMMUv3 PRI queue overflow when consuming events
 
 	    - Document (in a comment) why ATS is disabled for bypass streams
 
 	- AMD IOMMU updates:
 	    - 5-level page-table support
 	    - NUMA awareness for memory allocations
 
 	- Unisoc driver: Support for reattaching an existing domain
 
 	- Rockchip driver: Add missing set_platform_dma_ops callback
 
 	- Mediatek driver: Adjust the dma-ranges
 
 	- Various other small fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmRONeAACgkQK/BELZcB
 GuPmpw/8C9ruxQ0JU5rcDBXQGvos4gMmxlbELMrBpbbiTtdb35xchpKfdhnECGIF
 k2SrrcF40R/S82SyzNU/eZtGKirtcXvGFraUFgu/QdCcnnqpRHs+IJMXX2NJP+it
 +0wO1uiInt3CN1ERcR4F31cDKiWjDG8bvQVE5LIyiy4KrIU5ld2G91Fkaa0R13Au
 6H+/wKkcUC6OyaGE6wPx474xBkapT20vj5AIQuAWisXJJR0wbBon1sUTo/IRKsU+
 IkNxH0W+1PNImJ+crAdf/nkOlyqoChY4ww6cm07LrOsBLIsX5bCqXfL4HvKthElD
 MEgk2SN5kfjfR5Vf29W4hZVM1CT8VbhO41I7OzaZ6X6RU2PXoldPKlgKtZGeSKn1
 9bcMpSgB0BtbttvBevSkxTo5KHFozXS2DG3DFoMB3yFMme8Th0LrhBZ9oB7NIPNw
 ntMo4K75vviC6Vvzjy4Anj/+y+Zm3W6wDDP7F12O6WZLkK5s4hrSsHUm/MQnnKQP
 muJlG870RnSl73xUQZe3cuBxktXuJ3EHqqYIPE0npzvauu8hhWcis3opf2Y+U2s8
 aBCCIgp5kTKqjHLh2e4lNCKZf1/b/dhxRcRBQhpAIb8YsjMlIJyM+G8Jz6K6gBga
 5Ld+68UQ3oHJwoLV1HCFN8jbpQ9KZn1s9+h3yrYjRAcLNiFb3nU=
 =OvTo
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Convert to platform remove callback returning void

 - Extend changing default domain to normal group

 - Intel VT-d updates:
     - Remove VT-d virtual command interface and IOASID
     - Allow the VT-d driver to support non-PRI IOPF
     - Remove PASID supervisor request support
     - Various small and misc cleanups

 - ARM SMMU updates:
     - Device-tree binding updates:
         * Allow Qualcomm GPU SMMUs to accept relevant clock properties
         * Document Qualcomm 8550 SoC as implementing an MMU-500
         * Favour new "qcom,smmu-500" binding for Adreno SMMUs

     - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
       implementations

     - Acknowledge SMMUv3 PRI queue overflow when consuming events

     - Document (in a comment) why ATS is disabled for bypass streams

 - AMD IOMMU updates:
     - 5-level page-table support
     - NUMA awareness for memory allocations

 - Unisoc driver: Support for reattaching an existing domain

 - Rockchip driver: Add missing set_platform_dma_ops callback

 - Mediatek driver: Adjust the dma-ranges

 - Various other small fixes and cleanups

* tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits)
  iommu: Remove iommu_group_get_by_id()
  iommu: Make iommu_release_device() static
  iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
  iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
  iommu/vt-d: Remove BUG_ON in map/unmap()
  iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
  iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
  iommu/vt-d: Remove BUG_ON on checking valid pfn range
  iommu/vt-d: Make size of operands same in bitwise operations
  iommu/vt-d: Remove PASID supervisor request support
  iommu/vt-d: Use non-privileged mode for all PASIDs
  iommu/vt-d: Remove extern from function prototypes
  iommu/vt-d: Do not use GFP_ATOMIC when not needed
  iommu/vt-d: Remove unnecessary checks in iopf disabling path
  iommu/vt-d: Move PRI handling to IOPF feature path
  iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
  iommu/vt-d: Move iopf code from SVA to IOPF enabling path
  iommu/vt-d: Allow SVA with device-specific IOPF
  dmaengine: idxd: Add enable/disable device IOPF feature
  arm64: dts: mt8186: Add dma-ranges for the parent "soc" node
  ...
2023-04-30 13:00:38 -07:00
Uros Bizjak
5cd4c26841 locking/x86: Define arch_try_cmpxchg_local()
Define target specific arch_try_cmpxchg_local(). This
definition overrides the generic arch_try_cmpxchg_local()
fallback definition and enables target-specific
implementation of try_cmpxchg_local().

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230405141710.3551-5-ubizjak@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:09:23 +02:00
Uros Bizjak
d994f2c8e2 locking/arch: Wire up local_try_cmpxchg()
Implement target specific support for local_try_cmpxchg()
and local_cmpxchg() using typed C wrappers that call their
_local counterpart and provide additional checking of
their input arguments.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230405141710.3551-4-ubizjak@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-29 09:09:16 +02:00
Linus Torvalds
f20730efbd SMP cross-CPU function-call updates for v6.4:
- Remove diagnostics and adjust config for CSD lock diagnostics
 
  - Add a generic IPI-sending tracepoint, as currently there's no easy
    way to instrument IPI origins: it's arch dependent and for some
    major architectures it's not even consistently available.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq
 gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5
 M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9
 lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl
 KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT
 AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3
 JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi
 Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF
 25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB
 11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh
 yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK
 PA5+V7HcQRk=
 =Wp7f
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP cross-CPU function-call updates from Ingo Molnar:

 - Remove diagnostics and adjust config for CSD lock diagnostics

 - Add a generic IPI-sending tracepoint, as currently there's no easy
   way to instrument IPI origins: it's arch dependent and for some major
   architectures it's not even consistently available.

* tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  trace,smp: Trace all smp_function_call*() invocations
  trace: Add trace_ipi_send_cpu()
  sched, smp: Trace smp callback causing an IPI
  smp: reword smp call IPI comment
  treewide: Trace IPIs sent via smp_send_reschedule()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  trace: Add trace_ipi_send_cpumask()
  kernel/smp: Make csdlock_debug= resettable
  locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
  locking/csd_lock: Remove added data from CSD lock debugging
  locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28 15:03:43 -07:00
Linus Torvalds
7c339778f9 Perf changes for v6.4:
- Add Intel Granite Rapids support
 
  - Add uncore events for Intel SPR IMC PMU
 
  - Fix perf IRQ throttling bug
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK3GoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hqghAAwZyNLY8oAu/izkp5DKML7okLa48nh1+K
 JWsM4GT16ldx6MBqxX4V7tiYfAeo6ydCkd/LbxPklAk4Fmcwt5HWotOEGHab7N7B
 iTOix481TIa+E7ERuDLU5vS0chtdE5CrVfmBwtkI4WEv0c8vBwQHvRzy6RaoGG+2
 XSqwFhkDG7l6N6rIz/V8zJKFo0hNPou0bllYJMM+A+BRdOthKhgXNjM6yuuKXgst
 TWxByDCoAG59uWvzq3WmKRLk/bJ4VC2lRDFpo01xtPvn1hA/Bs8MDF1940NG8uEh
 u7aTI1ZyaJJnnxaTk98r/aZIMYlHHWIIKnymtVFT52ZLeKARDt6MmX5ttec16Cd1
 a5IXuiWPkiSfGlmN7Q4sK89eg606WQ/i+eIKfNMG0ZruvKFiT0uD+gNUlQvE/4fk
 6OgJot/iF2B5+UWcBmEOMYV4jagSWLFkIQ3vGVT6E3Zp8gEa1/R3ek1aSbQeb8Gi
 kvMjcwthUNOgGIXrFKbdDTN/3seoNmrgIP2wetUzXhVaAOIYz6mpxIcXLc+vNiDh
 soTRtkLgNTkrVQ8AKp7snKNRLjazL6CqUd+RILH4tM+bshFnyeH6K4Eck+Fo//Dn
 XtEHry6nJ0ZPhsJPRwFHOoYCAuDQdy5DG7QQ07qXYW9DyJV7ZmefDJYOEjgcip6G
 f3/sUe7ekRA=
 =41Sa
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:

 - Add Intel Granite Rapids support

 - Add uncore events for Intel SPR IMC PMU

 - Fix perf IRQ throttling bug

* tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add events for Intel SPR IMC PMU
  perf/core: Fix hardlockup failure caused by perf throttle
  perf/x86/cstate: Add Granite Rapids support
  perf/x86/msr: Add Granite Rapids
  perf/x86/intel: Add Granite Rapids
2023-04-28 14:41:53 -07:00
Linus Torvalds
2aff7c706c Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
    this inconsistently follow this new, common convention, and fix all the fallout
    that objtool can now detect statically.
 
  - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
    split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
 
  - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
 
  - Generate ORC data for __pfx code
 
  - Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
 
  - Misc improvements & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
 pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
 TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
 LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
 c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
 /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
 SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
 vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
 x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
 fRho4CqzrtM=
 =NwEV
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Linus Torvalds
22b8cc3e78 Add support for new Linear Address Masking CPU feature. This is similar
to ARM's Top Byte Ignore and allows userspace to store metadata in some
 bits of pointers without masking it out before use.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRK/WIACgkQaDWVMHDJ
 krAL+RAAw33EhsWyYVkeAtYmYBKkGvlgeSDULtfJKe5bynJBTHkGKfM6RE9MSJIt
 5fHWaConGh8HNpy0Us1sDvd/aWcWRm5h7ZcCVD+R4qrgh/vc7ULzM+elXe5jzr4W
 cyuTckF2eW6SVrYg6fH5q+6Uy/moDtrdkLRvwRBf+AYeepB8gvSSH5XixKDNiVBE
 pjNy1xXVZQokqD4tjsFelmLttyacR5OabiE/aeVNoFYf9yTwfnN8N3T6kwuOoS4l
 Lp6NA+/0ux+oBlR+Is+JJG8Mxrjvz96yJGZYdR2YP5k3bMQtHAAjuq2w+GgqZm5i
 j3/E6KQepEGaCfC+bHl68xy/kKx8ik+jMCEcBalCC25J3uxbLz41g6K3aI890wJn
 +5ZtfcmoDUk9pnUyLxR8t+UjOSBFAcRSUE+FTjUH1qEGsMPK++9a4iLXz5vYVK1+
 +YCt1u5LNJbkDxE8xVX3F5jkXh0G01SJsuUVAOqHSNfqSNmohFK8/omqhVRrRqoK
 A7cYLtnOGiUXLnvjrwSxPNOzRrG+GAwqaw8gwOTaYogETWbTY8qsSCEVl204uYwd
 m8io9rk2ZXUdDuha56xpBbPE0JHL9hJ2eKCuPkfvRgJT9YFyTh+e0UdX20k+nDjc
 ang1S350o/Y0sus6rij1qS8AuxJIjHucG0GdgpZk3KUbcxoRLhI=
 =qitk
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 LAM (Linear Address Masking) support from Dave Hansen:
 "Add support for the new Linear Address Masking CPU feature.

  This is similar to ARM's Top Byte Ignore and allows userspace to store
  metadata in some bits of pointers without masking it out before use"

* tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
  x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
  selftests/x86/lam: Add test cases for LAM vs thread creation
  selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking
  selftests/x86/lam: Add inherit test cases for linear-address masking
  selftests/x86/lam: Add io_uring test cases for linear-address masking
  selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
  selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking
  x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
  iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
  mm: Expose untagging mask in /proc/$PID/status
  x86/mm: Provide arch_prctl() interface for LAM
  x86/mm: Reduce untagged_addr() overhead for systems without LAM
  x86/uaccess: Provide untagged_addr() and remove tags before address check
  mm: Introduce untagged_addr_remote()
  x86/mm: Handle LAM on context switch
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  x86: Allow atomic MM_CONTEXT flags setting
  x86/mm: Rework address range check in get_user() and put_user()
2023-04-28 09:43:49 -07:00