Commit Graph

43138 Commits

Author SHA1 Message Date
Linus Torvalds
925cf0457d A single fix for x86:
Revert the recent change to the MTRR code which aimed to support
   SEV-SNP guests on Hyper-V. It cuased a regression on XEN Dom0
   kernels.
 
   The underlying issue of MTTR (mis)handling in the x86 code needs some
   deeper investigation and is definitely not 6.2 material.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPxYccTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRESEACXnTGYtTw5JTXBcztuwXFsGuEvKcp4
 yj1TF2l/OVjWh4MurjJRch0CSzkVZYOn4ymqR4H9ivjuwQkKxzvsO03+AwuJr8Rf
 aDqbDzwf0i6E1NmbSU30GkQYGPFNY5fyEn46ptuzu8Humc4FfHt6IrJ5xgNt2nnr
 xzRB0kJrB+EITe07Oh+3mj2YeTHNF7Uk0wQFhhXeLK8Xr2W3w/mmTAEddU42fgTE
 GCboRec6a8eTOA+4l3gAINJcERxOBnwKEIWo2NWDWdMrrSfcq1Ig1vvHH10rIXxe
 cPIHZY19kIBEvALfKlkzZk13cl6XDhHUnyK5mc9KBi/jZWBK8DF/8reyelW2ymyJ
 GneU8HESPlVAv8UvyOc/1jFcolgASfScO3aG/S9twv0RGFUbRHVvk0yhyPJ61rPM
 ibva+7PQG70AQc+kmsRBtZnR7zHNtoOYz3ufkKfl+InknkAMByR0DmT7Iw4HouAZ
 oZ1WDAgrvg/ld+bIVjyjDnEmPyg50d3COjsGlzMTXeogJTERbIAUwTiC3iRmJksB
 evTqKEYYiBbeex9LI8FOHvmLVew1SEvCScXdXX2+SjXdtJoAKO9uoUScju9MWeq5
 /bdfudRekNG4z1fQ8Qhs/nBkVsIguT2xoMREqy6D3LRbEXgnCKiy5V1P1XEjlwwJ
 ecBpXlRM+lIaBQ==
 =nNAW
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
 "A single fix for x86.

  Revert the recent change to the MTRR code which aimed to support
  SEV-SNP guests on Hyper-V. It caused a regression on XEN Dom0 kernels.

  The underlying issue of MTTR (mis)handling in the x86 code needs some
  deeper investigation and is definitely not 6.2 material"

* tag 'x86-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mtrr: Revert 90b926e68f ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
2023-02-18 17:57:16 -08:00
Linus Torvalds
5e725d112e x86:
* zero all padding for KVM_GET_DEBUGREGS
 
 * fix rST warning
 
 * disable vPMU support on hybrid CPUs
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPw5PsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNqaAf6A0zjrdY1KjSyvrcHr0NV6lUfU+Ye
 lc5xrtAJDuO7kERgnqFGPeg3a72tA4a9tlFrTdqqQIrAxnvVn4JNP5gtD7UxfpOn
 PELO7JUbG3/CV2oErTugH02n3lKN/pLSISAClFkO7uAL5sJEM2pXH+ws1CZ7F7kN
 FbPdnmvzi7tnTpv3oJ+gVl2l0HZYTnH4DydFGo68O3lP+oFgRXznkF5rpMxAe6oK
 93fvSWGabVCft278sSVq5XpYfKQSJb5j8KjB8L4qqAlRh0ZJA5haDZWQyaaJvNY0
 oefFj9XYPpA08l8VqZ2ti5vE4b6e+o2/oTg3Nwf/5DrQxJiYOGTHKoA/NA==
 =H68L
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm/x86 fixes from Paolo Bonzini:

 - zero all padding for KVM_GET_DEBUGREGS

 - fix rST warning

 - disable vPMU support on hybrid CPUs

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: initialize all of the kvm_debugregs structure before sending it to userspace
  perf/x86: Refuse to export capabilities for hybrid PMUs
  KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
  Documentation/hw-vuln: Fix rST warning
2023-02-18 11:07:32 -08:00
Greg Kroah-Hartman
2c10b61421 kvm: initialize all of the kvm_debugregs structure before sending it to userspace
When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there
might be some unitialized portions of the kvm_debugregs structure that
could be copied to userspace.  Prevent this as is done in the other kvm
ioctls, by setting the whole structure to 0 before copying anything into
it.

Bonus is that this reduces the lines of code as the explicit flag
setting and reserved space zeroing out can be removed.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20230214103304.3689213-1-gregkh@linuxfoundation.org>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-16 12:31:40 -05:00
Sean Christopherson
4b4191b8ae perf/x86: Refuse to export capabilities for hybrid PMUs
Now that KVM disables vPMU support on hybrid CPUs, WARN and return zeros
if perf_get_x86_pmu_capability() is invoked on a hybrid CPU.  The helper
doesn't provide an accurate accounting of the PMU capabilities for hybrid
CPUs and needs to be enhanced if KVM, or anything else outside of perf,
wants to act on the PMU capabilities.

Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15 08:25:44 -05:00
Sean Christopherson
4d7404e5ee KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)
Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests.  Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.

E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.

The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.

Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities.  But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.

For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.

Reported-by: Jianfeng Gao <jianfeng.gao@intel.com>
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-15 08:25:43 -05:00
Linus Torvalds
82eac0c830 Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
 transitions out of C0 state, the other thread gets access to twice as many
 entries in the RSB, but unfortunately the predictions of the now-halted
 logical processor are not purged.  Therefore, the executing processor
 could speculatively execute from locations that the now-halted processor
 had trained the RSB on.
 
 The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
 when context switching to the idle thread. However, KVM allows a VMM to
 prevent exiting guest mode when transitioning out of C0 using the
 KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change this
 behavior. To mitigate the cross-thread return address predictions bug,
 a VMM must not be allowed to override the default behavior to intercept
 C0 transitions.
 
 These patches introduce a KVM module parameter that, if set, will prevent
 the user from disabling the HLT, MWAIT and CSTATE exits.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPrvAAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMQAgf8CirL+yrngzQ+39UTFQgXj3IS1UyR
 o2mF39w3bVlQhLNf5MBHF965FUeOV/8A16x73hJGjhiiuisphtQWS/6xKR6uYbq7
 0Qi821skqN6XpRsWTWqFHMsdY+n0skr8QeXG4k/GJu7Ghb3tqs4eTGgnf2WBfI8/
 K1UgTmjd9+ikM5gKZoVLpcqZnti0gx3lM+cvZGdfrIUaXB+i+hNd2NfRTiGsTOiK
 fX7vZtLvOeje2TPoKLhzekTbh8kTU07HRWID9aVXT8bLy6Zd6tg2CHlv11noKpwv
 DFVV+RsJ1SiAQYSwT+4IvWfIG4oq4onBQ972g2a27pP2cxF+38GXzt4NQw==
 =xscg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Certain AMD processors are vulnerable to a cross-thread return address
  predictions bug. When running in SMT mode and one of the sibling
  threads transitions out of C0 state, the other thread gets access to
  twice as many entries in the RSB, but unfortunately the predictions of
  the now-halted logical processor are not purged. Therefore, the
  executing processor could speculatively execute from locations that
  the now-halted processor had trained the RSB on.

  The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
  when context switching to the idle thread. However, KVM allows a VMM
  to prevent exiting guest mode when transitioning out of C0 using the
  KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change
  this behavior. To mitigate the cross-thread return address predictions
  bug, a VMM must not be allowed to override the default behavior to
  intercept C0 transitions.

  These patches introduce a KVM module parameter that, if set, will
  prevent the user from disabling the HLT, MWAIT and CSTATE exits"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  Documentation/hw-vuln: Add documentation for Cross-Thread Return Predictions
  KVM: x86: Mitigate the cross-thread return address predictions bug
  x86/speculation: Identify processors vulnerable to SMT RSB predictions
2023-02-14 09:17:01 -08:00
Juergen Gross
f9f57da2c2 x86/mtrr: Revert 90b926e68f ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
Commit

  90b926e68f ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")

broke the use case of running Xen dom0 kernels on machines with an
external disk enclosure attached via USB, see Link tag.

What this commit was originally fixing - SEV-SNP guests on Hyper-V - is
a more specialized situation which has other issues at the moment anyway
so reverting this now and addressing the issue properly later is the
prudent thing to do.

So revert it in time for the 6.2 proper release.

  [ bp: Rewrite commit message. ]

Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/4fe9541e-4d4c-2b2a-f8c8-2d34a7284930@nerdbynature.de
2023-02-14 10:16:34 +01:00
Tom Lendacky
6f0f2d5ef8 KVM: x86: Mitigate the cross-thread return address predictions bug
By default, KVM/SVM will intercept attempts by the guest to transition
out of C0. However, the KVM_CAP_X86_DISABLE_EXITS capability can be used
by a VMM to change this behavior. To mitigate the cross-thread return
address predictions bug (X86_BUG_SMT_RSB), a VMM must not be allowed to
override the default behavior to intercept C0 transitions.

Use a module parameter to control the mitigation on processors that are
vulnerable to X86_BUG_SMT_RSB. If the processor is vulnerable to the
X86_BUG_SMT_RSB bug and the module parameter is set to mitigate the bug,
KVM will not allow the disabling of the HLT, MWAIT and CSTATE exits.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4019348b5e07148eb4d593380a5f6713b93c9a16.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-10 07:27:37 -05:00
Tom Lendacky
be8de49bea x86/speculation: Identify processors vulnerable to SMT RSB predictions
Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
transitions out of C0 state, the other sibling thread could use return
target predictions from the sibling thread that transitioned out of C0.

The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM to
prevent exiting guest mode when transitioning out of C0. A guest could
act maliciously in this situation, so create a new x86 BUG that can be
used to detect if the processor is vulnerable.

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <91cec885656ca1fcd4f0185ce403a53dd9edecb7.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-10 06:43:03 -05:00
Kan Liang
f545e8831e x86/cpu: Add Lunar Lake M
Intel confirmed the existence of this CPU in Q4'2022
earnings presentation.

Add the CPU model number.

[ dhansen: Merging these as soon as possible makes it easier
	   on all the folks developing model-specific features. ]

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230208172340.158548-1-tony.luck%40intel.com
2023-02-08 12:04:35 -08:00
Nadav Amit
ae052e3ae0 x86/kprobes: Fix 1 byte conditional jump target
Commit 3bc753c06d ("kbuild: treat char as always unsigned") broke
kprobes.  Setting a probe-point on 1 byte conditional jump can cause the
kernel to crash when the (signed) relative jump offset gets treated as
unsigned.

Fix by replacing the unsigned 'immediate.bytes' (plus a cast) with the
signed 'immediate.value' when assigning to the relative jump offset.

[ dhansen: clarified changelog ]

Fixes: 3bc753c06d ("kbuild: treat char as always unsigned")
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230208071708.4048-1-namit%40vmware.com
2023-02-08 12:03:27 -08:00
Joerg Roedel
9d2c7203ff x86/debug: Fix stack recursion caused by wrongly ordered DR7 accesses
In kernels compiled with CONFIG_PARAVIRT=n, the compiler re-orders the
DR7 read in exc_nmi() to happen before the call to sev_es_ist_enter().

This is problematic when running as an SEV-ES guest because in this
environment the DR7 read might cause a #VC exception, and taking #VC
exceptions is not safe in exc_nmi() before sev_es_ist_enter() has run.

The result is stack recursion if the NMI was caused on the #VC IST
stack, because a subsequent #VC exception in the NMI handler will
overwrite the stack frame of the interrupted #VC handler.

As there are no compiler barriers affecting the ordering of DR7
reads/writes, make the accesses to this register volatile, forbidding
the compiler to re-order them.

  [ bp: Massage text, make them volatile too, to make sure some
  aggressive compiler optimization pass doesn't discard them. ]

Fixes: 315562c9af ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Reported-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230127035616.508966-1-aik@amd.com
2023-01-31 12:51:19 +01:00
Linus Torvalds
bc6bc34b10 - Start checking for -mindirect-branch-cs-prefix clang support too now that LLVM
16 will support it
 
 - Fix a NULL ptr deref when suspending with Xen PV
 
 - Have a SEV-SNP guest check explicitly for features enabled by the hypervisor
   and fail gracefully if some are unsupported by the guest instead of failing in
   a non-obvious and hard-to-debug way
 
 - Fix a MSI descriptor leakage under Xen
 
 - Mark Xen's MSI domain as supporting MSI-X
 
 - Prevent legacy PIC interrupts from being resent in software by marking them
   level triggered, as they should be, which lead to a NULL ptr deref
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPWdaAACgkQEsHwGGHe
 VUrHUBAAvRh7cDVKvr2wPNJ+9RFjPFubcLq/+4yTe4Bu14wnYn8+gb2S7P2bPJWl
 TxbZhJGV2IeYyB+aJybjGwX96AzE5lWD6epQLRLI/BOVmqLA8Eyw9te0NEQDd9Wq
 hgY1/hhUmLdpXubq09YjIht5nlxVZ+JFfppouTR7LVtZl7MFvQbAOU8rWzpcbm7l
 UlA4GLakFeOYpbI5g+zzsZ1a1M0cSz8wQ43plD5aAtNy2wKbWiA4QDXJo9J7+ZQg
 1FmOHZ3ChPjEhf9k/N2uAZ6RUHVEDIJ7hDfsM3j/qsKGzCV4cho2Ify+0PFWo4pt
 FO2bRh1yDFqdz4m6ulhnmuMnoRnEuswwrTzrG+HYu1ntpUt72dULmmRBpJ0s6C7W
 BHpCThNWIlcfHbLNY+VAOJ2hiOqfU/8ld+8R0sn7xmomjgCRYHh9kDGOeuvh1hJl
 jfITDuL2gjcj3Ph6+xh6KvOga41ff3EcxkfvqolZ/emRllPiDWsKYXVgUIl22FHt
 23xH7gutPCp27MdoXM1EaWs5s/PQfctvm4LrW6XS8IjWURLo4RrkU6y7YD5VKVy8
 KCKcpF+JMdE1Ao1WpMFfjDG4ObyMYyqnD790nQVM1e5kIhOpeWaa7WzkGFRyIo6Q
 5BU3GGDyMT8SHy7NFhQL62YJe0P2ZctNIDjiTQlYDnWWhjmvk8I=
 =4QMN
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Start checking for -mindirect-branch-cs-prefix clang support too now
   that LLVM 16 will support it

 - Fix a NULL ptr deref when suspending with Xen PV

 - Have a SEV-SNP guest check explicitly for features enabled by the
   hypervisor and fail gracefully if some are unsupported by the guest
   instead of failing in a non-obvious and hard-to-debug way

 - Fix a MSI descriptor leakage under Xen

 - Mark Xen's MSI domain as supporting MSI-X

 - Prevent legacy PIC interrupts from being resent in software by
   marking them level triggered, as they should be, which lead to a NULL
   ptr deref

* tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
  acpi: Fix suspend with Xen PV
  x86/sev: Add SEV-SNP guest feature negotiation support
  x86/pci/xen: Fixup fallout from the PCI/MSI overhaul
  x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain
  x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
2023-01-29 11:17:34 -08:00
Linus Torvalds
b2f317173e ARM64:
- Pass the correct address to mte_clear_page_tags() on initialising
   a tagged page
 
 - Plug a race against a GICv4.1 doorbell interrupt while saving
   the vgic-v3 pending state.
 
 x86:
 
 - A command line parsing fix and a clang compilation fix for selftests
 
 - A fix for a longstanding VMX issue, that surprisingly was only found
   now to affect real world guests
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPM/foUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM18Af/ZygTp0zd0+ZEqI8lu6hi9MmL7pKu
 CbzjuJUD7iw8fUGZDyYpL7CrcAdQX7JC6cRjBQMq+9Zzh+QBc1SkkBoEwpHy/EoR
 xPOSlNmZGM3kQssqHhwC5ciLNYQQ9yEMAw0kTIoOw3/Aznjk70PUzjwIFC5fRTAB
 +ScOQj+9hkr9bzNTnIxY50Ewt6kwiZ7BEbL3a6CHCvkFkLnUAjwp/Ci6dIsqXsae
 Stlq/ZJi9QYw5Od4C0e63pfSG3MniaVT3aqisB3dEi8I4Tcpbsh7MaJf43ImFm56
 jEymmu/FYWXyMpV2Dlt3703SstXO8V9lVztsnbOVgU7/TEjFD5ADUOi7Dg==
 =WKnF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Pass the correct address to mte_clear_page_tags() on initialising a
     tagged page

   - Plug a race against a GICv4.1 doorbell interrupt while saving the
     vgic-v3 pending state.

  x86:

   - A command line parsing fix and a clang compilation fix for
     selftests

   - A fix for a longstanding VMX issue, that surprisingly was only
     found now to affect real world guests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: Make reclaim_period_ms input always be positive
  KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
  selftests: kvm: move declaration at the beginning of main()
  KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
  KVM: arm64: Pass the actual page address to mte_clear_page_tags()
2023-01-24 17:48:09 -08:00
Linus Torvalds
2475bf0250 - Make sure the scheduler doesn't use stale frequency scaling values when latter
get disabled due to a value error
 
 - Fix a NULL pointer access on UP configs
 
 - Use the proper locking when updating CPU capacity
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPNKRsACgkQEsHwGGHe
 VUr8NA/9GTqGUpOq0iRQkEOE1oAdT4ZxJ9dQpaWjwWSNv40HT7XJBzjtvzAyiOBF
 LTUVClBct5i1/j0tVcNq1zXEj3im2e24Ki6A1TWukejgGAMT/7siSkuChEDAMg2M
 b79nKCMpuIZMzkJND3qTkW/aMPpAyU82G8BeLCjw7vgPBsbVkjgbxGFKYdHgpLZa
 kdX/GhOufu40jcGeUxA4zWTpXfuXT3OG7JYLrlHeJ/HEdzy9kLCkWH4jHHllzPQw
 c4JwG7UnNkDKD6zkG0Guzi1zQy39egh/kaj7FQmVap9sneq+x69T4ta0BfXdBXnQ
 Vsqc/nnOybUzr9Gjg5W6KRZWk0k6hK9n3+cye88BfRvzMY0KAgxeCiZEn7cuKqZp
 15Agzz77vcwt32QsxjSp+hKQxJtePcBCurDhkfuAZyPELNIBeCW6inrXArprfTIg
 IfEF068GKsvDGu3I/z49VXFZ9YBoerQREHbN/xL1VNeB8VoAxAE227OMUvhMcIu1
 jxVvkESwc9BIybTcJbUjgg2i9t+Fv3gtZBQ6GFfDboHneia4U5a9aTyN6QGjX+5P
 SIXPhePbFCNrhW7JbSGqKBd96MczbFNQinRhgX1lBU0241cAnchY4nH9fUvv7Zrt
 b/QDg58tb2bJkm1Z08L256ZETELOU9nLJVxUbtBNSSO9dfkvoBs=
 =aNSK
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Make sure the scheduler doesn't use stale frequency scaling values
   when latter get disabled due to a value error

 - Fix a NULL pointer access on UP configs

 - Use the proper locking when updating CPU capacity

* tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
  sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
  sched/fair: Fixes for capacity inversion detection
  sched/uclamp: Fix a uninitialized variable warnings
2023-01-22 12:14:58 -08:00
Linus Torvalds
2b299a1cd4 - Add Emerald Rapids model support to more perf machinery
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPNJQMACgkQEsHwGGHe
 VUpplw//XSPjrP7Ikajhwv6qQVX0ZEfSnte5qA5B+t5qu59T1DQGhBllkV7pay+V
 PuEESYkOt/MfZ6lCyoG75w80dBMlVIGi3b0HsZ9bw0ejIxm33DMM/E1ZPsKBh8i5
 H/9u26Yo77LE2HMr8Nf8HyOlLrdvBqgQBV3VHgk/eLFZnJR3Yk/T1+8ZZzLpx+TA
 5BfKvXFHWqkpe8KF20A6OxC2FQfLx5hCiY0ozOk0ZHIKzTy9g+BVaDw3XV049nj/
 BttTvhV9bdimCxVgBWxUBt5n7QoR7zE5vfief4o9UkNHeut0LmjnwcfymFwL0j21
 8yVo+layNzhTYX/7wXQchz3LV+8ZvFvVZ1Iw3hhP0yQknbDBLtDHGLTy+p9t2cgS
 1nuMNX1bZHDAQ4vV3ehYqC+MmGIh55dTRDggLvxMOzOv+v1vXMR+mlVELL+Q8KJy
 7ynNfnP88EjCQfBpxe33D8VMFJ7LqsWAIjYu2upXrlVVR8mKbp2FnSjn9cK+Np75
 Y4132ESWSvn7mHSmmcRHOjdSUxmi1UgBsL689hF9Zkx8zpHJSLhUWwW4xKKwE/dd
 /jvwZ50KzjuJYWrn/ueJpubgyeoMSZY6Y+iaCl4aYZna0X7h+PF7qY4A8pB6YXpH
 A0uVCAlNL+lkGFKAq+TkOx1lCw/eR+zg2s2ClVUdB6e8qsYYX2o=
 =9JcB
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - Add Emerald Rapids model support to more perf machinery

* tag 'perf_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/cstate: Add Emerald Rapids
  perf/x86/intel: Add Emerald Rapids
2023-01-22 12:06:18 -08:00
Nathan Chancellor
27b5de622e x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
LLVM 16 will have support for this flag so move it out of the GCC-only
block to allow LLVM builds to take advantage of it.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1665
Link: 6f867f9102
Link: https://lore.kernel.org/r/20230120165826.2469302-1-nathan@kernel.org
2023-01-22 11:36:45 +01:00
Hendrik Borghorst
a44b331614 KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
When serializing and deserializing kvm_sregs, attributes of the segment
descriptors are stored by user space. For unusable segments,
vmx_segment_access_rights skips all attributes and sets them to 0.

This means we zero out the DPL (Descriptor Privilege Level) for unusable
entries.

Unusable segments are - contrary to their name - usable in 64bit mode and
are used by guests to for example create a linear map through the
NULL selector.

VMENTER checks if SS.DPL is correct depending on the CS segment type.
For types 9 (Execute Only) and 11 (Execute Read), CS.DPL must be equal to
SS.DPL [1].

We have seen real world guests setting CS to a usable segment with DPL=3
and SS to an unusable segment with DPL=3. Once we go through an sregs
get/set cycle, SS.DPL turns to 0. This causes the virtual machine to crash
reproducibly.

This commit changes the attribute logic to always preserve attributes for
unusable segments. According to [2] SS.DPL is always saved on VM exits,
regardless of the unusable bit so user space applications should have saved
the information on serialization correctly.

[3] specifies that besides SS.DPL the rest of the attributes of the
descriptors are undefined after VM entry if unusable bit is set. So, there
should be no harm in setting them all to the previous state.

[1] Intel SDM Vol 3C 26.3.1.2 Checks on Guest Segment Registers
[2] Intel SDM Vol 3C 27.3.2 Saving Segment Registers and Descriptor-Table
Registers
[3] Intel SDM Vol 3C 26.3.2.2 Loading Guest Segment Registers and
Descriptor-Table Registers

Cc: Alexander Graf <graf@amazon.de>
Cc: stable@vger.kernel.org
Signed-off-by: Hendrik Borghorst <hborghor@amazon.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221114164823.69555-1-hborghor@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-22 04:09:28 -05:00
Juergen Gross
fe0ba8c23f acpi: Fix suspend with Xen PV
Commit f1e5250094 ("x86/boot: Skip realmode init code when running as
Xen PV guest") missed one code path accessing real_mode_header, leading
to dereferencing NULL when suspending the system under Xen:

    [  348.284004] PM: suspend entry (deep)
    [  348.289532] Filesystems sync: 0.005 seconds
    [  348.291545] Freezing user space processes ... (elapsed 0.000 seconds) done.
    [  348.292457] OOM killer disabled.
    [  348.292462] Freezing remaining freezable tasks ... (elapsed 0.104 seconds) done.
    [  348.396612] printk: Suspending console(s) (use no_console_suspend to debug)
    [  348.749228] PM: suspend devices took 0.352 seconds
    [  348.769713] ACPI: EC: interrupt blocked
    [  348.816077] BUG: kernel NULL pointer dereference, address: 000000000000001c
    [  348.816080] #PF: supervisor read access in kernel mode
    [  348.816081] #PF: error_code(0x0000) - not-present page
    [  348.816083] PGD 0 P4D 0
    [  348.816086] Oops: 0000 [#1] PREEMPT SMP NOPTI
    [  348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 6.1.3-1.fc32.qubes.x86_64 #1
    [  348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 8.01 07/03/2022
    [  348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20

Fix that by adding an optional acpi callback allowing to skip setting
the wakeup address, as in the Xen PV case this will be handled by the
hypervisor anyway.

Fixes: f1e5250094 ("x86/boot: Skip realmode init code when running as Xen PV guest")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/all/20230117155724.22940-1-jgross%40suse.com
2023-01-19 13:52:05 -08:00
Nikunj A Dadhania
8c29f01654 x86/sev: Add SEV-SNP guest feature negotiation support
The hypervisor can enable various new features (SEV_FEATURES[1:63]) and start a
SNP guest. Some of these features need guest side implementation. If any of
these features are enabled without it, the behavior of the SNP guest will be
undefined.  It may fail booting in a non-obvious way making it difficult to
debug.

Instead of allowing the guest to continue and have it fail randomly later,
detect this early and fail gracefully.

The SEV_STATUS MSR indicates features which the hypervisor has enabled.  While
booting, SNP guests should ascertain that all the enabled features have guest
side implementation. In case a feature is not implemented in the guest, the
guest terminates booting with GHCB protocol Non-Automatic Exit(NAE) termination
request event, see "SEV-ES Guest-Hypervisor Communication Block Standardization"
document (currently at https://developer.amd.com/wp-content/resources/56421.pdf),
section "Termination Request".

Populate SW_EXITINFO2 with mask of unsupported features that the hypervisor can
easily report to the user.

More details in the AMD64 APM Vol 2, Section "SEV_STATUS MSR".

  [ bp:
    - Massage.
    - Move snp_check_features() call to C code.
    Note: the CC:stable@ aspect here is to be able to protect older, stable
    kernels when running on newer hypervisors. Or not "running" but fail
    reliably and in a well-defined manner instead of randomly. ]

Fixes: cbd3d4f7c4 ("x86/sev: Check SEV-SNP features support")
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230118061943.534309-1-nikunj@amd.com
2023-01-19 17:29:58 +01:00
Kan Liang
5a8a05f165 perf/x86/intel/cstate: Add Emerald Rapids
From the perspective of Intel cstate residency counters,
Emerald Rapids is the same as the Sapphire Rapids and Ice Lake.
Add Emerald Rapids model.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-2-kan.liang@linux.intel.com
2023-01-18 12:42:49 +01:00
Kan Liang
6795e558e9 perf/x86/intel: Add Emerald Rapids
From core PMU's perspective, Emerald Rapids is the same as the Sapphire
Rapids. The only difference is the event list, which will be
supported in the perf tool later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-1-kan.liang@linux.intel.com
2023-01-18 12:42:49 +01:00
Thomas Gleixner
6c796996ee x86/pci/xen: Fixup fallout from the PCI/MSI overhaul
David reported that the recent PCI/MSI rework results in MSI descriptor
leakage under XEN.

This is caused by:

  1) The missing MSI_FLAG_FREE_MSI_DESCS flag in the XEN MSI domain info,
     which is required now that PCI/MSI delegates descriptor freeing to
     the core MSI code.

  2) Not disassociating the interrupts on teardown, by setting the
     msi_desc::irq to 0. This was not required before because the teardown
     was unconditional and did not check whether a MSI descriptor was still
     connected to a Linux interrupt.

On further inspection it came to light that the MSI_FLAG_DEV_SYSFS is
missing in the XEN MSI domain info as well to restore the pre 6.2 status
quo.

Add the missing MSI flags and disassociate the MSI descriptor from the
Linux interrupt in the XEN specific teardown function.

Fixes: b2bdda205c ("PCI/MSI: Let the MSI core free descriptors")
Fixes: 2f2940d168 ("genirq/msi: Remove filter from msi_free_descs_free_range()")
Fixes: ffd84485e6 ("PCI/MSI: Let the irq code handle sysfs groups")
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/871qnunycr.ffs@tglx
2023-01-16 20:40:44 +01:00
David Woodhouse
0a3a58de31 x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain
The Xen MSI → PIRQ magic does support MSI-X, so advertise it.

(In fact it's better off with MSI-X than MSI, because it's actually
broken by design for 32-bit MSI, since it puts the high bits of the
PIRQ# into the high 32 bits of the MSI message address, instead of the
Extended Destination ID field which is in bits 4-11.

Strictly speaking, this really fixes a much older commit 2e4386eba0
("x86/xen: Wrap XEN MSI management into irqdomain") which failed to set
the flag. But that never really mattered until __pci_enable_msix_range()
started to check and bail out early. So in 6.2-rc we see failures e.g.
to bring up networking on an Amazon EC2 m4.16xlarge instance:

[   41.498694] ena 0000:00:03.0 (unnamed net_device) (uninitialized): Failed to enable MSI-X. irq_cnt -524
[   41.498705] ena 0000:00:03.0: Can not reserve msix vectors
[   41.498712] ena 0000:00:03.0: Failed to enable and set the admin interrupts

Side note: This is the first bug found, and first patch tested, by running
Xen guests under QEMU/KVM instead of running under actual Xen.

Fixes: 99f3d27976 ("PCI/MSI: Reject MSI-X early")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/4bffa69a949bfdc92c4a18e5a1c3cbb3b94a0d32.camel@infradead.org
2023-01-16 20:40:44 +01:00
Thomas Gleixner
5fa5595072 x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
Baoquan reported that after triggering a crash the subsequent crash-kernel
fails to boot about half of the time. It triggers a NULL pointer
dereference in the periodic tick code.

This happens because the legacy timer interrupt (IRQ0) is resent in
software which happens in soft interrupt (tasklet) context. In this context
get_irq_regs() returns NULL which leads to the NULL pointer dereference.

The reason for the resend is a spurious APIC interrupt on the IRQ0 vector
which is captured and leads to a resend when the legacy timer interrupt is
enabled. This is wrong because the legacy PIC interrupts are level
triggered and therefore should never be resent in software, but nothing
ever sets the IRQ_LEVEL flag on those interrupts, so the core code does not
know about their trigger type.

Ensure that IRQ_LEVEL is set when the legacy PCI interrupts are set up.

Fixes: a4633adcdb ("[PATCH] genirq: add genirq sw IRQ-retrigger")
Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/87mt6rjrra.ffs@tglx
2023-01-16 17:24:56 +01:00
Yair Podemsky
5f5cc9ed99 x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
Once disable_freq_invariance_work is called the scale_freq_tick function
will not compute or update the arch_freq_scale values.
However the scheduler will still read these values and use them.
The result is that the scheduler might perform unfair decisions based on stale
values.

This patch adds the step of setting the arch_freq_scale values for all
cpus to the default (max) value SCHED_CAPACITY_SCALE, Once all cpus
have the same arch_freq_scale value the scaling is meaningless.

Signed-off-by: Yair Podemsky <ypodemsk@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230110160206.75912-1-ypodemsk@redhat.com
2023-01-16 10:19:15 +01:00
Linus Torvalds
f0f70ddb8f - Make sure the poking PGD is pinned for Xen PV as it requires it this way
- Fixes for two resctrl races when moving a task or creating a new monitoring
   group
 
 - Fix SEV-SNP guests running under HyperV where MTRRs are disabled to not return
   a UC- type mapping type on memremap() and thus cause a serious slowdown
 
 - Fix insn mnemonics in bioscall.S now that binutils is starting to fix
   confusing insn suffixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPD5xsACgkQEsHwGGHe
 VUr65g/8CkfKQKIQ/kPn1B+M/PI4S8DBmz7CdufQTbB66GSfDwRpGnxIKJKZM1UG
 pyOmP1kHVXGGCsvFQimalxnBtFx6t3wFS+R+c/l5pRtgU63bwQpNjJ+vBcBpb4xs
 J7f06VgF6jB8qk0NqXBKNJt4kauZA50wiErYm8PU/yf0tmHratjrvDfKQmus9pI4
 AMcQEudRhDDo8xqn8fvIC/S7xm9/TgNzKxYP+HciPa74HZm5vrjS0hIyZkIsh5d7
 Q4MsP4VaBH12W6MbpF9TBw5VTDomwY8oS4xTNfkhyOTcHi8uLrMjA66VmaJwWXpe
 EZjOk4+KhtxYigaI5oQO9M5e9IINK6ZfbzU65P74wTvP3orszsBPG7QvodIzLc76
 YI1GEea/bXgxYgPuMR6spZoGMS58K7BXgViVMVRwZ9bZk17+5pfbLkzKqZsfw/s/
 nj3n8z4ayoc+ffDPllh0471Dn16ugKMf+EvW2Su1Q1QoaA7icNNkZrKaM6eSuoFr
 bClsrHeglQFadwy4kmY0fi7BUfiKTp+c5Ur7lM7VojrjH0XcoZreThVg0tNrk3ZR
 IAyBjtCdtg1rzrRfb7qBf6B+WNGdxYWGuDgOPiH1Xh+/plvyCqQ2/3AGWQAbu/qP
 to+IYmZg09mRpVBDYpCY4z6IzRbMJ9rRho4rNXCQUeAzSVMCrZE=
 =W2zR
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Make sure the poking PGD is pinned for Xen PV as it requires it this
   way

 - Fixes for two resctrl races when moving a task or creating a new
   monitoring group

 - Fix SEV-SNP guests running under HyperV where MTRRs are disabled to
   not return a UC- type mapping type on memremap() and thus cause a
   serious slowdown

 - Fix insn mnemonics in bioscall.S now that binutils is starting to fix
   confusing insn suffixes

* tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: fix poking_init() for Xen PV guests
  x86/resctrl: Fix event counts regression in reused RMIDs
  x86/resctrl: Fix task CLOSID/RMID update race
  x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
  x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
2023-01-15 07:17:44 -06:00
Linus Torvalds
9e058c2952 pci-v6.2-fixes-1
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmPBniAUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwOjRAAhjyRAgyiZV2rWS4pyvpQpqcpZWD9
 796ZSqnzLJjVYCymGvUTX23FEA48n59+bCM/WpfEGUPrBf8LZQxC9YOCm6ltuM8+
 FoSBykW/tHPq5IWaLzgrWpHeDOgEnZu/WFGGvrV3tl1mLpM1SJT8bGDsjHXlo+FM
 qkTEiA3nUEKQs5x9r2TTLCeUWGPNTIHNd2VfuxOqM3qC/nVCOfTTxU8nm6Lk7Eix
 nboAugAIADJIjs/+ZGekLBuzZYPkLYuDTyMYJ5hdo1p7wWCLc9gArEqvXKwVgmD3
 ptenZeOlQi9Ay45HmkfIgfgKeeQ7REJj3dx04vf67neAianyUrB0EZDqDjR7LmgM
 ozlNt0XjyoeEhu6AQS0s1LZtbDiED1R/00P6Gb+YEjUCVipW2lEYYwP0v9dsnNoh
 6wblgnkQoxLFM+5CAXRmCmpaoQn0Uam7okfVeohtsz8/kNQF2St0hjzr4Dmws+O3
 k9PUqnnUl4ByElzpEDesVGZMJ3pxFVH15ufu8VnRqN60pLTvNrsPyU4cVnG176Rc
 3RSDN3zMtPxnHJVy4r3bTNEZsX/7RUrOb4xScOXMmRDBMUc8QdscF8Oj1ucKlj5j
 mp7vB/7+VjU96uRarRyqUxGeQc77DCTcvOa1IGh/cuYom8ZJ6vpSCpKy6f6SFGuf
 i8iTTUcQKCdqVW4=
 =Fv2v
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci fixes from Bjorn Helgaas:

 - Work around apparent firmware issue that made Linux reject MMCONFIG
   space, which broke PCI extended config space (Bjorn Helgaas)

 - Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a
   PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas
   Bulwahn)

* tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
  x86/pci: Simplify is_mmconf_reserved() messages
  PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN
2023-01-13 17:32:22 -06:00
Linus Torvalds
92783a90bc ARM:
* Fix the PMCR_EL0 reset value after the PMU rework
 
 * Correctly handle S2 fault triggered by a S1 page table walk
   by not always classifying it as a write, as this breaks on
   R/O memslots
 
 * Document why we cannot exit with KVM_EXIT_MMIO when taking
   a write fault from a S1 PTW on a R/O memslot
 
 * Put the Apple M2 on the naughty list for not being able to
   correctly implement the vgic SEIS feature, just like the M1
   before it
 
 * Reviewer updates: Alex is stepping down, replaced by Zenghui
 
 x86:
 
 * Fix various rare locking issues in Xen emulation and teach lockdep
   to detect them
 
 * Documentation improvements
 
 * Do not return host topology information from KVM_GET_SUPPORTED_CPUID
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPAT3EUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPmDAf+ICCVMwgm+PjAc6NuXzaUk6BFGWKF
 1lzMvnKb6ARnhMKwyjl/Sf5EgnTuucnSTBHuE1kjaLkPUDNJvi4oRXVdDwKjtXnZ
 Zxk4dpsNLWVfALHTk1KweIkR5KNif0kugUh9RNp6zOBnoTVRh8XdCHpeDv73tJaG
 R1gCAreVTDbp+wNrVpiImUfYAZ4GrGpwwWRH/xLAGDWoTL9Z9J5tQygf+0C429n/
 eJoTrToLjESbYadDgCNDD+TUkHbeDVg8aeio2JZga9SvH3RBhwriLqz26v9yvikL
 UoY96AySMaiox4pgCUYUl8nng8MR8AG4C4vpNnLalj7tfHxRfhtAwD0EYw==
 =gDOV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix the PMCR_EL0 reset value after the PMU rework

   - Correctly handle S2 fault triggered by a S1 page table walk by not
     always classifying it as a write, as this breaks on R/O memslots

   - Document why we cannot exit with KVM_EXIT_MMIO when taking a write
     fault from a S1 PTW on a R/O memslot

   - Put the Apple M2 on the naughty list for not being able to
     correctly implement the vgic SEIS feature, just like the M1 before
     it

   - Reviewer updates: Alex is stepping down, replaced by Zenghui

  x86:

   - Fix various rare locking issues in Xen emulation and teach lockdep
     to detect them

   - Documentation improvements

   - Do not return host topology information from KVM_GET_SUPPORTED_CPUID"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
  KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
  KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
  KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
  Documentation: kvm: fix SRCU locking order docs
  KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  KVM: nSVM: clarify recalc_intercepts() wrt CR8
  MAINTAINERS: Remove myself as a KVM/arm64 reviewer
  MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
  KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations
  KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
  KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
  KVM: arm64: Fix S1PTW handling on RO memslots
  KVM: arm64: PMU: Fix PMCR_EL0 reset value
2023-01-13 14:41:50 -06:00
Bjorn Helgaas
fd3a8cff4d x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
Normally we reject ECAM space unless it is reported as reserved in the E820
table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2).

07eab0901e ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
E820 entries that correspond to EfiMemoryMappedIO regions because some
other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
E820 entries prevent Linux from allocating BAR space for hot-added devices.

Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
normally converted to an E820 entry by a bootloader or EFI stub.  After
07eab0901e, that E820 entry is removed, so we reject this ECAM space,
which makes PCI extended config space (offsets 0x100-0xfff) inaccessible.

The lack of extended config space breaks anything that relies on it,
including perf, VSEC telemetry, EDAC, QAT, SR-IOV, etc.

Allow use of ECAM for extended config space when the region is covered by
an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
_CRS.

Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216891
Fixes: 07eab0901e ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
Link: https://lore.kernel.org/r/20230110180243.1590045-3-helgaas@kernel.org
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reported-by: Yunying Sun <yunying.sun@intel.com>
Reported-by: Baowen Zheng <baowen.zheng@corigine.com>
Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reported-by: Yang Lixiao <lixiao.yang@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
2023-01-13 11:53:54 -06:00
Linus Torvalds
d45b832d6f arm64 fixes for -rc4
- Fix PAGE_TABLE_CHECK failures on hugepage splitting path
 
 - Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header
 
 - Fix NULL deref when accessing debugfs node if PSCI is not present
 
 - Fix MTE core dumping when VMA list is being updated concurrently
 
 - Fix SME signal frame handling when SVE is not implemented by the CPU
 
 - Fix asm constraints for cmpxchg_double() to hazard both words
 
 - Fix build failure with stack tracer and older versions of Clang
 
 - Bring back workaround for Cortex-A715 erratum 2645198
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmO9SzwQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNLdYB/9pX4El38TX4Y4M6sR2yl+m1rkGRiU4nV3N
 MKJ3ZVjrx87QZ8CKVYmJbnHzolN0Art9WvqFnyxtPMBlZyWzHjtsrQnad3VwLDOu
 4qmqjDCXvPod1EncCxBiGu28FZ88HoLqhnwWB6O2Su6TlczD0kJTfzincdyzqvi2
 r0uUlBd9gtFt3sjV+sLPjE6NqMf9MfhoOLLafijz7ZMElQL+2/BjZxhpHLaWhUz1
 aHIp4w841TJOuSlCwstX20Nc6Q9+6ta07bw+TD/flyQ+IGUptgDEoIrpjdSO5b2t
 zFFHHN5IXovAJPDfhAdXGAbC2SDFyYJtURCpv6hVt/SSsilGEbYg
 =241k
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "Here's a sizeable batch of Friday the 13th arm64 fixes for -rc4. What
  could possibly go wrong?

  The obvious reason we have so much here is because of the holiday
  season right after the merge window, but we've also brought back an
  erratum workaround that was previously dropped at the last minute and
  there's an MTE coredumping fix that strays outside of the arch/arm64
  directory.

  Summary:

   - Fix PAGE_TABLE_CHECK failures on hugepage splitting path

   - Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header

   - Fix NULL deref when accessing debugfs node if PSCI is not present

   - Fix MTE core dumping when VMA list is being updated concurrently

   - Fix SME signal frame handling when SVE is not implemented by the
     CPU

   - Fix asm constraints for cmpxchg_double() to hazard both words

   - Fix build failure with stack tracer and older versions of Clang

   - Bring back workaround for Cortex-A715 erratum 2645198"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y
  arm64/mm: Define dummy pud_user_exec() when using 2-level page-table
  arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
  firmware/psci: Don't register with debugfs if PSCI isn't available
  firmware/psci: Fix MEM_PROTECT_RANGE function numbers
  arm64/signal: Always allocate SVE signal frames on SME only systems
  arm64/signal: Always accept SVE signal frames on SME only systems
  arm64/sme: Fix context switch for SME only systems
  arm64: cmpxchg_double*: hazard against entire exchange variable
  arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning
  arm64: mte: Avoid the racy walk of the vma list during core dump
  elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
  arm64: mte: Fix double-freeing of the temporary tag storage during coredump
  arm64: ptrace: Use ARM64_SME to guard the SME register enumerations
  arm64/mm: add pud_user_exec() check in pud_user_accessible_page()
  arm64/mm: fix incorrect file_map_count for invalid pmd
2023-01-13 07:11:45 -06:00
Linus Torvalds
bad8c4a850 xen: branch for v6.2-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY76ohgAKCRCAXGG7T9hj
 vo8fAP0XJ94B7asqcN4W3EyeyfqxUf1eZvmWRhrbKqpLnmHLaQEA/uJBkXL49Zj7
 TTcbxR1coJ/hPwhtmONU4TNtCZ+RXw0=
 =2Ib5
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - two cleanup patches

 - a fix of a memory leak in the Xen pvfront driver

 - a fix of a locking issue in the Xen hypervisor console driver

* tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvcalls: free active map buffer on pvcalls_front_free_map
  hvc/xen: lock console list traversal
  x86/xen: Remove the unused function p2m_index()
  xen: make remove callback of xen driver void returned
2023-01-12 17:02:20 -06:00
Juergen Gross
26ce6ec364 x86/mm: fix poking_init() for Xen PV guests
Commit 3f4c8211d9 ("x86/mm: Use mm_alloc() in poking_init()") broke
the kernel for running as Xen PV guest.

It seems as if the new address space is never activated before being
used, resulting in Xen rejecting to accept the new CR3 value (the PGD
isn't pinned).

Fix that by adding the now missing call of paravirt_arch_dup_mmap() to
poking_init(). That call was previously done by dup_mm()->dup_mmap() and
it is a NOP for all cases but for Xen PV, where it is just doing the
pinning of the PGD.

Fixes: 3f4c8211d9 ("x86/mm: Use mm_alloc() in poking_init()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230109150922.10578-1-jgross@suse.com
2023-01-12 11:22:20 +01:00
David Woodhouse
310bc39546 KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
In commit 14243b3871 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN
and event channel delivery") the clever version of me left some helpful
notes for those who would come after him:

       /*
        * For the irqfd workqueue, using the main kvm->lock mutex is
        * fine since this function is invoked from kvm_set_irq() with
        * no other lock held, no srcu. In future if it will be called
        * directly from a vCPU thread (e.g. on hypercall for an IPI)
        * then it may need to switch to using a leaf-node mutex for
        * serializing the shared_info mapping.
        */
       mutex_lock(&kvm->lock);

In commit 2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
the other version of me ran straight past that comment without reading it,
and introduced a potential deadlock by taking vcpu->mutex and kvm->lock
in the wrong order.

Solve this as originally suggested, by adding a leaf-node lock in the Xen
state rather than using kvm->lock for it.

Fixes: 2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-4-dwmw2@infradead.org>
[Rebase, add docs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 17:45:58 -05:00
Bjorn Helgaas
a48fe63769 x86/pci: Simplify is_mmconf_reserved() messages
is_mmconf_reserved() takes a "with_e820" parameter that only determines the
message logged if it finds the MMCONFIG region is reserved.  Pass the
message directly, which will simplify a future patch that adds a new way of
looking for that reservation.  No functional change intended.

Link: https://lore.kernel.org/r/20230110180243.1590045-2-helgaas@kernel.org
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2023-01-11 16:28:09 -06:00
David Woodhouse
bbe17c625d KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
The kvm_xen_update_runstate_guest() function can be called when the vCPU
is being scheduled out, from a preempt notifier. It *opportunistically*
updates the runstate area in the guest memory, if the gfn_to_pfn_cache
which caches the appropriate address is still valid.

If there is *contention* when it attempts to obtain gpc->lock, then
locking inside the priority inheritance checks may cause a deadlock.
Lockdep reports:

[13890.148997] Chain exists of:
                 &gpc->lock --> &p->pi_lock --> &rq->__lock

[13890.149002]  Possible unsafe locking scenario:

[13890.149003]        CPU0                    CPU1
[13890.149004]        ----                    ----
[13890.149005]   lock(&rq->__lock);
[13890.149007]                                lock(&p->pi_lock);
[13890.149009]                                lock(&rq->__lock);
[13890.149011]   lock(&gpc->lock);
[13890.149013]
                *** DEADLOCK ***

In the general case, if there's contention for a read lock on gpc->lock,
that's going to be because something else is either invalidating or
revalidating the cache. Either way, we've raced with seeing it in an
invalid state, in which case we would have aborted the opportunistic
update anyway.

So in the 'atomic' case when called from the preempt notifier, just
switch to using read_trylock() and avoid the PI handling altogether.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 13:32:21 -05:00
David Woodhouse
23e60258ae KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
In commit 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate
area") we declared it safe to obtain two gfn_to_pfn_cache locks at the same
time:
	/*
	 * The guest's runstate_info is split across two pages and we
	 * need to hold and validate both GPCs simultaneously. We can
	 * declare a lock ordering GPC1 > GPC2 because nothing else
	 * takes them more than one at a time.
	 */

However, we forgot to tell lockdep. Do so, by setting a subclass on the
first lock before taking the second.

Fixes: 5ec3289b31 ("KVM: x86/xen: Compatibility fixes for shared runstate area")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-1-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 13:32:21 -05:00
Peter Newman
2a81160d29 x86/resctrl: Fix event counts regression in reused RMIDs
When creating a new monitoring group, the RMID allocated for it may have
been used by a group which was previously removed. In this case, the
hardware counters will have non-zero values which should be deducted
from what is reported in the new group's counts.

resctrl_arch_reset_rmid() initializes the prev_msr value for counters to
0, causing the initial count to be charged to the new group. Resurrect
__rmid_read() and use it to initialize prev_msr correctly.

Unlike before, __rmid_read() checks for error bits in the MSR read so
that callers don't need to.

Fixes: 1d81d15db3 ("x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221220164132.443083-1-peternewman@google.com
2023-01-10 19:51:59 +01:00
Peter Newman
fe1f071438 x86/resctrl: Fix task CLOSID/RMID update race
When the user moves a running task to a new rdtgroup using the task's
file interface or by deleting its rdtgroup, the resulting change in
CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the
task(s) CPUs.

x86 allows reordering loads with prior stores, so if the task starts
running between a task_curr() check that the CPU hoisted before the
stores in the CLOSID/RMID update then it can start running with the old
CLOSID/RMID until it is switched again because __rdtgroup_move_task()
failed to determine that it needs to be interrupted to obtain the new
CLOSID/RMID.

Refer to the diagram below:

CPU 0                                   CPU 1
-----                                   -----
__rdtgroup_move_task():
  curr <- t1->cpu->rq->curr
                                        __schedule():
                                          rq->curr <- t1
                                        resctrl_sched_in():
                                          t1->{closid,rmid} -> {1,1}
  t1->{closid,rmid} <- {2,2}
  if (curr == t1) // false
   IPI(t1->cpu)

A similar race impacts rdt_move_group_tasks(), which updates tasks in a
deleted rdtgroup.

In both cases, use smp_mb() to order the task_struct::{closid,rmid}
stores before the loads in task_curr().  In particular, in the
rdt_move_group_tasks() case, simply execute an smp_mb() on every
iteration with a matching task.

It is possible to use a single smp_mb() in rdt_move_group_tasks(), but
this would require two passes and a means of remembering which
task_structs were updated in the first loop. However, benchmarking
results below showed too little performance impact in the simple
approach to justify implementing the two-pass approach.

Times below were collected using `perf stat` to measure the time to
remove a group containing a 1600-task, parallel workload.

CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads)

  # mkdir /sys/fs/resctrl/test
  # echo $$ > /sys/fs/resctrl/test/tasks
  # perf bench sched messaging -g 40 -l 100000

task-clock time ranges collected using:

  # perf stat rmdir /sys/fs/resctrl/test

Baseline:                     1.54 - 1.60 ms
smp_mb() every matching task: 1.57 - 1.67 ms

  [ bp: Massage commit message. ]

Fixes: ae28d1aae4 ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR")
Fixes: 0efc89be94 ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221220161123.432120-1-peternewman@google.com
2023-01-10 19:47:30 +01:00
Juergen Gross
90b926e68f x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
Since

  72cbc8f04f ("x86/PAT: Have pat_enabled() properly reflect state when running on Xen")

PAT can be enabled without MTRR.

This has resulted in problems e.g. for a SEV-SNP guest running under Hyper-V,
when trying to establish a new mapping via memremap() with WB caching mode, as
pat_x_mtrr_type() will call mtrr_type_lookup(), which in turn is returning
MTRR_TYPE_INVALID due to MTRR being disabled in this configuration.

The result is a mapping with UC- caching, leading to severe performance
degradation.

Fix that by handling MTRR_TYPE_INVALID the same way as MTRR_TYPE_WRBACK
in pat_x_mtrr_type() because MTRR_TYPE_INVALID means MTRRs are disabled.

  [ bp: Massage commit message. ]

Fixes: 72cbc8f04f ("x86/PAT: Have pat_enabled() properly reflect state when running on Xen")
Reported-by: Michael Kelley (LINUX) <mikelley@microsoft.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230110065427.20767-1-jgross@suse.com
2023-01-10 17:21:53 +01:00
Peter Zijlstra
7c6dd961d0 x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
With 'GNU assembler (GNU Binutils for Debian) 2.39.90.20221231' the
build now reports:

  arch/x86/realmode/rm/../../boot/bioscall.S: Assembler messages:
  arch/x86/realmode/rm/../../boot/bioscall.S:35: Warning: found `movsd'; assuming `movsl' was meant
  arch/x86/realmode/rm/../../boot/bioscall.S:70: Warning: found `movsd'; assuming `movsl' was meant

  arch/x86/boot/bioscall.S: Assembler messages:
  arch/x86/boot/bioscall.S:35: Warning: found `movsd'; assuming `movsl' was meant
  arch/x86/boot/bioscall.S:70: Warning: found `movsd'; assuming `movsl' was meant

Which is due to:

  PR gas/29525

  Note that with the dropped CMPSD and MOVSD Intel Syntax string insn
  templates taking operands, mixed IsString/non-IsString template groups
  (with memory operands) cannot occur anymore. With that
  maybe_adjust_templates() becomes unnecessary (and is hence being
  removed).

More details: https://sourceware.org/bugzilla/show_bug.cgi?id=29525

Borislav Petkov further explains:

  " the particular problem here is is that the 'd' suffix is
    "conflicting" in the sense that you can have SSE mnemonics like movsD %xmm...
    and the same thing also for string ops (which is the case here) so apparently
    the agreement in binutils land is to use the always accepted suffixes 'l' or 'q'
    and phase out 'd' slowly... "

Fixes: 7a734e7dd9 ("x86, setup: "glove box" BIOS calls -- infrastructure")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y71I3Ex2pvIxMpsP@hirez.programming.kicks-ass.net
2023-01-10 13:03:23 +01:00
Kan Liang
5268a28420 perf/x86/intel/uncore: Add Emerald Rapids
From the perspective of the uncore PMU, the new Emerald Rapids is the
same as the Sapphire Rapids. The only difference is the event list,
which will be supported in the perf tool later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-4-kan.liang@linux.intel.com
2023-01-09 12:00:58 +01:00
Kan Liang
69ced41609 perf/x86/msr: Add Emerald Rapids
The same as Sapphire Rapids, the SMI_COUNT MSR is also supported on
Emerald Rapids. Add Emerald Rapids model.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-3-kan.liang@linux.intel.com
2023-01-09 12:00:52 +01:00
Kan Liang
6887a4d3ae perf/x86/msr: Add Meteor Lake support
Meteor Lake is Intel's successor to Raptor lake. PPERF and SMI_COUNT MSRs
are also supported.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/20230104201349.1451191-7-kan.liang@linux.intel.com
2023-01-09 12:00:52 +01:00
Kan Liang
01f2ea5bcf perf/x86/cstate: Add Meteor Lake support
Meteor Lake is Intel's successor to Raptor lake. From the perspective of
Intel cstate residency counters, there is nothing changed compared with
Raptor lake.

Share adl_cstates with Raptor lake.
Update the comments for Meteor Lake.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/20230104201349.1451191-6-kan.liang@linux.intel.com
2023-01-09 12:00:51 +01:00
Paolo Bonzini
45e966fcca KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
Passing the host topology to the guest is almost certainly wrong
and will confuse the scheduler.  In addition, several fields of
these CPUID leaves vary on each processor; it is simply impossible to
return the right values from KVM_GET_SUPPORTED_CPUID in such a way that
they can be passed to KVM_SET_CPUID2.

The values that will most likely prevent confusion are all zeroes.
Userspace will have to override it anyway if it wishes to present a
specific topology to the guest.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-09 05:35:21 -05:00
Paolo Bonzini
74905e3de8 KVM: nSVM: clarify recalc_intercepts() wrt CR8
The mysterious comment "We only want the cr8 intercept bits of L1"
dates back to basically the introduction of nested SVM, back when
the handling of "less typical" hypervisors was very haphazard.
With the development of kvm-unit-tests for interrupt handling,
the same code grew another vmcb_clr_intercept for the interrupt
window (VINTR) vmexit, this time with a comment that is at least
decent.

It turns out however that the same comment applies to the CR8 write
intercept, which is also a "recheck if an interrupt should be
injected" intercept.  The CR8 read intercept instead has not
been used by KVM for 14 years (commit 649d68643e, "KVM: SVM:
sync TPR value to V_TPR field in the VMCB"), so do not bother
clearing it and let one comment describe both CR8 write and VINTR
handling.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-09 05:35:21 -05:00
Jiapeng Chong
37c1785609 x86/xen: Remove the unused function p2m_index()
The function p2m_index is defined in the p2m.c file, but not called
elsewhere, so remove this unused function.

arch/x86/xen/p2m.c:137:24: warning: unused function 'p2m_index'.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3557
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230105090141.36248-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-01-09 07:54:28 +01:00
Linus Torvalds
d7a0853d65 Intel RAPL updates for new model IDs.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmO4CGwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gCjg//YdjAjNrJ9HkLfhmEtYSA1fCXef0f9nD7
 ddEY7FIHa+yTE5QJ3o3Ag4ZPVaHUHkD2t936pRMcZ6EOsbDldlS6R5c5P1ISpzvl
 SAPv19iWoVKqFjjFv0ftocb6yO6bVizyUea0TV8yp2+3ih3C5pj1KhRJ/IHJaClX
 ohyK6APMdYlffeb8VfLegM8Kr8E0bdn5FZyX8LmBuUi1PyYe4x/Bo3ZW3QhgtgI2
 Rnm8bXC1dFkwqNHUFbw8dwkCKQe9jePc2VBjwah251X+M3RSIEoF7RHmPjUS1zpc
 rn9o3elifV45D6g6A74wXJ+7eH2DRkDsXXyfeEB5dtrt5HBzhROfrw2PIyywAC91
 B01HLMDaB/jRux+7i2uj2KZO1hwEWT3a3Y+GunB2ZrPqD2WvqASKDcs9wfe6kfuv
 awOrip50sKZb6KmQgWq/xIDutpzBeK/smX49+rnY9ot/jXxBqsKgk9L6U7Qt6RiH
 MpbT5EnT5149bVp6sUqZk4SmWjYa3El0f4X93Weqfxx6rxsuZYq70QVMdktnI9A+
 VTeQE4525wQxu+oENQhB86iv63sRsk4i53LmyIkaV/KJxoAK1X1G9kNQ6fIrT7Ad
 7vwrmeA1sLst8H76WpDGuEk9rngAg7SS3IorS0QPupc494eCgs6vazAvUS5pAOu2
 ZdA365eMKe8=
 =5mg2
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
 "Intel RAPL updates for new model IDs"

* tag 'perf-urgent-2023-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/rapl: Add support for Intel Emerald Rapids
  perf/x86/rapl: Add support for Intel Meteor Lake
  perf/x86/rapl: Treat Tigerlake like Icelake
2023-01-06 11:20:12 -08:00
Catalin Marinas
19e183b545 elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
A subsequent fix for arm64 will use this parameter to parse the vma
information from the snapshot created by dump_vma_snapshot() rather than
traversing the vma list without the mmap_lock.

Fixes: 6dd8b1a0b6 ("arm64: mte: Dump the MTE tags in the core file")
Cc: <stable@vger.kernel.org> # 5.18.x
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Seth Jenkins <sethjenkins@google.com>
Suggested-by: Seth Jenkins <sethjenkins@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221222181251.1345752-3-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2023-01-05 15:12:12 +00:00