IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC).
The feature support is identified via CPUID Fn8000_0020_EBX_x0[3]: EVT_CFG -
Bandwidth Monitoring Event Configuration (BMEC)
The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes are set to
count all the total and local reads/writes, respectively. With the introduction
of slow memory, the two counters are not enough to count all the different types
of memory events. Therefore, BMEC provides the option to configure
mbm_total_bytes and mbm_local_bytes to count the specific type of events.
Each BMEC event has a configuration MSR which contains one field for each
bandwidth type that can be used to configure the bandwidth event to track any
combination of supported bandwidth types. The event will count requests from
every bandwidth type bit that is set in the corresponding configuration
register.
Following are the types of events supported:
==== ========================================================
Bits Description
==== ========================================================
6 Dirty Victims from the QOS domain to all types of memory
5 Reads to slow memory in the non-local NUMA domain
4 Reads to slow memory in the local NUMA domain
3 Non-temporal writes to non-local NUMA domain
2 Non-temporal writes to local NUMA domain
1 Reads to memory in the non-local NUMA domain
0 Reads to memory in the local NUMA domain
==== ========================================================
By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to 0x15 to
count all the local memory events.
Feature description is available in the specification, "AMD64 Technology
Platform Quality of Service Extensions, Revision: 1.03 Publication" at
https://bugzilla.kernel.org/attachment.cgi?id=301365
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-5-babu.moger@amd.com
Add a new resource type RDT_RESOURCE_SMBA to handle the QoS enforcement
policies on the external slow memory.
Mostly initialization of the essentials. Setting fflags to RFTYPE_RES_MB
configures the SMBA resource to have the same resctrl files as the
existing MBA resource. The SMBA resource has identical properties to
the existing MBA resource. These properties will be enumerated in an
upcoming change and exposed via resctrl because of this flag.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-4-babu.moger@amd.com
Add the new AMD feature X86_FEATURE_SMBA. With it, the QOS enforcement policies
can be applied to external slow memory connected to the host. QOS enforcement is
accomplished by assigning a Class Of Service (COS) to a processor and specifying
allocations or limits for that COS for each resource to be allocated.
This feature is identified by the CPUID function 0x8000_0020_EBX_x0[2]:
L3SBE - L3 external slow memory bandwidth enforcement.
CXL.memory is the only supported "slow" memory device. With SMBA, the hardware
enables bandwidth allocation on the slow memory devices. If there are multiple
slow memory devices in the system, then the throttling logic groups all the slow
sources together and applies the limit on them as a whole.
The presence of the SMBA feature (with CXL.memory) is independent of whether
slow memory device is actually present in the system. If there is no slow memory
in the system, then setting a SMBA limit will have no impact on the performance
of the system.
Presence of CXL memory can be identified by the numactl command:
$numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
node 0 size: 63678 MB node 0 free: 59542 MB
node 1 cpus:
node 1 size: 16122 MB
node 1 free: 15627 MB
node distances:
node 0 1
0: 10 50
1: 50 10
CPU list for CXL memory will be empty. The cpu-cxl node distance is greater than
cpu-to-cpu distances. Node 1 has the CXL memory in this case. CXL memory can
also be identified using ACPI SRAT table and memory maps.
Feature description is available in the specification, "AMD64 Technology
Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
Revision: 1.03 Issue Date: February 2022" at
https://bugzilla.kernel.org/attachment.cgi?id=301365
See also https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-3-babu.moger@amd.com
on_each_cpu_mask() runs the function on each CPU specified by cpumask,
which may include the local processor.
Replace smp_call_function_many() with on_each_cpu_mask() to simplify
the code.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-2-babu.moger@amd.com
get disabled due to a value error
- Fix a NULL pointer access on UP configs
- Use the proper locking when updating CPU capacity
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPNKRsACgkQEsHwGGHe
VUr8NA/9GTqGUpOq0iRQkEOE1oAdT4ZxJ9dQpaWjwWSNv40HT7XJBzjtvzAyiOBF
LTUVClBct5i1/j0tVcNq1zXEj3im2e24Ki6A1TWukejgGAMT/7siSkuChEDAMg2M
b79nKCMpuIZMzkJND3qTkW/aMPpAyU82G8BeLCjw7vgPBsbVkjgbxGFKYdHgpLZa
kdX/GhOufu40jcGeUxA4zWTpXfuXT3OG7JYLrlHeJ/HEdzy9kLCkWH4jHHllzPQw
c4JwG7UnNkDKD6zkG0Guzi1zQy39egh/kaj7FQmVap9sneq+x69T4ta0BfXdBXnQ
Vsqc/nnOybUzr9Gjg5W6KRZWk0k6hK9n3+cye88BfRvzMY0KAgxeCiZEn7cuKqZp
15Agzz77vcwt32QsxjSp+hKQxJtePcBCurDhkfuAZyPELNIBeCW6inrXArprfTIg
IfEF068GKsvDGu3I/z49VXFZ9YBoerQREHbN/xL1VNeB8VoAxAE227OMUvhMcIu1
jxVvkESwc9BIybTcJbUjgg2i9t+Fv3gtZBQ6GFfDboHneia4U5a9aTyN6QGjX+5P
SIXPhePbFCNrhW7JbSGqKBd96MczbFNQinRhgX1lBU0241cAnchY4nH9fUvv7Zrt
b/QDg58tb2bJkm1Z08L256ZETELOU9nLJVxUbtBNSSO9dfkvoBs=
=aNSK
-----END PGP SIGNATURE-----
Merge tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Make sure the scheduler doesn't use stale frequency scaling values
when latter get disabled due to a value error
- Fix a NULL pointer access on UP configs
- Use the proper locking when updating CPU capacity
* tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
sched/fair: Fixes for capacity inversion detection
sched/uclamp: Fix a uninitialized variable warnings
Commit a6de553da01c ("kbuild: Allow to combine multiple W= levels")
supported W=123 to enable all the extra warning groups.
I think a similar idea is applicable to the V= option.
V=1 echos the whole command
V=2 prints the reason for rebuilding
These are orthogonal, and can be enabled at the same time.
This commit supports V=12 to enable both of them.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
LLVM 16 will have support for this flag so move it out of the GCC-only
block to allow LLVM builds to take advantage of it.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1665
Link: 6f867f9102
Link: https://lore.kernel.org/r/20230120165826.2469302-1-nathan@kernel.org
When serializing and deserializing kvm_sregs, attributes of the segment
descriptors are stored by user space. For unusable segments,
vmx_segment_access_rights skips all attributes and sets them to 0.
This means we zero out the DPL (Descriptor Privilege Level) for unusable
entries.
Unusable segments are - contrary to their name - usable in 64bit mode and
are used by guests to for example create a linear map through the
NULL selector.
VMENTER checks if SS.DPL is correct depending on the CS segment type.
For types 9 (Execute Only) and 11 (Execute Read), CS.DPL must be equal to
SS.DPL [1].
We have seen real world guests setting CS to a usable segment with DPL=3
and SS to an unusable segment with DPL=3. Once we go through an sregs
get/set cycle, SS.DPL turns to 0. This causes the virtual machine to crash
reproducibly.
This commit changes the attribute logic to always preserve attributes for
unusable segments. According to [2] SS.DPL is always saved on VM exits,
regardless of the unusable bit so user space applications should have saved
the information on serialization correctly.
[3] specifies that besides SS.DPL the rest of the attributes of the
descriptors are undefined after VM entry if unusable bit is set. So, there
should be no harm in setting them all to the previous state.
[1] Intel SDM Vol 3C 26.3.1.2 Checks on Guest Segment Registers
[2] Intel SDM Vol 3C 27.3.2 Saving Segment Registers and Descriptor-Table
Registers
[3] Intel SDM Vol 3C 26.3.2.2 Loading Guest Segment Registers and
Descriptor-Table Registers
Cc: Alexander Graf <graf@amazon.de>
Cc: stable@vger.kernel.org
Signed-off-by: Hendrik Borghorst <hborghor@amazon.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221114164823.69555-1-hborghor@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make early loading message match late loading message and print both old
and new revisions.
This is helpful to know what the BIOS loaded revision is before an early
update.
Cache the early BIOS revision before the microcode update and have
print_ucode_info() print both the old and new revision in the same
format as microcode_reload_late().
[ bp: Massage, remove useless comment. ]
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230120161923.118882-6-ashok.raj@intel.com
print_ucode_info() takes a struct ucode_cpu_info pointer as parameter.
Its sole purpose is to print the microcode revision.
The only available ucode_cpu_info always describes the currently loaded
microcode revision. After a microcode update is successful, this is the
new revision, or on failure it is the original revision.
In preparation for future changes, replace the struct ucode_cpu_info
pointer parameter with a plain integer which contains the revision
number and adjust the call sites accordingly.
No functional change.
[ bp:
- Fix + cleanup commit message.
- Revert arbitrary, unrelated change.
]
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230120161923.118882-5-ashok.raj@intel.com
During late microcode loading, the "Reload completed" message is issued
unconditionally, regardless of success or failure.
Adjust the message to report the result of the update.
[ bp: Massage. ]
Fixes: 9bd681251b7c ("x86/microcode: Announce reload operation's completion")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/lkml/874judpqqd.ffs@tglx/
The kernel caches each CPU's feature bits at boot in an x86_capability[]
structure. However, the capabilities in the BSP's copy can be turned off
as a result of certain command line parameters or configuration
restrictions, for example the SGX bit. This can cause a mismatch when
comparing the values before and after the microcode update.
Another example is X86_FEATURE_SRBDS_CTRL which gets added only after
microcode update:
--- cpuid.before 2023-01-21 14:54:15.652000747 +0100
+++ cpuid.after 2023-01-21 14:54:26.632001024 +0100
@@ -10,7 +10,7 @@ CPU:
0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x11142120
0x00000006 0x00: eax=0x000027f7 ebx=0x00000002 ecx=0x00000001 edx=0x00000000
- 0x00000007 0x00: eax=0x00000000 ebx=0x029c6fbf ecx=0x40000000 edx=0xbc002400
+ 0x00000007 0x00: eax=0x00000000 ebx=0x029c6fbf ecx=0x40000000 edx=0xbc002e00
^^^
and which proves for a gazillionth time that late loading is a bad bad
idea.
microcode_check() is called after an update to report any previously
cached CPUID bits which might have changed due to the update.
Therefore, store the cached CPU caps before the update and compare them
with the CPU caps after the microcode update has succeeded.
Thus, the comparison is done between the CPUID *hardware* bits before
and after the upgrade instead of using the cached, possibly runtime
modified values in BSP's boot_cpu_data copy.
As a result, false warnings about CPUID bits changes are avoided.
[ bp:
- Massage.
- Add SRBDS_CTRL example.
- Add kernel-doc.
- Incorporate forgotten review feedback from dhansen.
]
Fixes: 1008c52c09dc ("x86/CPU: Add a microcode loader callback")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230109153555.4986-3-ashok.raj@intel.com
The kernel warning message is triggered, when SPR MCC is used.
[ 17.945331] ------------[ cut here ]------------
[ 17.946305] WARNING: CPU: 65 PID: 1 at
arch/x86/events/intel/uncore_discovery.c:184
intel_uncore_has_discovery_tables+0x4c0/0x65c
[ 17.946305] Modules linked in:
[ 17.946305] CPU: 65 PID: 1 Comm: swapper/0 Not tainted
5.4.17-2136.313.1-X10-2c+ #4
It's caused by the broken discovery table of UPI.
The discovery tables are from hardware. Except for dropping the broken
information, there is nothing Linux can do. Using WARN_ON_ONCE() is
overkilled.
Use the pr_info() to replace WARN_ON_ONCE(), and specify what uncore unit
is dropped and the reason.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-6-kan.liang@linux.intel.com
The discovery table of UPI on some SPR variants, e.g., MCC, is broken.
The third UPI table may includes a wrong address which points to a
non-exists device. The bug impacts both UPI and M3UPI uncore PMON.
Use a pre-defined UPI and M3UPI table to replace the broken table.
Different BIOS may populate a device into a different domain or a
different BUS. The accurate location can only be retrieved at load time.
Add spr_update_device_location() to update the location of the UPI and
M3UPI in the pre-defined table.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-5-kan.liang@linux.intel.com
Some units in a discovery table may be broken, e.g., UPI of SPR MCC.
A generic method is required to ignore the broken units.
Add uncore_units_ignore in the struct intel_uncore_init_fun, which
indicates the type ID of broken units. It will be assigned by the
platform-specific code later when the platform has a broken discovery
table.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-4-kan.liang@linux.intel.com
The current code assumes that the discovery table provides valid
box_ids for the normal units. It's not the case anymore since some units
in the discovery table are broken on some SPR variants.
Factor out uncore_get_box_id(). Check the existence of the type->box_ids
before using it. If it's not available, use pmu_idx.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-3-kan.liang@linux.intel.com
The same code is used to retrieve the logical die ID with a given PCI
device in both the discovery code and the code that supports a system
with > 8 nodes.
Factor out uncore_device_to_die() to replace the duplicate code.
No functional change.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230112200105.733466-2-kan.liang@linux.intel.com
Add a parameter to store CPU capabilities before performing a microcode
update so that CPU capabilities can be compared before and after update.
[ bp: Massage. ]
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230109153555.4986-2-ashok.raj@intel.com
When a KVM guest has MWAIT, mwait_idle() is used as the default idle
function.
However, the cpuidle-haltpoll driver calls default_idle() from
default_enter_idle() directly and that one uses HLT instead of MWAIT,
which may affect performance adversely, because MWAIT is preferred to
HLT as explained by the changelog of commit aebef63cf7ff ("x86: Remove
vendor checks from prefer_mwait_c1_over_halt").
Make default_enter_idle() call arch_cpu_idle(), which can use MWAIT,
instead of default_idle() to address this issue.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
[ rjw: Changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The minimum version of binutils for kernel build is currently 2.23 and
it doesn't support GFNI.
So, it fails to build the aria-avx512 if the old binutils is used.
aria-avx512 requires GFNI, so it should not be allowed to build if the
old binutils is used.
The AS_AVX512 and AS_GFNI are added to the Kconfig to disable build
aria-avx512 if the old binutils is used.
Fixes: c970d42001f2 ("crypto: x86/aria - implement aria-avx512")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The minimum version of binutils for kernel build is currently 2.23 and
it doesn't support GFNI.
So, it fails to build the aria-avx2 if the old binutils is used.
The code using GFNI is an optional part of aria-avx2.
So, it disables GFNI part in it when the old binutils is used.
Fixes: 37d8d3ae7a58 ("crypto: x86/aria - implement aria-avx2")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The minimum version of binutils for kernel build is currently 2.23 and
it doesn't support GFNI.
So, it fails to build the aria-avx if the old binutils is used.
The code using GFNI is an optional part of aria-avx.
So, it disables GFNI part in it when the old binutils is used.
In order to check whether the using binutils is supporting GFNI or not,
AS_GFNI is added.
Fixes: ba3579e6e45c ("crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instrument nmi_trigger_cpumask_backtrace() to dump out diagnostics based
on evidence accumulated by exc_nmi(). These diagnostics are dumped for
CPUs that ignored an NMI backtrace request for more than 10 seconds.
[ paulmck: Apply Ingo Molnar feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
CPUs ignoring NMIs is often a sign of those CPUs going bad, but there
are quite a few other reasons why a CPU might ignore NMIs. Therefore,
accumulate evidence within exc_nmi() as to what might be preventing a
given CPU from responding to an NMI.
[ paulmck: Apply Peter Zijlstra feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
KVM already has a 'GPA_INVALID' defined as (~(gpa_t)0) in kvm_types.h,
and it is used by ARM code. We do not need another definition of
'INVALID_GPA' for X86 specifically.
Instead of using the common 'GPA_INVALID' for X86, replace it with
'INVALID_GPA', and change the users of 'GPA_INVALID' so that the diff
can be smaller. Also because the name 'INVALID_GPA' tells the user we
are using an invalid GPA, while the name 'GPA_INVALID' is emphasizing
the GPA is an invalid one.
No functional change intended.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230105130127.866171-1-yu.c.zhang@linux.intel.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The hypervisor can enable various new features (SEV_FEATURES[1:63]) and start a
SNP guest. Some of these features need guest side implementation. If any of
these features are enabled without it, the behavior of the SNP guest will be
undefined. It may fail booting in a non-obvious way making it difficult to
debug.
Instead of allowing the guest to continue and have it fail randomly later,
detect this early and fail gracefully.
The SEV_STATUS MSR indicates features which the hypervisor has enabled. While
booting, SNP guests should ascertain that all the enabled features have guest
side implementation. In case a feature is not implemented in the guest, the
guest terminates booting with GHCB protocol Non-Automatic Exit(NAE) termination
request event, see "SEV-ES Guest-Hypervisor Communication Block Standardization"
document (currently at https://developer.amd.com/wp-content/resources/56421.pdf),
section "Termination Request".
Populate SW_EXITINFO2 with mask of unsupported features that the hypervisor can
easily report to the user.
More details in the AMD64 APM Vol 2, Section "SEV_STATUS MSR".
[ bp:
- Massage.
- Move snp_check_features() call to C code.
Note: the CC:stable@ aspect here is to be able to protect older, stable
kernels when running on newer hypervisors. Or not "running" but fail
reliably and in a well-defined manner instead of randomly. ]
Fixes: cbd3d4f7c4e5 ("x86/sev: Check SEV-SNP features support")
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230118061943.534309-1-nikunj@amd.com
zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas. While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma. In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas. When
crossing a vma boundary, a new mmu_notifier_range_init/end call pair with
the new vma should be made.
Instead of fixing zap_page_range, do the following:
- Create a new routine zap_vma_pages() that will remove all pages within
the passed vma. Most users of zap_page_range pass the entire vma and
can use this new routine.
- For callers of zap_page_range not passing the entire vma, instead call
zap_page_range_single().
- Remove zap_page_range.
[1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch is a cleanup to always wr-protect pte/pmd in mkuffd_wp paths.
The reasons I still think this patch is worthwhile, are:
(1) It is a cleanup already; diffstat tells.
(2) It just feels natural after I thought about this, if the pte is uffd
protected, let's remove the write bit no matter what it was.
(2) Since x86 is the only arch that supports uffd-wp, it also redefines
pte|pmd_mkuffd_wp() in that it should always contain removals of
write bits. It means any future arch that want to implement uffd-wp
should naturally follow this rule too. It's good to make it a
default, even if with vm_page_prot changes on VM_UFFD_WP.
(3) It covers more than vm_page_prot. So no chance of any potential
future "accident" (like pte_mkdirty() sparc64 or loongarch, even
though it just got its pte_mkdirty fixed <1 month ago). It'll be
fairly clear when reading the code too that we don't worry anything
before a pte_mkuffd_wp() on uncertainty of the write bit.
We may call pte_wrprotect() one more time in some paths (e.g. thp split),
but that should be fully local bitop instruction so the overhead should be
negligible.
Although this patch should logically also fix all the known issues on
uffd-wp too recently on page migration (not for numa hint recovery - that
may need another explcit pte_wrprotect), but this is not the plan for that
fix. So no fixes, and stable doesn't need this.
Link: https://lkml.kernel.org/r/20221214201533.1774616-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ives van Hoorne <ives@codesandbox.io>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
From the perspective of Intel cstate residency counters,
Emerald Rapids is the same as the Sapphire Rapids and Ice Lake.
Add Emerald Rapids model.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-2-kan.liang@linux.intel.com
From core PMU's perspective, Emerald Rapids is the same as the Sapphire
Rapids. The only difference is the event list, which will be
supported in the perf tool later.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230106160449.3566477-1-kan.liang@linux.intel.com
Use DEVICE_ATTR_RO() helper instead of open-coded DEVICE_ATTR(),
which makes the code a bit shorter and easier to read.
No change in functionality.
Signed-off-by: Guangju Wang[baidu] <wgj900@163.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230118023554.1898-1-wgj900@163.com
Factor out perf_prepare_header() so that it can call
perf_prepare_sample() without a header if not needed.
Also it checks the filtered_sample_type to avoid duplicate
work when perf_prepare_sample() is called twice (or more).
Suggested-by: Peter Zijlstr <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-8-namhyung@kernel.org
When we saves the branch stack to the perf sample data, we needs to
update the sample flags and the dynamic size. To make sure this is
done consistently, add the perf_sample_save_brstack() helper and
convert all call sites.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-5-namhyung@kernel.org
When we save the raw_data to the perf sample data, we need to update
the sample flags and the dynamic size. To make sure this is done
consistently, add the perf_sample_save_raw_data() helper and convert
all call sites.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-4-namhyung@kernel.org
When we save the callchain to the perf sample data, we need to update
the sample flags and the dynamic size. To ensure this is done consistently,
add the perf_sample_save_callchain() helper and convert all call sites.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-3-namhyung@kernel.org
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPEGkMeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGE4YH/1sxdve3nlWT7eyf
VDanaCZFc2u6Sk1td23HyhvOVbFjj1E39UokHK9YfzD4hhn+u9SFvDgoXHGhHzYq
IhDvNM3mLsKpEtjwNbbi/lt6tryJchhZ96O5leRiUxZaw/GBVnHrKGR56vCjC/DB
zq8td4I+TmTPMpsL3e3WZqJoX3bjTqgZFNShA1mZFYtuQRpAVkJkafMRwuAySIMQ
+FyK/zZ+MpGJ7y/UmWfaBmS5GtjBz0fva7fTbEkUqk8bDtNUWMBhsTKtgU1LBpBe
Yrkhs3sdQdpKSm9yNaNrlLhSjFwkusV5kR09p6/5w+wthdPmjnoUZuEnAZpYyydE
ZY7+Vqk=
=q9Ac
-----END PGP SIGNATURE-----
Merge tag 'v6.2-rc4' into perf/core, to pick up fixes
Move from the -rc1 base to the fresher -rc4 kernel that
has various fixes included, before applying a larger
patchset.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
According to TLFS, in order to communicate to L0 hypervisor there needs
to be an additional bit set in the control register. This communication
is required to perform privileged instructions which can only be
performed by L0 hypervisor. An example of that could be setting up the
VMBus infrastructure.
Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/24f9d46d5259a688113e6e5e69e21002647f4949.1672639707.git.jinankjain@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Child partitions are free to allocate SynIC message and event page but in
case of root partition it must use the pages allocated by Microsoft
Hypervisor (MSHV). Base address for these pages can be found using
synthetic MSRs exposed by MSHV. There is a slight difference in those MSRs
for nested vs non-nested root partition.
Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/cb951fb1ad6814996fc54f4a255c5841a20a151f.1672639707.git.jinankjain@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
David reported that the recent PCI/MSI rework results in MSI descriptor
leakage under XEN.
This is caused by:
1) The missing MSI_FLAG_FREE_MSI_DESCS flag in the XEN MSI domain info,
which is required now that PCI/MSI delegates descriptor freeing to
the core MSI code.
2) Not disassociating the interrupts on teardown, by setting the
msi_desc::irq to 0. This was not required before because the teardown
was unconditional and did not check whether a MSI descriptor was still
connected to a Linux interrupt.
On further inspection it came to light that the MSI_FLAG_DEV_SYSFS is
missing in the XEN MSI domain info as well to restore the pre 6.2 status
quo.
Add the missing MSI flags and disassociate the MSI descriptor from the
Linux interrupt in the XEN specific teardown function.
Fixes: b2bdda205c0c ("PCI/MSI: Let the MSI core free descriptors")
Fixes: 2f2940d16823 ("genirq/msi: Remove filter from msi_free_descs_free_range()")
Fixes: ffd84485e6be ("PCI/MSI: Let the irq code handle sysfs groups")
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/871qnunycr.ffs@tglx
The Xen MSI → PIRQ magic does support MSI-X, so advertise it.
(In fact it's better off with MSI-X than MSI, because it's actually
broken by design for 32-bit MSI, since it puts the high bits of the
PIRQ# into the high 32 bits of the MSI message address, instead of the
Extended Destination ID field which is in bits 4-11.
Strictly speaking, this really fixes a much older commit 2e4386eba0c0
("x86/xen: Wrap XEN MSI management into irqdomain") which failed to set
the flag. But that never really mattered until __pci_enable_msix_range()
started to check and bail out early. So in 6.2-rc we see failures e.g.
to bring up networking on an Amazon EC2 m4.16xlarge instance:
[ 41.498694] ena 0000:00:03.0 (unnamed net_device) (uninitialized): Failed to enable MSI-X. irq_cnt -524
[ 41.498705] ena 0000:00:03.0: Can not reserve msix vectors
[ 41.498712] ena 0000:00:03.0: Failed to enable and set the admin interrupts
Side note: This is the first bug found, and first patch tested, by running
Xen guests under QEMU/KVM instead of running under actual Xen.
Fixes: 99f3d2797657 ("PCI/MSI: Reject MSI-X early")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/4bffa69a949bfdc92c4a18e5a1c3cbb3b94a0d32.camel@infradead.org
Baoquan reported that after triggering a crash the subsequent crash-kernel
fails to boot about half of the time. It triggers a NULL pointer
dereference in the periodic tick code.
This happens because the legacy timer interrupt (IRQ0) is resent in
software which happens in soft interrupt (tasklet) context. In this context
get_irq_regs() returns NULL which leads to the NULL pointer dereference.
The reason for the resend is a spurious APIC interrupt on the IRQ0 vector
which is captured and leads to a resend when the legacy timer interrupt is
enabled. This is wrong because the legacy PIC interrupts are level
triggered and therefore should never be resent in software, but nothing
ever sets the IRQ_LEVEL flag on those interrupts, so the core code does not
know about their trigger type.
Ensure that IRQ_LEVEL is set when the legacy PCI interrupts are set up.
Fixes: a4633adcdbc1 ("[PATCH] genirq: add genirq sw IRQ-retrigger")
Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/87mt6rjrra.ffs@tglx
Once disable_freq_invariance_work is called the scale_freq_tick function
will not compute or update the arch_freq_scale values.
However the scheduler will still read these values and use them.
The result is that the scheduler might perform unfair decisions based on stale
values.
This patch adds the step of setting the arch_freq_scale values for all
cpus to the default (max) value SCHED_CAPACITY_SCALE, Once all cpus
have the same arch_freq_scale value the scaling is meaningless.
Signed-off-by: Yair Podemsky <ypodemsk@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230110160206.75912-1-ypodemsk@redhat.com
- Fixes for two resctrl races when moving a task or creating a new monitoring
group
- Fix SEV-SNP guests running under HyperV where MTRRs are disabled to not return
a UC- type mapping type on memremap() and thus cause a serious slowdown
- Fix insn mnemonics in bioscall.S now that binutils is starting to fix
confusing insn suffixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPD5xsACgkQEsHwGGHe
VUr65g/8CkfKQKIQ/kPn1B+M/PI4S8DBmz7CdufQTbB66GSfDwRpGnxIKJKZM1UG
pyOmP1kHVXGGCsvFQimalxnBtFx6t3wFS+R+c/l5pRtgU63bwQpNjJ+vBcBpb4xs
J7f06VgF6jB8qk0NqXBKNJt4kauZA50wiErYm8PU/yf0tmHratjrvDfKQmus9pI4
AMcQEudRhDDo8xqn8fvIC/S7xm9/TgNzKxYP+HciPa74HZm5vrjS0hIyZkIsh5d7
Q4MsP4VaBH12W6MbpF9TBw5VTDomwY8oS4xTNfkhyOTcHi8uLrMjA66VmaJwWXpe
EZjOk4+KhtxYigaI5oQO9M5e9IINK6ZfbzU65P74wTvP3orszsBPG7QvodIzLc76
YI1GEea/bXgxYgPuMR6spZoGMS58K7BXgViVMVRwZ9bZk17+5pfbLkzKqZsfw/s/
nj3n8z4ayoc+ffDPllh0471Dn16ugKMf+EvW2Su1Q1QoaA7icNNkZrKaM6eSuoFr
bClsrHeglQFadwy4kmY0fi7BUfiKTp+c5Ur7lM7VojrjH0XcoZreThVg0tNrk3ZR
IAyBjtCdtg1rzrRfb7qBf6B+WNGdxYWGuDgOPiH1Xh+/plvyCqQ2/3AGWQAbu/qP
to+IYmZg09mRpVBDYpCY4z6IzRbMJ9rRho4rNXCQUeAzSVMCrZE=
=W2zR
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure the poking PGD is pinned for Xen PV as it requires it this
way
- Fixes for two resctrl races when moving a task or creating a new
monitoring group
- Fix SEV-SNP guests running under HyperV where MTRRs are disabled to
not return a UC- type mapping type on memremap() and thus cause a
serious slowdown
- Fix insn mnemonics in bioscall.S now that binutils is starting to fix
confusing insn suffixes
* tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: fix poking_init() for Xen PV guests
x86/resctrl: Fix event counts regression in reused RMIDs
x86/resctrl: Fix task CLOSID/RMID update race
x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
x86/boot: Avoid using Intel mnemonics in AT&T syntax asm
Functions used with __setup() return 1 when the argument has been
successfully parsed.
Reverse the returned value so that 1 is returned when kstrtobool() is
successful (i.e. returns 0).
My understanding of these __setup() functions is that returning 1 or 0
does not change much anyway - so this is more of a cleanup than a
functional fix.
I spot it and found it spurious while looking at something else.
Even if the output is not perfect, you'll get the idea with:
$ git grep -B2 -A10 retu.*kstrtobool | grep __setup -B10
Fixes: 3aac3ebea08f ("x86/signal: Implement sigaltstack size validation")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/73882d43ebe420c9d8fb82d0560021722b243000.1673717552.git.christophe.jaillet@wanadoo.fr
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmPBniAUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwOjRAAhjyRAgyiZV2rWS4pyvpQpqcpZWD9
796ZSqnzLJjVYCymGvUTX23FEA48n59+bCM/WpfEGUPrBf8LZQxC9YOCm6ltuM8+
FoSBykW/tHPq5IWaLzgrWpHeDOgEnZu/WFGGvrV3tl1mLpM1SJT8bGDsjHXlo+FM
qkTEiA3nUEKQs5x9r2TTLCeUWGPNTIHNd2VfuxOqM3qC/nVCOfTTxU8nm6Lk7Eix
nboAugAIADJIjs/+ZGekLBuzZYPkLYuDTyMYJ5hdo1p7wWCLc9gArEqvXKwVgmD3
ptenZeOlQi9Ay45HmkfIgfgKeeQ7REJj3dx04vf67neAianyUrB0EZDqDjR7LmgM
ozlNt0XjyoeEhu6AQS0s1LZtbDiED1R/00P6Gb+YEjUCVipW2lEYYwP0v9dsnNoh
6wblgnkQoxLFM+5CAXRmCmpaoQn0Uam7okfVeohtsz8/kNQF2St0hjzr4Dmws+O3
k9PUqnnUl4ByElzpEDesVGZMJ3pxFVH15ufu8VnRqN60pLTvNrsPyU4cVnG176Rc
3RSDN3zMtPxnHJVy4r3bTNEZsX/7RUrOb4xScOXMmRDBMUc8QdscF8Oj1ucKlj5j
mp7vB/7+VjU96uRarRyqUxGeQc77DCTcvOa1IGh/cuYom8ZJ6vpSCpKy6f6SFGuf
i8iTTUcQKCdqVW4=
=Fv2v
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Work around apparent firmware issue that made Linux reject MMCONFIG
space, which broke PCI extended config space (Bjorn Helgaas)
- Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a
PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas
Bulwahn)
* tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
x86/pci: Simplify is_mmconf_reserved() messages
PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN
* Fix the PMCR_EL0 reset value after the PMU rework
* Correctly handle S2 fault triggered by a S1 page table walk
by not always classifying it as a write, as this breaks on
R/O memslots
* Document why we cannot exit with KVM_EXIT_MMIO when taking
a write fault from a S1 PTW on a R/O memslot
* Put the Apple M2 on the naughty list for not being able to
correctly implement the vgic SEIS feature, just like the M1
before it
* Reviewer updates: Alex is stepping down, replaced by Zenghui
x86:
* Fix various rare locking issues in Xen emulation and teach lockdep
to detect them
* Documentation improvements
* Do not return host topology information from KVM_GET_SUPPORTED_CPUID
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPAT3EUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPmDAf+ICCVMwgm+PjAc6NuXzaUk6BFGWKF
1lzMvnKb6ARnhMKwyjl/Sf5EgnTuucnSTBHuE1kjaLkPUDNJvi4oRXVdDwKjtXnZ
Zxk4dpsNLWVfALHTk1KweIkR5KNif0kugUh9RNp6zOBnoTVRh8XdCHpeDv73tJaG
R1gCAreVTDbp+wNrVpiImUfYAZ4GrGpwwWRH/xLAGDWoTL9Z9J5tQygf+0C429n/
eJoTrToLjESbYadDgCNDD+TUkHbeDVg8aeio2JZga9SvH3RBhwriLqz26v9yvikL
UoY96AySMaiox4pgCUYUl8nng8MR8AG4C4vpNnLalj7tfHxRfhtAwD0EYw==
=gDOV
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the PMCR_EL0 reset value after the PMU rework
- Correctly handle S2 fault triggered by a S1 page table walk by not
always classifying it as a write, as this breaks on R/O memslots
- Document why we cannot exit with KVM_EXIT_MMIO when taking a write
fault from a S1 PTW on a R/O memslot
- Put the Apple M2 on the naughty list for not being able to
correctly implement the vgic SEIS feature, just like the M1 before
it
- Reviewer updates: Alex is stepping down, replaced by Zenghui
x86:
- Fix various rare locking issues in Xen emulation and teach lockdep
to detect them
- Documentation improvements
- Do not return host topology information from KVM_GET_SUPPORTED_CPUID"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
Documentation: kvm: fix SRCU locking order docs
KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
KVM: nSVM: clarify recalc_intercepts() wrt CR8
MAINTAINERS: Remove myself as a KVM/arm64 reviewer
MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations
KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
KVM: arm64: Fix S1PTW handling on RO memslots
KVM: arm64: PMU: Fix PMCR_EL0 reset value