IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When HWCR is set to 0, store 0 in vcpu->arch.msr_hwcr.
Fixes: 191c8137a939 ("x86/kvm: Implement HWCR support")
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20230929230246.1954854-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
When populating the guest's PV wall clock information, KVM currently does
a simple 'kvm_get_real_ns() - get_kvmclock_ns(kvm)'. This is an antipattern
which should be avoided; when working with the relationship between two
clocks, it's never correct to obtain one of them "now" and then the other
at a slightly different "now" after an unspecified period of preemption
(which might not even be under the control of the kernel, if this is an
L1 hosting an L2 guest under nested virtualization).
Add a kvm_get_wall_clock_epoch() function to return the guest wall clock
epoch in nanoseconds using the same method as __get_kvmclock() — by using
kvm_get_walltime_and_clockread() to calculate both the wall clock and KVM
clock time from a *single* TSC reading.
The condition using get_cpu_tsc_khz() is equivalent to the version in
__get_kvmclock() which separately checks for the CONSTANT_TSC feature or
the per-CPU cpu_tsc_khz. Which is what get_cpu_tsc_khz() does anyway.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/bfc6d3d7cfb88c47481eabbf5a30a264c58c7789.camel@infradead.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add support for the AMD Selective Branch Predictor Barrier (SBPB) by
advertising the CPUID bit and handling PRED_CMD writes accordingly.
Note, like SRSO_NO and IBPB_BRTYPE before it, advertise support for SBPB
even if it's not enumerated by in the raw CPUID. Some CPUs that gained
support via a uCode patch don't report SBPB via CPUID (the kernel forces
the flag).
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/a4ab1e7fe50096d50fde33e739ed2da40b41ea6a.1692919072.git.jpoimboe@kernel.org
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add support for the IBPB_BRTYPE CPUID flag, which indicates that IBPB
includes branch type prediction flushing.
Note, like SRSO_NO, advertise support for IBPB_BRTYPE even if it's not
enumerated by in the raw CPUID, i.e. bypass the cpuid_count() in
__kvm_cpu_cap_mask(). Some CPUs that gained support via a uCode patch
don't report IBPB_BRTYPE via CPUID (the kernel forces the flag).
Opportunistically use kvm_cpu_cap_check_and_set() for SRSO_NO instead
of manually querying host support (cpu_feature_enabled() and
boot_cpu_has() yield the same end result in this case).
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/79d5f5914fb42c2c62418ffbcd78f138645ded21.1692919072.git.jpoimboe@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a Kconfig entry to set the maximum number of vCPUs per KVM guest and
set the default value to 4096 when MAXSMP is enabled, as there are use
cases that want to create more than the currently allowed 1024 vCPUs and
are more than happy to eat the memory overhead.
The Hyper-V TLFS doesn't allow more than 64 sparse banks, i.e. allows a
maximum of 4096 virtual CPUs. Cap KVM's maximum number of virtual CPUs
to 4096 to avoid exceeding Hyper-V's limit as KVM support for Hyper-V is
unconditional, and alternatives like dynamically disabling Hyper-V
enlightenments that rely on sparse banks would require non-trivial code
changes.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Link: https://lore.kernel.org/r/20230824215244.3897419-1-kyle.meyer@hpe.com
[sean: massage changelog with --verbose, document #ifdef mess]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Userspace can directly modify the content of vCPU's CR0, CR3, and CR4 via
KVM_SYNC_X86_SREGS and KVM_SET_SREGS{,2}. Make sure that KVM flushes guest
TLB entries and paging-structure caches if a (partial) guest TLB flush is
architecturally required based on the CRn changes. To keep things simple,
flush whenever KVM resets the MMU context, i.e. if any bits in CR0, CR3,
CR4, or EFER are modified. This is extreme overkill, but stuffing state
from userspace is not such a hot path that preserving guest TLB state is a
priority.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230814222358.707877-3-mhal@rbox.co
[sean: call out that the flushing on MMU context resets is for simplicity]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Drop the vcpu->arch.cr0 assignment after static_call(kvm_x86_set_cr0).
CR0 was already set by {vmx,svm}_set_cr0().
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230814222358.707877-2-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
- Fix KVM_GET_REG_LIST API for ISA_EXT registers
- Fix reading ISA_EXT register of a missing extension
- Fix ISA_EXT register handling in get-reg-list test
- Fix filtering of AIA registers in get-reg-list test
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmUMDssACgkQrUjsVaLH
LAckSg/+IZ5DPvPs81rUpL3i1Z5SrK4jXWL2zyvMIksBEYmFD2NPNvinVZ4Sxv6u
IzWNKJcAp4nA/+qPGPLXCURDe1W6PCDvO4SShjYm2UkPtNIfiskmFr3MunXZysgm
I7USJgj9ev+46yfOnwlYrwpZ8sQk7Z6nLTI/6Osk4Q7Sm0Vjoobh6krub7LNjeKQ
y6p+vxrXj+Owc5H9bgl0wAi6GOmOJKAM+cZU5DygQxjOgiUgNbOzsVgbLDTvExNq
gISUU4PoAO7/U1NKEaaopbe7C0KNQHTnegedtXsDzg7WTBah77/MNBt4snbfiR27
6rODklZlG/kAGIHdVtYC+zf8AfPqvGTIT8SLGmzQlyVlHujFBGn0L41NmMzW+EeA
7UpfUk8vPiiGhefBE+Ml3yqiReogo+aRhL1mZoI39rPusd7DMnwx97KpBlAcYuTI
PTgqycIMRmq2dSCHya+nrOVpwwV3Qx4G8Alpq1jOa7XDMeGMj4h521NQHjWckIK2
IJ2a0RtzB10+Z91nLV+amdAno326AnxJC7dR26O6uqVSPJy/nHE2GAc49gFKeWq6
QmzgzY1sU2Y02/TM8miyKSl3i+bpZSIPzXCKlOm1TowBKO+sfJzn/yMon9sVaVhb
4Wjgg3vgE74y9FVsL4JXR/PfrZH5Aq77J1R+/pMtsNTtVYrt1Sk=
=ytFs
-----END PGP SIGNATURE-----
Merge tag 'kvm-riscv-fixes-6.6-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.6, take #1
- Fix KVM_GET_REG_LIST API for ISA_EXT registers
- Fix reading ISA_EXT register of a missing extension
- Fix ISA_EXT register handling in get-reg-list test
- Fix filtering of AIA registers in get-reg-list test
When the TSC_AUX MSR is virtualized, the TSC_AUX value is swap type "B"
within the VMSA. This means that the guest value is loaded on VMRUN and
the host value is restored from the host save area on #VMEXIT.
Since the value is restored on #VMEXIT, the KVM user return MSR support
for TSC_AUX can be replaced by populating the host save area with the
current host value of TSC_AUX. And, since TSC_AUX is not changed by Linux
post-boot, the host save area can be set once in svm_hardware_enable().
This eliminates the two WRMSR instructions associated with the user return
MSR support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <d381de38eb0ab6c9c93dda8503b72b72546053d7.1694811272.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.
This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.
Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.
With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
svm_recalc_instruction_intercepts() is always called at least once
before the vCPU is started, so the setting or clearing of the RDTSCP
intercept can be dropped from the TSC_AUX virtualization support.
Extracted from a patch by Tom Lendacky.
Cc: stable@vger.kernel.org
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stop zapping invalidate TDP MMU roots via work queue now that KVM
preserves TDP MMU roots until they are explicitly invalidated. Zapping
roots asynchronously was effectively a workaround to avoid stalling a vCPU
for an extended during if a vCPU unloaded a root, which at the time
happened whenever the guest toggled CR0.WP (a frequent operation for some
guest kernels).
While a clever hack, zapping roots via an unbound worker had subtle,
unintended consequences on host scheduling, especially when zapping
multiple roots, e.g. as part of a memslot. Because the work of zapping a
root is no longer bound to the task that initiated the zap, things like
the CPU affinity and priority of the original task get lost. Losing the
affinity and priority can be especially problematic if unbound workqueues
aren't affined to a small number of CPUs, as zapping multiple roots can
cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
the CPUs KVM is already using to run vCPUs.
When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
zap can result in KVM occupying all logical CPUs for ~8ms, and result in
high priority tasks not being scheduled in in a timely manner. In v5.15,
which doesn't preserve unloaded roots, the issues were even more noticeable
as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.
Consuming all CPUs for an extended duration can lead to significant jitter
throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
is a semi-frequent operation as memslots are deleted and recreated with
different host virtual addresses to react to host GPU drivers allocating
and freeing GPU blobs. On ChromeOS, the jitter manifests as audio blips
during games due to the audio server's tasks not getting scheduled in
promptly, despite the tasks having a high realtime priority.
Deleting memslots isn't exactly a fast path and should be avoided when
possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
memslot shenanigans, but KVM is squarely in the wrong. Not to mention
that removing the async zapping eliminates a non-trivial amount of
complexity.
Note, one of the subtle behaviors hidden behind the async zapping is that
KVM would zap invalidated roots only once (ignoring partial zaps from
things like mmu_notifier events). Preserve this behavior by adding a flag
to identify roots that are scheduled to be zapped versus roots that have
already been zapped but not yet freed.
Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
encounter invalid roots, as it's not at all obvious why zapping
invalidated roots shouldn't simply zap all invalid roots.
Reported-by: Pattara Teerapong <pteerapong@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Yiwei Zhang<zzyiwei@google.com>
Cc: Paul Hsia <paulhsia@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230916003916.2545000-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All callers except the MMU notifier want to process all address spaces.
Remove the address space ID argument of for_each_tdp_mmu_root_yield_safe()
and switch the MMU notifier to use __for_each_tdp_mmu_root_yield_safe().
Extracted out of a patch by Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The mmu_notifier path is a bit of a special snowflake, e.g. it zaps only a
single address space (because it's per-slot), and can't always yield.
Because of this, it calls kvm_tdp_mmu_zap_leafs() in ways that no one
else does.
Iterate manually over the leafs in response to an mmu_notifier
invalidation, instead of invoking kvm_tdp_mmu_zap_leafs(). Drop the
@can_yield param from kvm_tdp_mmu_zap_leafs() as its sole remaining
caller unconditionally passes "true".
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230916003916.2545000-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Fix an UV boot crash,
- Skip spurious ENDBR generation on _THIS_IP_,
- Fix ENDBR use in putuser() asm methods,
- Fix corner case boot crashes on 5-level paging,
- and fix a false positive WARNING on LTO kernels.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUHOrwRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1j6Jw/+PjUfh/4+KM/Z8VzcBy2UhY3NMF2ptGCN
47FPLy+8ADqOvIfgsPsBEO9VXwdvkHfH64YWRUlULjvPNOSs+37drBYMe9AI9xKE
u6NhoBHmsnOtoLkBFIQYlJys9GyAeoMlwSSHxzRwQS+3VztRjoH636jiBcg/h7DR
zhakfnJD1SSOZuEyyDPnO0uIUarrcqC2jdZwucSqZnvZFdA/pexUHQEe2RtMXLln
EIA5kuEo7UdADcSr/Lbty7MKO+6xpRTjxF0yNItPtwPWsnxrSAC7P+dQ37uB975U
Z0CJ/kw54XG5Sdoech7XCWYmJzDxSPhiziA1USKpZJMo5phAG/apTJK6NG4TG94r
U+3QhLopUoSd8N74/VtZn0FjOrMsk7YKD5o8twlTbpCd2VaiJk4X5oBIns6Tvq05
0Vmsx15XO3mEzg1wWbbdjhjHW0czRgBpikS9QyexZKVkukYcW8QT6bk1gK1ypg94
do4PHKB53QIt31dedy/ddpQj4u1sJ4+a9/+IG29kjUB33M4+uFedtw11vfe+CDN0
XLRc6HfPblogIIEO/figJgwSq/TPCOsNHTwKkejq+D1Ey6DsrnX9Gg4BWVz/3dDW
84SW7TaO2FGEDh4NkR2ijkZlbpofFnCvhCh/uohodPlqDrTVTuRKCZW9I5plmGVa
qeXUpNDFs1E=
=BMjT
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- Fix an UV boot crash
- Skip spurious ENDBR generation on _THIS_IP_
- Fix ENDBR use in putuser() asm methods
- Fix corner case boot crashes on 5-level paging
- and fix a false positive WARNING on LTO kernels"
* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Remove LTO flags
x86/boot/compressed: Reserve more memory for page tables
x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
x86/ibt: Suppress spurious ENDBR
x86/platform/uv: Use alternate source for socket to node data
balancing bug, and a topology setup bug on (Intel) hybrid processors.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUHOVQRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iOahAAj3YsoNbT/k6m9yp622n1OopaNEQvsK+/
F2Q5g/hJrm3+W5764rF8CvjhDbmrv6owjp3yUyZLDIfSAFZYMvwoNody3a373Yr3
VFBMJ00jNIv/TAFCJZYeybg3yViwObKKfpu4JBj//QU+4uGWCoBMolkVekU2bBti
r50fMxBPgg2Yic57DCC8Y+JZzHI/ydQ3rvVXMzkrTZCO/zY4/YmERM9d+vp4wl4B
uG9cfXQ4Yf/1gZo0XDlTUkOJUXPnkMgi+N4eHYlGuyOCoIZOfATI24hRaPBoQcdx
PDwHcKmyNxH9iaRppNQMvi797g3KrKVEmZwlZg1IfsILhKC0F4GsQ85tw8qQWE8j
brFPkWVUxAUSl4LXoqVInaxDHmJWR2UC3RA7b+BxFF/GMLTow0z4a+JMC6eKlNyR
uBisZnuEuecqwF9TLhyD3KBHh7PihUPz8PuFHk+Um5sggaUE82I+VwX6uxEi5y8r
ke2kAkpuMxPWT5lwDmFPAXWfvpZz5vvTIRUxGGj2+s4d8b0YfLtZsx5+uOIacaub
Gw+wYFfSowph72tR/SUVq0An/UTSPPBxty8eFIVeE6sW9bw3ghTtkf8300xjV7Rj
sKVxXy/podAo8wG7R6aZfTfsCpohmeEjskiatYdThYamPPx7V0R5pq4twmTXTHLJ
bFvQ1GFCOu0=
=jIeN
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix a performance regression on large SMT systems, an Intel SMT4
balancing bug, and a topology setup bug on (Intel) hybrid processors"
* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
sched/fair: Fix SMT4 group_smt_balance handling
sched/fair: Optimize should_we_balance() for large SMT systems
-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
multiple .text sections for purgatory.ro:
$ readelf -S purgatory.ro | grep " .text"
[ 1] .text PROGBITS 0000000000000000 00000040
[ 7] .text.purgatory PROGBITS 0000000000000000 000020e0
[ 9] .text.warn PROGBITS 0000000000000000 000021c0
[13] .text.sha256_upda PROGBITS 0000000000000000 000022f0
[15] .text.sha224_upda PROGBITS 0000000000000000 00002be0
[17] .text.sha256_fina PROGBITS 0000000000000000 00002bf0
[19] .text.sha224_fina PROGBITS 0000000000000000 00002cc0
This causes WARNING from kexec_purgatory_setup_sechdrs():
WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
kexec_load_purgatory+0x37f/0x390
Fix this by disabling LTO for purgatory.
[ AFAICT, x86 is the only arch that supports LTO and purgatory. ]
We could also fix this with an explicit linker script to rejoin .text.*
sections back into .text. However, given the benefit of LTOing purgatory
is small, simply disable the production of more .text.* sections for now.
Fixes: b33fff07e3e3 ("x86, build: allow LTO to be selected")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.
The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.
In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.
Update wost-case calculations to include 5-level paging.
To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.
[ Also add a warning, should this memory run low - by Dave Hansen. ]
Fixes: 34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
Commit 8f2d6c41e5a6 ("x86/sched: Rewrite topology setup") dropped the
SD_ASYM_PACKING flag in the DIE domain added in commit 044f0e27dec6
("x86/sched: Add the SD_ASYM_PACKING flag to the die domain of hybrid
processors"). Restore it on hybrid processors.
The die-level domain does not depend on any build configuration and now
x86_sched_itmt_flags() is always needed. Remove the build dependency on
CONFIG_SCHED_[SMT|CLUSTER|MC].
Fixes: 8f2d6c41e5a6 ("x86/sched: Rewrite topology setup")
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Link: https://lkml.kernel.org/r/20230815035747.11529-1-ricardo.neri-calderon@linux.intel.com
Commit cb855971d717 ("x86/putuser: Provide room for padding") changed
__put_user_nocheck_*() into proper functions but failed to note that
SYM_FUNC_START() already provides ENDBR, rendering the explicit ENDBR
superfluous.
Fixes: cb855971d717 ("x86/putuser: Provide room for padding")
Reported-by: David Kaplan <David.Kaplan@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230802110323.086971726@infradead.org
It was reported that under certain circumstances GCC emits ENDBR
instructions for _THIS_IP_ usage. Specifically, when it appears at the
start of a basic block -- but not elsewhere.
Since _THIS_IP_ is never used for control flow, these ENDBR
instructions are completely superfluous. Override the _THIS_IP_
definition for x86_64 to avoid this.
Less ENDBR instructions is better.
Fixes: 156ff4a544ae ("x86/ibt: Base IBT bits")
Reported-by: David Kaplan <David.Kaplan@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230802110323.016197440@infradead.org
The UV code attempts to build a set of tables to allow it to do
bidirectional socket<=>node lookups.
But when nr_cpus is set to a smaller number than actually present, the
cpu_to_node() mapping information for unused CPUs is not available to
build_socket_tables(). This results in skipping some nodes or sockets
when creating the tables and leaving some -1's for later code to trip.
over, causing oopses.
The problem is that the socket<=>node lookups are created by doing a
loop over all CPUs, then looking up the CPU's APICID and socket. But
if a CPU is not present, there is no way to start this lookup.
Instead of looping over all CPUs, take CPUs out of the equation
entirely. Loop over all APICIDs which are mapped to a valid NUMA node.
Then just extract the socket-id from the APICID.
This avoid tripping over disabled CPUs.
Fixes: 8a50c5851927 ("x86/platform/uv: UV support for sub-NUMA clustering")
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230807141730.1117278-1-steve.wahl%40hpe.com
CONFIG_EFI_RUNTIME_MAP needs to be enabled in order for kexec to be able
to provide the required information about the EFI runtime mappings to
the incoming kernel, regardless of whether kexec_load() or
kexec_file_load() is being used. Without this information, kexec boot in
EFI mode is not possible.
The CONFIG_EFI_RUNTIME_MAP option is currently directly configurable if
CONFIG_EXPERT is enabled, so that it can be turned on for debugging
purposes even if KEXEC is not enabled. However, the upshot of this is
that it can also be disabled even when it shouldn't.
So tweak the Kconfig declarations to avoid this situation.
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Only the arch_efi_call_virt() macro that some architectures override
needs to be a macro, given that it is variadic and encapsulates calls
via function pointers that have different prototypes.
The associated setup and teardown code are not special in this regard,
and don't need to be instantiated at each call site. So turn them into
ordinary C functions and move them out of line.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
fix a ld.lld linker (in)compatibility quirk and make the x86 SMP init code a bit
more conservative to fix kexec() lockups.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmT97boRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jObA//X7nug+d+IMLIs+c4579z4ZhkltMRxJVI
Btf8sdHpwgTUtKaOLmJnGiJ7f0GK5NtoaNtGUJF28aQETVOhco0Fvg/R8k1FE2Tc
CJqw6oy2FjVqD9qzZPCXh6QCvTtGjN5GF+xmoUbf7eZ9U8IVvOxBG+7yDMorQw3P
zzjIccLLg/aDvNLN/yZD2oqw6UGHZuh/Qr/0Q4PkZ7zL+yWV8EC+HOx3rlQklq0x
hh6YMwa4LGr3przUObHsfNS11EDzLDhg2WtTQMr12vlnpUB2eXnXWLklr6rpWjcz
qJiMxkrEkygB7seXnuQ0b4KHN/17zdkJ+t6vZoznUTXs1ohIDLWdiNTSl03qCs9B
V98a1x3MPT6aro9O/5ywyAJwPb0hvsg2S0ODFWab0Z3oRUbIG/k6dTEYlP7qZw8v
EFMtLy6M2EILXetj8q2ZGcA0rKz7pj/z9SosWDzqNj76w7xGwDKrSWoKJckkCwG+
j+ycBuKfrpxVYOF4ywvONSf35QTIW8BR0sM9Lg1GZuwaeincFwLf0cmS4ljGRyZ1
Vsi0SfpIgVQkeY/17onTa1C5X6c2wIE9nq253M58Xnc9B2EWpYImr+4PVZk6s4GI
GExvdPC/rIIwYa0LmvYTTlpHEd7f5qIAhfcEtMAuGSjVDLvmdDGFkaU7TgJ6Jcw2
D12wKSAAgPU=
=S38E
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix preemption delays in the SGX code, remove unnecessarily
UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
make the x86 SMP init code a bit more conservative to fix kexec()
lockups"
* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
x86/smp: Don't send INIT to non-present and non-booted CPUs
affecting certain Intel systems.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmT97GIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iLjRAAjk67cOUubVYXYOwlI4/5c+XiMC1qcqv9
M0uK30h+EECqWdm5Waj4mjFJ3seWUPrNsmdY9L4a6Gd1+s6W10wIabCEJnDBcfj5
M4Nv7OBnbfKb2JGfUfgd0OSSoclxL/QJh5UXMI+vLPKS2hMC5wL3BdKsxwrmbT+y
ZwilRbkHU0kSZpkkKE9yZNOcHhBMZ6LuQGlCAQ5k99Q8Z2KbxPa8T/8zx/RjYfuw
5WrE2gJJPXE/J/IHIvg7auhu2IcpzOmWPtb9J8MPExrUZLSQIYNhdqqyY9lvagKu
OdYcotBEoZM2+MSXzVWSQPKVVKox03FDErwdGef/QLFmzzldFFGUjKSRBfpnMK9Q
esjSiIGXlxh8fFcyCS/6FH48zA5y4BnBSzmPn3xt63VK0+EHPtUOUt8Xiug8DEYs
O8hW+3SYmhr6seF9ioX0RCoqft8dQuNz07zZWfLu5Qc0Bl8aLBhwRoXsYKnKqems
uqQ7bOb1FFwrS6AgebDhqHJUiw6WaeldsEolDyZzbKGJqFHJuTTFaECHApceLD7M
V4ymgy6QoKeccctr8qnwzX/5xbwTx8fRfTIuL9ZR2KC0NJLF/UwIw+bEAq2vuv3A
ROFYH+z1h4IvlCE3o3mRhGGYTxkdFw+BRXrWP690qxyRHBoHE+JwccSu36CKwBYS
LiZn7CWuFtg=
=Zq1S
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event fix from Ingo Molnar:
"Work around a firmware bug in the uncore PMU driver, affecting certain
Intel systems"
* tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Correct the number of CHAs on EMR
* Clean up vCPU targets, always returning generic v8 as the preferred target
* Trap forwarding infrastructure for nested virtualization (used for traps
that are taken from an L2 guest and are needed by the L1 hypervisor)
* FEAT_TLBIRANGE support to only invalidate specific ranges of addresses
when collapsing a table PTE to a block PTE. This avoids that the guest
refills the TLBs again for addresses that aren't covered by the table PTE.
* Fix vPMU issues related to handling of PMUver.
* Don't unnecessary align non-stack allocations in the EL2 VA space
* Drop HCR_VIRT_EXCP_MASK, which was never used...
* Don't use smp_processor_id() in kvm_arch_vcpu_load(),
but the cpu parameter instead
* Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
* Remove prototypes without implementations
RISC-V:
* Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
* Added ONE_REG interface for SATP mode
* Added ONE_REG interface to enable/disable multiple ISA extensions
* Improved error codes returned by ONE_REG interfaces
* Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
* Added get-reg-list selftest for KVM RISC-V
s390:
* PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
* Guest debug fixes (Ilya)
x86:
* Clean up KVM's handling of Intel architectural events
* Intel bugfixes
* Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug
registers and generate/handle #DBs
* Clean up LBR virtualization code
* Fix a bug where KVM fails to set the target pCPU during an IRTE update
* Fix fatal bugs in SEV-ES intrahost migration
* Fix a bug where the recent (architecturally correct) change to reinject
#BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
* Retry APIC map recalculation if a vCPU is added/enabled
* Overhaul emergency reboot code to bring SVM up to par with VMX, tie the
"emergency disabling" behavior to KVM actually being loaded, and move all of
the logic within KVM
* Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC
ratio MSR cannot diverge from the default when TSC scaling is disabled
up related code
* Add a framework to allow "caching" feature flags so that KVM can check if
the guest can use a feature without needing to search guest CPUID
* Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
* Fix KVM's handling of !visible guest roots to avoid premature triple fault
injection
* Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface
that is needed by external users (currently only KVMGT), and fix a variety
of issues in the process
This last item had a silly one-character bug in the topic branch that
was sent to me. Because it caused pretty bad selftest failures in
some configurations, I decided to squash in the fix. So, while the
exact commit ids haven't been in linux-next, the code has (from the
kvm-x86 tree).
Generic:
* Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass
action specific data without needing to constantly update the main handlers.
* Drop unused function declarations
Selftests:
* Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
* Add support for printf() in guest code and covert all guest asserts to use
printf-based reporting
* Clean up the PMU event filter test and add new testcases
* Include x86 selftests in the KVM x86 MAINTAINERS entry
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT1m0kUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMNgggAiN7nz6UC423FznuI+yO3TLm8tkx1
CpKh5onqQogVtchH+vrngi97cfOzZb1/AtifY90OWQi31KEWhehkeofcx7G6ERhj
5a9NFADY1xGBsX4exca/VHDxhnzsbDWaWYPXw5vWFWI6erft9Mvy3tp1LwTvOzqM
v8X4aWz+g5bmo/DWJf4Wu32tEU6mnxzkrjKU14JmyqQTBawVmJ3RYvHVJ/Agpw+n
hRtPAy7FU6XTdkmq/uCT+KUHuJEIK0E/l1js47HFAqSzwdW70UDg14GGo1o4ETxu
RjZQmVNvL57yVgi6QU38/A0FWIsWQm5IlaX1Ug6x8pjZPnUKNbo9BY4T1g==
=W+4p
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Clean up vCPU targets, always returning generic v8 as the preferred
target
- Trap forwarding infrastructure for nested virtualization (used for
traps that are taken from an L2 guest and are needed by the L1
hypervisor)
- FEAT_TLBIRANGE support to only invalidate specific ranges of
addresses when collapsing a table PTE to a block PTE. This avoids
that the guest refills the TLBs again for addresses that aren't
covered by the table PTE.
- Fix vPMU issues related to handling of PMUver.
- Don't unnecessary align non-stack allocations in the EL2 VA space
- Drop HCR_VIRT_EXCP_MASK, which was never used...
- Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
parameter instead
- Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
- Remove prototypes without implementations
RISC-V:
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
s390:
- PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
- Guest debug fixes (Ilya)
x86:
- Clean up KVM's handling of Intel architectural events
- Intel bugfixes
- Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
debug registers and generate/handle #DBs
- Clean up LBR virtualization code
- Fix a bug where KVM fails to set the target pCPU during an IRTE
update
- Fix fatal bugs in SEV-ES intrahost migration
- Fix a bug where the recent (architecturally correct) change to
reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
skip it)
- Retry APIC map recalculation if a vCPU is added/enabled
- Overhaul emergency reboot code to bring SVM up to par with VMX, tie
the "emergency disabling" behavior to KVM actually being loaded,
and move all of the logic within KVM
- Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
TSC ratio MSR cannot diverge from the default when TSC scaling is
disabled up related code
- Add a framework to allow "caching" feature flags so that KVM can
check if the guest can use a feature without needing to search
guest CPUID
- Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
- Fix KVM's handling of !visible guest roots to avoid premature
triple fault injection
- Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
API surface that is needed by external users (currently only
KVMGT), and fix a variety of issues in the process
Generic:
- Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
events to pass action specific data without needing to constantly
update the main handlers.
- Drop unused function declarations
Selftests:
- Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
- Add support for printf() in guest code and covert all guest asserts
to use printf-based reporting
- Clean up the PMU event filter test and add new testcases
- Include x86 selftests in the KVM x86 MAINTAINERS entry"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
KVM: x86/mmu: Include mmu.h in spte.h
KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
KVM: x86/mmu: Disallow guest from using !visible slots for page tables
KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
KVM: x86/mmu: Harden new PGD against roots without shadow pages
KVM: x86/mmu: Add helper to convert root hpa to shadow page
drm/i915/gvt: Drop final dependencies on KVM internal details
KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
KVM: x86/mmu: Assert that correct locks are held for page write-tracking
KVM: x86/mmu: Rename page-track APIs to reflect the new reality
KVM: x86/mmu: Drop infrastructure for multiple page-track modes
KVM: x86/mmu: Use page-track notifiers iff there are external users
KVM: x86/mmu: Move KVM-only page-track declarations to internal header
KVM: x86: Remove the unused page-track hook track_flush_slot()
drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
KVM: x86: Add a new page-track hook to handle memslot deletion
drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
KVM: x86: Reject memslot MOVE operations if KVMGT is attached
...
On large enclaves we hit the softlockup warning with following call trace:
xa_erase()
sgx_vepc_release()
__fput()
task_work_run()
do_exit()
The latency issue is similar to the one fixed in:
8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
The test system has 64GB of enclave memory, and all is assigned to a single VM.
Release of 'vepc' takes a longer time and causes long latencies, which triggers
the softlockup warning.
Add cond_resched() to give other tasks a chance to run and reduce
latencies, which also avoids the softlockup detector.
[ mingo: Rewrote the changelog. ]
Fixes: 540745ddbc70 ("x86/sgx: Introduce virtual EPC for use by KVM guests")
Reported-by: Yu Zhang <yu.zhang@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Yu Zhang <yu.zhang@ionos.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Haitao Huang <haitao.huang@linux.intel.com>
Cc: stable@vger.kernel.org
The arch_calc_vm_prot_bits() macro uses VM_PKEY_BIT0 etc. which are
not part of the UAPI, so the macro is completely useless for userspace.
It is also hidden behind the CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
config switch which we shouldn't expose to userspace. Thus let's move
this macro into a new internal header instead.
Fixes: 8f62c883222c ("x86/mm/pkeys: Add arch-specific VMA protection bits")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/r/20230906162658.142511-1-thuth@redhat.com
With ":text =0xcccc", ld.lld fills unused text area with 0xcccc0000.
Example objdump -D output:
ffffffff82b04203: 00 00 add %al,(%rax)
ffffffff82b04205: cc int3
ffffffff82b04206: cc int3
ffffffff82b04207: 00 00 add %al,(%rax)
ffffffff82b04209: cc int3
ffffffff82b0420a: cc int3
Replace it with ":text =0xcccccccc", so we get the following instead:
ffffffff82b04203: cc int3
ffffffff82b04204: cc int3
ffffffff82b04205: cc int3
ffffffff82b04206: cc int3
ffffffff82b04207: cc int3
ffffffff82b04208: cc int3
gcc/ld doesn't seem to have the same issue. The generated code stays the
same for gcc/ld.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixes: 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
Link: https://lore.kernel.org/r/20230906175215.2236033-1-song@kernel.org
Starting from SPR, the basic uncore PMON information is retrieved from
the discovery table (resides in an MMIO space populated by BIOS). It is
called the discovery method. The existing value of the type->num_boxes
is from the discovery table.
On some SPR variants, there is a firmware bug that makes the value from the
discovery table incorrect. We use the value from the
SPR_MSR_UNC_CBO_CONFIG MSR to replace the one from the discovery table:
38776cc45eb7 ("perf/x86/uncore: Correct the number of CHAs on SPR")
Unfortunately, the SPR_MSR_UNC_CBO_CONFIG isn't available for the EMR
XCC (Always returns 0), but the above firmware bug doesn't impact the
EMR XCC.
Don't let the value from the MSR replace the existing value from the
discovery table.
Fixes: 38776cc45eb7 ("perf/x86/uncore: Correct the number of CHAs on SPR")
Reported-by: Stephane Eranian <eranian@google.com>
Reported-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Yunying Sun <yunying.sun@intel.com>
Link: https://lore.kernel.org/r/20230905134248.496114-1-kan.liang@linux.intel.com
- Enable -Wenum-conversion warning option
- Refactor the rpm-pkg target
- Fix scripts/setlocalversion to consider annotated tags for rt-kernel
- Add a jump key feature for the search menu of 'make nconfig'
- Support Qt6 for 'make xconfig'
- Enable -Wformat-overflow, -Wformat-truncation, -Wstringop-overflow, and
-Wrestrict warnings for W=1 builds
- Replace <asm/export.h> with <linux/export.h> for alpha, ia64, and sparc
- Support DEB_BUILD_OPTIONS=parallel=N for the debian source package
- Refactor scripts/Makefile.modinst and fix some modules_sign issues
- Add a new Kconfig env variable to warn symbols that are not defined anywhere
- Show help messages of config fragments in 'make help'
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmT3X/oVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsG58oQAIXDrka3r53Flky/uJjSl8ab620o
XL3u4PF/ekv6qsZoLlU24WQP8BzcJO6gPHFz88mE9/J1+wHpNKZLZehjpgj1cCY3
LatbEAa3DCZPC/c7P/nz+FT4mjTZpKOeQmvZVfA+xonBHmTyVUKgws0uDB/xuTjE
GARyOX7ymD0AAZv84SUUCiaBe5Y2Bkrki67HfteS4bxW8GHg0rZWzrFUUkEkoG54
elNOYR0WYROwyo8Iokd2MedVdK2SPZxvY8i67hXl2K+Qve6tLNk8dbRIENnYI0pk
7oQVmIfC20eu9CteywHlyjt8jpTOeIrRc2yhJKR0YrjjIzKhulRGMh+pFAAwoySd
Se60uWCS2AydcXWTrtb+iwFUyM2zRK4SaMlxleqnoE/bWYp6jhg9qbV9xpztWSYI
j39k9aX7B19stN1drzJeyXdILRVtaAQCcax3RR+mGgm4Z5fuTDntPepvIv8J3lBg
QZ4MCdOdtFw33eQaKa7O3LddD3q1X355xeaIITivEe3rAk5iIJYu3Ty1VY+/XTcH
ktSVl83zQ5Ge3tvx8D6kCR9J8jAQyTLIKHxvr/j969HgZKguS2i37eChnPyKcu23
ZMKJcmCJ1O7naQXVrb/TeiaMR0UEo/PSdrUjpEO3LlMpRthNXLVSLfgJGv8WLO7/
pb/HFXHgKaSORiRV
=lfUi
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Enable -Wenum-conversion warning option
- Refactor the rpm-pkg target
- Fix scripts/setlocalversion to consider annotated tags for rt-kernel
- Add a jump key feature for the search menu of 'make nconfig'
- Support Qt6 for 'make xconfig'
- Enable -Wformat-overflow, -Wformat-truncation, -Wstringop-overflow,
and -Wrestrict warnings for W=1 builds
- Replace <asm/export.h> with <linux/export.h> for alpha, ia64, and
sparc
- Support DEB_BUILD_OPTIONS=parallel=N for the debian source package
- Refactor scripts/Makefile.modinst and fix some modules_sign issues
- Add a new Kconfig env variable to warn symbols that are not defined
anywhere
- Show help messages of config fragments in 'make help'
* tag 'kbuild-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (62 commits)
kconfig: fix possible buffer overflow
kbuild: Show marked Kconfig fragments in "help"
kconfig: add warn-unknown-symbols sanity check
kbuild: dummy-tools: make MPROFILE_KERNEL checks work on BE
Documentation/llvm: refresh docs
modpost: Skip .llvm.call-graph-profile section check
kbuild: support modules_sign for external modules as well
kbuild: support 'make modules_sign' with CONFIG_MODULE_SIG_ALL=n
kbuild: move more module installation code to scripts/Makefile.modinst
kbuild: reduce the number of mkdir calls during modules_install
kbuild: remove $(MODLIB)/source symlink
kbuild: move depmod rule to scripts/Makefile.modinst
kbuild: add modules_sign to no-{compiler,sync-config}-targets
kbuild: do not run depmod for 'make modules_sign'
kbuild: deb-pkg: support DEB_BUILD_OPTIONS=parallel=N in debian/rules
alpha: remove <asm/export.h>
alpha: replace #include <asm/export.h> with #include <linux/export.h>
ia64: remove <asm/export.h>
ia64: replace #include <asm/export.h> with #include <linux/export.h>
sparc: remove <asm/export.h>
...
- Drop 32-bit checksum implementation and re-use it from arch/x86
- String function cleanup
- Fixes for -Wmissing-variable-declarations and -Wmissing-prototypes builds
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmT0158WHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wTUvD/4yx5F1tFljUrsBipHtg8NX81Wk
/yqidBnug71S76BhkoOHX1P+qxzl3WS+zp35zclL2/kPPew1vc5TpVXXFdZa5Xkh
yaEPGSNKY+yFnaxYWrg/fWYYJbBS+HLDB+Am3fyYdeiXmAEV9K/5HVptZOnDqXnM
0hJf8hKzkfmn0eaXBlvyfZaQSVofK3R9lKalDXMOwc0vp7i1gEHWuN3K1jjM6WCx
/lPkpHKErRCDGyCtr/dY9J0FgCROo5ytSQq7KIBat+F+JBGZh7pcruEbl11fg1Ed
w4RjbcwYITBNHOlrb2kYvxIu7/clxgzYWtcarS0GB3o7Iu8ZQXijWKXakeRV2DBx
by6NAxliZZDp+QmRpPMDgZPPbN5BQY3GDCYY8eAQEAllrTZXE0ZJyIgBPYX6j4bF
pOKAHd/hErpoPXi5Ec7r01+pE9aMOZUEK7ebXSM7Hqnj4btcX+1b5V3FMrAHPqEa
LZpQmeJRtIJQZNRhjjgQx0zd2oJ2OXz0gc+Dap2fj1aSIfGkwIIo0hXvNSoETH3J
Swsr4v+FynCGCWyNsYFEmWMEe9EWOrElvSNk+Hek4gdQc9lByo9JXxSfOxcoaYLG
vebV5kog6NDP3DgmnfmPzAnGjpGwMhjNrMQ9UR1Ul3BUOyvZ4nljdzr4tNYVnF0R
ydtiEOf5IjHW2Jq2Ww==
=UmrW
-----END PGP SIGNATURE-----
Merge tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Drop 32-bit checksum implementation and re-use it from arch/x86
- String function cleanup
- Fixes for -Wmissing-variable-declarations and -Wmissing-prototypes
builds
* tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: virt-pci: fix missing declaration warning
um: Refactor deprecated strncpy to memcpy
um: fix 3 instances of -Wmissing-prototypes
um: port_kern: fix -Wmissing-variable-declarations
uml: audio: fix -Wmissing-variable-declarations
um: vector: refactor deprecated strncpy
um: use obj-y to descend into arch/um/*/
um: Hard-code the result of 'uname -s'
um: Use the x86 checksum implementation on 32-bit
asm-generic: current: Don't include thread-info.h if building asm
um: Remove unsued extern declaration ldt_host_info()
um: Fix hostaudio build errors
um: Remove strlcpy usage
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmT0EE8THHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXg5FCACGJ6n2ikhtRHAENHIVY/mTh+HbhO07
ERzjADfqKF43u1Nt9cslgT4MioqwLjQsAu/A0YcJgVxVSOtg7dnbDmurRAjrGT/3
iKqcVvnaiwSV44TkF8evpeMttZSOg29ImmpyQjoZJJvDMfpxleEK53nuKB9EsjKL
Mz/0gSPoNc79bWF+85cVhgPnGIh9nBarxHqVsuWjMhc+UFhzjf9mOtk34qqPfJ1Q
4RsKGEjkVkeXoG6nGd6Gl/+8WoTpenOZQLchhInocY+k9FlAzW1Kr+ICLDx+Topw
8OJ6fv2rMDOejT9aOaA3/imf7LMer0xSUKb6N0sqQAQX8KzwcOYyKtQJ
=rC/v
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Support for SEV-SNP guests on Hyper-V (Tianyu Lan)
- Support for TDX guests on Hyper-V (Dexuan Cui)
- Use SBRM API in Hyper-V balloon driver (Mitchell Levy)
- Avoid dereferencing ACPI root object handle in VMBus driver (Maciej
Szmigiero)
- A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh
Sengar)
* tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
x86/hyperv: Remove duplicate include
x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's
x86/hyperv: Remove hv_isolation_type_en_snp
x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor
x86/hyperv: Introduce a global variable hyperv_paravisor_present
Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests
Drivers: hv: vmbus: Support fully enlightened TDX guests
x86/hyperv: Support hypercalls for fully enlightened TDX guests
x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV
x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub
hv: hyperv.h: Replace one-element array with flexible-array member
Drivers: hv: vmbus: Don't dereference ACPI root object handle
x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
x86/hyperv: Add smp support for SEV-SNP guest
clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest
x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest
...
Vasant reported that kexec() can hang or reset the machine when it tries to
park CPUs via INIT. This happens when the kernel is using extended APIC,
but the present mask has APIC IDs >= 0x100 enumerated.
As extended APIC can only handle 8 bit of APIC ID sending INIT to APIC ID
0x100 sends INIT to APIC ID 0x0. That's the boot CPU which is special on
x86 and INIT causes the system to hang or resets the machine.
Prevent this by sending INIT only to those CPUs which have been booted
once.
Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/87cyzwjbff.ffs@tglx
Currently the Kconfig fragments in kernel/configs and arch/*/configs
that aren't used internally aren't discoverable through "make help",
which consists of hard-coded lists of config fragments. Instead, list
all the fragment targets that have a "# Help: " comment prefix so the
targets can be generated dynamically.
Add logic to the Makefile to search for and display the fragment and
comment. Add comments to fragments that are intended to be direct targets.
Signed-off-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* Fix PKRU covert channel
* Fix -Wmissing-variable-declarations warning for ia32_xyz_class
* Fix kernel-doc annotation warning
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTyK2sACgkQaDWVMHDJ
krCPyA//WbT65hhGZt5UHayoW3Cv8NmS4kRf6FTVMmt1yIvbOtVmWVZeOGMypprC
jYBPOdTf1SdoUxeD1cTTd/rtbYknczaBLC5VGilGb2/663yQl9ThT3ePc6OgUpPW
28ItLslfHGchHbrOgCgUK8Aiv8xRdfNUJoByMTOoce6eGUQsgYaNOq74z1YaLsZ4
afbitcd/vWO9eJfW5DNl26rIP+ksOgA/Iue8uRiEczoPltnbUxGGLCtuuR9B2Xxi
FqfttvcnUy9yFJBnELx4721VtTcmK4Dq5O/ZoJnmVQqRPY9aXpkbwZnH9co8r/uh
GueQj30l6VhEO6XnwPvjsOghXI2vOMrxlL3zGMw5y6pjfUWa4W6f33EV6mByYfml
dfPTXIz9K1QxznHtk6xVhn1X8CFw7L1L7CC/2CPQPETiuqyjWrHqvFBXcsWpKq4N
1GjnpoveNZmWcE7UY9qEHAXGh7ndI+EIux/0WfJJHOE/fzCr06GphQdxNwPGdi5O
T3JkZf6R4GBvfi6e0F6Qn2B02sDl18remmAMi9bAn/M31g+mvAoHqKLlI9EzAGGz
91PzJdHecdvalU6zsCKqi0ETzq11WG7zCfSXs+jeHBe1jsUITpcZCkpVi2pGA2aq
X7qLjJyW+Sx0hdE2IWFhCOfiwGFpXxfegCl1uax3HrZg6rrORbw=
=prt2
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Dave Hansen:
"The most important fix here adds a missing CPU model to the recent
Gather Data Sampling (GDS) mitigation list to ensure that mitigations
are available on that CPU.
There are also a pair of warning fixes, and closure of a covert
channel that pops up when protection keys are disabled.
Summary:
- Mark all Skylake CPUs as vulnerable to GDS
- Fix PKRU covert channel
- Fix -Wmissing-variable-declarations warning for ia32_xyz_class
- Fix kernel-doc annotation warning"
* tag 'x86-urgent-2023-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu/xstate: Fix PKRU covert channel
x86/irq/i8259: Fix kernel-doc annotation warning
x86/speculation: Mark all Skylake CPUs as vulnerable to GDS
x86/audit: Fix -Wmissing-variable-declarations warning for ia32_xyz_class
Here is the big set of char/misc and other small driver subsystem
changes for 6.6-rc1.
Stuff all over the place here, lots of driver updates and changes and
new additions. Short summary is:
- new IIO drivers and updates
- Interconnect driver updates
- fpga driver updates and additions
- fsi driver updates
- mei driver updates
- coresight driver updates
- nvmem driver updates
- counter driver updates
- lots of smaller misc and char driver updates and additions
All of these have been in linux-next for a long time with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH64g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynr2QCfd3RKeR+WnGzyEOFhksl30UJJhiIAoNZtYT5+
t9KG0iMDXRuTsOqeEQbd
=tVnk
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of char/misc and other small driver subsystem
changes for 6.6-rc1.
Stuff all over the place here, lots of driver updates and changes and
new additions. Short summary is:
- new IIO drivers and updates
- Interconnect driver updates
- fpga driver updates and additions
- fsi driver updates
- mei driver updates
- coresight driver updates
- nvmem driver updates
- counter driver updates
- lots of smaller misc and char driver updates and additions
All of these have been in linux-next for a long time with no reported
problems"
* tag 'char-misc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (267 commits)
nvmem: core: Notify when a new layout is registered
nvmem: core: Do not open-code existing functions
nvmem: core: Return NULL when no nvmem layout is found
nvmem: core: Create all cells before adding the nvmem device
nvmem: u-boot-env:: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
nvmem: sec-qfprom: Add Qualcomm secure QFPROM support
dt-bindings: nvmem: sec-qfprom: Add bindings for secure qfprom
dt-bindings: nvmem: Add compatible for QCM2290
nvmem: Kconfig: Fix typo "drive" -> "driver"
nvmem: Explicitly include correct DT includes
nvmem: add new NXP QorIQ eFuse driver
dt-bindings: nvmem: Add t1023-sfp efuse support
dt-bindings: nvmem: qfprom: Add compatible for MSM8226
nvmem: uniphier: Use devm_platform_get_and_ioremap_resource()
nvmem: qfprom: do some cleanup
nvmem: stm32-romem: Use devm_platform_get_and_ioremap_resource()
nvmem: rockchip-efuse: Use devm_platform_get_and_ioremap_resource()
nvmem: meson-mx-efuse: Convert to devm_platform_ioremap_resource()
nvmem: lpc18xx_otp: Convert to devm_platform_ioremap_resource()
nvmem: brcm_nvram: Use devm_platform_get_and_ioremap_resource()
...
Here is a small set of driver core updates and additions for 6.6-rc1.
Included in here are:
- stable kernel documentation updates
- class structure const work from Ivan on various subsystems
- kernfs tweaks
- driver core tests!
- kobject sanity cleanups
- kobject structure reordering to save space
- driver core error code handling fixups
- other minor driver core cleanups
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH77Q8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylZMACePk8SitfaJc6FfFf5I7YK7Nq0V8MAn0nUjgsR
i8NcNpu/Yv4HGrDgTdh/
=PJbk
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is a small set of driver core updates and additions for 6.6-rc1.
Included in here are:
- stable kernel documentation updates
- class structure const work from Ivan on various subsystems
- kernfs tweaks
- driver core tests!
- kobject sanity cleanups
- kobject structure reordering to save space
- driver core error code handling fixups
- other minor driver core cleanups
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
driver core: Call in reversed order in device_platform_notify_remove()
driver core: Return proper error code when dev_set_name() fails
kobject: Remove redundant checks for whether ktype is NULL
kobject: Add sanity check for kset->kobj.ktype in kset_register()
drivers: base: test: Add missing MODULE_* macros to root device tests
drivers: base: test: Add missing MODULE_* macros for platform devices tests
drivers: base: Free devm resources when unregistering a device
drivers: base: Add basic devm tests for platform devices
drivers: base: Add basic devm tests for root devices
kernfs: fix missing kernfs_iattr_rwsem locking
docs: stable-kernel-rules: mention that regressions must be prevented
docs: stable-kernel-rules: fine-tune various details
docs: stable-kernel-rules: make the examples for option 1 a proper list
docs: stable-kernel-rules: move text around to improve flow
docs: stable-kernel-rules: improve structure by changing headlines
base/node: Remove duplicated include
kernfs: attach uuid for every kernfs and report it in fsid
kernfs: add stub helper for kernfs_generic_poll()
x86/resctrl: make pseudo_lock_class a static const structure
x86/MSR: make msr_class a static const structure
...
When XCR0[9] is set, PKRU can be read and written from userspace with
XSAVE and XRSTOR, even when CR4.PKE is clear.
Clear XCR0[9] when protection keys are disabled.
Reported-by: Tavis Ormandy <taviso@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20230831043228.1194256-1-jmattson@google.com
Convert IBT selftest to asm to fix objtool warning
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ
krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC
Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/
gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO
Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR
ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx
rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9
MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/
Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s
VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh
8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp
AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA=
=3UUm
-----END PGP SIGNATURE-----
Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 shadow stack support from Dave Hansen:
"This is the long awaited x86 shadow stack support, part of Intel's
Control-flow Enforcement Technology (CET).
CET consists of two related security features: shadow stacks and
indirect branch tracking. This series implements just the shadow stack
part of this feature, and just for userspace.
The main use case for shadow stack is providing protection against
return oriented programming attacks. It works by maintaining a
secondary (shadow) stack using a special memory type that has
protections against modification. When executing a CALL instruction,
the processor pushes the return address to both the normal stack and
to the special permission shadow stack. Upon RET, the processor pops
the shadow stack copy and compares it to the normal stack copy.
For more information, refer to the links below for the earlier
versions of this patch set"
Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/
* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
x86/shstk: Change order of __user in type
x86/ibt: Convert IBT selftest to asm
x86/shstk: Don't retry vm_munmap() on -EINTR
x86/kbuild: Fix Documentation/ reference
x86/shstk: Move arch detail comment out of core mm
x86/shstk: Add ARCH_SHSTK_STATUS
x86/shstk: Add ARCH_SHSTK_UNLOCK
x86: Add PTRACE interface for shadow stack
selftests/x86: Add shadow stack test
x86/cpufeatures: Enable CET CR4 bit for shadow stack
x86/shstk: Wire in shadow stack interface
x86: Expose thread features in /proc/$PID/status
x86/shstk: Support WRSS for userspace
x86/shstk: Introduce map_shadow_stack syscall
x86/shstk: Check that signal frame is shadow stack mem
x86/shstk: Check that SSP is aligned on sigreturn
x86/shstk: Handle signals for shadow stack
x86/shstk: Introduce routines modifying shstk
x86/shstk: Handle thread shadow stack
x86/shstk: Add user-mode shadow stack support
...
Fix this warning:
arch/x86/kernel/i8259.c:235: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* ELCR registers (0x4d0, 0x4d1) control edge/level of IRQ
CC arch/x86/kernel/irqinit.o
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230830131211.88226-1-vincenzopalazzodev@gmail.com
The Gather Data Sampling (GDS) vulnerability is common to all Skylake
processors. However, the "client" Skylakes* are now in this list:
https://www.intel.com/content/www/us/en/support/articles/000022396/processors.html
which means they are no longer included for new vulnerabilities here:
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
or in other GDS documentation. Thus, they were not included in the
original GDS mitigation patches.
Mark SKYLAKE and SKYLAKE_L as vulnerable to GDS to match all the
other Skylake CPUs (which include Kaby Lake). Also group the CPUs
so that the ones that share the exact same vulnerabilities are next
to each other.
Last, move SRBDS to the end of each line. This makes it clear at a
glance that SKYLAKE_X is unique. Of the five Skylakes, it is the
only "server" CPU and has a different implementation from the
clients of the "special register" hardware, making it immune to SRBDS.
This makes the diff much harder to read, but the resulting table is
worth it.
I very much appreciate the report from Michael Zhivich about this
issue. Despite what level of support a hardware vendor is providing,
the kernel very much needs an accurate and up-to-date list of
vulnerable CPUs. More reports like this are very welcome.
* Client Skylakes are CPUID 406E3/506E3 which is family 6, models
0x4E and 0x5E, aka INTEL_FAM6_SKYLAKE and INTEL_FAM6_SKYLAKE_L.
Reported-by: Michael Zhivich <mzhivich@akamai.com>
Fixes: 8974eb588283 ("x86/speculation: Add Gather Data Sampling mitigation")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Explicitly include mmu.h in spte.h instead of relying on the "parent" to
include mmu.h. spte.h references a variety of macros and variables that
are defined/declared in mmu.h, and so including spte.h before (or instead
of) mmu.h will result in build errors, e.g.
arch/x86/kvm/mmu/spte.h: In function ‘is_mmio_spte’:
arch/x86/kvm/mmu/spte.h:242:23: error: ‘enable_mmio_caching’ undeclared
242 | likely(enable_mmio_caching);
| ^~~~~~~~~~~~~~~~~~~
arch/x86/kvm/mmu/spte.h: In function ‘is_large_pte’:
arch/x86/kvm/mmu/spte.h:302:22: error: ‘PT_PAGE_SIZE_MASK’ undeclared
302 | return pte & PT_PAGE_SIZE_MASK;
| ^~~~~~~~~~~~~~~~~
arch/x86/kvm/mmu/spte.h: In function ‘is_dirty_spte’:
arch/x86/kvm/mmu/spte.h:332:56: error: ‘PT_WRITABLE_MASK’ undeclared
332 | return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK;
| ^~~~~~~~~~~~~~~~
Fixes: 5a9624affe7c ("KVM: mmu: extract spte.h and spte.c")
Link: https://lore.kernel.org/r/20230808224059.2492476-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When attempting to allocate a shadow root for a !visible guest root gfn,
e.g. that resides in MMIO space, load a dummy root that is backed by the
zero page instead of immediately synthesizing a triple fault shutdown
(using the zero page ensures any attempt to translate memory will generate
a !PRESENT fault and thus VM-Exit).
Unless the vCPU is racing with memslot activity, KVM will inject a page
fault due to not finding a visible slot in FNAME(walk_addr_generic), i.e.
the end result is mostly same, but critically KVM will inject a fault only
*after* KVM runs the vCPU with the bogus root.
Waiting to inject a fault until after running the vCPU fixes a bug where
KVM would bail from nested VM-Enter if L1 tried to run L2 with TDP enabled
and a !visible root. Even though a bad root will *probably* lead to
shutdown, (a) it's not guaranteed and (b) the CPU won't read the
underlying memory until after VM-Enter succeeds. E.g. if L1 runs L2 with
a VMX preemption timer value of '0', then architecturally the preemption
timer VM-Exit is guaranteed to occur before the CPU executes any
instruction, i.e. before the CPU needs to translate a GPA to a HPA (so
long as there are no injected events with higher priority than the
preemption timer).
If KVM manages to get to FNAME(fetch) with a dummy root, e.g. because
userspace created a memslot between installing the dummy root and handling
the page fault, simply unload the MMU to allocate a new root and retry the
instruction. Use KVM_REQ_MMU_FREE_OBSOLETE_ROOTS to drop the root, as
invoking kvm_mmu_free_roots() while holding mmu_lock would deadlock, and
conceptually the dummy root has indeeed become obsolete. The only
difference versus existing usage of KVM_REQ_MMU_FREE_OBSOLETE_ROOTS is
that the root has become obsolete due to memslot *creation*, not memslot
deletion or movement.
Reported-by: Reima Ishii <ishiir@g.ecc.u-tokyo.ac.jp>
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Link: https://lore.kernel.org/r/20230729005200.1057358-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly inject a page fault if guest attempts to use a !visible gfn
as a page table. kvm_vcpu_gfn_to_hva_prot() will naturally handle the
case where there is no memslot, but doesn't catch the scenario where the
gfn points at a KVM-internal memslot.
Letting the guest backdoor its way into accessing KVM-internal memslots
isn't dangerous on its own, e.g. at worst the guest can crash itself, but
disallowing the behavior will simplify fixing how KVM handles !visible
guest root gfns (immediately synthesizing a triple fault when loading the
root is architecturally wrong).
Link: https://lore.kernel.org/r/20230729005200.1057358-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly check that tdp_iter_start() is handed a valid shadow page
to harden KVM against bugs, e.g. if KVM calls into the TDP MMU with an
invalid or shadow MMU root (which would be a fatal KVM bug), the shadow
page pointer will be NULL.
Opportunistically stop the TDP MMU iteration instead of continuing on
with garbage if the incoming root is bogus. Attempting to walk a garbage
root is more likely to caused major problems than doing nothing.
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Link: https://lore.kernel.org/r/20230729005200.1057358-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Harden kvm_mmu_new_pgd() against NULL pointer dereference bugs by sanity
checking that the target root has an associated shadow page prior to
dereferencing said shadow page. The code in question is guaranteed to
only see roots with shadow pages as fast_pgd_switch() explicitly frees the
current root if it doesn't have a shadow page, i.e. is a PAE root, and
that in turn prevents valid roots from being cached, but that's all very
subtle.
Link: https://lore.kernel.org/r/20230729005200.1057358-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a dedicated helper for converting a root hpa to a shadow page in
anticipation of using a "dummy" root to handle the scenario where KVM
needs to load a valid shadow root (from hardware's perspective), but
the guest doesn't have a visible root to shadow. Similar to PAE roots,
the dummy root won't have an associated kvm_mmu_page and will need special
handling when finding a shadow page given a root.
Opportunistically retrieve the root shadow page in kvm_mmu_sync_roots()
*after* verifying the root is unsync (the dummy root can never be unsync).
Link: https://lore.kernel.org/r/20230729005200.1057358-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Get/put references to KVM when a page-track notifier is (un)registered
instead of relying on the caller to do so. Forcing the caller to do the
bookkeeping is unnecessary and adds one more thing for users to get
wrong, e.g. see commit 9ed1fdee9ee3 ("drm/i915/gvt: Get reference to KVM
iff attachment to VM is successful").
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-29-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>