Misc KVM x86 fixes and cleanups for 6.2:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up the MSR filter docs. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmOJQesSHHNlYW5qY0Bn b29nbGUuY29tAAoJEGCRIgFNDBL5IR4QAKGPbLRykY/2FohV2HDu5fDPxA2Fe9nu 5W7ZIptQu+tQtCTWKFEjcQdwYoNrLbr0hr1eGubVbIvBqJbwPQfH7G0765eOIcvy s6Zn2N24IisIoUxdkJGOL3Tt1UR7wCFbwC+ms0i4FQvMcw+TbM0BTHgJDdwR5laX mGN7ubz5iImwDFFE3Bd8Qy5I+FGL9CI60l+RzK6b7J8HYi1wOBMLU9QueF/dB7gR g+navZJAAnvN6YIkjP5j8yPBuvhDzni379ue5ATDq1ALvyyI7xlYALsxpUjCnLuo CkbvgmfmC94Vdm7pzFgsbazUN2oIjwoimjFQHP1bf8Jmd+770R282JpnwiD/ydCV Tl2ArwXA2zxVxNZm9g/XqPBwWBWWgWfYIQbuuxc065MnXCnHkY5UGGf0JNx41CDl hdtm9DHkft8+6kyBBmgkdKxd328Znljq02v3nLePUipfpDVaNj4VAUj9IpV6Lpuj GJjs4Wx7oqFwH1Im/LqZgnOGwgkSj3ObHtkYx2RSrQAQultbjuplFz2qZWP8PF6A FrJbcddKOmLINrdNOlvTd5WKCAjtV8vycjFkk+/7H67rpZdM8AI1StrzMP6gmwg4 ARozZJ2UF8nTriRYFQbFQyNm9bBTZ7YQ/HajqfhvCuZLi7i1EaImhC0F1xn7IU5S 6XhvQPvjRTgS =i6OA -----END PGP SIGNATURE----- Merge tag 'kvm-x86-fixes-6.2-1' of https://github.com/kvm-x86/linux into HEAD Misc KVM x86 fixes and cleanups for 6.2: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up the MSR filter docs. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency.
This commit is contained in:
commit
b376144595
@ -4079,80 +4079,71 @@ flags values for ``struct kvm_msr_filter_range``:
|
||||
``KVM_MSR_FILTER_READ``
|
||||
|
||||
Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap
|
||||
indicates that a read should immediately fail, while a 1 indicates that
|
||||
a read for a particular MSR should be handled regardless of the default
|
||||
indicates that read accesses should be denied, while a 1 indicates that
|
||||
a read for a particular MSR should be allowed regardless of the default
|
||||
filter action.
|
||||
|
||||
``KVM_MSR_FILTER_WRITE``
|
||||
|
||||
Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap
|
||||
indicates that a write should immediately fail, while a 1 indicates that
|
||||
a write for a particular MSR should be handled regardless of the default
|
||||
indicates that write accesses should be denied, while a 1 indicates that
|
||||
a write for a particular MSR should be allowed regardless of the default
|
||||
filter action.
|
||||
|
||||
``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE``
|
||||
|
||||
Filter both read and write accesses to MSRs using the given bitmap. A 0
|
||||
in the bitmap indicates that both reads and writes should immediately fail,
|
||||
while a 1 indicates that reads and writes for a particular MSR are not
|
||||
filtered by this range.
|
||||
|
||||
flags values for ``struct kvm_msr_filter``:
|
||||
|
||||
``KVM_MSR_FILTER_DEFAULT_ALLOW``
|
||||
|
||||
If no filter range matches an MSR index that is getting accessed, KVM will
|
||||
fall back to allowing access to the MSR.
|
||||
allow accesses to all MSRs by default.
|
||||
|
||||
``KVM_MSR_FILTER_DEFAULT_DENY``
|
||||
|
||||
If no filter range matches an MSR index that is getting accessed, KVM will
|
||||
fall back to rejecting access to the MSR. In this mode, all MSRs that should
|
||||
be processed by KVM need to explicitly be marked as allowed in the bitmaps.
|
||||
deny accesses to all MSRs by default.
|
||||
|
||||
This ioctl allows user space to define up to 16 bitmaps of MSR ranges to
|
||||
specify whether a certain MSR access should be explicitly filtered for or not.
|
||||
This ioctl allows userspace to define up to 16 bitmaps of MSR ranges to deny
|
||||
guest MSR accesses that would normally be allowed by KVM. If an MSR is not
|
||||
covered by a specific range, the "default" filtering behavior applies. Each
|
||||
bitmap range covers MSRs from [base .. base+nmsrs).
|
||||
|
||||
If this ioctl has never been invoked, MSR accesses are not guarded and the
|
||||
default KVM in-kernel emulation behavior is fully preserved.
|
||||
If an MSR access is denied by userspace, the resulting KVM behavior depends on
|
||||
whether or not KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER is
|
||||
enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace
|
||||
on denied accesses, i.e. userspace effectively intercepts the MSR access. If
|
||||
KVM_MSR_EXIT_REASON_FILTER is not enabled, KVM will inject a #GP into the guest
|
||||
on denied accesses.
|
||||
|
||||
If an MSR access is allowed by userspace, KVM will emulate and/or virtualize
|
||||
the access in accordance with the vCPU model. Note, KVM may still ultimately
|
||||
inject a #GP if an access is allowed by userspace, e.g. if KVM doesn't support
|
||||
the MSR, or to follow architectural behavior for the MSR.
|
||||
|
||||
By default, KVM operates in KVM_MSR_FILTER_DEFAULT_ALLOW mode with no MSR range
|
||||
filters.
|
||||
|
||||
Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR
|
||||
filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes
|
||||
an error.
|
||||
|
||||
As soon as the filtering is in place, every MSR access is processed through
|
||||
the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff);
|
||||
x2APIC MSRs are always allowed, independent of the ``default_allow`` setting,
|
||||
and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base
|
||||
register.
|
||||
|
||||
.. warning::
|
||||
MSR accesses coming from nested vmentry/vmexit are not filtered.
|
||||
MSR accesses as part of nested VM-Enter/VM-Exit are not filtered.
|
||||
This includes both writes to individual VMCS fields and reads/writes
|
||||
through the MSR lists pointed to by the VMCS.
|
||||
|
||||
If a bit is within one of the defined ranges, read and write accesses are
|
||||
guarded by the bitmap's value for the MSR index if the kind of access
|
||||
is included in the ``struct kvm_msr_filter_range`` flags. If no range
|
||||
cover this particular access, the behavior is determined by the flags
|
||||
field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW``
|
||||
and ``KVM_MSR_FILTER_DEFAULT_DENY``.
|
||||
|
||||
Each bitmap range specifies a range of MSRs to potentially allow access on.
|
||||
The range goes from MSR index [base .. base+nmsrs]. The flags field
|
||||
indicates whether reads, writes or both reads and writes are filtered
|
||||
by setting a 1 bit in the bitmap for the corresponding MSR index.
|
||||
|
||||
If an MSR access is not permitted through the filtering, it generates a
|
||||
#GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that
|
||||
allows user space to deflect and potentially handle various MSR accesses
|
||||
into user space.
|
||||
x2APIC MSR accesses cannot be filtered (KVM silently ignores filters that
|
||||
cover any x2APIC MSRs).
|
||||
|
||||
Note, invoking this ioctl while a vCPU is running is inherently racy. However,
|
||||
KVM does guarantee that vCPUs will see either the previous filter or the new
|
||||
filter, e.g. MSRs with identical settings in both the old and new filter will
|
||||
have deterministic behavior.
|
||||
|
||||
Similarly, if userspace wishes to intercept on denied accesses,
|
||||
KVM_MSR_EXIT_REASON_FILTER must be enabled before activating any filters, and
|
||||
left enabled until after all filters are deactivated. Failure to do so may
|
||||
result in KVM injecting a #GP instead of exiting to userspace.
|
||||
|
||||
4.98 KVM_CREATE_SPAPR_TCE_64
|
||||
----------------------------
|
||||
|
||||
@ -6457,31 +6448,33 @@ if it decides to decode and emulate the instruction.
|
||||
|
||||
Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
|
||||
enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code
|
||||
will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
|
||||
may instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
|
||||
exit for writes.
|
||||
|
||||
The "reason" field specifies why the MSR trap occurred. User space will only
|
||||
receive MSR exit traps when a particular reason was requested during through
|
||||
The "reason" field specifies why the MSR interception occurred. Userspace will
|
||||
only receive MSR exits when a particular reason was requested during through
|
||||
ENABLE_CAP. Currently valid exit reasons are:
|
||||
|
||||
KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM
|
||||
KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits
|
||||
KVM_MSR_EXIT_REASON_FILTER - access blocked by KVM_X86_SET_MSR_FILTER
|
||||
|
||||
For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest
|
||||
wants to read. To respond to this request with a successful read, user space
|
||||
For KVM_EXIT_X86_RDMSR, the "index" field tells userspace which MSR the guest
|
||||
wants to read. To respond to this request with a successful read, userspace
|
||||
writes the respective data into the "data" field and must continue guest
|
||||
execution to ensure the read data is transferred into guest register state.
|
||||
|
||||
If the RDMSR request was unsuccessful, user space indicates that with a "1" in
|
||||
If the RDMSR request was unsuccessful, userspace indicates that with a "1" in
|
||||
the "error" field. This will inject a #GP into the guest when the VCPU is
|
||||
executed again.
|
||||
|
||||
For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest
|
||||
wants to write. Once finished processing the event, user space must continue
|
||||
vCPU execution. If the MSR write was unsuccessful, user space also sets the
|
||||
For KVM_EXIT_X86_WRMSR, the "index" field tells userspace which MSR the guest
|
||||
wants to write. Once finished processing the event, userspace must continue
|
||||
vCPU execution. If the MSR write was unsuccessful, userspace also sets the
|
||||
"error" field to "1".
|
||||
|
||||
See KVM_X86_SET_MSR_FILTER for details on the interaction with MSR filtering.
|
||||
|
||||
::
|
||||
|
||||
|
||||
@ -7247,19 +7240,27 @@ the module parameter for the target VM.
|
||||
:Parameters: args[0] contains the mask of KVM_MSR_EXIT_REASON_* events to report
|
||||
:Returns: 0 on success; -1 on error
|
||||
|
||||
This capability enables trapping of #GP invoking RDMSR and WRMSR instructions
|
||||
into user space.
|
||||
This capability allows userspace to intercept RDMSR and WRMSR instructions if
|
||||
access to an MSR is denied. By default, KVM injects #GP on denied accesses.
|
||||
|
||||
When a guest requests to read or write an MSR, KVM may not implement all MSRs
|
||||
that are relevant to a respective system. It also does not differentiate by
|
||||
CPU type.
|
||||
|
||||
To allow more fine grained control over MSR handling, user space may enable
|
||||
To allow more fine grained control over MSR handling, userspace may enable
|
||||
this capability. With it enabled, MSR accesses that match the mask specified in
|
||||
args[0] and trigger a #GP event inside the guest by KVM will instead trigger
|
||||
KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
|
||||
can then handle to implement model specific MSR handling and/or user notifications
|
||||
to inform a user that an MSR was not handled.
|
||||
args[0] and would trigger a #GP inside the guest will instead trigger
|
||||
KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications. Userspace
|
||||
can then implement model specific MSR handling and/or user notifications
|
||||
to inform a user that an MSR was not emulated/virtualized by KVM.
|
||||
|
||||
The valid mask flags are:
|
||||
|
||||
KVM_MSR_EXIT_REASON_UNKNOWN - intercept accesses to unknown (to KVM) MSRs
|
||||
KVM_MSR_EXIT_REASON_INVAL - intercept accesses that are architecturally
|
||||
invalid according to the vCPU model and/or mode
|
||||
KVM_MSR_EXIT_REASON_FILTER - intercept accesses that are denied by userspace
|
||||
via KVM_X86_SET_MSR_FILTER
|
||||
|
||||
7.22 KVM_CAP_X86_BUS_LOCK_EXIT
|
||||
-------------------------------
|
||||
@ -7919,7 +7920,7 @@ KVM_EXIT_X86_WRMSR exit notifications.
|
||||
This capability indicates that KVM supports that accesses to user defined MSRs
|
||||
may be rejected. With this capability exposed, KVM exports new VM ioctl
|
||||
KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR
|
||||
ranges that KVM should reject access to.
|
||||
ranges that KVM should deny access to.
|
||||
|
||||
In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
|
||||
trap and emulate MSRs that are outside of the scope of KVM as well as
|
||||
|
@ -465,9 +465,9 @@ static void sev_clflush_pages(struct page *pages[], unsigned long npages)
|
||||
return;
|
||||
|
||||
for (i = 0; i < npages; i++) {
|
||||
page_virtual = kmap_atomic(pages[i]);
|
||||
page_virtual = kmap_local_page(pages[i]);
|
||||
clflush_cache_range(page_virtual, PAGE_SIZE);
|
||||
kunmap_atomic(page_virtual);
|
||||
kunmap_local(page_virtual);
|
||||
cond_resched();
|
||||
}
|
||||
}
|
||||
|
@ -3895,8 +3895,14 @@ static int svm_vcpu_pre_run(struct kvm_vcpu *vcpu)
|
||||
|
||||
static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_MSR &&
|
||||
to_svm(vcpu)->vmcb->control.exit_info_1)
|
||||
struct vmcb_control_area *control = &to_svm(vcpu)->vmcb->control;
|
||||
|
||||
/*
|
||||
* Note, the next RIP must be provided as SRCU isn't held, i.e. KVM
|
||||
* can't read guest memory (dereference memslots) to decode the WRMSR.
|
||||
*/
|
||||
if (control->exit_code == SVM_EXIT_MSR && control->exit_info_1 &&
|
||||
nrips && control->next_rip)
|
||||
return handle_fastpath_set_msr_irqoff(vcpu);
|
||||
|
||||
return EXIT_FASTPATH_NONE;
|
||||
|
@ -2588,12 +2588,9 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
|
||||
nested_ept_init_mmu_context(vcpu);
|
||||
|
||||
/*
|
||||
* This sets GUEST_CR0 to vmcs12->guest_cr0, possibly modifying those
|
||||
* bits which we consider mandatory enabled.
|
||||
* The CR0_READ_SHADOW is what L2 should have expected to read given
|
||||
* the specifications by L1; It's not enough to take
|
||||
* vmcs12->cr0_read_shadow because on our cr0_guest_host_mask we
|
||||
* have more bits than L1 expected.
|
||||
* Override the CR0/CR4 read shadows after setting the effective guest
|
||||
* CR0/CR4. The common helpers also set the shadows, but they don't
|
||||
* account for vmcs12's cr0/4_guest_host_mask.
|
||||
*/
|
||||
vmx_set_cr0(vcpu, vmcs12->guest_cr0);
|
||||
vmcs_writel(CR0_READ_SHADOW, nested_read_cr0(vmcs12));
|
||||
@ -4798,6 +4795,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
|
||||
|
||||
vmx_switch_vmcs(vcpu, &vmx->vmcs01);
|
||||
|
||||
/*
|
||||
* If IBRS is advertised to the vCPU, KVM must flush the indirect
|
||||
* branch predictors when transitioning from L2 to L1, as L1 expects
|
||||
* hardware (KVM in this case) to provide separate predictor modes.
|
||||
* Bare metal isolates VMX root (host) from VMX non-root (guest), but
|
||||
* doesn't isolate different VMCSs, i.e. in this case, doesn't provide
|
||||
* separate modes for L2 vs L1.
|
||||
*/
|
||||
if (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
|
||||
indirect_branch_prediction_barrier();
|
||||
|
||||
/* Update any VMCS fields that might have changed while L2 ran */
|
||||
vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
|
||||
vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
|
||||
@ -5131,24 +5139,35 @@ static int handle_vmxon(struct kvm_vcpu *vcpu)
|
||||
| FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
|
||||
|
||||
/*
|
||||
* Note, KVM cannot rely on hardware to perform the CR0/CR4 #UD checks
|
||||
* that have higher priority than VM-Exit (see Intel SDM's pseudocode
|
||||
* for VMXON), as KVM must load valid CR0/CR4 values into hardware while
|
||||
* running the guest, i.e. KVM needs to check the _guest_ values.
|
||||
* Manually check CR4.VMXE checks, KVM must force CR4.VMXE=1 to enter
|
||||
* the guest and so cannot rely on hardware to perform the check,
|
||||
* which has higher priority than VM-Exit (see Intel SDM's pseudocode
|
||||
* for VMXON).
|
||||
*
|
||||
* Rely on hardware for the other two pre-VM-Exit checks, !VM86 and
|
||||
* !COMPATIBILITY modes. KVM may run the guest in VM86 to emulate Real
|
||||
* Mode, but KVM will never take the guest out of those modes.
|
||||
* Rely on hardware for the other pre-VM-Exit checks, CR0.PE=1, !VM86
|
||||
* and !COMPATIBILITY modes. For an unrestricted guest, KVM doesn't
|
||||
* force any of the relevant guest state. For a restricted guest, KVM
|
||||
* does force CR0.PE=1, but only to also force VM86 in order to emulate
|
||||
* Real Mode, and so there's no need to check CR0.PE manually.
|
||||
*/
|
||||
if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) ||
|
||||
!nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) {
|
||||
if (!kvm_read_cr4_bits(vcpu, X86_CR4_VMXE)) {
|
||||
kvm_queue_exception(vcpu, UD_VECTOR);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* CPL=0 and all other checks that are lower priority than VM-Exit must
|
||||
* be checked manually.
|
||||
* The CPL is checked for "not in VMX operation" and for "in VMX root",
|
||||
* and has higher priority than the VM-Fail due to being post-VMXON,
|
||||
* i.e. VMXON #GPs outside of VMX non-root if CPL!=0. In VMX non-root,
|
||||
* VMXON causes VM-Exit and KVM unconditionally forwards VMXON VM-Exits
|
||||
* from L2 to L1, i.e. there's no need to check for the vCPU being in
|
||||
* VMX non-root.
|
||||
*
|
||||
* Forwarding the VM-Exit unconditionally, i.e. without performing the
|
||||
* #UD checks (see above), is functionally ok because KVM doesn't allow
|
||||
* L1 to run L2 without CR4.VMXE=0, and because KVM never modifies L2's
|
||||
* CR0 or CR4, i.e. it's L2's responsibility to emulate #UDs that are
|
||||
* missed by hardware due to shadowing CR0 and/or CR4.
|
||||
*/
|
||||
if (vmx_get_cpl(vcpu)) {
|
||||
kvm_inject_gp(vcpu, 0);
|
||||
@ -5158,6 +5177,17 @@ static int handle_vmxon(struct kvm_vcpu *vcpu)
|
||||
if (vmx->nested.vmxon)
|
||||
return nested_vmx_fail(vcpu, VMXERR_VMXON_IN_VMX_ROOT_OPERATION);
|
||||
|
||||
/*
|
||||
* Invalid CR0/CR4 generates #GP. These checks are performed if and
|
||||
* only if the vCPU isn't already in VMX operation, i.e. effectively
|
||||
* have lower priority than the VM-Fail above.
|
||||
*/
|
||||
if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) ||
|
||||
!nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) {
|
||||
kvm_inject_gp(vcpu, 0);
|
||||
return 1;
|
||||
}
|
||||
|
||||
if ((vmx->msr_ia32_feature_control & VMXON_NEEDED_FEATURES)
|
||||
!= VMXON_NEEDED_FEATURES) {
|
||||
kvm_inject_gp(vcpu, 0);
|
||||
|
@ -79,9 +79,10 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
|
||||
/*
|
||||
* Return the cr0 value that a nested guest would read. This is a combination
|
||||
* of the real cr0 used to run the guest (guest_cr0), and the bits shadowed by
|
||||
* its hypervisor (cr0_read_shadow).
|
||||
* Return the cr0/4 value that a nested guest would read. This is a combination
|
||||
* of L1's "real" cr0 used to run the guest (guest_cr0), and the bits shadowed
|
||||
* by the L1 hypervisor (cr0_read_shadow). KVM must emulate CPU behavior as
|
||||
* the value+mask loaded into vmcs02 may not match the vmcs12 fields.
|
||||
*/
|
||||
static inline unsigned long nested_read_cr0(struct vmcs12 *fields)
|
||||
{
|
||||
|
@ -182,8 +182,10 @@ static int __handle_encls_ecreate(struct kvm_vcpu *vcpu,
|
||||
/* Enforce CPUID restriction on max enclave size. */
|
||||
max_size_log2 = (attributes & SGX_ATTR_MODE64BIT) ? sgx_12_0->edx >> 8 :
|
||||
sgx_12_0->edx;
|
||||
if (size >= BIT_ULL(max_size_log2))
|
||||
if (size >= BIT_ULL(max_size_log2)) {
|
||||
kvm_inject_gp(vcpu, 0);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* sgx_virt_ecreate() returns:
|
||||
|
@ -269,6 +269,7 @@ SYM_FUNC_END(__vmx_vcpu_run)
|
||||
|
||||
.section .text, "ax"
|
||||
|
||||
#ifndef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
|
||||
/**
|
||||
* vmread_error_trampoline - Trampoline from inline asm to vmread_error()
|
||||
* @field: VMCS field encoding that failed
|
||||
@ -317,6 +318,7 @@ SYM_FUNC_START(vmread_error_trampoline)
|
||||
|
||||
RET
|
||||
SYM_FUNC_END(vmread_error_trampoline)
|
||||
#endif
|
||||
|
||||
SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
|
||||
/*
|
||||
|
@ -858,7 +858,7 @@ unsigned int __vmx_vcpu_run_flags(struct vcpu_vmx *vmx)
|
||||
* to change it directly without causing a vmexit. In that case read
|
||||
* it after vmexit and store it in vmx->spec_ctrl.
|
||||
*/
|
||||
if (unlikely(!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL)))
|
||||
if (!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL))
|
||||
flags |= VMX_RUN_SAVE_SPEC_CTRL;
|
||||
|
||||
return flags;
|
||||
@ -1348,8 +1348,10 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
|
||||
|
||||
/*
|
||||
* No indirect branch prediction barrier needed when switching
|
||||
* the active VMCS within a guest, e.g. on nested VM-Enter.
|
||||
* The L1 VMM can protect itself with retpolines, IBPB or IBRS.
|
||||
* the active VMCS within a vCPU, unless IBRS is advertised to
|
||||
* the vCPU. To minimize the number of IBPBs executed, KVM
|
||||
* performs IBPB on nested VM-Exit (a single nested transition
|
||||
* may switch the active VMCS multiple times).
|
||||
*/
|
||||
if (!buddy || WARN_ON_ONCE(buddy->vmcs != prev))
|
||||
indirect_branch_prediction_barrier();
|
||||
@ -1834,12 +1836,42 @@ bool nested_vmx_allowed(struct kvm_vcpu *vcpu)
|
||||
return nested && guest_cpuid_has(vcpu, X86_FEATURE_VMX);
|
||||
}
|
||||
|
||||
static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
|
||||
uint64_t val)
|
||||
{
|
||||
uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
|
||||
/*
|
||||
* Userspace is allowed to set any supported IA32_FEATURE_CONTROL regardless of
|
||||
* guest CPUID. Note, KVM allows userspace to set "VMX in SMX" to maintain
|
||||
* backwards compatibility even though KVM doesn't support emulating SMX. And
|
||||
* because userspace set "VMX in SMX", the guest must also be allowed to set it,
|
||||
* e.g. if the MSR is left unlocked and the guest does a RMW operation.
|
||||
*/
|
||||
#define KVM_SUPPORTED_FEATURE_CONTROL (FEAT_CTL_LOCKED | \
|
||||
FEAT_CTL_VMX_ENABLED_INSIDE_SMX | \
|
||||
FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX | \
|
||||
FEAT_CTL_SGX_LC_ENABLED | \
|
||||
FEAT_CTL_SGX_ENABLED | \
|
||||
FEAT_CTL_LMCE_ENABLED)
|
||||
|
||||
return !(val & ~valid_bits);
|
||||
static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
|
||||
struct msr_data *msr)
|
||||
{
|
||||
uint64_t valid_bits;
|
||||
|
||||
/*
|
||||
* Ensure KVM_SUPPORTED_FEATURE_CONTROL is updated when new bits are
|
||||
* exposed to the guest.
|
||||
*/
|
||||
WARN_ON_ONCE(vmx->msr_ia32_feature_control_valid_bits &
|
||||
~KVM_SUPPORTED_FEATURE_CONTROL);
|
||||
|
||||
if (!msr->host_initiated &&
|
||||
(vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED))
|
||||
return false;
|
||||
|
||||
if (msr->host_initiated)
|
||||
valid_bits = KVM_SUPPORTED_FEATURE_CONTROL;
|
||||
else
|
||||
valid_bits = vmx->msr_ia32_feature_control_valid_bits;
|
||||
|
||||
return !(msr->data & ~valid_bits);
|
||||
}
|
||||
|
||||
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
|
||||
@ -2238,10 +2270,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
vcpu->arch.mcg_ext_ctl = data;
|
||||
break;
|
||||
case MSR_IA32_FEAT_CTL:
|
||||
if (!vmx_feature_control_msr_valid(vcpu, data) ||
|
||||
(to_vmx(vcpu)->msr_ia32_feature_control &
|
||||
FEAT_CTL_LOCKED && !msr_info->host_initiated))
|
||||
if (!is_vmx_feature_control_msr_valid(vmx, msr_info))
|
||||
return 1;
|
||||
|
||||
vmx->msr_ia32_feature_control = data;
|
||||
if (msr_info->host_initiated && data == 0)
|
||||
vmx_leave_nested(vcpu);
|
||||
|
@ -11,14 +11,28 @@
|
||||
#include "../x86.h"
|
||||
|
||||
void vmread_error(unsigned long field, bool fault);
|
||||
__attribute__((regparm(0))) void vmread_error_trampoline(unsigned long field,
|
||||
bool fault);
|
||||
void vmwrite_error(unsigned long field, unsigned long value);
|
||||
void vmclear_error(struct vmcs *vmcs, u64 phys_addr);
|
||||
void vmptrld_error(struct vmcs *vmcs, u64 phys_addr);
|
||||
void invvpid_error(unsigned long ext, u16 vpid, gva_t gva);
|
||||
void invept_error(unsigned long ext, u64 eptp, gpa_t gpa);
|
||||
|
||||
#ifndef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
|
||||
/*
|
||||
* The VMREAD error trampoline _always_ uses the stack to pass parameters, even
|
||||
* for 64-bit targets. Preserving all registers allows the VMREAD inline asm
|
||||
* blob to avoid clobbering GPRs, which in turn allows the compiler to better
|
||||
* optimize sequences of VMREADs.
|
||||
*
|
||||
* Declare the trampoline as an opaque label as it's not safe to call from C
|
||||
* code; there is no way to tell the compiler to pass params on the stack for
|
||||
* 64-bit targets.
|
||||
*
|
||||
* void vmread_error_trampoline(unsigned long field, bool fault);
|
||||
*/
|
||||
extern unsigned long vmread_error_trampoline;
|
||||
#endif
|
||||
|
||||
static __always_inline void vmcs_check16(unsigned long field)
|
||||
{
|
||||
BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 0x2000,
|
||||
|
@ -2974,6 +2974,22 @@ static void kvm_update_masterclock(struct kvm *kvm)
|
||||
kvm_end_pvclock_update(kvm);
|
||||
}
|
||||
|
||||
/*
|
||||
* Use the kernel's tsc_khz directly if the TSC is constant, otherwise use KVM's
|
||||
* per-CPU value (which may be zero if a CPU is going offline). Note, tsc_khz
|
||||
* can change during boot even if the TSC is constant, as it's possible for KVM
|
||||
* to be loaded before TSC calibration completes. Ideally, KVM would get a
|
||||
* notification when calibration completes, but practically speaking calibration
|
||||
* will complete before userspace is alive enough to create VMs.
|
||||
*/
|
||||
static unsigned long get_cpu_tsc_khz(void)
|
||||
{
|
||||
if (static_cpu_has(X86_FEATURE_CONSTANT_TSC))
|
||||
return tsc_khz;
|
||||
else
|
||||
return __this_cpu_read(cpu_tsc_khz);
|
||||
}
|
||||
|
||||
/* Called within read_seqcount_begin/retry for kvm->pvclock_sc. */
|
||||
static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
|
||||
{
|
||||
@ -2984,7 +3000,8 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
|
||||
get_cpu();
|
||||
|
||||
data->flags = 0;
|
||||
if (ka->use_master_clock && __this_cpu_read(cpu_tsc_khz)) {
|
||||
if (ka->use_master_clock &&
|
||||
(static_cpu_has(X86_FEATURE_CONSTANT_TSC) || __this_cpu_read(cpu_tsc_khz))) {
|
||||
#ifdef CONFIG_X86_64
|
||||
struct timespec64 ts;
|
||||
|
||||
@ -2998,7 +3015,7 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
|
||||
data->flags |= KVM_CLOCK_TSC_STABLE;
|
||||
hv_clock.tsc_timestamp = ka->master_cycle_now;
|
||||
hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
|
||||
kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
|
||||
kvm_get_time_scale(NSEC_PER_SEC, get_cpu_tsc_khz() * 1000LL,
|
||||
&hv_clock.tsc_shift,
|
||||
&hv_clock.tsc_to_system_mul);
|
||||
data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
|
||||
@ -3108,7 +3125,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
|
||||
|
||||
/* Keep irq disabled to prevent changes to the clock */
|
||||
local_irq_save(flags);
|
||||
tgt_tsc_khz = __this_cpu_read(cpu_tsc_khz);
|
||||
tgt_tsc_khz = get_cpu_tsc_khz();
|
||||
if (unlikely(tgt_tsc_khz == 0)) {
|
||||
local_irq_restore(flags);
|
||||
kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
|
||||
@ -8772,7 +8789,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
write_fault_to_spt,
|
||||
emulation_type))
|
||||
return 1;
|
||||
if (ctxt->have_exception) {
|
||||
|
||||
if (ctxt->have_exception &&
|
||||
!(emulation_type & EMULTYPE_SKIP)) {
|
||||
/*
|
||||
* #UD should result in just EMULATION_FAILED, and trap-like
|
||||
* exception should not be encountered during decode.
|
||||
@ -9036,9 +9055,11 @@ static void tsc_khz_changed(void *data)
|
||||
struct cpufreq_freqs *freq = data;
|
||||
unsigned long khz = 0;
|
||||
|
||||
WARN_ON_ONCE(boot_cpu_has(X86_FEATURE_CONSTANT_TSC));
|
||||
|
||||
if (data)
|
||||
khz = freq->new;
|
||||
else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
|
||||
else
|
||||
khz = cpufreq_quick_get(raw_smp_processor_id());
|
||||
if (!khz)
|
||||
khz = tsc_khz;
|
||||
@ -9059,8 +9080,10 @@ static void kvm_hyperv_tsc_notifier(void)
|
||||
hyperv_stop_tsc_emulation();
|
||||
|
||||
/* TSC frequency always matches when on Hyper-V */
|
||||
for_each_present_cpu(cpu)
|
||||
per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
|
||||
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
|
||||
for_each_present_cpu(cpu)
|
||||
per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
|
||||
}
|
||||
kvm_caps.max_guest_tsc_khz = tsc_khz;
|
||||
|
||||
list_for_each_entry(kvm, &vm_list, vm_list) {
|
||||
@ -9197,10 +9220,10 @@ static void kvm_timer_init(void)
|
||||
}
|
||||
cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
|
||||
CPUFREQ_TRANSITION_NOTIFIER);
|
||||
}
|
||||
|
||||
cpuhp_setup_state(CPUHP_AP_X86_KVM_CLK_ONLINE, "x86/kvm/clk:online",
|
||||
kvmclock_cpu_online, kvmclock_cpu_down_prep);
|
||||
cpuhp_setup_state(CPUHP_AP_X86_KVM_CLK_ONLINE, "x86/kvm/clk:online",
|
||||
kvmclock_cpu_online, kvmclock_cpu_down_prep);
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
@ -9360,10 +9383,11 @@ void kvm_arch_exit(void)
|
||||
#endif
|
||||
kvm_lapic_exit();
|
||||
|
||||
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
|
||||
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
|
||||
cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
|
||||
CPUFREQ_TRANSITION_NOTIFIER);
|
||||
cpuhp_remove_state_nocalls(CPUHP_AP_X86_KVM_CLK_ONLINE);
|
||||
cpuhp_remove_state_nocalls(CPUHP_AP_X86_KVM_CLK_ONLINE);
|
||||
}
|
||||
#ifdef CONFIG_X86_64
|
||||
pvclock_gtod_unregister_notifier(&pvclock_gtod_notifier);
|
||||
irq_work_sync(&pvclock_irq_work);
|
||||
|
@ -1165,8 +1165,8 @@ static bool wait_pending_event(struct kvm_vcpu *vcpu, int nr_ports,
|
||||
bool ret = true;
|
||||
int idx, i;
|
||||
|
||||
read_lock_irqsave(&gpc->lock, flags);
|
||||
idx = srcu_read_lock(&kvm->srcu);
|
||||
read_lock_irqsave(&gpc->lock, flags);
|
||||
if (!kvm_gpc_check(kvm, gpc, gpc->gpa, PAGE_SIZE))
|
||||
goto out_rcu;
|
||||
|
||||
@ -1187,8 +1187,8 @@ static bool wait_pending_event(struct kvm_vcpu *vcpu, int nr_ports,
|
||||
}
|
||||
|
||||
out_rcu:
|
||||
srcu_read_unlock(&kvm->srcu, idx);
|
||||
read_unlock_irqrestore(&gpc->lock, flags);
|
||||
srcu_read_unlock(&kvm->srcu, idx);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
@ -103,6 +103,7 @@ struct kvm_x86_cpu_feature {
|
||||
#define X86_FEATURE_XMM2 KVM_X86_CPU_FEATURE(0x1, 0, EDX, 26)
|
||||
#define X86_FEATURE_FSGSBASE KVM_X86_CPU_FEATURE(0x7, 0, EBX, 0)
|
||||
#define X86_FEATURE_TSC_ADJUST KVM_X86_CPU_FEATURE(0x7, 0, EBX, 1)
|
||||
#define X86_FEATURE_SGX KVM_X86_CPU_FEATURE(0x7, 0, EBX, 2)
|
||||
#define X86_FEATURE_HLE KVM_X86_CPU_FEATURE(0x7, 0, EBX, 4)
|
||||
#define X86_FEATURE_SMEP KVM_X86_CPU_FEATURE(0x7, 0, EBX, 7)
|
||||
#define X86_FEATURE_INVPCID KVM_X86_CPU_FEATURE(0x7, 0, EBX, 10)
|
||||
@ -116,6 +117,7 @@ struct kvm_x86_cpu_feature {
|
||||
#define X86_FEATURE_PKU KVM_X86_CPU_FEATURE(0x7, 0, ECX, 3)
|
||||
#define X86_FEATURE_LA57 KVM_X86_CPU_FEATURE(0x7, 0, ECX, 16)
|
||||
#define X86_FEATURE_RDPID KVM_X86_CPU_FEATURE(0x7, 0, ECX, 22)
|
||||
#define X86_FEATURE_SGX_LC KVM_X86_CPU_FEATURE(0x7, 0, ECX, 30)
|
||||
#define X86_FEATURE_SHSTK KVM_X86_CPU_FEATURE(0x7, 0, ECX, 7)
|
||||
#define X86_FEATURE_IBT KVM_X86_CPU_FEATURE(0x7, 0, EDX, 20)
|
||||
#define X86_FEATURE_AMX_TILE KVM_X86_CPU_FEATURE(0x7, 0, EDX, 24)
|
||||
|
@ -67,6 +67,52 @@ static void vmx_save_restore_msrs_test(struct kvm_vcpu *vcpu)
|
||||
vmx_fixed1_msr_test(vcpu, MSR_IA32_VMX_VMFUNC, -1ull);
|
||||
}
|
||||
|
||||
static void __ia32_feature_control_msr_test(struct kvm_vcpu *vcpu,
|
||||
uint64_t msr_bit,
|
||||
struct kvm_x86_cpu_feature feature)
|
||||
{
|
||||
uint64_t val;
|
||||
|
||||
vcpu_clear_cpuid_feature(vcpu, feature);
|
||||
|
||||
val = vcpu_get_msr(vcpu, MSR_IA32_FEAT_CTL);
|
||||
vcpu_set_msr(vcpu, MSR_IA32_FEAT_CTL, val | msr_bit | FEAT_CTL_LOCKED);
|
||||
vcpu_set_msr(vcpu, MSR_IA32_FEAT_CTL, (val & ~msr_bit) | FEAT_CTL_LOCKED);
|
||||
vcpu_set_msr(vcpu, MSR_IA32_FEAT_CTL, val | msr_bit | FEAT_CTL_LOCKED);
|
||||
vcpu_set_msr(vcpu, MSR_IA32_FEAT_CTL, (val & ~msr_bit) | FEAT_CTL_LOCKED);
|
||||
vcpu_set_msr(vcpu, MSR_IA32_FEAT_CTL, val);
|
||||
|
||||
if (!kvm_cpu_has(feature))
|
||||
return;
|
||||
|
||||
vcpu_set_cpuid_feature(vcpu, feature);
|
||||
}
|
||||
|
||||
static void ia32_feature_control_msr_test(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
uint64_t supported_bits = FEAT_CTL_LOCKED |
|
||||
FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
|
||||
FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX |
|
||||
FEAT_CTL_SGX_LC_ENABLED |
|
||||
FEAT_CTL_SGX_ENABLED |
|
||||
FEAT_CTL_LMCE_ENABLED;
|
||||
int bit, r;
|
||||
|
||||
__ia32_feature_control_msr_test(vcpu, FEAT_CTL_VMX_ENABLED_INSIDE_SMX, X86_FEATURE_SMX);
|
||||
__ia32_feature_control_msr_test(vcpu, FEAT_CTL_VMX_ENABLED_INSIDE_SMX, X86_FEATURE_VMX);
|
||||
__ia32_feature_control_msr_test(vcpu, FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX, X86_FEATURE_VMX);
|
||||
__ia32_feature_control_msr_test(vcpu, FEAT_CTL_SGX_LC_ENABLED, X86_FEATURE_SGX_LC);
|
||||
__ia32_feature_control_msr_test(vcpu, FEAT_CTL_SGX_LC_ENABLED, X86_FEATURE_SGX);
|
||||
__ia32_feature_control_msr_test(vcpu, FEAT_CTL_SGX_ENABLED, X86_FEATURE_SGX);
|
||||
__ia32_feature_control_msr_test(vcpu, FEAT_CTL_LMCE_ENABLED, X86_FEATURE_MCE);
|
||||
|
||||
for_each_clear_bit(bit, &supported_bits, 64) {
|
||||
r = _vcpu_set_msr(vcpu, MSR_IA32_FEAT_CTL, BIT(bit));
|
||||
TEST_ASSERT(r == 0,
|
||||
"Setting reserved bit %d in IA32_FEATURE_CONTROL should fail", bit);
|
||||
}
|
||||
}
|
||||
|
||||
int main(void)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
@ -79,6 +125,7 @@ int main(void)
|
||||
vm = vm_create_with_one_vcpu(&vcpu, NULL);
|
||||
|
||||
vmx_save_restore_msrs_test(vcpu);
|
||||
ia32_feature_control_msr_test(vcpu);
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user