s390:
* More phys_to_virt conversions * Improvement of AP management for VSIE (nested virtualization) ARM64: * Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. * New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. * Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. * A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. * The usual selftest fixes and improvements. KVM x86 changes for 6.4: * Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) * Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool * Move AMD_PSFD to cpufeatures.h and purge KVM's definition * Avoid unnecessary writes+flushes when the guest is only adding new PTEs * Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations * Clean up the range-based flushing APIs * Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry * Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() * Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware * Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features * Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES * Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest x86 AMD: * Add support for virtual NMIs * Fixes for edge cases related to virtual interrupts x86 Intel: * Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() * Fix a bug in emulation of ENCLS in compatibility mode * Allow emulation of NOP and PAUSE for L2 * AMX selftests improvements * Misc cleanups MIPS: * Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: * Drop unnecessary casts from "void *" throughout kvm_main.c * Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: * Fix goof introduced by the conversion to rST -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32 Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw== =S2Hq -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
This commit is contained in:
commit
c8c655c34e
@ -5645,7 +5645,8 @@ with the KVM_XEN_VCPU_GET_ATTR ioctl.
|
||||
};
|
||||
|
||||
Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
|
||||
``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
|
||||
``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned.
|
||||
``length`` must not be bigger than 2^31 - PAGE_SIZE bytes. The ``addr``
|
||||
field must point to a buffer which the tags will be copied to or from.
|
||||
|
||||
``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
|
||||
@ -6029,6 +6030,44 @@ delivery must be provided via the "reg_aen" struct.
|
||||
The "pad" and "reserved" fields may be used for future extensions and should be
|
||||
set to 0s by userspace.
|
||||
|
||||
4.138 KVM_ARM_SET_COUNTER_OFFSET
|
||||
--------------------------------
|
||||
|
||||
:Capability: KVM_CAP_COUNTER_OFFSET
|
||||
:Architectures: arm64
|
||||
:Type: vm ioctl
|
||||
:Parameters: struct kvm_arm_counter_offset (in)
|
||||
:Returns: 0 on success, < 0 on error
|
||||
|
||||
This capability indicates that userspace is able to apply a single VM-wide
|
||||
offset to both the virtual and physical counters as viewed by the guest
|
||||
using the KVM_ARM_SET_CNT_OFFSET ioctl and the following data structure:
|
||||
|
||||
::
|
||||
|
||||
struct kvm_arm_counter_offset {
|
||||
__u64 counter_offset;
|
||||
__u64 reserved;
|
||||
};
|
||||
|
||||
The offset describes a number of counter cycles that are subtracted from
|
||||
both virtual and physical counter views (similar to the effects of the
|
||||
CNTVOFF_EL2 and CNTPOFF_EL2 system registers, but only global). The offset
|
||||
always applies to all vcpus (already created or created after this ioctl)
|
||||
for this VM.
|
||||
|
||||
It is userspace's responsibility to compute the offset based, for example,
|
||||
on previous values of the guest counters.
|
||||
|
||||
Any value other than 0 for the "reserved" field may result in an error
|
||||
(-EINVAL) being returned. This ioctl can also return -EBUSY if any vcpu
|
||||
ioctl is issued concurrently.
|
||||
|
||||
Note that using this ioctl results in KVM ignoring subsequent userspace
|
||||
writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG
|
||||
interface. No error will be returned, but the resulting offset will not be
|
||||
applied.
|
||||
|
||||
5. The kvm_run structure
|
||||
========================
|
||||
|
||||
@ -6218,15 +6257,40 @@ to the byte array.
|
||||
__u64 nr;
|
||||
__u64 args[6];
|
||||
__u64 ret;
|
||||
__u32 longmode;
|
||||
__u32 pad;
|
||||
__u64 flags;
|
||||
} hypercall;
|
||||
|
||||
Unused. This was once used for 'hypercall to userspace'. To implement
|
||||
such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
|
||||
|
||||
It is strongly recommended that userspace use ``KVM_EXIT_IO`` (x86) or
|
||||
``KVM_EXIT_MMIO`` (all except s390) to implement functionality that
|
||||
requires a guest to interact with host userpace.
|
||||
|
||||
.. note:: KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
|
||||
|
||||
For arm64:
|
||||
----------
|
||||
|
||||
SMCCC exits can be enabled depending on the configuration of the SMCCC
|
||||
filter. See the Documentation/virt/kvm/devices/vm.rst
|
||||
``KVM_ARM_SMCCC_FILTER`` for more details.
|
||||
|
||||
``nr`` contains the function ID of the guest's SMCCC call. Userspace is
|
||||
expected to use the ``KVM_GET_ONE_REG`` ioctl to retrieve the call
|
||||
parameters from the vCPU's GPRs.
|
||||
|
||||
Definition of ``flags``:
|
||||
- ``KVM_HYPERCALL_EXIT_SMC``: Indicates that the guest used the SMC
|
||||
conduit to initiate the SMCCC call. If this bit is 0 then the guest
|
||||
used the HVC conduit for the SMCCC call.
|
||||
|
||||
- ``KVM_HYPERCALL_EXIT_16BIT``: Indicates that the guest used a 16bit
|
||||
instruction to initiate the SMCCC call. If this bit is 0 then the
|
||||
guest used a 32bit instruction. An AArch64 guest always has this
|
||||
bit set to 0.
|
||||
|
||||
At the point of exit, PC points to the instruction immediately following
|
||||
the trapping instruction.
|
||||
|
||||
::
|
||||
|
||||
/* KVM_EXIT_TPR_ACCESS */
|
||||
@ -7266,6 +7330,7 @@ and injected exceptions.
|
||||
will clear DR6.RTM.
|
||||
|
||||
7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
|
||||
--------------------------------------
|
||||
|
||||
:Architectures: x86, arm64, mips
|
||||
:Parameters: args[0] whether feature should be enabled or not
|
||||
|
@ -321,3 +321,82 @@ Allows userspace to query the status of migration mode.
|
||||
if it is enabled
|
||||
:Returns: -EFAULT if the given address is not accessible from kernel space;
|
||||
0 in case of success.
|
||||
|
||||
6. GROUP: KVM_ARM_VM_SMCCC_CTRL
|
||||
===============================
|
||||
|
||||
:Architectures: arm64
|
||||
|
||||
6.1. ATTRIBUTE: KVM_ARM_VM_SMCCC_FILTER (w/o)
|
||||
---------------------------------------------
|
||||
|
||||
:Parameters: Pointer to a ``struct kvm_smccc_filter``
|
||||
|
||||
:Returns:
|
||||
|
||||
====== ===========================================
|
||||
EEXIST Range intersects with a previously inserted
|
||||
or reserved range
|
||||
EBUSY A vCPU in the VM has already run
|
||||
EINVAL Invalid filter configuration
|
||||
ENOMEM Failed to allocate memory for the in-kernel
|
||||
representation of the SMCCC filter
|
||||
====== ===========================================
|
||||
|
||||
Requests the installation of an SMCCC call filter described as follows::
|
||||
|
||||
enum kvm_smccc_filter_action {
|
||||
KVM_SMCCC_FILTER_HANDLE = 0,
|
||||
KVM_SMCCC_FILTER_DENY,
|
||||
KVM_SMCCC_FILTER_FWD_TO_USER,
|
||||
};
|
||||
|
||||
struct kvm_smccc_filter {
|
||||
__u32 base;
|
||||
__u32 nr_functions;
|
||||
__u8 action;
|
||||
__u8 pad[15];
|
||||
};
|
||||
|
||||
The filter is defined as a set of non-overlapping ranges. Each
|
||||
range defines an action to be applied to SMCCC calls within the range.
|
||||
Userspace can insert multiple ranges into the filter by using
|
||||
successive calls to this attribute.
|
||||
|
||||
The default configuration of KVM is such that all implemented SMCCC
|
||||
calls are allowed. Thus, the SMCCC filter can be defined sparsely
|
||||
by userspace, only describing ranges that modify the default behavior.
|
||||
|
||||
The range expressed by ``struct kvm_smccc_filter`` is
|
||||
[``base``, ``base + nr_functions``). The range is not allowed to wrap,
|
||||
i.e. userspace cannot rely on ``base + nr_functions`` overflowing.
|
||||
|
||||
The SMCCC filter applies to both SMC and HVC calls initiated by the
|
||||
guest. The SMCCC filter gates the in-kernel emulation of SMCCC calls
|
||||
and as such takes effect before other interfaces that interact with
|
||||
SMCCC calls (e.g. hypercall bitmap registers).
|
||||
|
||||
Actions:
|
||||
|
||||
- ``KVM_SMCCC_FILTER_HANDLE``: Allows the guest SMCCC call to be
|
||||
handled in-kernel. It is strongly recommended that userspace *not*
|
||||
explicitly describe the allowed SMCCC call ranges.
|
||||
|
||||
- ``KVM_SMCCC_FILTER_DENY``: Rejects the guest SMCCC call in-kernel
|
||||
and returns to the guest.
|
||||
|
||||
- ``KVM_SMCCC_FILTER_FWD_TO_USER``: The guest SMCCC call is forwarded
|
||||
to userspace with an exit reason of ``KVM_EXIT_HYPERCALL``.
|
||||
|
||||
The ``pad`` field is reserved for future use and must be zero. KVM may
|
||||
return ``-EINVAL`` if the field is nonzero.
|
||||
|
||||
KVM reserves the 'Arm Architecture Calls' range of function IDs and
|
||||
will reject attempts to define a filter for any portion of these ranges:
|
||||
|
||||
=========== ===============
|
||||
Start End (inclusive)
|
||||
=========== ===============
|
||||
0x8000_0000 0x8000_FFFF
|
||||
0xC000_0000 0xC000_FFFF
|
||||
=========== ===============
|
||||
|
@ -21,7 +21,7 @@ The acquisition orders for mutexes are as follows:
|
||||
- kvm->mn_active_invalidate_count ensures that pairs of
|
||||
invalidate_range_start() and invalidate_range_end() callbacks
|
||||
use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock
|
||||
are taken on the waiting side in install_new_memslots, so MMU notifiers
|
||||
are taken on the waiting side when modifying memslots, so MMU notifiers
|
||||
must not take either kvm->slots_lock or kvm->slots_arch_lock.
|
||||
|
||||
For SRCU:
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include <linux/types.h>
|
||||
#include <linux/jump_label.h>
|
||||
#include <linux/kvm_types.h>
|
||||
#include <linux/maple_tree.h>
|
||||
#include <linux/percpu.h>
|
||||
#include <linux/psci.h>
|
||||
#include <asm/arch_gicv3.h>
|
||||
@ -199,6 +200,9 @@ struct kvm_arch {
|
||||
/* Mandated version of PSCI */
|
||||
u32 psci_version;
|
||||
|
||||
/* Protects VM-scoped configuration data */
|
||||
struct mutex config_lock;
|
||||
|
||||
/*
|
||||
* If we encounter a data abort without valid instruction syndrome
|
||||
* information, report this to user space. User space can (and
|
||||
@ -221,7 +225,12 @@ struct kvm_arch {
|
||||
#define KVM_ARCH_FLAG_EL1_32BIT 4
|
||||
/* PSCI SYSTEM_SUSPEND enabled for the guest */
|
||||
#define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED 5
|
||||
|
||||
/* VM counter offset */
|
||||
#define KVM_ARCH_FLAG_VM_COUNTER_OFFSET 6
|
||||
/* Timer PPIs made immutable */
|
||||
#define KVM_ARCH_FLAG_TIMER_PPIS_IMMUTABLE 7
|
||||
/* SMCCC filter initialized for the VM */
|
||||
#define KVM_ARCH_FLAG_SMCCC_FILTER_CONFIGURED 8
|
||||
unsigned long flags;
|
||||
|
||||
/*
|
||||
@ -242,6 +251,7 @@ struct kvm_arch {
|
||||
|
||||
/* Hypercall features firmware registers' descriptor */
|
||||
struct kvm_smccc_features smccc_feat;
|
||||
struct maple_tree smccc_filter;
|
||||
|
||||
/*
|
||||
* For an untrusted host VM, 'pkvm.handle' is used to lookup
|
||||
@ -365,6 +375,10 @@ enum vcpu_sysreg {
|
||||
TPIDR_EL2, /* EL2 Software Thread ID Register */
|
||||
CNTHCTL_EL2, /* Counter-timer Hypervisor Control register */
|
||||
SP_EL2, /* EL2 Stack Pointer */
|
||||
CNTHP_CTL_EL2,
|
||||
CNTHP_CVAL_EL2,
|
||||
CNTHV_CTL_EL2,
|
||||
CNTHV_CVAL_EL2,
|
||||
|
||||
NR_SYS_REGS /* Nothing after this line! */
|
||||
};
|
||||
@ -522,6 +536,7 @@ struct kvm_vcpu_arch {
|
||||
|
||||
/* vcpu power state */
|
||||
struct kvm_mp_state mp_state;
|
||||
spinlock_t mp_state_lock;
|
||||
|
||||
/* Cache some mmu pages needed inside spinlock regions */
|
||||
struct kvm_mmu_memory_cache mmu_page_cache;
|
||||
@ -939,6 +954,9 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
|
||||
|
||||
int __init kvm_sys_reg_table_init(void);
|
||||
|
||||
bool lock_all_vcpus(struct kvm *kvm);
|
||||
void unlock_all_vcpus(struct kvm *kvm);
|
||||
|
||||
/* MMIO helpers */
|
||||
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
|
||||
unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
|
||||
@ -1022,8 +1040,10 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
|
||||
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
|
||||
struct kvm_device_attr *attr);
|
||||
|
||||
long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
|
||||
struct kvm_arm_copy_mte_tags *copy_tags);
|
||||
int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
|
||||
struct kvm_arm_copy_mte_tags *copy_tags);
|
||||
int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
|
||||
struct kvm_arm_counter_offset *offset);
|
||||
|
||||
/* Guest/host FPSIMD coordination helpers */
|
||||
int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
|
||||
@ -1078,6 +1098,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
|
||||
(system_supports_32bit_el0() && \
|
||||
!static_branch_unlikely(&arm64_mismatched_32bit_el0))
|
||||
|
||||
#define kvm_vm_has_ran_once(kvm) \
|
||||
(test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &(kvm)->arch.flags))
|
||||
|
||||
int kvm_trng_call(struct kvm_vcpu *vcpu);
|
||||
#ifdef CONFIG_KVM
|
||||
extern phys_addr_t hyp_mem_base;
|
||||
|
@ -63,6 +63,7 @@
|
||||
* specific registers encoded in the instructions).
|
||||
*/
|
||||
.macro kern_hyp_va reg
|
||||
#ifndef __KVM_VHE_HYPERVISOR__
|
||||
alternative_cb ARM64_ALWAYS_SYSTEM, kvm_update_va_mask
|
||||
and \reg, \reg, #1 /* mask with va_mask */
|
||||
ror \reg, \reg, #1 /* rotate to the first tag bit */
|
||||
@ -70,6 +71,7 @@ alternative_cb ARM64_ALWAYS_SYSTEM, kvm_update_va_mask
|
||||
add \reg, \reg, #0, lsl 12 /* insert the top 12 bits of the tag */
|
||||
ror \reg, \reg, #63 /* rotate back */
|
||||
alternative_cb_end
|
||||
#endif
|
||||
.endm
|
||||
|
||||
/*
|
||||
@ -127,6 +129,7 @@ void kvm_apply_hyp_relocations(void);
|
||||
|
||||
static __always_inline unsigned long __kern_hyp_va(unsigned long v)
|
||||
{
|
||||
#ifndef __KVM_VHE_HYPERVISOR__
|
||||
asm volatile(ALTERNATIVE_CB("and %0, %0, #1\n"
|
||||
"ror %0, %0, #1\n"
|
||||
"add %0, %0, #0\n"
|
||||
@ -135,6 +138,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
|
||||
ARM64_ALWAYS_SYSTEM,
|
||||
kvm_update_va_mask)
|
||||
: "+r" (v));
|
||||
#endif
|
||||
return v;
|
||||
}
|
||||
|
||||
|
@ -388,6 +388,7 @@
|
||||
|
||||
#define SYS_CNTFRQ_EL0 sys_reg(3, 3, 14, 0, 0)
|
||||
|
||||
#define SYS_CNTPCT_EL0 sys_reg(3, 3, 14, 0, 1)
|
||||
#define SYS_CNTPCTSS_EL0 sys_reg(3, 3, 14, 0, 5)
|
||||
#define SYS_CNTVCTSS_EL0 sys_reg(3, 3, 14, 0, 6)
|
||||
|
||||
@ -400,7 +401,9 @@
|
||||
|
||||
#define SYS_AARCH32_CNTP_TVAL sys_reg(0, 0, 14, 2, 0)
|
||||
#define SYS_AARCH32_CNTP_CTL sys_reg(0, 0, 14, 2, 1)
|
||||
#define SYS_AARCH32_CNTPCT sys_reg(0, 0, 0, 14, 0)
|
||||
#define SYS_AARCH32_CNTP_CVAL sys_reg(0, 2, 0, 14, 0)
|
||||
#define SYS_AARCH32_CNTPCTSS sys_reg(0, 8, 0, 14, 0)
|
||||
|
||||
#define __PMEV_op2(n) ((n) & 0x7)
|
||||
#define __CNTR_CRm(n) (0x8 | (((n) >> 3) & 0x3))
|
||||
|
@ -198,6 +198,15 @@ struct kvm_arm_copy_mte_tags {
|
||||
__u64 reserved[2];
|
||||
};
|
||||
|
||||
/*
|
||||
* Counter/Timer offset structure. Describe the virtual/physical offset.
|
||||
* To be used with KVM_ARM_SET_COUNTER_OFFSET.
|
||||
*/
|
||||
struct kvm_arm_counter_offset {
|
||||
__u64 counter_offset;
|
||||
__u64 reserved;
|
||||
};
|
||||
|
||||
#define KVM_ARM_TAGS_TO_GUEST 0
|
||||
#define KVM_ARM_TAGS_FROM_GUEST 1
|
||||
|
||||
@ -372,6 +381,10 @@ enum {
|
||||
#endif
|
||||
};
|
||||
|
||||
/* Device Control API on vm fd */
|
||||
#define KVM_ARM_VM_SMCCC_CTRL 0
|
||||
#define KVM_ARM_VM_SMCCC_FILTER 0
|
||||
|
||||
/* Device Control API: ARM VGIC */
|
||||
#define KVM_DEV_ARM_VGIC_GRP_ADDR 0
|
||||
#define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
|
||||
@ -411,6 +424,8 @@ enum {
|
||||
#define KVM_ARM_VCPU_TIMER_CTRL 1
|
||||
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
|
||||
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
|
||||
#define KVM_ARM_VCPU_TIMER_IRQ_HVTIMER 2
|
||||
#define KVM_ARM_VCPU_TIMER_IRQ_HPTIMER 3
|
||||
#define KVM_ARM_VCPU_PVTIME_CTRL 2
|
||||
#define KVM_ARM_VCPU_PVTIME_IPA 0
|
||||
|
||||
@ -469,6 +484,27 @@ enum {
|
||||
/* run->fail_entry.hardware_entry_failure_reason codes. */
|
||||
#define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED (1ULL << 0)
|
||||
|
||||
enum kvm_smccc_filter_action {
|
||||
KVM_SMCCC_FILTER_HANDLE = 0,
|
||||
KVM_SMCCC_FILTER_DENY,
|
||||
KVM_SMCCC_FILTER_FWD_TO_USER,
|
||||
|
||||
#ifdef __KERNEL__
|
||||
NR_SMCCC_FILTER_ACTIONS
|
||||
#endif
|
||||
};
|
||||
|
||||
struct kvm_smccc_filter {
|
||||
__u32 base;
|
||||
__u32 nr_functions;
|
||||
__u8 action;
|
||||
__u8 pad[15];
|
||||
};
|
||||
|
||||
/* arm64-specific KVM_EXIT_HYPERCALL flags */
|
||||
#define KVM_HYPERCALL_EXIT_SMC (1U << 0)
|
||||
#define KVM_HYPERCALL_EXIT_16BIT (1U << 1)
|
||||
|
||||
#endif
|
||||
|
||||
#endif /* __ARM_KVM_H__ */
|
||||
|
@ -2230,6 +2230,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
|
||||
.matches = has_cpuid_feature,
|
||||
ARM64_CPUID_FIELDS(ID_AA64MMFR0_EL1, ECV, IMP)
|
||||
},
|
||||
{
|
||||
.desc = "Enhanced Counter Virtualization (CNTPOFF)",
|
||||
.capability = ARM64_HAS_ECV_CNTPOFF,
|
||||
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
|
||||
.matches = has_cpuid_feature,
|
||||
.sys_reg = SYS_ID_AA64MMFR0_EL1,
|
||||
.field_pos = ID_AA64MMFR0_EL1_ECV_SHIFT,
|
||||
.field_width = 4,
|
||||
.sign = FTR_UNSIGNED,
|
||||
.min_field_value = ID_AA64MMFR0_EL1_ECV_CNTPOFF,
|
||||
},
|
||||
#ifdef CONFIG_ARM64_PAN
|
||||
{
|
||||
.desc = "Privileged Access Never",
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include <asm/arch_timer.h>
|
||||
#include <asm/kvm_emulate.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_nested.h>
|
||||
|
||||
#include <kvm/arm_vgic.h>
|
||||
#include <kvm/arm_arch_timer.h>
|
||||
@ -30,14 +31,11 @@ static u32 host_ptimer_irq_flags;
|
||||
|
||||
static DEFINE_STATIC_KEY_FALSE(has_gic_active_state);
|
||||
|
||||
static const struct kvm_irq_level default_ptimer_irq = {
|
||||
.irq = 30,
|
||||
.level = 1,
|
||||
};
|
||||
|
||||
static const struct kvm_irq_level default_vtimer_irq = {
|
||||
.irq = 27,
|
||||
.level = 1,
|
||||
static const u8 default_ppi[] = {
|
||||
[TIMER_PTIMER] = 30,
|
||||
[TIMER_VTIMER] = 27,
|
||||
[TIMER_HPTIMER] = 26,
|
||||
[TIMER_HVTIMER] = 28,
|
||||
};
|
||||
|
||||
static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
|
||||
@ -51,6 +49,24 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
|
||||
static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
|
||||
struct arch_timer_context *timer,
|
||||
enum kvm_arch_timer_regs treg);
|
||||
static bool kvm_arch_timer_get_input_level(int vintid);
|
||||
|
||||
static struct irq_ops arch_timer_irq_ops = {
|
||||
.get_input_level = kvm_arch_timer_get_input_level,
|
||||
};
|
||||
|
||||
static bool has_cntpoff(void)
|
||||
{
|
||||
return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
|
||||
}
|
||||
|
||||
static int nr_timers(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (!vcpu_has_nv(vcpu))
|
||||
return NR_KVM_EL0_TIMERS;
|
||||
|
||||
return NR_KVM_TIMERS;
|
||||
}
|
||||
|
||||
u32 timer_get_ctl(struct arch_timer_context *ctxt)
|
||||
{
|
||||
@ -61,6 +77,10 @@ u32 timer_get_ctl(struct arch_timer_context *ctxt)
|
||||
return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
|
||||
case TIMER_PTIMER:
|
||||
return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
|
||||
case TIMER_HVTIMER:
|
||||
return __vcpu_sys_reg(vcpu, CNTHV_CTL_EL2);
|
||||
case TIMER_HPTIMER:
|
||||
return __vcpu_sys_reg(vcpu, CNTHP_CTL_EL2);
|
||||
default:
|
||||
WARN_ON(1);
|
||||
return 0;
|
||||
@ -76,6 +96,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
|
||||
return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
|
||||
case TIMER_PTIMER:
|
||||
return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
|
||||
case TIMER_HVTIMER:
|
||||
return __vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2);
|
||||
case TIMER_HPTIMER:
|
||||
return __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
|
||||
default:
|
||||
WARN_ON(1);
|
||||
return 0;
|
||||
@ -84,10 +108,17 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
|
||||
|
||||
static u64 timer_get_offset(struct arch_timer_context *ctxt)
|
||||
{
|
||||
if (ctxt->offset.vm_offset)
|
||||
return *ctxt->offset.vm_offset;
|
||||
u64 offset = 0;
|
||||
|
||||
return 0;
|
||||
if (!ctxt)
|
||||
return 0;
|
||||
|
||||
if (ctxt->offset.vm_offset)
|
||||
offset += *ctxt->offset.vm_offset;
|
||||
if (ctxt->offset.vcpu_offset)
|
||||
offset += *ctxt->offset.vcpu_offset;
|
||||
|
||||
return offset;
|
||||
}
|
||||
|
||||
static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
|
||||
@ -101,6 +132,12 @@ static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
|
||||
case TIMER_PTIMER:
|
||||
__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
|
||||
break;
|
||||
case TIMER_HVTIMER:
|
||||
__vcpu_sys_reg(vcpu, CNTHV_CTL_EL2) = ctl;
|
||||
break;
|
||||
case TIMER_HPTIMER:
|
||||
__vcpu_sys_reg(vcpu, CNTHP_CTL_EL2) = ctl;
|
||||
break;
|
||||
default:
|
||||
WARN_ON(1);
|
||||
}
|
||||
@ -117,6 +154,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
|
||||
case TIMER_PTIMER:
|
||||
__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
|
||||
break;
|
||||
case TIMER_HVTIMER:
|
||||
__vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2) = cval;
|
||||
break;
|
||||
case TIMER_HPTIMER:
|
||||
__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = cval;
|
||||
break;
|
||||
default:
|
||||
WARN_ON(1);
|
||||
}
|
||||
@ -139,13 +182,27 @@ u64 kvm_phys_timer_read(void)
|
||||
|
||||
static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
|
||||
{
|
||||
if (has_vhe()) {
|
||||
if (vcpu_has_nv(vcpu)) {
|
||||
if (is_hyp_ctxt(vcpu)) {
|
||||
map->direct_vtimer = vcpu_hvtimer(vcpu);
|
||||
map->direct_ptimer = vcpu_hptimer(vcpu);
|
||||
map->emul_vtimer = vcpu_vtimer(vcpu);
|
||||
map->emul_ptimer = vcpu_ptimer(vcpu);
|
||||
} else {
|
||||
map->direct_vtimer = vcpu_vtimer(vcpu);
|
||||
map->direct_ptimer = vcpu_ptimer(vcpu);
|
||||
map->emul_vtimer = vcpu_hvtimer(vcpu);
|
||||
map->emul_ptimer = vcpu_hptimer(vcpu);
|
||||
}
|
||||
} else if (has_vhe()) {
|
||||
map->direct_vtimer = vcpu_vtimer(vcpu);
|
||||
map->direct_ptimer = vcpu_ptimer(vcpu);
|
||||
map->emul_vtimer = NULL;
|
||||
map->emul_ptimer = NULL;
|
||||
} else {
|
||||
map->direct_vtimer = vcpu_vtimer(vcpu);
|
||||
map->direct_ptimer = NULL;
|
||||
map->emul_vtimer = NULL;
|
||||
map->emul_ptimer = vcpu_ptimer(vcpu);
|
||||
}
|
||||
|
||||
@ -212,7 +269,7 @@ static u64 kvm_counter_compute_delta(struct arch_timer_context *timer_ctx,
|
||||
ns = cyclecounter_cyc2ns(timecounter->cc,
|
||||
val - now,
|
||||
timecounter->mask,
|
||||
&timecounter->frac);
|
||||
&timer_ctx->ns_frac);
|
||||
return ns;
|
||||
}
|
||||
|
||||
@ -240,8 +297,11 @@ static bool vcpu_has_wfit_active(struct kvm_vcpu *vcpu)
|
||||
|
||||
static u64 wfit_delay_ns(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_context *ctx = vcpu_vtimer(vcpu);
|
||||
u64 val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
|
||||
struct arch_timer_context *ctx;
|
||||
|
||||
ctx = (vcpu_has_nv(vcpu) && is_hyp_ctxt(vcpu)) ? vcpu_hvtimer(vcpu)
|
||||
: vcpu_vtimer(vcpu);
|
||||
|
||||
return kvm_counter_compute_delta(ctx, val);
|
||||
}
|
||||
@ -255,7 +315,7 @@ static u64 kvm_timer_earliest_exp(struct kvm_vcpu *vcpu)
|
||||
u64 min_delta = ULLONG_MAX;
|
||||
int i;
|
||||
|
||||
for (i = 0; i < NR_KVM_TIMERS; i++) {
|
||||
for (i = 0; i < nr_timers(vcpu); i++) {
|
||||
struct arch_timer_context *ctx = &vcpu->arch.timer_cpu.timers[i];
|
||||
|
||||
WARN(ctx->loaded, "timer %d loaded\n", i);
|
||||
@ -338,9 +398,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
|
||||
|
||||
switch (index) {
|
||||
case TIMER_VTIMER:
|
||||
case TIMER_HVTIMER:
|
||||
cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
|
||||
break;
|
||||
case TIMER_PTIMER:
|
||||
case TIMER_HPTIMER:
|
||||
cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
|
||||
break;
|
||||
case NR_KVM_TIMERS:
|
||||
@ -392,12 +454,12 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
|
||||
int ret;
|
||||
|
||||
timer_ctx->irq.level = new_level;
|
||||
trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
|
||||
trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_irq(timer_ctx),
|
||||
timer_ctx->irq.level);
|
||||
|
||||
if (!userspace_irqchip(vcpu->kvm)) {
|
||||
ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
|
||||
timer_ctx->irq.irq,
|
||||
timer_irq(timer_ctx),
|
||||
timer_ctx->irq.level,
|
||||
timer_ctx);
|
||||
WARN_ON(ret);
|
||||
@ -432,6 +494,12 @@ static void set_cntvoff(u64 cntvoff)
|
||||
kvm_call_hyp(__kvm_timer_set_cntvoff, cntvoff);
|
||||
}
|
||||
|
||||
static void set_cntpoff(u64 cntpoff)
|
||||
{
|
||||
if (has_cntpoff())
|
||||
write_sysreg_s(cntpoff, SYS_CNTPOFF_EL2);
|
||||
}
|
||||
|
||||
static void timer_save_state(struct arch_timer_context *ctx)
|
||||
{
|
||||
struct arch_timer_cpu *timer = vcpu_timer(ctx->vcpu);
|
||||
@ -447,7 +515,10 @@ static void timer_save_state(struct arch_timer_context *ctx)
|
||||
goto out;
|
||||
|
||||
switch (index) {
|
||||
u64 cval;
|
||||
|
||||
case TIMER_VTIMER:
|
||||
case TIMER_HVTIMER:
|
||||
timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
|
||||
timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
|
||||
|
||||
@ -473,13 +544,20 @@ static void timer_save_state(struct arch_timer_context *ctx)
|
||||
set_cntvoff(0);
|
||||
break;
|
||||
case TIMER_PTIMER:
|
||||
case TIMER_HPTIMER:
|
||||
timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
|
||||
timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
|
||||
cval = read_sysreg_el0(SYS_CNTP_CVAL);
|
||||
|
||||
if (!has_cntpoff())
|
||||
cval -= timer_get_offset(ctx);
|
||||
|
||||
timer_set_cval(ctx, cval);
|
||||
|
||||
/* Disable the timer */
|
||||
write_sysreg_el0(0, SYS_CNTP_CTL);
|
||||
isb();
|
||||
|
||||
set_cntpoff(0);
|
||||
break;
|
||||
case NR_KVM_TIMERS:
|
||||
BUG();
|
||||
@ -510,6 +588,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
|
||||
!kvm_timer_irq_can_fire(map.direct_ptimer) &&
|
||||
!kvm_timer_irq_can_fire(map.emul_vtimer) &&
|
||||
!kvm_timer_irq_can_fire(map.emul_ptimer) &&
|
||||
!vcpu_has_wfit_active(vcpu))
|
||||
return;
|
||||
@ -543,14 +622,23 @@ static void timer_restore_state(struct arch_timer_context *ctx)
|
||||
goto out;
|
||||
|
||||
switch (index) {
|
||||
u64 cval, offset;
|
||||
|
||||
case TIMER_VTIMER:
|
||||
case TIMER_HVTIMER:
|
||||
set_cntvoff(timer_get_offset(ctx));
|
||||
write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
|
||||
isb();
|
||||
write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
|
||||
break;
|
||||
case TIMER_PTIMER:
|
||||
write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
|
||||
case TIMER_HPTIMER:
|
||||
cval = timer_get_cval(ctx);
|
||||
offset = timer_get_offset(ctx);
|
||||
set_cntpoff(offset);
|
||||
if (!has_cntpoff())
|
||||
cval += offset;
|
||||
write_sysreg_el0(cval, SYS_CNTP_CVAL);
|
||||
isb();
|
||||
write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
|
||||
break;
|
||||
@ -586,7 +674,7 @@ static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
|
||||
kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx);
|
||||
|
||||
if (irqchip_in_kernel(vcpu->kvm))
|
||||
phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq);
|
||||
phys_active = kvm_vgic_map_is_active(vcpu, timer_irq(ctx));
|
||||
|
||||
phys_active |= ctx->irq.level;
|
||||
|
||||
@ -621,6 +709,128 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
|
||||
enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
|
||||
}
|
||||
|
||||
/* If _pred is true, set bit in _set, otherwise set it in _clr */
|
||||
#define assign_clear_set_bit(_pred, _bit, _clr, _set) \
|
||||
do { \
|
||||
if (_pred) \
|
||||
(_set) |= (_bit); \
|
||||
else \
|
||||
(_clr) |= (_bit); \
|
||||
} while (0)
|
||||
|
||||
static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
|
||||
struct timer_map *map)
|
||||
{
|
||||
int hw, ret;
|
||||
|
||||
if (!irqchip_in_kernel(vcpu->kvm))
|
||||
return;
|
||||
|
||||
/*
|
||||
* We only ever unmap the vtimer irq on a VHE system that runs nested
|
||||
* virtualization, in which case we have both a valid emul_vtimer,
|
||||
* emul_ptimer, direct_vtimer, and direct_ptimer.
|
||||
*
|
||||
* Since this is called from kvm_timer_vcpu_load(), a change between
|
||||
* vEL2 and vEL1/0 will have just happened, and the timer_map will
|
||||
* represent this, and therefore we switch the emul/direct mappings
|
||||
* below.
|
||||
*/
|
||||
hw = kvm_vgic_get_map(vcpu, timer_irq(map->direct_vtimer));
|
||||
if (hw < 0) {
|
||||
kvm_vgic_unmap_phys_irq(vcpu, timer_irq(map->emul_vtimer));
|
||||
kvm_vgic_unmap_phys_irq(vcpu, timer_irq(map->emul_ptimer));
|
||||
|
||||
ret = kvm_vgic_map_phys_irq(vcpu,
|
||||
map->direct_vtimer->host_timer_irq,
|
||||
timer_irq(map->direct_vtimer),
|
||||
&arch_timer_irq_ops);
|
||||
WARN_ON_ONCE(ret);
|
||||
ret = kvm_vgic_map_phys_irq(vcpu,
|
||||
map->direct_ptimer->host_timer_irq,
|
||||
timer_irq(map->direct_ptimer),
|
||||
&arch_timer_irq_ops);
|
||||
WARN_ON_ONCE(ret);
|
||||
|
||||
/*
|
||||
* The virtual offset behaviour is "interresting", as it
|
||||
* always applies when HCR_EL2.E2H==0, but only when
|
||||
* accessed from EL1 when HCR_EL2.E2H==1. So make sure we
|
||||
* track E2H when putting the HV timer in "direct" mode.
|
||||
*/
|
||||
if (map->direct_vtimer == vcpu_hvtimer(vcpu)) {
|
||||
struct arch_timer_offset *offs = &map->direct_vtimer->offset;
|
||||
|
||||
if (vcpu_el2_e2h_is_set(vcpu))
|
||||
offs->vcpu_offset = NULL;
|
||||
else
|
||||
offs->vcpu_offset = &__vcpu_sys_reg(vcpu, CNTVOFF_EL2);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static void timer_set_traps(struct kvm_vcpu *vcpu, struct timer_map *map)
|
||||
{
|
||||
bool tpt, tpc;
|
||||
u64 clr, set;
|
||||
|
||||
/*
|
||||
* No trapping gets configured here with nVHE. See
|
||||
* __timer_enable_traps(), which is where the stuff happens.
|
||||
*/
|
||||
if (!has_vhe())
|
||||
return;
|
||||
|
||||
/*
|
||||
* Our default policy is not to trap anything. As we progress
|
||||
* within this function, reality kicks in and we start adding
|
||||
* traps based on emulation requirements.
|
||||
*/
|
||||
tpt = tpc = false;
|
||||
|
||||
/*
|
||||
* We have two possibility to deal with a physical offset:
|
||||
*
|
||||
* - Either we have CNTPOFF (yay!) or the offset is 0:
|
||||
* we let the guest freely access the HW
|
||||
*
|
||||
* - or neither of these condition apply:
|
||||
* we trap accesses to the HW, but still use it
|
||||
* after correcting the physical offset
|
||||
*/
|
||||
if (!has_cntpoff() && timer_get_offset(map->direct_ptimer))
|
||||
tpt = tpc = true;
|
||||
|
||||
/*
|
||||
* Apply the enable bits that the guest hypervisor has requested for
|
||||
* its own guest. We can only add traps that wouldn't have been set
|
||||
* above.
|
||||
*/
|
||||
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) {
|
||||
u64 val = __vcpu_sys_reg(vcpu, CNTHCTL_EL2);
|
||||
|
||||
/* Use the VHE format for mental sanity */
|
||||
if (!vcpu_el2_e2h_is_set(vcpu))
|
||||
val = (val & (CNTHCTL_EL1PCEN | CNTHCTL_EL1PCTEN)) << 10;
|
||||
|
||||
tpt |= !(val & (CNTHCTL_EL1PCEN << 10));
|
||||
tpc |= !(val & (CNTHCTL_EL1PCTEN << 10));
|
||||
}
|
||||
|
||||
/*
|
||||
* Now that we have collected our requirements, compute the
|
||||
* trap and enable bits.
|
||||
*/
|
||||
set = 0;
|
||||
clr = 0;
|
||||
|
||||
assign_clear_set_bit(tpt, CNTHCTL_EL1PCEN << 10, set, clr);
|
||||
assign_clear_set_bit(tpc, CNTHCTL_EL1PCTEN << 10, set, clr);
|
||||
|
||||
/* This only happens on VHE, so use the CNTKCTL_EL1 accessor */
|
||||
sysreg_clear_set(cntkctl_el1, clr, set);
|
||||
}
|
||||
|
||||
void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_cpu *timer = vcpu_timer(vcpu);
|
||||
@ -632,6 +842,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
|
||||
get_timer_map(vcpu, &map);
|
||||
|
||||
if (static_branch_likely(&has_gic_active_state)) {
|
||||
if (vcpu_has_nv(vcpu))
|
||||
kvm_timer_vcpu_load_nested_switch(vcpu, &map);
|
||||
|
||||
kvm_timer_vcpu_load_gic(map.direct_vtimer);
|
||||
if (map.direct_ptimer)
|
||||
kvm_timer_vcpu_load_gic(map.direct_ptimer);
|
||||
@ -644,9 +857,12 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
|
||||
timer_restore_state(map.direct_vtimer);
|
||||
if (map.direct_ptimer)
|
||||
timer_restore_state(map.direct_ptimer);
|
||||
|
||||
if (map.emul_vtimer)
|
||||
timer_emulate(map.emul_vtimer);
|
||||
if (map.emul_ptimer)
|
||||
timer_emulate(map.emul_ptimer);
|
||||
|
||||
timer_set_traps(vcpu, &map);
|
||||
}
|
||||
|
||||
bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
|
||||
@ -689,6 +905,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
|
||||
* In any case, we re-schedule the hrtimer for the physical timer when
|
||||
* coming back to the VCPU thread in kvm_timer_vcpu_load().
|
||||
*/
|
||||
if (map.emul_vtimer)
|
||||
soft_timer_cancel(&map.emul_vtimer->hrtimer);
|
||||
if (map.emul_ptimer)
|
||||
soft_timer_cancel(&map.emul_ptimer->hrtimer);
|
||||
|
||||
@ -738,56 +956,89 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
|
||||
* resets the timer to be disabled and unmasked and is compliant with
|
||||
* the ARMv7 architecture.
|
||||
*/
|
||||
timer_set_ctl(vcpu_vtimer(vcpu), 0);
|
||||
timer_set_ctl(vcpu_ptimer(vcpu), 0);
|
||||
for (int i = 0; i < nr_timers(vcpu); i++)
|
||||
timer_set_ctl(vcpu_get_timer(vcpu, i), 0);
|
||||
|
||||
/*
|
||||
* A vcpu running at EL2 is in charge of the offset applied to
|
||||
* the virtual timer, so use the physical VM offset, and point
|
||||
* the vcpu offset to CNTVOFF_EL2.
|
||||
*/
|
||||
if (vcpu_has_nv(vcpu)) {
|
||||
struct arch_timer_offset *offs = &vcpu_vtimer(vcpu)->offset;
|
||||
|
||||
offs->vcpu_offset = &__vcpu_sys_reg(vcpu, CNTVOFF_EL2);
|
||||
offs->vm_offset = &vcpu->kvm->arch.timer_data.poffset;
|
||||
}
|
||||
|
||||
if (timer->enabled) {
|
||||
kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
|
||||
kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
|
||||
for (int i = 0; i < nr_timers(vcpu); i++)
|
||||
kvm_timer_update_irq(vcpu, false,
|
||||
vcpu_get_timer(vcpu, i));
|
||||
|
||||
if (irqchip_in_kernel(vcpu->kvm)) {
|
||||
kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
|
||||
kvm_vgic_reset_mapped_irq(vcpu, timer_irq(map.direct_vtimer));
|
||||
if (map.direct_ptimer)
|
||||
kvm_vgic_reset_mapped_irq(vcpu, map.direct_ptimer->irq.irq);
|
||||
kvm_vgic_reset_mapped_irq(vcpu, timer_irq(map.direct_ptimer));
|
||||
}
|
||||
}
|
||||
|
||||
if (map.emul_vtimer)
|
||||
soft_timer_cancel(&map.emul_vtimer->hrtimer);
|
||||
if (map.emul_ptimer)
|
||||
soft_timer_cancel(&map.emul_ptimer->hrtimer);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
|
||||
{
|
||||
struct arch_timer_context *ctxt = vcpu_get_timer(vcpu, timerid);
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
|
||||
ctxt->vcpu = vcpu;
|
||||
|
||||
if (timerid == TIMER_VTIMER)
|
||||
ctxt->offset.vm_offset = &kvm->arch.timer_data.voffset;
|
||||
else
|
||||
ctxt->offset.vm_offset = &kvm->arch.timer_data.poffset;
|
||||
|
||||
hrtimer_init(&ctxt->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
|
||||
ctxt->hrtimer.function = kvm_hrtimer_expire;
|
||||
|
||||
switch (timerid) {
|
||||
case TIMER_PTIMER:
|
||||
case TIMER_HPTIMER:
|
||||
ctxt->host_timer_irq = host_ptimer_irq;
|
||||
break;
|
||||
case TIMER_VTIMER:
|
||||
case TIMER_HVTIMER:
|
||||
ctxt->host_timer_irq = host_vtimer_irq;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_cpu *timer = vcpu_timer(vcpu);
|
||||
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
|
||||
|
||||
vtimer->vcpu = vcpu;
|
||||
vtimer->offset.vm_offset = &vcpu->kvm->arch.timer_data.voffset;
|
||||
ptimer->vcpu = vcpu;
|
||||
for (int i = 0; i < NR_KVM_TIMERS; i++)
|
||||
timer_context_init(vcpu, i);
|
||||
|
||||
/* Synchronize cntvoff across all vtimers of a VM. */
|
||||
timer_set_offset(vtimer, kvm_phys_timer_read());
|
||||
timer_set_offset(ptimer, 0);
|
||||
/* Synchronize offsets across timers of a VM if not already provided */
|
||||
if (!test_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &vcpu->kvm->arch.flags)) {
|
||||
timer_set_offset(vcpu_vtimer(vcpu), kvm_phys_timer_read());
|
||||
timer_set_offset(vcpu_ptimer(vcpu), 0);
|
||||
}
|
||||
|
||||
hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
|
||||
timer->bg_timer.function = kvm_bg_timer_expire;
|
||||
}
|
||||
|
||||
hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
|
||||
hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
|
||||
vtimer->hrtimer.function = kvm_hrtimer_expire;
|
||||
ptimer->hrtimer.function = kvm_hrtimer_expire;
|
||||
|
||||
vtimer->irq.irq = default_vtimer_irq.irq;
|
||||
ptimer->irq.irq = default_ptimer_irq.irq;
|
||||
|
||||
vtimer->host_timer_irq = host_vtimer_irq;
|
||||
ptimer->host_timer_irq = host_ptimer_irq;
|
||||
|
||||
vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
|
||||
ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
|
||||
void kvm_timer_init_vm(struct kvm *kvm)
|
||||
{
|
||||
for (int i = 0; i < NR_KVM_TIMERS; i++)
|
||||
kvm->arch.timer_data.ppi[i] = default_ppi[i];
|
||||
}
|
||||
|
||||
void kvm_timer_cpu_up(void)
|
||||
@ -814,8 +1065,11 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
|
||||
kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
|
||||
break;
|
||||
case KVM_REG_ARM_TIMER_CNT:
|
||||
timer = vcpu_vtimer(vcpu);
|
||||
timer_set_offset(timer, kvm_phys_timer_read() - value);
|
||||
if (!test_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET,
|
||||
&vcpu->kvm->arch.flags)) {
|
||||
timer = vcpu_vtimer(vcpu);
|
||||
timer_set_offset(timer, kvm_phys_timer_read() - value);
|
||||
}
|
||||
break;
|
||||
case KVM_REG_ARM_TIMER_CVAL:
|
||||
timer = vcpu_vtimer(vcpu);
|
||||
@ -825,6 +1079,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
|
||||
timer = vcpu_ptimer(vcpu);
|
||||
kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
|
||||
break;
|
||||
case KVM_REG_ARM_PTIMER_CNT:
|
||||
if (!test_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET,
|
||||
&vcpu->kvm->arch.flags)) {
|
||||
timer = vcpu_ptimer(vcpu);
|
||||
timer_set_offset(timer, kvm_phys_timer_read() - value);
|
||||
}
|
||||
break;
|
||||
case KVM_REG_ARM_PTIMER_CVAL:
|
||||
timer = vcpu_ptimer(vcpu);
|
||||
kvm_arm_timer_write(vcpu, timer, TIMER_REG_CVAL, value);
|
||||
@ -902,6 +1163,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
|
||||
val = kvm_phys_timer_read() - timer_get_offset(timer);
|
||||
break;
|
||||
|
||||
case TIMER_REG_VOFF:
|
||||
val = *timer->offset.vcpu_offset;
|
||||
break;
|
||||
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
@ -920,7 +1185,7 @@ u64 kvm_arm_timer_read_sysreg(struct kvm_vcpu *vcpu,
|
||||
get_timer_map(vcpu, &map);
|
||||
timer = vcpu_get_timer(vcpu, tmr);
|
||||
|
||||
if (timer == map.emul_ptimer)
|
||||
if (timer == map.emul_vtimer || timer == map.emul_ptimer)
|
||||
return kvm_arm_timer_read(vcpu, timer, treg);
|
||||
|
||||
preempt_disable();
|
||||
@ -952,6 +1217,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
|
||||
timer_set_cval(timer, val);
|
||||
break;
|
||||
|
||||
case TIMER_REG_VOFF:
|
||||
*timer->offset.vcpu_offset = val;
|
||||
break;
|
||||
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
@ -967,7 +1236,7 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
|
||||
|
||||
get_timer_map(vcpu, &map);
|
||||
timer = vcpu_get_timer(vcpu, tmr);
|
||||
if (timer == map.emul_ptimer) {
|
||||
if (timer == map.emul_vtimer || timer == map.emul_ptimer) {
|
||||
soft_timer_cancel(&timer->hrtimer);
|
||||
kvm_arm_timer_write(vcpu, timer, treg, val);
|
||||
timer_emulate(timer);
|
||||
@ -1047,10 +1316,6 @@ static const struct irq_domain_ops timer_domain_ops = {
|
||||
.free = timer_irq_domain_free,
|
||||
};
|
||||
|
||||
static struct irq_ops arch_timer_irq_ops = {
|
||||
.get_input_level = kvm_arch_timer_get_input_level,
|
||||
};
|
||||
|
||||
static void kvm_irq_fixup_flags(unsigned int virq, u32 *flags)
|
||||
{
|
||||
*flags = irq_get_trigger_type(virq);
|
||||
@ -1192,44 +1457,56 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
|
||||
|
||||
static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
int vtimer_irq, ptimer_irq, ret;
|
||||
unsigned long i;
|
||||
u32 ppis = 0;
|
||||
bool valid;
|
||||
|
||||
vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
|
||||
ret = kvm_vgic_set_owner(vcpu, vtimer_irq, vcpu_vtimer(vcpu));
|
||||
if (ret)
|
||||
return false;
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
|
||||
ptimer_irq = vcpu_ptimer(vcpu)->irq.irq;
|
||||
ret = kvm_vgic_set_owner(vcpu, ptimer_irq, vcpu_ptimer(vcpu));
|
||||
if (ret)
|
||||
return false;
|
||||
for (int i = 0; i < nr_timers(vcpu); i++) {
|
||||
struct arch_timer_context *ctx;
|
||||
int irq;
|
||||
|
||||
kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
|
||||
if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
|
||||
vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
|
||||
return false;
|
||||
ctx = vcpu_get_timer(vcpu, i);
|
||||
irq = timer_irq(ctx);
|
||||
if (kvm_vgic_set_owner(vcpu, irq, ctx))
|
||||
break;
|
||||
|
||||
/*
|
||||
* We know by construction that we only have PPIs, so
|
||||
* all values are less than 32.
|
||||
*/
|
||||
ppis |= BIT(irq);
|
||||
}
|
||||
|
||||
return true;
|
||||
valid = hweight32(ppis) == nr_timers(vcpu);
|
||||
|
||||
if (valid)
|
||||
set_bit(KVM_ARCH_FLAG_TIMER_PPIS_IMMUTABLE, &vcpu->kvm->arch.flags);
|
||||
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
|
||||
return valid;
|
||||
}
|
||||
|
||||
bool kvm_arch_timer_get_input_level(int vintid)
|
||||
static bool kvm_arch_timer_get_input_level(int vintid)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
|
||||
struct arch_timer_context *timer;
|
||||
|
||||
if (WARN(!vcpu, "No vcpu context!\n"))
|
||||
return false;
|
||||
|
||||
if (vintid == vcpu_vtimer(vcpu)->irq.irq)
|
||||
timer = vcpu_vtimer(vcpu);
|
||||
else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
|
||||
timer = vcpu_ptimer(vcpu);
|
||||
else
|
||||
BUG();
|
||||
for (int i = 0; i < nr_timers(vcpu); i++) {
|
||||
struct arch_timer_context *ctx;
|
||||
|
||||
return kvm_timer_should_fire(timer);
|
||||
ctx = vcpu_get_timer(vcpu, i);
|
||||
if (timer_irq(ctx) == vintid)
|
||||
return kvm_timer_should_fire(ctx);
|
||||
}
|
||||
|
||||
/* A timer IRQ has fired, but no matching timer was found? */
|
||||
WARN_RATELIMIT(1, "timer INTID%d unknown\n", vintid);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
||||
@ -1258,7 +1535,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
||||
|
||||
ret = kvm_vgic_map_phys_irq(vcpu,
|
||||
map.direct_vtimer->host_timer_irq,
|
||||
map.direct_vtimer->irq.irq,
|
||||
timer_irq(map.direct_vtimer),
|
||||
&arch_timer_irq_ops);
|
||||
if (ret)
|
||||
return ret;
|
||||
@ -1266,7 +1543,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
||||
if (map.direct_ptimer) {
|
||||
ret = kvm_vgic_map_phys_irq(vcpu,
|
||||
map.direct_ptimer->host_timer_irq,
|
||||
map.direct_ptimer->irq.irq,
|
||||
timer_irq(map.direct_ptimer),
|
||||
&arch_timer_irq_ops);
|
||||
}
|
||||
|
||||
@ -1278,45 +1555,17 @@ no_vgic:
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* On VHE system, we only need to configure the EL2 timer trap register once,
|
||||
* not for every world switch.
|
||||
* The host kernel runs at EL2 with HCR_EL2.TGE == 1,
|
||||
* and this makes those bits have no effect for the host kernel execution.
|
||||
*/
|
||||
/* If we have CNTPOFF, permanently set ECV to enable it */
|
||||
void kvm_timer_init_vhe(void)
|
||||
{
|
||||
/* When HCR_EL2.E2H ==1, EL1PCEN and EL1PCTEN are shifted by 10 */
|
||||
u32 cnthctl_shift = 10;
|
||||
u64 val;
|
||||
|
||||
/*
|
||||
* VHE systems allow the guest direct access to the EL1 physical
|
||||
* timer/counter.
|
||||
*/
|
||||
val = read_sysreg(cnthctl_el2);
|
||||
val |= (CNTHCTL_EL1PCEN << cnthctl_shift);
|
||||
val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
|
||||
write_sysreg(val, cnthctl_el2);
|
||||
}
|
||||
|
||||
static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
unsigned long i;
|
||||
|
||||
kvm_for_each_vcpu(i, vcpu, kvm) {
|
||||
vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
|
||||
vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
|
||||
}
|
||||
if (cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF))
|
||||
sysreg_clear_set(cntkctl_el1, 0, CNTHCTL_ECV);
|
||||
}
|
||||
|
||||
int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||
{
|
||||
int __user *uaddr = (int __user *)(long)attr->addr;
|
||||
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
|
||||
int irq;
|
||||
int irq, idx, ret = 0;
|
||||
|
||||
if (!irqchip_in_kernel(vcpu->kvm))
|
||||
return -EINVAL;
|
||||
@ -1327,21 +1576,42 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||
if (!(irq_is_ppi(irq)))
|
||||
return -EINVAL;
|
||||
|
||||
if (vcpu->arch.timer_cpu.enabled)
|
||||
return -EBUSY;
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
|
||||
if (test_bit(KVM_ARCH_FLAG_TIMER_PPIS_IMMUTABLE,
|
||||
&vcpu->kvm->arch.flags)) {
|
||||
ret = -EBUSY;
|
||||
goto out;
|
||||
}
|
||||
|
||||
switch (attr->attr) {
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
|
||||
set_timer_irqs(vcpu->kvm, irq, ptimer->irq.irq);
|
||||
idx = TIMER_VTIMER;
|
||||
break;
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
|
||||
set_timer_irqs(vcpu->kvm, vtimer->irq.irq, irq);
|
||||
idx = TIMER_PTIMER;
|
||||
break;
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_HVTIMER:
|
||||
idx = TIMER_HVTIMER;
|
||||
break;
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_HPTIMER:
|
||||
idx = TIMER_HPTIMER;
|
||||
break;
|
||||
default:
|
||||
return -ENXIO;
|
||||
ret = -ENXIO;
|
||||
goto out;
|
||||
}
|
||||
|
||||
return 0;
|
||||
/*
|
||||
* We cannot validate the IRQ unicity before we run, so take it at
|
||||
* face value. The verdict will be given on first vcpu run, for each
|
||||
* vcpu. Yes this is late. Blame it on the stupid API.
|
||||
*/
|
||||
vcpu->kvm->arch.timer_data.ppi[idx] = irq;
|
||||
|
||||
out:
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||
@ -1357,11 +1627,17 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
|
||||
timer = vcpu_ptimer(vcpu);
|
||||
break;
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_HVTIMER:
|
||||
timer = vcpu_hvtimer(vcpu);
|
||||
break;
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_HPTIMER:
|
||||
timer = vcpu_hptimer(vcpu);
|
||||
break;
|
||||
default:
|
||||
return -ENXIO;
|
||||
}
|
||||
|
||||
irq = timer->irq.irq;
|
||||
irq = timer_irq(timer);
|
||||
return put_user(irq, uaddr);
|
||||
}
|
||||
|
||||
@ -1370,8 +1646,42 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||
switch (attr->attr) {
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_HVTIMER:
|
||||
case KVM_ARM_VCPU_TIMER_IRQ_HPTIMER:
|
||||
return 0;
|
||||
}
|
||||
|
||||
return -ENXIO;
|
||||
}
|
||||
|
||||
int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
|
||||
struct kvm_arm_counter_offset *offset)
|
||||
{
|
||||
int ret = 0;
|
||||
|
||||
if (offset->reserved)
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
|
||||
if (lock_all_vcpus(kvm)) {
|
||||
set_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &kvm->arch.flags);
|
||||
|
||||
/*
|
||||
* If userspace decides to set the offset using this
|
||||
* API rather than merely restoring the counter
|
||||
* values, the offset applies to both the virtual and
|
||||
* physical views.
|
||||
*/
|
||||
kvm->arch.timer_data.voffset = offset->counter_offset;
|
||||
kvm->arch.timer_data.poffset = offset->counter_offset;
|
||||
|
||||
unlock_all_vcpus(kvm);
|
||||
} else {
|
||||
ret = -EBUSY;
|
||||
}
|
||||
|
||||
mutex_unlock(&kvm->lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
@ -126,6 +126,16 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
||||
{
|
||||
int ret;
|
||||
|
||||
mutex_init(&kvm->arch.config_lock);
|
||||
|
||||
#ifdef CONFIG_LOCKDEP
|
||||
/* Clue in lockdep that the config_lock must be taken inside kvm->lock */
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
mutex_unlock(&kvm->lock);
|
||||
#endif
|
||||
|
||||
ret = kvm_share_hyp(kvm, kvm + 1);
|
||||
if (ret)
|
||||
return ret;
|
||||
@ -146,6 +156,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
||||
|
||||
kvm_vgic_early_init(kvm);
|
||||
|
||||
kvm_timer_init_vm(kvm);
|
||||
|
||||
/* The maximum number of VCPUs is limited by the host's GIC model */
|
||||
kvm->max_vcpus = kvm_arm_default_max_vcpus();
|
||||
|
||||
@ -190,6 +202,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
|
||||
kvm_destroy_vcpus(kvm);
|
||||
|
||||
kvm_unshare_hyp(kvm, kvm + 1);
|
||||
|
||||
kvm_arm_teardown_hypercalls(kvm);
|
||||
}
|
||||
|
||||
int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
@ -219,6 +233,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
case KVM_CAP_PTP_KVM:
|
||||
case KVM_CAP_ARM_SYSTEM_SUSPEND:
|
||||
case KVM_CAP_IRQFD_RESAMPLE:
|
||||
case KVM_CAP_COUNTER_OFFSET:
|
||||
r = 1;
|
||||
break;
|
||||
case KVM_CAP_SET_GUEST_DEBUG2:
|
||||
@ -325,6 +340,16 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
int err;
|
||||
|
||||
spin_lock_init(&vcpu->arch.mp_state_lock);
|
||||
|
||||
#ifdef CONFIG_LOCKDEP
|
||||
/* Inform lockdep that the config_lock is acquired after vcpu->mutex */
|
||||
mutex_lock(&vcpu->mutex);
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
mutex_unlock(&vcpu->mutex);
|
||||
#endif
|
||||
|
||||
/* Force users to call KVM_ARM_VCPU_INIT */
|
||||
vcpu->arch.target = -1;
|
||||
bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
|
||||
@ -442,34 +467,41 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
|
||||
vcpu->cpu = -1;
|
||||
}
|
||||
|
||||
void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
|
||||
static void __kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
vcpu->arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
|
||||
WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_STOPPED);
|
||||
kvm_make_request(KVM_REQ_SLEEP, vcpu);
|
||||
kvm_vcpu_kick(vcpu);
|
||||
}
|
||||
|
||||
void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
spin_lock(&vcpu->arch.mp_state_lock);
|
||||
__kvm_arm_vcpu_power_off(vcpu);
|
||||
spin_unlock(&vcpu->arch.mp_state_lock);
|
||||
}
|
||||
|
||||
bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_STOPPED;
|
||||
return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_STOPPED;
|
||||
}
|
||||
|
||||
static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
vcpu->arch.mp_state.mp_state = KVM_MP_STATE_SUSPENDED;
|
||||
WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_SUSPENDED);
|
||||
kvm_make_request(KVM_REQ_SUSPEND, vcpu);
|
||||
kvm_vcpu_kick(vcpu);
|
||||
}
|
||||
|
||||
static bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_SUSPENDED;
|
||||
return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_SUSPENDED;
|
||||
}
|
||||
|
||||
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mp_state *mp_state)
|
||||
{
|
||||
*mp_state = vcpu->arch.mp_state;
|
||||
*mp_state = READ_ONCE(vcpu->arch.mp_state);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -479,12 +511,14 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
|
||||
{
|
||||
int ret = 0;
|
||||
|
||||
spin_lock(&vcpu->arch.mp_state_lock);
|
||||
|
||||
switch (mp_state->mp_state) {
|
||||
case KVM_MP_STATE_RUNNABLE:
|
||||
vcpu->arch.mp_state = *mp_state;
|
||||
WRITE_ONCE(vcpu->arch.mp_state, *mp_state);
|
||||
break;
|
||||
case KVM_MP_STATE_STOPPED:
|
||||
kvm_arm_vcpu_power_off(vcpu);
|
||||
__kvm_arm_vcpu_power_off(vcpu);
|
||||
break;
|
||||
case KVM_MP_STATE_SUSPENDED:
|
||||
kvm_arm_vcpu_suspend(vcpu);
|
||||
@ -493,6 +527,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
|
||||
ret = -EINVAL;
|
||||
}
|
||||
|
||||
spin_unlock(&vcpu->arch.mp_state_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -592,9 +628,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
|
||||
if (kvm_vm_is_protected(kvm))
|
||||
kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
set_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags);
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
@ -1209,10 +1245,14 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
|
||||
/*
|
||||
* Handle the "start in power-off" case.
|
||||
*/
|
||||
spin_lock(&vcpu->arch.mp_state_lock);
|
||||
|
||||
if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
|
||||
kvm_arm_vcpu_power_off(vcpu);
|
||||
__kvm_arm_vcpu_power_off(vcpu);
|
||||
else
|
||||
vcpu->arch.mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
|
||||
WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_RUNNABLE);
|
||||
|
||||
spin_unlock(&vcpu->arch.mp_state_lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -1438,11 +1478,31 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
|
||||
}
|
||||
}
|
||||
|
||||
long kvm_arch_vm_ioctl(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
static int kvm_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||
{
|
||||
switch (attr->group) {
|
||||
case KVM_ARM_VM_SMCCC_CTRL:
|
||||
return kvm_vm_smccc_has_attr(kvm, attr);
|
||||
default:
|
||||
return -ENXIO;
|
||||
}
|
||||
}
|
||||
|
||||
static int kvm_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||
{
|
||||
switch (attr->group) {
|
||||
case KVM_ARM_VM_SMCCC_CTRL:
|
||||
return kvm_vm_smccc_set_attr(kvm, attr);
|
||||
default:
|
||||
return -ENXIO;
|
||||
}
|
||||
}
|
||||
|
||||
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
|
||||
{
|
||||
struct kvm *kvm = filp->private_data;
|
||||
void __user *argp = (void __user *)arg;
|
||||
struct kvm_device_attr attr;
|
||||
|
||||
switch (ioctl) {
|
||||
case KVM_CREATE_IRQCHIP: {
|
||||
@ -1478,11 +1538,73 @@ long kvm_arch_vm_ioctl(struct file *filp,
|
||||
return -EFAULT;
|
||||
return kvm_vm_ioctl_mte_copy_tags(kvm, ©_tags);
|
||||
}
|
||||
case KVM_ARM_SET_COUNTER_OFFSET: {
|
||||
struct kvm_arm_counter_offset offset;
|
||||
|
||||
if (copy_from_user(&offset, argp, sizeof(offset)))
|
||||
return -EFAULT;
|
||||
return kvm_vm_ioctl_set_counter_offset(kvm, &offset);
|
||||
}
|
||||
case KVM_HAS_DEVICE_ATTR: {
|
||||
if (copy_from_user(&attr, argp, sizeof(attr)))
|
||||
return -EFAULT;
|
||||
|
||||
return kvm_vm_has_attr(kvm, &attr);
|
||||
}
|
||||
case KVM_SET_DEVICE_ATTR: {
|
||||
if (copy_from_user(&attr, argp, sizeof(attr)))
|
||||
return -EFAULT;
|
||||
|
||||
return kvm_vm_set_attr(kvm, &attr);
|
||||
}
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
/* unlocks vcpus from @vcpu_lock_idx and smaller */
|
||||
static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
|
||||
{
|
||||
struct kvm_vcpu *tmp_vcpu;
|
||||
|
||||
for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
|
||||
tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
|
||||
mutex_unlock(&tmp_vcpu->mutex);
|
||||
}
|
||||
}
|
||||
|
||||
void unlock_all_vcpus(struct kvm *kvm)
|
||||
{
|
||||
lockdep_assert_held(&kvm->lock);
|
||||
|
||||
unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
|
||||
}
|
||||
|
||||
/* Returns true if all vcpus were locked, false otherwise */
|
||||
bool lock_all_vcpus(struct kvm *kvm)
|
||||
{
|
||||
struct kvm_vcpu *tmp_vcpu;
|
||||
unsigned long c;
|
||||
|
||||
lockdep_assert_held(&kvm->lock);
|
||||
|
||||
/*
|
||||
* Any time a vcpu is in an ioctl (including running), the
|
||||
* core KVM code tries to grab the vcpu->mutex.
|
||||
*
|
||||
* By grabbing the vcpu->mutex of all VCPUs we ensure that no
|
||||
* other VCPUs can fiddle with the state while we access it.
|
||||
*/
|
||||
kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
|
||||
if (!mutex_trylock(&tmp_vcpu->mutex)) {
|
||||
unlock_vcpus(kvm, c - 1);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static unsigned long nvhe_percpu_size(void)
|
||||
{
|
||||
return (unsigned long)CHOOSE_NVHE_SYM(__per_cpu_end) -
|
||||
|
@ -590,11 +590,16 @@ static unsigned long num_core_regs(const struct kvm_vcpu *vcpu)
|
||||
return copy_core_reg_indices(vcpu, NULL);
|
||||
}
|
||||
|
||||
/**
|
||||
* ARM64 versions of the TIMER registers, always available on arm64
|
||||
*/
|
||||
static const u64 timer_reg_list[] = {
|
||||
KVM_REG_ARM_TIMER_CTL,
|
||||
KVM_REG_ARM_TIMER_CNT,
|
||||
KVM_REG_ARM_TIMER_CVAL,
|
||||
KVM_REG_ARM_PTIMER_CTL,
|
||||
KVM_REG_ARM_PTIMER_CNT,
|
||||
KVM_REG_ARM_PTIMER_CVAL,
|
||||
};
|
||||
|
||||
#define NUM_TIMER_REGS 3
|
||||
#define NUM_TIMER_REGS ARRAY_SIZE(timer_reg_list)
|
||||
|
||||
static bool is_timer_reg(u64 index)
|
||||
{
|
||||
@ -602,6 +607,9 @@ static bool is_timer_reg(u64 index)
|
||||
case KVM_REG_ARM_TIMER_CTL:
|
||||
case KVM_REG_ARM_TIMER_CNT:
|
||||
case KVM_REG_ARM_TIMER_CVAL:
|
||||
case KVM_REG_ARM_PTIMER_CTL:
|
||||
case KVM_REG_ARM_PTIMER_CNT:
|
||||
case KVM_REG_ARM_PTIMER_CVAL:
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
@ -609,14 +617,11 @@ static bool is_timer_reg(u64 index)
|
||||
|
||||
static int copy_timer_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
|
||||
{
|
||||
if (put_user(KVM_REG_ARM_TIMER_CTL, uindices))
|
||||
return -EFAULT;
|
||||
uindices++;
|
||||
if (put_user(KVM_REG_ARM_TIMER_CNT, uindices))
|
||||
return -EFAULT;
|
||||
uindices++;
|
||||
if (put_user(KVM_REG_ARM_TIMER_CVAL, uindices))
|
||||
return -EFAULT;
|
||||
for (int i = 0; i < NUM_TIMER_REGS; i++) {
|
||||
if (put_user(timer_reg_list[i], uindices))
|
||||
return -EFAULT;
|
||||
uindices++;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -957,7 +962,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
|
||||
|
||||
switch (attr->group) {
|
||||
case KVM_ARM_VCPU_PMU_V3_CTRL:
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
ret = kvm_arm_pmu_v3_set_attr(vcpu, attr);
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
break;
|
||||
case KVM_ARM_VCPU_TIMER_CTRL:
|
||||
ret = kvm_arm_timer_set_attr(vcpu, attr);
|
||||
@ -1019,8 +1026,8 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
|
||||
return ret;
|
||||
}
|
||||
|
||||
long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
|
||||
struct kvm_arm_copy_mte_tags *copy_tags)
|
||||
int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
|
||||
struct kvm_arm_copy_mte_tags *copy_tags)
|
||||
{
|
||||
gpa_t guest_ipa = copy_tags->guest_ipa;
|
||||
size_t length = copy_tags->length;
|
||||
@ -1041,6 +1048,10 @@ long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
|
||||
if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
/* Lengths above INT_MAX cannot be represented in the return value */
|
||||
if (length > INT_MAX)
|
||||
return -EINVAL;
|
||||
|
||||
gfn = gpa_to_gfn(guest_ipa);
|
||||
|
||||
mutex_lock(&kvm->slots_lock);
|
||||
|
@ -36,8 +36,6 @@ static void kvm_handle_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
|
||||
|
||||
static int handle_hvc(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
int ret;
|
||||
|
||||
trace_kvm_hvc_arm64(*vcpu_pc(vcpu), vcpu_get_reg(vcpu, 0),
|
||||
kvm_vcpu_hvc_get_imm(vcpu));
|
||||
vcpu->stat.hvc_exit_stat++;
|
||||
@ -52,33 +50,29 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
|
||||
return 1;
|
||||
}
|
||||
|
||||
ret = kvm_hvc_call_handler(vcpu);
|
||||
if (ret < 0) {
|
||||
vcpu_set_reg(vcpu, 0, ~0UL);
|
||||
return 1;
|
||||
}
|
||||
|
||||
return ret;
|
||||
return kvm_smccc_call_handler(vcpu);
|
||||
}
|
||||
|
||||
static int handle_smc(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
int ret;
|
||||
|
||||
/*
|
||||
* "If an SMC instruction executed at Non-secure EL1 is
|
||||
* trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
|
||||
* Trap exception, not a Secure Monitor Call exception [...]"
|
||||
*
|
||||
* We need to advance the PC after the trap, as it would
|
||||
* otherwise return to the same address...
|
||||
*
|
||||
* Only handle SMCs from the virtual EL2 with an immediate of zero and
|
||||
* skip it otherwise.
|
||||
* otherwise return to the same address. Furthermore, pre-incrementing
|
||||
* the PC before potentially exiting to userspace maintains the same
|
||||
* abstraction for both SMCs and HVCs.
|
||||
*/
|
||||
if (!vcpu_is_el2(vcpu) || kvm_vcpu_hvc_get_imm(vcpu)) {
|
||||
kvm_incr_pc(vcpu);
|
||||
|
||||
/*
|
||||
* SMCs with a nonzero immediate are reserved according to DEN0028E 2.9
|
||||
* "SMC and HVC immediate value".
|
||||
*/
|
||||
if (kvm_vcpu_hvc_get_imm(vcpu)) {
|
||||
vcpu_set_reg(vcpu, 0, ~0UL);
|
||||
kvm_incr_pc(vcpu);
|
||||
return 1;
|
||||
}
|
||||
|
||||
@ -89,13 +83,7 @@ static int handle_smc(struct kvm_vcpu *vcpu)
|
||||
* at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
|
||||
* being treated as UNDEFINED.
|
||||
*/
|
||||
ret = kvm_hvc_call_handler(vcpu);
|
||||
if (ret < 0)
|
||||
vcpu_set_reg(vcpu, 0, ~0UL);
|
||||
|
||||
kvm_incr_pc(vcpu);
|
||||
|
||||
return ret;
|
||||
return kvm_smccc_call_handler(vcpu);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -26,6 +26,7 @@
|
||||
#include <asm/kvm_emulate.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
#include <asm/kvm_nested.h>
|
||||
#include <asm/fpsimd.h>
|
||||
#include <asm/debug-monitors.h>
|
||||
#include <asm/processor.h>
|
||||
@ -326,6 +327,55 @@ static bool kvm_hyp_handle_ptrauth(struct kvm_vcpu *vcpu, u64 *exit_code)
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool kvm_hyp_handle_cntpct(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_context *ctxt;
|
||||
u32 sysreg;
|
||||
u64 val;
|
||||
|
||||
/*
|
||||
* We only get here for 64bit guests, 32bit guests will hit
|
||||
* the long and winding road all the way to the standard
|
||||
* handling. Yes, it sucks to be irrelevant.
|
||||
*/
|
||||
sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
|
||||
|
||||
switch (sysreg) {
|
||||
case SYS_CNTPCT_EL0:
|
||||
case SYS_CNTPCTSS_EL0:
|
||||
if (vcpu_has_nv(vcpu)) {
|
||||
if (is_hyp_ctxt(vcpu)) {
|
||||
ctxt = vcpu_hptimer(vcpu);
|
||||
break;
|
||||
}
|
||||
|
||||
/* Check for guest hypervisor trapping */
|
||||
val = __vcpu_sys_reg(vcpu, CNTHCTL_EL2);
|
||||
if (!vcpu_el2_e2h_is_set(vcpu))
|
||||
val = (val & CNTHCTL_EL1PCTEN) << 10;
|
||||
|
||||
if (!(val & (CNTHCTL_EL1PCTEN << 10)))
|
||||
return false;
|
||||
}
|
||||
|
||||
ctxt = vcpu_ptimer(vcpu);
|
||||
break;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
|
||||
val = arch_timer_read_cntpct_el0();
|
||||
|
||||
if (ctxt->offset.vm_offset)
|
||||
val -= *kern_hyp_va(ctxt->offset.vm_offset);
|
||||
if (ctxt->offset.vcpu_offset)
|
||||
val -= *kern_hyp_va(ctxt->offset.vcpu_offset);
|
||||
|
||||
vcpu_set_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu), val);
|
||||
__kvm_skip_instr(vcpu);
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code)
|
||||
{
|
||||
if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) &&
|
||||
@ -339,6 +389,9 @@ static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code)
|
||||
if (esr_is_ptrauth_trap(kvm_vcpu_get_esr(vcpu)))
|
||||
return kvm_hyp_handle_ptrauth(vcpu, exit_code);
|
||||
|
||||
if (kvm_hyp_handle_cntpct(vcpu))
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
|
@ -37,7 +37,6 @@ static void __debug_save_spe(u64 *pmscr_el1)
|
||||
|
||||
/* Now drain all buffered data to memory */
|
||||
psb_csync();
|
||||
dsb(nsh);
|
||||
}
|
||||
|
||||
static void __debug_restore_spe(u64 pmscr_el1)
|
||||
@ -69,7 +68,6 @@ static void __debug_save_trace(u64 *trfcr_el1)
|
||||
isb();
|
||||
/* Drain the trace buffer to memory */
|
||||
tsb_csync();
|
||||
dsb(nsh);
|
||||
}
|
||||
|
||||
static void __debug_restore_trace(u64 trfcr_el1)
|
||||
|
@ -297,6 +297,13 @@ int __pkvm_prot_finalize(void)
|
||||
params->vttbr = kvm_get_vttbr(mmu);
|
||||
params->vtcr = host_mmu.arch.vtcr;
|
||||
params->hcr_el2 |= HCR_VM;
|
||||
|
||||
/*
|
||||
* The CMO below not only cleans the updated params to the
|
||||
* PoC, but also provides the DSB that ensures ongoing
|
||||
* page-table walks that have started before we trapped to EL2
|
||||
* have completed.
|
||||
*/
|
||||
kvm_flush_dcache_to_poc(params, sizeof(*params));
|
||||
|
||||
write_sysreg(params->hcr_el2, hcr_el2);
|
||||
|
@ -272,6 +272,17 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
__debug_save_host_buffers_nvhe(vcpu);
|
||||
|
||||
/*
|
||||
* We're about to restore some new MMU state. Make sure
|
||||
* ongoing page-table walks that have started before we
|
||||
* trapped to EL2 have completed. This also synchronises the
|
||||
* above disabling of SPE and TRBE.
|
||||
*
|
||||
* See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
|
||||
* rule R_LFHQG and subsequent information statements.
|
||||
*/
|
||||
dsb(nsh);
|
||||
|
||||
__kvm_adjust_pc(vcpu);
|
||||
|
||||
/*
|
||||
@ -306,6 +317,13 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
|
||||
__timer_disable_traps(vcpu);
|
||||
__hyp_vgic_save_state(vcpu);
|
||||
|
||||
/*
|
||||
* Same thing as before the guest run: we're about to switch
|
||||
* the MMU context, so let's make sure we don't have any
|
||||
* ongoing EL1&0 translations.
|
||||
*/
|
||||
dsb(nsh);
|
||||
|
||||
__deactivate_traps(vcpu);
|
||||
__load_host_stage2();
|
||||
|
||||
|
@ -9,6 +9,7 @@
|
||||
#include <linux/kvm_host.h>
|
||||
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
|
||||
void __kvm_timer_set_cntvoff(u64 cntvoff)
|
||||
{
|
||||
@ -35,14 +36,19 @@ void __timer_disable_traps(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
void __timer_enable_traps(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u64 val;
|
||||
u64 clr = 0, set = 0;
|
||||
|
||||
/*
|
||||
* Disallow physical timer access for the guest
|
||||
* Physical counter access is allowed
|
||||
* Physical counter access is allowed if no offset is enforced
|
||||
* or running protected (we don't offset anything in this case).
|
||||
*/
|
||||
val = read_sysreg(cnthctl_el2);
|
||||
val &= ~CNTHCTL_EL1PCEN;
|
||||
val |= CNTHCTL_EL1PCTEN;
|
||||
write_sysreg(val, cnthctl_el2);
|
||||
clr = CNTHCTL_EL1PCEN;
|
||||
if (is_protected_kvm_enabled() ||
|
||||
!kern_hyp_va(vcpu->kvm)->arch.timer_data.poffset)
|
||||
set |= CNTHCTL_EL1PCTEN;
|
||||
else
|
||||
clr |= CNTHCTL_EL1PCTEN;
|
||||
|
||||
sysreg_clear_set(cnthctl_el2, clr, set);
|
||||
}
|
||||
|
@ -15,8 +15,31 @@ struct tlb_inv_context {
|
||||
};
|
||||
|
||||
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
|
||||
struct tlb_inv_context *cxt)
|
||||
struct tlb_inv_context *cxt,
|
||||
bool nsh)
|
||||
{
|
||||
/*
|
||||
* We have two requirements:
|
||||
*
|
||||
* - ensure that the page table updates are visible to all
|
||||
* CPUs, for which a dsb(DOMAIN-st) is what we need, DOMAIN
|
||||
* being either ish or nsh, depending on the invalidation
|
||||
* type.
|
||||
*
|
||||
* - complete any speculative page table walk started before
|
||||
* we trapped to EL2 so that we can mess with the MM
|
||||
* registers out of context, for which dsb(nsh) is enough
|
||||
*
|
||||
* The composition of these two barriers is a dsb(DOMAIN), and
|
||||
* the 'nsh' parameter tracks the distinction between
|
||||
* Inner-Shareable and Non-Shareable, as specified by the
|
||||
* callers.
|
||||
*/
|
||||
if (nsh)
|
||||
dsb(nsh);
|
||||
else
|
||||
dsb(ish);
|
||||
|
||||
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
|
||||
u64 val;
|
||||
|
||||
@ -60,10 +83,8 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
|
||||
{
|
||||
struct tlb_inv_context cxt;
|
||||
|
||||
dsb(ishst);
|
||||
|
||||
/* Switch to requested VMID */
|
||||
__tlb_switch_to_guest(mmu, &cxt);
|
||||
__tlb_switch_to_guest(mmu, &cxt, false);
|
||||
|
||||
/*
|
||||
* We could do so much better if we had the VA as well.
|
||||
@ -113,10 +134,8 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
|
||||
{
|
||||
struct tlb_inv_context cxt;
|
||||
|
||||
dsb(ishst);
|
||||
|
||||
/* Switch to requested VMID */
|
||||
__tlb_switch_to_guest(mmu, &cxt);
|
||||
__tlb_switch_to_guest(mmu, &cxt, false);
|
||||
|
||||
__tlbi(vmalls12e1is);
|
||||
dsb(ish);
|
||||
@ -130,7 +149,7 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
|
||||
struct tlb_inv_context cxt;
|
||||
|
||||
/* Switch to requested VMID */
|
||||
__tlb_switch_to_guest(mmu, &cxt);
|
||||
__tlb_switch_to_guest(mmu, &cxt, false);
|
||||
|
||||
__tlbi(vmalle1);
|
||||
asm volatile("ic iallu");
|
||||
@ -142,7 +161,8 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
|
||||
|
||||
void __kvm_flush_vm_context(void)
|
||||
{
|
||||
dsb(ishst);
|
||||
/* Same remark as in __tlb_switch_to_guest() */
|
||||
dsb(ish);
|
||||
__tlbi(alle1is);
|
||||
|
||||
/*
|
||||
|
@ -227,11 +227,10 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
|
||||
|
||||
/*
|
||||
* When we exit from the guest we change a number of CPU configuration
|
||||
* parameters, such as traps. Make sure these changes take effect
|
||||
* before running the host or additional guests.
|
||||
* parameters, such as traps. We rely on the isb() in kvm_call_hyp*()
|
||||
* to make sure these changes take effect before running the host or
|
||||
* additional guests.
|
||||
*/
|
||||
isb();
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -13,6 +13,7 @@
|
||||
#include <asm/kvm_asm.h>
|
||||
#include <asm/kvm_emulate.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_nested.h>
|
||||
|
||||
/*
|
||||
* VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
|
||||
@ -69,6 +70,17 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
|
||||
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
|
||||
__sysreg_save_user_state(host_ctxt);
|
||||
|
||||
/*
|
||||
* When running a normal EL1 guest, we only load a new vcpu
|
||||
* after a context switch, which imvolves a DSB, so all
|
||||
* speculative EL1&0 walks will have already completed.
|
||||
* If running NV, the vcpu may transition between vEL1 and
|
||||
* vEL2 without a context switch, so make sure we complete
|
||||
* those walks before loading a new context.
|
||||
*/
|
||||
if (vcpu_has_nv(vcpu))
|
||||
dsb(nsh);
|
||||
|
||||
/*
|
||||
* Load guest EL1 and user state
|
||||
*
|
||||
|
@ -47,7 +47,7 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
|
||||
cycles = systime_snapshot.cycles - vcpu->kvm->arch.timer_data.voffset;
|
||||
break;
|
||||
case KVM_PTP_PHYS_COUNTER:
|
||||
cycles = systime_snapshot.cycles;
|
||||
cycles = systime_snapshot.cycles - vcpu->kvm->arch.timer_data.poffset;
|
||||
break;
|
||||
default:
|
||||
return;
|
||||
@ -65,7 +65,7 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
|
||||
val[3] = lower_32_bits(cycles);
|
||||
}
|
||||
|
||||
static bool kvm_hvc_call_default_allowed(u32 func_id)
|
||||
static bool kvm_smccc_default_allowed(u32 func_id)
|
||||
{
|
||||
switch (func_id) {
|
||||
/*
|
||||
@ -93,7 +93,7 @@ static bool kvm_hvc_call_default_allowed(u32 func_id)
|
||||
}
|
||||
}
|
||||
|
||||
static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
|
||||
static bool kvm_smccc_test_fw_bmap(struct kvm_vcpu *vcpu, u32 func_id)
|
||||
{
|
||||
struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
|
||||
|
||||
@ -117,20 +117,161 @@ static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
|
||||
return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_PTP,
|
||||
&smccc_feat->vendor_hyp_bmap);
|
||||
default:
|
||||
return kvm_hvc_call_default_allowed(func_id);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
#define SMC32_ARCH_RANGE_BEGIN ARM_SMCCC_VERSION_FUNC_ID
|
||||
#define SMC32_ARCH_RANGE_END ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
|
||||
ARM_SMCCC_SMC_32, \
|
||||
0, ARM_SMCCC_FUNC_MASK)
|
||||
|
||||
#define SMC64_ARCH_RANGE_BEGIN ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
|
||||
ARM_SMCCC_SMC_64, \
|
||||
0, 0)
|
||||
#define SMC64_ARCH_RANGE_END ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
|
||||
ARM_SMCCC_SMC_64, \
|
||||
0, ARM_SMCCC_FUNC_MASK)
|
||||
|
||||
static void init_smccc_filter(struct kvm *kvm)
|
||||
{
|
||||
int r;
|
||||
|
||||
mt_init(&kvm->arch.smccc_filter);
|
||||
|
||||
/*
|
||||
* Prevent userspace from handling any SMCCC calls in the architecture
|
||||
* range, avoiding the risk of misrepresenting Spectre mitigation status
|
||||
* to the guest.
|
||||
*/
|
||||
r = mtree_insert_range(&kvm->arch.smccc_filter,
|
||||
SMC32_ARCH_RANGE_BEGIN, SMC32_ARCH_RANGE_END,
|
||||
xa_mk_value(KVM_SMCCC_FILTER_HANDLE),
|
||||
GFP_KERNEL_ACCOUNT);
|
||||
WARN_ON_ONCE(r);
|
||||
|
||||
r = mtree_insert_range(&kvm->arch.smccc_filter,
|
||||
SMC64_ARCH_RANGE_BEGIN, SMC64_ARCH_RANGE_END,
|
||||
xa_mk_value(KVM_SMCCC_FILTER_HANDLE),
|
||||
GFP_KERNEL_ACCOUNT);
|
||||
WARN_ON_ONCE(r);
|
||||
|
||||
}
|
||||
|
||||
static int kvm_smccc_set_filter(struct kvm *kvm, struct kvm_smccc_filter __user *uaddr)
|
||||
{
|
||||
const void *zero_page = page_to_virt(ZERO_PAGE(0));
|
||||
struct kvm_smccc_filter filter;
|
||||
u32 start, end;
|
||||
int r;
|
||||
|
||||
if (copy_from_user(&filter, uaddr, sizeof(filter)))
|
||||
return -EFAULT;
|
||||
|
||||
if (memcmp(filter.pad, zero_page, sizeof(filter.pad)))
|
||||
return -EINVAL;
|
||||
|
||||
start = filter.base;
|
||||
end = start + filter.nr_functions - 1;
|
||||
|
||||
if (end < start || filter.action >= NR_SMCCC_FILTER_ACTIONS)
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
|
||||
if (kvm_vm_has_ran_once(kvm)) {
|
||||
r = -EBUSY;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
r = mtree_insert_range(&kvm->arch.smccc_filter, start, end,
|
||||
xa_mk_value(filter.action), GFP_KERNEL_ACCOUNT);
|
||||
if (r)
|
||||
goto out_unlock;
|
||||
|
||||
set_bit(KVM_ARCH_FLAG_SMCCC_FILTER_CONFIGURED, &kvm->arch.flags);
|
||||
|
||||
out_unlock:
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
return r;
|
||||
}
|
||||
|
||||
static u8 kvm_smccc_filter_get_action(struct kvm *kvm, u32 func_id)
|
||||
{
|
||||
unsigned long idx = func_id;
|
||||
void *val;
|
||||
|
||||
if (!test_bit(KVM_ARCH_FLAG_SMCCC_FILTER_CONFIGURED, &kvm->arch.flags))
|
||||
return KVM_SMCCC_FILTER_HANDLE;
|
||||
|
||||
/*
|
||||
* But where's the error handling, you say?
|
||||
*
|
||||
* mt_find() returns NULL if no entry was found, which just so happens
|
||||
* to match KVM_SMCCC_FILTER_HANDLE.
|
||||
*/
|
||||
val = mt_find(&kvm->arch.smccc_filter, &idx, idx);
|
||||
return xa_to_value(val);
|
||||
}
|
||||
|
||||
static u8 kvm_smccc_get_action(struct kvm_vcpu *vcpu, u32 func_id)
|
||||
{
|
||||
/*
|
||||
* Intervening actions in the SMCCC filter take precedence over the
|
||||
* pseudo-firmware register bitmaps.
|
||||
*/
|
||||
u8 action = kvm_smccc_filter_get_action(vcpu->kvm, func_id);
|
||||
if (action != KVM_SMCCC_FILTER_HANDLE)
|
||||
return action;
|
||||
|
||||
if (kvm_smccc_test_fw_bmap(vcpu, func_id) ||
|
||||
kvm_smccc_default_allowed(func_id))
|
||||
return KVM_SMCCC_FILTER_HANDLE;
|
||||
|
||||
return KVM_SMCCC_FILTER_DENY;
|
||||
}
|
||||
|
||||
static void kvm_prepare_hypercall_exit(struct kvm_vcpu *vcpu, u32 func_id)
|
||||
{
|
||||
u8 ec = ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
|
||||
struct kvm_run *run = vcpu->run;
|
||||
u64 flags = 0;
|
||||
|
||||
if (ec == ESR_ELx_EC_SMC32 || ec == ESR_ELx_EC_SMC64)
|
||||
flags |= KVM_HYPERCALL_EXIT_SMC;
|
||||
|
||||
if (!kvm_vcpu_trap_il_is32bit(vcpu))
|
||||
flags |= KVM_HYPERCALL_EXIT_16BIT;
|
||||
|
||||
run->exit_reason = KVM_EXIT_HYPERCALL;
|
||||
run->hypercall = (typeof(run->hypercall)) {
|
||||
.nr = func_id,
|
||||
.flags = flags,
|
||||
};
|
||||
}
|
||||
|
||||
int kvm_smccc_call_handler(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
|
||||
u32 func_id = smccc_get_function(vcpu);
|
||||
u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
|
||||
u32 feature;
|
||||
u8 action;
|
||||
gpa_t gpa;
|
||||
|
||||
if (!kvm_hvc_call_allowed(vcpu, func_id))
|
||||
action = kvm_smccc_get_action(vcpu, func_id);
|
||||
switch (action) {
|
||||
case KVM_SMCCC_FILTER_HANDLE:
|
||||
break;
|
||||
case KVM_SMCCC_FILTER_DENY:
|
||||
goto out;
|
||||
case KVM_SMCCC_FILTER_FWD_TO_USER:
|
||||
kvm_prepare_hypercall_exit(vcpu, func_id);
|
||||
return 0;
|
||||
default:
|
||||
WARN_RATELIMIT(1, "Unhandled SMCCC filter action: %d\n", action);
|
||||
goto out;
|
||||
}
|
||||
|
||||
switch (func_id) {
|
||||
case ARM_SMCCC_VERSION_FUNC_ID:
|
||||
@ -245,6 +386,13 @@ void kvm_arm_init_hypercalls(struct kvm *kvm)
|
||||
smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
|
||||
smccc_feat->std_hyp_bmap = KVM_ARM_SMCCC_STD_HYP_FEATURES;
|
||||
smccc_feat->vendor_hyp_bmap = KVM_ARM_SMCCC_VENDOR_HYP_FEATURES;
|
||||
|
||||
init_smccc_filter(kvm);
|
||||
}
|
||||
|
||||
void kvm_arm_teardown_hypercalls(struct kvm *kvm)
|
||||
{
|
||||
mtree_destroy(&kvm->arch.smccc_filter);
|
||||
}
|
||||
|
||||
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
|
||||
@ -377,17 +525,16 @@ static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
|
||||
if (val & ~fw_reg_features)
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
|
||||
if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags) &&
|
||||
val != *fw_reg_bmap) {
|
||||
if (kvm_vm_has_ran_once(kvm) && val != *fw_reg_bmap) {
|
||||
ret = -EBUSY;
|
||||
goto out;
|
||||
}
|
||||
|
||||
WRITE_ONCE(*fw_reg_bmap, val);
|
||||
out:
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -481,3 +628,25 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
|
||||
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
int kvm_vm_smccc_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||
{
|
||||
switch (attr->attr) {
|
||||
case KVM_ARM_VM_SMCCC_FILTER:
|
||||
return 0;
|
||||
default:
|
||||
return -ENXIO;
|
||||
}
|
||||
}
|
||||
|
||||
int kvm_vm_smccc_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||
{
|
||||
void __user *uaddr = (void __user *)attr->addr;
|
||||
|
||||
switch (attr->attr) {
|
||||
case KVM_ARM_VM_SMCCC_FILTER:
|
||||
return kvm_smccc_set_filter(kvm, uaddr);
|
||||
default:
|
||||
return -ENXIO;
|
||||
}
|
||||
}
|
||||
|
@ -876,13 +876,13 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
|
||||
struct arm_pmu *arm_pmu;
|
||||
int ret = -ENXIO;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
lockdep_assert_held(&kvm->arch.config_lock);
|
||||
mutex_lock(&arm_pmus_lock);
|
||||
|
||||
list_for_each_entry(entry, &arm_pmus, entry) {
|
||||
arm_pmu = entry->arm_pmu;
|
||||
if (arm_pmu->pmu.type == pmu_id) {
|
||||
if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags) ||
|
||||
if (kvm_vm_has_ran_once(kvm) ||
|
||||
(kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) {
|
||||
ret = -EBUSY;
|
||||
break;
|
||||
@ -896,7 +896,6 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
|
||||
}
|
||||
|
||||
mutex_unlock(&arm_pmus_lock);
|
||||
mutex_unlock(&kvm->lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -904,22 +903,20 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||
{
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
|
||||
lockdep_assert_held(&kvm->arch.config_lock);
|
||||
|
||||
if (!kvm_vcpu_has_pmu(vcpu))
|
||||
return -ENODEV;
|
||||
|
||||
if (vcpu->arch.pmu.created)
|
||||
return -EBUSY;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
if (!kvm->arch.arm_pmu) {
|
||||
/* No PMU set, get the default one */
|
||||
kvm->arch.arm_pmu = kvm_pmu_probe_armpmu();
|
||||
if (!kvm->arch.arm_pmu) {
|
||||
mutex_unlock(&kvm->lock);
|
||||
if (!kvm->arch.arm_pmu)
|
||||
return -ENODEV;
|
||||
}
|
||||
}
|
||||
mutex_unlock(&kvm->lock);
|
||||
|
||||
switch (attr->attr) {
|
||||
case KVM_ARM_VCPU_PMU_V3_IRQ: {
|
||||
@ -963,19 +960,13 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||
filter.action != KVM_PMU_EVENT_DENY))
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
|
||||
if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
|
||||
mutex_unlock(&kvm->lock);
|
||||
if (kvm_vm_has_ran_once(kvm))
|
||||
return -EBUSY;
|
||||
}
|
||||
|
||||
if (!kvm->arch.pmu_filter) {
|
||||
kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT);
|
||||
if (!kvm->arch.pmu_filter) {
|
||||
mutex_unlock(&kvm->lock);
|
||||
if (!kvm->arch.pmu_filter)
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
/*
|
||||
* The default depends on the first applied filter.
|
||||
@ -994,8 +985,6 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
|
||||
else
|
||||
bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
|
||||
|
||||
mutex_unlock(&kvm->lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
case KVM_ARM_VCPU_PMU_V3_SET_PMU: {
|
||||
|
@ -62,6 +62,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
|
||||
struct vcpu_reset_state *reset_state;
|
||||
struct kvm *kvm = source_vcpu->kvm;
|
||||
struct kvm_vcpu *vcpu = NULL;
|
||||
int ret = PSCI_RET_SUCCESS;
|
||||
unsigned long cpu_id;
|
||||
|
||||
cpu_id = smccc_get_arg1(source_vcpu);
|
||||
@ -76,11 +77,15 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
|
||||
*/
|
||||
if (!vcpu)
|
||||
return PSCI_RET_INVALID_PARAMS;
|
||||
|
||||
spin_lock(&vcpu->arch.mp_state_lock);
|
||||
if (!kvm_arm_vcpu_stopped(vcpu)) {
|
||||
if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
|
||||
return PSCI_RET_ALREADY_ON;
|
||||
ret = PSCI_RET_ALREADY_ON;
|
||||
else
|
||||
return PSCI_RET_INVALID_PARAMS;
|
||||
ret = PSCI_RET_INVALID_PARAMS;
|
||||
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
reset_state = &vcpu->arch.reset_state;
|
||||
@ -96,7 +101,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
|
||||
*/
|
||||
reset_state->r0 = smccc_get_arg3(source_vcpu);
|
||||
|
||||
WRITE_ONCE(reset_state->reset, true);
|
||||
reset_state->reset = true;
|
||||
kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
|
||||
|
||||
/*
|
||||
@ -105,10 +110,12 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
|
||||
*/
|
||||
smp_wmb();
|
||||
|
||||
vcpu->arch.mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
|
||||
WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_RUNNABLE);
|
||||
kvm_vcpu_wake_up(vcpu);
|
||||
|
||||
return PSCI_RET_SUCCESS;
|
||||
out_unlock:
|
||||
spin_unlock(&vcpu->arch.mp_state_lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
|
||||
@ -168,8 +175,11 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type, u64 flags)
|
||||
* after this call is handled and before the VCPUs have been
|
||||
* re-initialized.
|
||||
*/
|
||||
kvm_for_each_vcpu(i, tmp, vcpu->kvm)
|
||||
tmp->arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
|
||||
kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
|
||||
spin_lock(&tmp->arch.mp_state_lock);
|
||||
WRITE_ONCE(tmp->arch.mp_state.mp_state, KVM_MP_STATE_STOPPED);
|
||||
spin_unlock(&tmp->arch.mp_state_lock);
|
||||
}
|
||||
kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
|
||||
|
||||
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
|
||||
@ -229,7 +239,6 @@ static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32
|
||||
|
||||
static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
u32 psci_fn = smccc_get_function(vcpu);
|
||||
unsigned long val;
|
||||
int ret = 1;
|
||||
@ -254,9 +263,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
|
||||
kvm_psci_narrow_to_32bit(vcpu);
|
||||
fallthrough;
|
||||
case PSCI_0_2_FN64_CPU_ON:
|
||||
mutex_lock(&kvm->lock);
|
||||
val = kvm_psci_vcpu_on(vcpu);
|
||||
mutex_unlock(&kvm->lock);
|
||||
break;
|
||||
case PSCI_0_2_FN_AFFINITY_INFO:
|
||||
kvm_psci_narrow_to_32bit(vcpu);
|
||||
@ -395,7 +402,6 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
|
||||
|
||||
static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
u32 psci_fn = smccc_get_function(vcpu);
|
||||
unsigned long val;
|
||||
|
||||
@ -405,9 +411,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
|
||||
val = PSCI_RET_SUCCESS;
|
||||
break;
|
||||
case KVM_PSCI_FN_CPU_ON:
|
||||
mutex_lock(&kvm->lock);
|
||||
val = kvm_psci_vcpu_on(vcpu);
|
||||
mutex_unlock(&kvm->lock);
|
||||
break;
|
||||
default:
|
||||
val = PSCI_RET_NOT_SUPPORTED;
|
||||
@ -435,6 +439,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
|
||||
int kvm_psci_call(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u32 psci_fn = smccc_get_function(vcpu);
|
||||
int version = kvm_psci_version(vcpu);
|
||||
unsigned long val;
|
||||
|
||||
val = kvm_psci_check_allowed_function(vcpu, psci_fn);
|
||||
@ -443,7 +448,7 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
|
||||
return 1;
|
||||
}
|
||||
|
||||
switch (kvm_psci_version(vcpu)) {
|
||||
switch (version) {
|
||||
case KVM_ARM_PSCI_1_1:
|
||||
return kvm_psci_1_x_call(vcpu, 1);
|
||||
case KVM_ARM_PSCI_1_0:
|
||||
@ -453,6 +458,8 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
|
||||
case KVM_ARM_PSCI_0_1:
|
||||
return kvm_psci_0_1_call(vcpu);
|
||||
default:
|
||||
return -EINVAL;
|
||||
WARN_ONCE(1, "Unknown PSCI version %d", version);
|
||||
smccc_set_retval(vcpu, SMCCC_RET_NOT_SUPPORTED, 0, 0, 0);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
@ -205,7 +205,7 @@ static int kvm_set_vm_width(struct kvm_vcpu *vcpu)
|
||||
|
||||
is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
|
||||
|
||||
lockdep_assert_held(&kvm->lock);
|
||||
lockdep_assert_held(&kvm->arch.config_lock);
|
||||
|
||||
if (test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &kvm->arch.flags)) {
|
||||
/*
|
||||
@ -262,17 +262,18 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
|
||||
bool loaded;
|
||||
u32 pstate;
|
||||
|
||||
mutex_lock(&vcpu->kvm->lock);
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
ret = kvm_set_vm_width(vcpu);
|
||||
if (!ret) {
|
||||
reset_state = vcpu->arch.reset_state;
|
||||
WRITE_ONCE(vcpu->arch.reset_state.reset, false);
|
||||
}
|
||||
mutex_unlock(&vcpu->kvm->lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
spin_lock(&vcpu->arch.mp_state_lock);
|
||||
reset_state = vcpu->arch.reset_state;
|
||||
vcpu->arch.reset_state.reset = false;
|
||||
spin_unlock(&vcpu->arch.mp_state_lock);
|
||||
|
||||
/* Reset PMU outside of the non-preemptible section */
|
||||
kvm_pmu_vcpu_reset(vcpu);
|
||||
|
||||
|
@ -1154,6 +1154,12 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
|
||||
tmr = TIMER_PTIMER;
|
||||
treg = TIMER_REG_CVAL;
|
||||
break;
|
||||
case SYS_CNTPCT_EL0:
|
||||
case SYS_CNTPCTSS_EL0:
|
||||
case SYS_AARCH32_CNTPCT:
|
||||
tmr = TIMER_PTIMER;
|
||||
treg = TIMER_REG_CNT;
|
||||
break;
|
||||
default:
|
||||
print_sys_reg_msg(p, "%s", "Unhandled trapped timer register");
|
||||
kvm_inject_undefined(vcpu);
|
||||
@ -2091,6 +2097,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
|
||||
AMU_AMEVTYPER1_EL0(14),
|
||||
AMU_AMEVTYPER1_EL0(15),
|
||||
|
||||
{ SYS_DESC(SYS_CNTPCT_EL0), access_arch_timer },
|
||||
{ SYS_DESC(SYS_CNTPCTSS_EL0), access_arch_timer },
|
||||
{ SYS_DESC(SYS_CNTP_TVAL_EL0), access_arch_timer },
|
||||
{ SYS_DESC(SYS_CNTP_CTL_EL0), access_arch_timer },
|
||||
{ SYS_DESC(SYS_CNTP_CVAL_EL0), access_arch_timer },
|
||||
@ -2541,10 +2549,12 @@ static const struct sys_reg_desc cp15_64_regs[] = {
|
||||
{ Op1( 0), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, TTBR0_EL1 },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 0, 9, 0), .access = access_pmu_evcntr },
|
||||
{ Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_SGI1R */
|
||||
{ SYS_DESC(SYS_AARCH32_CNTPCT), access_arch_timer },
|
||||
{ Op1( 1), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, TTBR1_EL1 },
|
||||
{ Op1( 1), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_ASGI1R */
|
||||
{ Op1( 2), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_SGI0R */
|
||||
{ SYS_DESC(SYS_AARCH32_CNTP_CVAL), access_arch_timer },
|
||||
{ SYS_DESC(SYS_AARCH32_CNTPCTSS), access_arch_timer },
|
||||
};
|
||||
|
||||
static bool check_sysreg_table(const struct sys_reg_desc *table, unsigned int n,
|
||||
|
@ -206,6 +206,7 @@ TRACE_EVENT(kvm_get_timer_map,
|
||||
__field( unsigned long, vcpu_id )
|
||||
__field( int, direct_vtimer )
|
||||
__field( int, direct_ptimer )
|
||||
__field( int, emul_vtimer )
|
||||
__field( int, emul_ptimer )
|
||||
),
|
||||
|
||||
@ -214,14 +215,17 @@ TRACE_EVENT(kvm_get_timer_map,
|
||||
__entry->direct_vtimer = arch_timer_ctx_index(map->direct_vtimer);
|
||||
__entry->direct_ptimer =
|
||||
(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
|
||||
__entry->emul_vtimer =
|
||||
(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
|
||||
__entry->emul_ptimer =
|
||||
(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
|
||||
),
|
||||
|
||||
TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
|
||||
TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
|
||||
__entry->vcpu_id,
|
||||
__entry->direct_vtimer,
|
||||
__entry->direct_ptimer,
|
||||
__entry->emul_vtimer,
|
||||
__entry->emul_ptimer)
|
||||
);
|
||||
|
||||
|
@ -85,7 +85,7 @@ static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
|
||||
struct kvm *kvm = s->private;
|
||||
struct vgic_state_iter *iter;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
iter = kvm->arch.vgic.iter;
|
||||
if (iter) {
|
||||
iter = ERR_PTR(-EBUSY);
|
||||
@ -104,7 +104,7 @@ static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
|
||||
if (end_of_vgic(iter))
|
||||
iter = NULL;
|
||||
out:
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
return iter;
|
||||
}
|
||||
|
||||
@ -132,12 +132,12 @@ static void vgic_debug_stop(struct seq_file *s, void *v)
|
||||
if (IS_ERR(v))
|
||||
return;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
iter = kvm->arch.vgic.iter;
|
||||
kfree(iter->lpi_array);
|
||||
kfree(iter);
|
||||
kvm->arch.vgic.iter = NULL;
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
}
|
||||
|
||||
static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
|
||||
|
@ -74,9 +74,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
|
||||
unsigned long i;
|
||||
int ret;
|
||||
|
||||
if (irqchip_in_kernel(kvm))
|
||||
return -EEXIST;
|
||||
|
||||
/*
|
||||
* This function is also called by the KVM_CREATE_IRQCHIP handler,
|
||||
* which had no chance yet to check the availability of the GICv2
|
||||
@ -87,10 +84,20 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
|
||||
!kvm_vgic_global_state.can_emulate_gicv2)
|
||||
return -ENODEV;
|
||||
|
||||
/* Must be held to avoid race with vCPU creation */
|
||||
lockdep_assert_held(&kvm->lock);
|
||||
|
||||
ret = -EBUSY;
|
||||
if (!lock_all_vcpus(kvm))
|
||||
return ret;
|
||||
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
|
||||
if (irqchip_in_kernel(kvm)) {
|
||||
ret = -EEXIST;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
kvm_for_each_vcpu(i, vcpu, kvm) {
|
||||
if (vcpu_has_run_once(vcpu))
|
||||
goto out_unlock;
|
||||
@ -118,6 +125,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
|
||||
INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
|
||||
|
||||
out_unlock:
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
unlock_all_vcpus(kvm);
|
||||
return ret;
|
||||
}
|
||||
@ -227,9 +235,9 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
|
||||
* KVM io device for the redistributor that belongs to this VCPU.
|
||||
*/
|
||||
if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
|
||||
mutex_lock(&vcpu->kvm->lock);
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
ret = vgic_register_redist_iodev(vcpu);
|
||||
mutex_unlock(&vcpu->kvm->lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
@ -250,7 +258,6 @@ static void kvm_vgic_vcpu_enable(struct kvm_vcpu *vcpu)
|
||||
* The function is generally called when nr_spis has been explicitly set
|
||||
* by the guest through the KVM DEVICE API. If not nr_spis is set to 256.
|
||||
* vgic_initialized() returns true when this function has succeeded.
|
||||
* Must be called with kvm->lock held!
|
||||
*/
|
||||
int vgic_init(struct kvm *kvm)
|
||||
{
|
||||
@ -259,6 +266,8 @@ int vgic_init(struct kvm *kvm)
|
||||
int ret = 0, i;
|
||||
unsigned long idx;
|
||||
|
||||
lockdep_assert_held(&kvm->arch.config_lock);
|
||||
|
||||
if (vgic_initialized(kvm))
|
||||
return 0;
|
||||
|
||||
@ -373,12 +382,13 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
|
||||
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
|
||||
}
|
||||
|
||||
/* To be called with kvm->lock held */
|
||||
static void __kvm_vgic_destroy(struct kvm *kvm)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
unsigned long i;
|
||||
|
||||
lockdep_assert_held(&kvm->arch.config_lock);
|
||||
|
||||
vgic_debug_destroy(kvm);
|
||||
|
||||
kvm_for_each_vcpu(i, vcpu, kvm)
|
||||
@ -389,9 +399,9 @@ static void __kvm_vgic_destroy(struct kvm *kvm)
|
||||
|
||||
void kvm_vgic_destroy(struct kvm *kvm)
|
||||
{
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
__kvm_vgic_destroy(kvm);
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
}
|
||||
|
||||
/**
|
||||
@ -414,9 +424,9 @@ int vgic_lazy_init(struct kvm *kvm)
|
||||
if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2)
|
||||
return -EBUSY;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
ret = vgic_init(kvm);
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
}
|
||||
|
||||
return ret;
|
||||
@ -441,7 +451,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
|
||||
if (likely(vgic_ready(kvm)))
|
||||
return 0;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
if (vgic_ready(kvm))
|
||||
goto out;
|
||||
|
||||
@ -459,7 +469,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
|
||||
dist->ready = true;
|
||||
|
||||
out:
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -1958,6 +1958,16 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
|
||||
mutex_init(&its->its_lock);
|
||||
mutex_init(&its->cmd_lock);
|
||||
|
||||
/* Yep, even more trickery for lock ordering... */
|
||||
#ifdef CONFIG_LOCKDEP
|
||||
mutex_lock(&dev->kvm->arch.config_lock);
|
||||
mutex_lock(&its->cmd_lock);
|
||||
mutex_lock(&its->its_lock);
|
||||
mutex_unlock(&its->its_lock);
|
||||
mutex_unlock(&its->cmd_lock);
|
||||
mutex_unlock(&dev->kvm->arch.config_lock);
|
||||
#endif
|
||||
|
||||
its->vgic_its_base = VGIC_ADDR_UNDEF;
|
||||
|
||||
INIT_LIST_HEAD(&its->device_list);
|
||||
@ -2045,6 +2055,13 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
|
||||
|
||||
mutex_lock(&dev->kvm->lock);
|
||||
|
||||
if (!lock_all_vcpus(dev->kvm)) {
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
return -EBUSY;
|
||||
}
|
||||
|
||||
mutex_lock(&dev->kvm->arch.config_lock);
|
||||
|
||||
if (IS_VGIC_ADDR_UNDEF(its->vgic_its_base)) {
|
||||
ret = -ENXIO;
|
||||
goto out;
|
||||
@ -2058,11 +2075,6 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (!lock_all_vcpus(dev->kvm)) {
|
||||
ret = -EBUSY;
|
||||
goto out;
|
||||
}
|
||||
|
||||
addr = its->vgic_its_base + offset;
|
||||
|
||||
len = region->access_flags & VGIC_ACCESS_64bit ? 8 : 4;
|
||||
@ -2076,8 +2088,9 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
|
||||
} else {
|
||||
*reg = region->its_read(dev->kvm, its, addr, len);
|
||||
}
|
||||
unlock_all_vcpus(dev->kvm);
|
||||
out:
|
||||
mutex_unlock(&dev->kvm->arch.config_lock);
|
||||
unlock_all_vcpus(dev->kvm);
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
return ret;
|
||||
}
|
||||
@ -2749,14 +2762,15 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
|
||||
return 0;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&its->its_lock);
|
||||
|
||||
if (!lock_all_vcpus(kvm)) {
|
||||
mutex_unlock(&its->its_lock);
|
||||
mutex_unlock(&kvm->lock);
|
||||
return -EBUSY;
|
||||
}
|
||||
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
mutex_lock(&its->its_lock);
|
||||
|
||||
switch (attr) {
|
||||
case KVM_DEV_ARM_ITS_CTRL_RESET:
|
||||
vgic_its_reset(kvm, its);
|
||||
@ -2769,8 +2783,9 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
|
||||
break;
|
||||
}
|
||||
|
||||
unlock_all_vcpus(kvm);
|
||||
mutex_unlock(&its->its_lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
unlock_all_vcpus(kvm);
|
||||
mutex_unlock(&kvm->lock);
|
||||
return ret;
|
||||
}
|
||||
|
@ -46,7 +46,7 @@ int kvm_set_legacy_vgic_v2_addr(struct kvm *kvm, struct kvm_arm_device_addr *dev
|
||||
struct vgic_dist *vgic = &kvm->arch.vgic;
|
||||
int r;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
switch (FIELD_GET(KVM_ARM_DEVICE_TYPE_MASK, dev_addr->id)) {
|
||||
case KVM_VGIC_V2_ADDR_TYPE_DIST:
|
||||
r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
|
||||
@ -68,7 +68,7 @@ int kvm_set_legacy_vgic_v2_addr(struct kvm *kvm, struct kvm_arm_device_addr *dev
|
||||
r = -ENODEV;
|
||||
}
|
||||
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
|
||||
return r;
|
||||
}
|
||||
@ -102,7 +102,7 @@ static int kvm_vgic_addr(struct kvm *kvm, struct kvm_device_attr *attr, bool wri
|
||||
if (get_user(addr, uaddr))
|
||||
return -EFAULT;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
mutex_lock(&kvm->arch.config_lock);
|
||||
switch (attr->attr) {
|
||||
case KVM_VGIC_V2_ADDR_TYPE_DIST:
|
||||
r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
|
||||
@ -191,7 +191,7 @@ static int kvm_vgic_addr(struct kvm *kvm, struct kvm_device_attr *attr, bool wri
|
||||
}
|
||||
|
||||
out:
|
||||
mutex_unlock(&kvm->lock);
|
||||
mutex_unlock(&kvm->arch.config_lock);
|
||||
|
||||
if (!r && !write)
|
||||
r = put_user(addr, uaddr);
|
||||
@ -227,7 +227,7 @@ static int vgic_set_common_attr(struct kvm_device *dev,
|
||||
(val & 31))
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&dev->kvm->lock);
|
||||
mutex_lock(&dev->kvm->arch.config_lock);
|
||||
|
||||
if (vgic_ready(dev->kvm) || dev->kvm->arch.vgic.nr_spis)
|
||||
ret = -EBUSY;
|
||||
@ -235,16 +235,16 @@ static int vgic_set_common_attr(struct kvm_device *dev,
|
||||
dev->kvm->arch.vgic.nr_spis =
|
||||
val - VGIC_NR_PRIVATE_IRQS;
|
||||
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
mutex_unlock(&dev->kvm->arch.config_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
case KVM_DEV_ARM_VGIC_GRP_CTRL: {
|
||||
switch (attr->attr) {
|
||||
case KVM_DEV_ARM_VGIC_CTRL_INIT:
|
||||
mutex_lock(&dev->kvm->lock);
|
||||
mutex_lock(&dev->kvm->arch.config_lock);
|
||||
r = vgic_init(dev->kvm);
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
mutex_unlock(&dev->kvm->arch.config_lock);
|
||||
return r;
|
||||
case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
|
||||
/*
|
||||
@ -260,7 +260,10 @@ static int vgic_set_common_attr(struct kvm_device *dev,
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
return -EBUSY;
|
||||
}
|
||||
|
||||
mutex_lock(&dev->kvm->arch.config_lock);
|
||||
r = vgic_v3_save_pending_tables(dev->kvm);
|
||||
mutex_unlock(&dev->kvm->arch.config_lock);
|
||||
unlock_all_vcpus(dev->kvm);
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
return r;
|
||||
@ -342,44 +345,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* unlocks vcpus from @vcpu_lock_idx and smaller */
|
||||
static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
|
||||
{
|
||||
struct kvm_vcpu *tmp_vcpu;
|
||||
|
||||
for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
|
||||
tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
|
||||
mutex_unlock(&tmp_vcpu->mutex);
|
||||
}
|
||||
}
|
||||
|
||||
void unlock_all_vcpus(struct kvm *kvm)
|
||||
{
|
||||
unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
|
||||
}
|
||||
|
||||
/* Returns true if all vcpus were locked, false otherwise */
|
||||
bool lock_all_vcpus(struct kvm *kvm)
|
||||
{
|
||||
struct kvm_vcpu *tmp_vcpu;
|
||||
unsigned long c;
|
||||
|
||||
/*
|
||||
* Any time a vcpu is run, vcpu_load is called which tries to grab the
|
||||
* vcpu->mutex. By grabbing the vcpu->mutex of all VCPUs we ensure
|
||||
* that no other VCPUs are run and fiddle with the vgic state while we
|
||||
* access it.
|
||||
*/
|
||||
kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
|
||||
if (!mutex_trylock(&tmp_vcpu->mutex)) {
|
||||
unlock_vcpus(kvm, c - 1);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
|
||||
*
|
||||
@ -411,15 +376,17 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
|
||||
|
||||
mutex_lock(&dev->kvm->lock);
|
||||
|
||||
if (!lock_all_vcpus(dev->kvm)) {
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
return -EBUSY;
|
||||
}
|
||||
|
||||
mutex_lock(&dev->kvm->arch.config_lock);
|
||||
|
||||
ret = vgic_init(dev->kvm);
|
||||
if (ret)
|
||||
goto out;
|
||||
|
||||
if (!lock_all_vcpus(dev->kvm)) {
|
||||
ret = -EBUSY;
|
||||
goto out;
|
||||
}
|
||||
|
||||
switch (attr->group) {
|
||||
case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
|
||||
ret = vgic_v2_cpuif_uaccess(vcpu, is_write, addr, &val);
|
||||
@ -432,8 +399,9 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
|
||||
break;
|
||||
}
|
||||
|
||||
unlock_all_vcpus(dev->kvm);
|
||||
out:
|
||||
mutex_unlock(&dev->kvm->arch.config_lock);
|
||||
unlock_all_vcpus(dev->kvm);
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
|
||||
if (!ret && !is_write)
|
||||
@ -569,12 +537,14 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
|
||||
|
||||
mutex_lock(&dev->kvm->lock);
|
||||
|
||||
if (unlikely(!vgic_initialized(dev->kvm))) {
|
||||
ret = -EBUSY;
|
||||
goto out;
|
||||
if (!lock_all_vcpus(dev->kvm)) {
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
return -EBUSY;
|
||||
}
|
||||
|
||||
if (!lock_all_vcpus(dev->kvm)) {
|
||||
mutex_lock(&dev->kvm->arch.config_lock);
|
||||
|
||||
if (unlikely(!vgic_initialized(dev->kvm))) {
|
||||
ret = -EBUSY;
|
||||
goto out;
|
||||
}
|
||||
@ -609,8 +579,9 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
|
||||
break;
|
||||
}
|
||||
|
||||
unlock_all_vcpus(dev->kvm);
|
||||
out:
|
||||
mutex_unlock(&dev->kvm->arch.config_lock);
|
||||
unlock_all_vcpus(dev->kvm);
|
||||
mutex_unlock(&dev->kvm->lock);
|
||||
|
||||
if (!ret && uaccess && !is_write) {
|
||||
|
@ -111,7 +111,7 @@ static void vgic_mmio_write_v3_misc(struct kvm_vcpu *vcpu,
|
||||
case GICD_CTLR: {
|
||||
bool was_enabled, is_hwsgi;
|
||||
|
||||
mutex_lock(&vcpu->kvm->lock);
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
|
||||
was_enabled = dist->enabled;
|
||||
is_hwsgi = dist->nassgireq;
|
||||
@ -139,7 +139,7 @@ static void vgic_mmio_write_v3_misc(struct kvm_vcpu *vcpu,
|
||||
else if (!was_enabled && dist->enabled)
|
||||
vgic_kick_vcpus(vcpu->kvm);
|
||||
|
||||
mutex_unlock(&vcpu->kvm->lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
break;
|
||||
}
|
||||
case GICD_TYPER:
|
||||
|
@ -530,13 +530,13 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
|
||||
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||
u32 val;
|
||||
|
||||
mutex_lock(&vcpu->kvm->lock);
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
vgic_access_active_prepare(vcpu, intid);
|
||||
|
||||
val = __vgic_mmio_read_active(vcpu, addr, len);
|
||||
|
||||
vgic_access_active_finish(vcpu, intid);
|
||||
mutex_unlock(&vcpu->kvm->lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
|
||||
return val;
|
||||
}
|
||||
@ -625,13 +625,13 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
|
||||
{
|
||||
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||
|
||||
mutex_lock(&vcpu->kvm->lock);
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
vgic_access_active_prepare(vcpu, intid);
|
||||
|
||||
__vgic_mmio_write_cactive(vcpu, addr, len, val);
|
||||
|
||||
vgic_access_active_finish(vcpu, intid);
|
||||
mutex_unlock(&vcpu->kvm->lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
}
|
||||
|
||||
int vgic_mmio_uaccess_write_cactive(struct kvm_vcpu *vcpu,
|
||||
@ -662,13 +662,13 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
|
||||
{
|
||||
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||
|
||||
mutex_lock(&vcpu->kvm->lock);
|
||||
mutex_lock(&vcpu->kvm->arch.config_lock);
|
||||
vgic_access_active_prepare(vcpu, intid);
|
||||
|
||||
__vgic_mmio_write_sactive(vcpu, addr, len, val);
|
||||
|
||||
vgic_access_active_finish(vcpu, intid);
|
||||
mutex_unlock(&vcpu->kvm->lock);
|
||||
mutex_unlock(&vcpu->kvm->arch.config_lock);
|
||||
}
|
||||
|
||||
int vgic_mmio_uaccess_write_sactive(struct kvm_vcpu *vcpu,
|
||||
|
@ -232,9 +232,8 @@ int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq)
|
||||
* @kvm: Pointer to the VM being initialized
|
||||
*
|
||||
* We may be called each time a vITS is created, or when the
|
||||
* vgic is initialized. This relies on kvm->lock to be
|
||||
* held. In both cases, the number of vcpus should now be
|
||||
* fixed.
|
||||
* vgic is initialized. In both cases, the number of vcpus
|
||||
* should now be fixed.
|
||||
*/
|
||||
int vgic_v4_init(struct kvm *kvm)
|
||||
{
|
||||
@ -243,6 +242,8 @@ int vgic_v4_init(struct kvm *kvm)
|
||||
int nr_vcpus, ret;
|
||||
unsigned long i;
|
||||
|
||||
lockdep_assert_held(&kvm->arch.config_lock);
|
||||
|
||||
if (!kvm_vgic_global_state.has_gicv4)
|
||||
return 0; /* Nothing to see here... move along. */
|
||||
|
||||
@ -309,14 +310,14 @@ int vgic_v4_init(struct kvm *kvm)
|
||||
/**
|
||||
* vgic_v4_teardown - Free the GICv4 data structures
|
||||
* @kvm: Pointer to the VM being destroyed
|
||||
*
|
||||
* Relies on kvm->lock to be held.
|
||||
*/
|
||||
void vgic_v4_teardown(struct kvm *kvm)
|
||||
{
|
||||
struct its_vm *its_vm = &kvm->arch.vgic.its_vm;
|
||||
int i;
|
||||
|
||||
lockdep_assert_held(&kvm->arch.config_lock);
|
||||
|
||||
if (!its_vm->vpes)
|
||||
return;
|
||||
|
||||
|
@ -24,11 +24,13 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
|
||||
/*
|
||||
* Locking order is always:
|
||||
* kvm->lock (mutex)
|
||||
* its->cmd_lock (mutex)
|
||||
* its->its_lock (mutex)
|
||||
* vgic_cpu->ap_list_lock must be taken with IRQs disabled
|
||||
* kvm->lpi_list_lock must be taken with IRQs disabled
|
||||
* vgic_irq->irq_lock must be taken with IRQs disabled
|
||||
* vcpu->mutex (mutex)
|
||||
* kvm->arch.config_lock (mutex)
|
||||
* its->cmd_lock (mutex)
|
||||
* its->its_lock (mutex)
|
||||
* vgic_cpu->ap_list_lock must be taken with IRQs disabled
|
||||
* kvm->lpi_list_lock must be taken with IRQs disabled
|
||||
* vgic_irq->irq_lock must be taken with IRQs disabled
|
||||
*
|
||||
* As the ap_list_lock might be taken from the timer interrupt handler,
|
||||
* we have to disable IRQs before taking this lock and everything lower
|
||||
@ -573,6 +575,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
|
||||
return 0;
|
||||
}
|
||||
|
||||
int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
|
||||
{
|
||||
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
|
||||
unsigned long flags;
|
||||
int ret = -1;
|
||||
|
||||
raw_spin_lock_irqsave(&irq->irq_lock, flags);
|
||||
if (irq->hw)
|
||||
ret = irq->hwintid;
|
||||
raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
|
||||
|
||||
vgic_put_irq(vcpu->kvm, irq);
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* kvm_vgic_set_owner - Set the owner of an interrupt for a VM
|
||||
*
|
||||
|
@ -273,9 +273,6 @@ int vgic_init(struct kvm *kvm);
|
||||
void vgic_debug_init(struct kvm *kvm);
|
||||
void vgic_debug_destroy(struct kvm *kvm);
|
||||
|
||||
bool lock_all_vcpus(struct kvm *kvm);
|
||||
void unlock_all_vcpus(struct kvm *kvm);
|
||||
|
||||
static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
|
||||
|
@ -23,6 +23,7 @@ HAS_DCPOP
|
||||
HAS_DIT
|
||||
HAS_E0PD
|
||||
HAS_ECV
|
||||
HAS_ECV_CNTPOFF
|
||||
HAS_EPAN
|
||||
HAS_GENERIC_AUTH
|
||||
HAS_GENERIC_AUTH_ARCH_QARMA3
|
||||
|
@ -2115,6 +2115,10 @@ Sysreg CONTEXTIDR_EL2 3 4 13 0 1
|
||||
Fields CONTEXTIDR_ELx
|
||||
EndSysreg
|
||||
|
||||
Sysreg CNTPOFF_EL2 3 4 14 0 6
|
||||
Field 63:0 PhysicalOffset
|
||||
EndSysreg
|
||||
|
||||
Sysreg CPACR_EL12 3 5 1 0 2
|
||||
Fields CPACR_ELx
|
||||
EndSysreg
|
||||
|
@ -757,7 +757,7 @@ struct kvm_mips_callbacks {
|
||||
int (*vcpu_run)(struct kvm_vcpu *vcpu);
|
||||
void (*vcpu_reenter)(struct kvm_vcpu *vcpu);
|
||||
};
|
||||
extern struct kvm_mips_callbacks *kvm_mips_callbacks;
|
||||
extern const struct kvm_mips_callbacks * const kvm_mips_callbacks;
|
||||
int kvm_mips_emulation_init(void);
|
||||
|
||||
/* Debug: dump vcpu state */
|
||||
|
@ -993,9 +993,9 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
|
||||
kvm_flush_remote_tlbs(kvm);
|
||||
}
|
||||
|
||||
long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
|
||||
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
|
||||
{
|
||||
long r;
|
||||
int r;
|
||||
|
||||
switch (ioctl) {
|
||||
default:
|
||||
|
@ -3305,7 +3305,7 @@ static struct kvm_mips_callbacks kvm_vz_callbacks = {
|
||||
};
|
||||
|
||||
/* FIXME: Get rid of the callbacks now that trap-and-emulate is gone. */
|
||||
struct kvm_mips_callbacks *kvm_mips_callbacks = &kvm_vz_callbacks;
|
||||
const struct kvm_mips_callbacks * const kvm_mips_callbacks = &kvm_vz_callbacks;
|
||||
|
||||
int kvm_mips_emulation_init(void)
|
||||
{
|
||||
|
@ -167,7 +167,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
|
||||
|
||||
extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
|
||||
extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
|
||||
extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
|
||||
extern int kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
|
||||
extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
|
||||
extern void kvmppc_rmap_reset(struct kvm *kvm);
|
||||
extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
|
||||
@ -181,7 +181,7 @@ extern int kvmppc_switch_mmu_to_hpt(struct kvm *kvm);
|
||||
extern int kvmppc_switch_mmu_to_radix(struct kvm *kvm);
|
||||
extern void kvmppc_setup_partition_table(struct kvm *kvm);
|
||||
|
||||
extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
|
||||
extern int kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
|
||||
struct kvm_create_spapr_tce_64 *args);
|
||||
#define kvmppc_ioba_validate(stt, ioba, npages) \
|
||||
(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
|
||||
@ -222,10 +222,10 @@ extern void kvmppc_bookehv_exit(void);
|
||||
extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu);
|
||||
|
||||
extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *);
|
||||
extern long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
|
||||
struct kvm_ppc_resize_hpt *rhpt);
|
||||
extern long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
|
||||
extern int kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
|
||||
struct kvm_ppc_resize_hpt *rhpt);
|
||||
extern int kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
|
||||
struct kvm_ppc_resize_hpt *rhpt);
|
||||
|
||||
int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
|
||||
|
||||
@ -297,8 +297,8 @@ struct kvmppc_ops {
|
||||
int (*emulate_mtspr)(struct kvm_vcpu *vcpu, int sprn, ulong spr_val);
|
||||
int (*emulate_mfspr)(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val);
|
||||
void (*fast_vcpu_kick)(struct kvm_vcpu *vcpu);
|
||||
long (*arch_vm_ioctl)(struct file *filp, unsigned int ioctl,
|
||||
unsigned long arg);
|
||||
int (*arch_vm_ioctl)(struct file *filp, unsigned int ioctl,
|
||||
unsigned long arg);
|
||||
int (*hcall_implemented)(unsigned long hcall);
|
||||
int (*irq_bypass_add_producer)(struct irq_bypass_consumer *,
|
||||
struct irq_bypass_producer *);
|
||||
|
@ -124,9 +124,9 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
|
||||
info->virt, (long)info->order, kvm->arch.lpid);
|
||||
}
|
||||
|
||||
long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
|
||||
int kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
|
||||
{
|
||||
long err = -EBUSY;
|
||||
int err = -EBUSY;
|
||||
struct kvm_hpt_info info;
|
||||
|
||||
mutex_lock(&kvm->arch.mmu_setup_lock);
|
||||
@ -1482,8 +1482,8 @@ static void resize_hpt_prepare_work(struct work_struct *work)
|
||||
mutex_unlock(&kvm->arch.mmu_setup_lock);
|
||||
}
|
||||
|
||||
long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
|
||||
struct kvm_ppc_resize_hpt *rhpt)
|
||||
int kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
|
||||
struct kvm_ppc_resize_hpt *rhpt)
|
||||
{
|
||||
unsigned long flags = rhpt->flags;
|
||||
unsigned long shift = rhpt->shift;
|
||||
@ -1548,13 +1548,13 @@ static void resize_hpt_boot_vcpu(void *opaque)
|
||||
/* Nothing to do, just force a KVM exit */
|
||||
}
|
||||
|
||||
long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
|
||||
struct kvm_ppc_resize_hpt *rhpt)
|
||||
int kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
|
||||
struct kvm_ppc_resize_hpt *rhpt)
|
||||
{
|
||||
unsigned long flags = rhpt->flags;
|
||||
unsigned long shift = rhpt->shift;
|
||||
struct kvm_resize_hpt *resize;
|
||||
long ret;
|
||||
int ret;
|
||||
|
||||
if (flags != 0 || kvm_is_radix(kvm))
|
||||
return -EINVAL;
|
||||
|
@ -288,8 +288,8 @@ static const struct file_operations kvm_spapr_tce_fops = {
|
||||
.release = kvm_spapr_tce_release,
|
||||
};
|
||||
|
||||
long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
|
||||
struct kvm_create_spapr_tce_64 *args)
|
||||
int kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
|
||||
struct kvm_create_spapr_tce_64 *args)
|
||||
{
|
||||
struct kvmppc_spapr_tce_table *stt = NULL;
|
||||
struct kvmppc_spapr_tce_table *siter;
|
||||
|
@ -5799,12 +5799,12 @@ static void kvmppc_irq_bypass_del_producer_hv(struct irq_bypass_consumer *cons,
|
||||
}
|
||||
#endif
|
||||
|
||||
static long kvm_arch_vm_ioctl_hv(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
static int kvm_arch_vm_ioctl_hv(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
{
|
||||
struct kvm *kvm __maybe_unused = filp->private_data;
|
||||
void __user *argp = (void __user *)arg;
|
||||
long r;
|
||||
int r;
|
||||
|
||||
switch (ioctl) {
|
||||
|
||||
|
@ -2044,8 +2044,8 @@ static int kvmppc_core_check_processor_compat_pr(void)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static long kvm_arch_vm_ioctl_pr(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
static int kvm_arch_vm_ioctl_pr(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
{
|
||||
return -ENOTTY;
|
||||
}
|
||||
|
@ -2379,12 +2379,11 @@ static int kvmppc_get_cpu_char(struct kvm_ppc_cpu_char *cp)
|
||||
}
|
||||
#endif
|
||||
|
||||
long kvm_arch_vm_ioctl(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
|
||||
{
|
||||
struct kvm *kvm __maybe_unused = filp->private_data;
|
||||
void __user *argp = (void __user *)arg;
|
||||
long r;
|
||||
int r;
|
||||
|
||||
switch (ioctl) {
|
||||
case KVM_PPC_GET_PVINFO: {
|
||||
|
@ -87,8 +87,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
return r;
|
||||
}
|
||||
|
||||
long kvm_arch_vm_ioctl(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
|
@ -305,7 +305,7 @@ static inline u8 gisa_get_ipm_or_restore_iam(struct kvm_s390_gisa_interrupt *gi)
|
||||
|
||||
static inline int gisa_in_alert_list(struct kvm_s390_gisa *gisa)
|
||||
{
|
||||
return READ_ONCE(gisa->next_alert) != (u32)(u64)gisa;
|
||||
return READ_ONCE(gisa->next_alert) != (u32)virt_to_phys(gisa);
|
||||
}
|
||||
|
||||
static inline void gisa_set_ipm_gisc(struct kvm_s390_gisa *gisa, u32 gisc)
|
||||
@ -3168,7 +3168,7 @@ void kvm_s390_gisa_init(struct kvm *kvm)
|
||||
hrtimer_init(&gi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
|
||||
gi->timer.function = gisa_vcpu_kicker;
|
||||
memset(gi->origin, 0, sizeof(struct kvm_s390_gisa));
|
||||
gi->origin->next_alert = (u32)(u64)gi->origin;
|
||||
gi->origin->next_alert = (u32)virt_to_phys(gi->origin);
|
||||
VM_EVENT(kvm, 3, "gisa 0x%pK initialized", gi->origin);
|
||||
}
|
||||
|
||||
|
@ -1990,7 +1990,7 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
|
||||
return ret;
|
||||
}
|
||||
|
||||
static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
|
||||
static int kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
|
||||
{
|
||||
uint8_t *keys;
|
||||
uint64_t hva;
|
||||
@ -2038,7 +2038,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
|
||||
return r;
|
||||
}
|
||||
|
||||
static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
|
||||
static int kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
|
||||
{
|
||||
uint8_t *keys;
|
||||
uint64_t hva;
|
||||
@ -2899,8 +2899,7 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
|
||||
}
|
||||
}
|
||||
|
||||
long kvm_arch_vm_ioctl(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
|
||||
{
|
||||
struct kvm *kvm = filp->private_data;
|
||||
void __user *argp = (void __user *)arg;
|
||||
|
@ -112,7 +112,7 @@ static int zpci_reset_aipb(u8 nisc)
|
||||
return -EINVAL;
|
||||
|
||||
aift->sbv = zpci_aif_sbv;
|
||||
aift->gait = (struct zpci_gaite *)zpci_aipb->aipb.gait;
|
||||
aift->gait = phys_to_virt(zpci_aipb->aipb.gait);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -138,11 +138,15 @@ static int prepare_cpuflags(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
|
||||
}
|
||||
/* Copy to APCB FORMAT1 from APCB FORMAT0 */
|
||||
static int setup_apcb10(struct kvm_vcpu *vcpu, struct kvm_s390_apcb1 *apcb_s,
|
||||
unsigned long apcb_o, struct kvm_s390_apcb1 *apcb_h)
|
||||
unsigned long crycb_gpa, struct kvm_s390_apcb1 *apcb_h)
|
||||
{
|
||||
struct kvm_s390_apcb0 tmp;
|
||||
unsigned long apcb_gpa;
|
||||
|
||||
if (read_guest_real(vcpu, apcb_o, &tmp, sizeof(struct kvm_s390_apcb0)))
|
||||
apcb_gpa = crycb_gpa + offsetof(struct kvm_s390_crypto_cb, apcb0);
|
||||
|
||||
if (read_guest_real(vcpu, apcb_gpa, &tmp,
|
||||
sizeof(struct kvm_s390_apcb0)))
|
||||
return -EFAULT;
|
||||
|
||||
apcb_s->apm[0] = apcb_h->apm[0] & tmp.apm[0];
|
||||
@ -157,15 +161,19 @@ static int setup_apcb10(struct kvm_vcpu *vcpu, struct kvm_s390_apcb1 *apcb_s,
|
||||
* setup_apcb00 - Copy to APCB FORMAT0 from APCB FORMAT0
|
||||
* @vcpu: pointer to the virtual CPU
|
||||
* @apcb_s: pointer to start of apcb in the shadow crycb
|
||||
* @apcb_o: pointer to start of original apcb in the guest2
|
||||
* @crycb_gpa: guest physical address to start of original guest crycb
|
||||
* @apcb_h: pointer to start of apcb in the guest1
|
||||
*
|
||||
* Returns 0 and -EFAULT on error reading guest apcb
|
||||
*/
|
||||
static int setup_apcb00(struct kvm_vcpu *vcpu, unsigned long *apcb_s,
|
||||
unsigned long apcb_o, unsigned long *apcb_h)
|
||||
unsigned long crycb_gpa, unsigned long *apcb_h)
|
||||
{
|
||||
if (read_guest_real(vcpu, apcb_o, apcb_s,
|
||||
unsigned long apcb_gpa;
|
||||
|
||||
apcb_gpa = crycb_gpa + offsetof(struct kvm_s390_crypto_cb, apcb0);
|
||||
|
||||
if (read_guest_real(vcpu, apcb_gpa, apcb_s,
|
||||
sizeof(struct kvm_s390_apcb0)))
|
||||
return -EFAULT;
|
||||
|
||||
@ -178,16 +186,20 @@ static int setup_apcb00(struct kvm_vcpu *vcpu, unsigned long *apcb_s,
|
||||
* setup_apcb11 - Copy the FORMAT1 APCB from the guest to the shadow CRYCB
|
||||
* @vcpu: pointer to the virtual CPU
|
||||
* @apcb_s: pointer to start of apcb in the shadow crycb
|
||||
* @apcb_o: pointer to start of original guest apcb
|
||||
* @crycb_gpa: guest physical address to start of original guest crycb
|
||||
* @apcb_h: pointer to start of apcb in the host
|
||||
*
|
||||
* Returns 0 and -EFAULT on error reading guest apcb
|
||||
*/
|
||||
static int setup_apcb11(struct kvm_vcpu *vcpu, unsigned long *apcb_s,
|
||||
unsigned long apcb_o,
|
||||
unsigned long crycb_gpa,
|
||||
unsigned long *apcb_h)
|
||||
{
|
||||
if (read_guest_real(vcpu, apcb_o, apcb_s,
|
||||
unsigned long apcb_gpa;
|
||||
|
||||
apcb_gpa = crycb_gpa + offsetof(struct kvm_s390_crypto_cb, apcb1);
|
||||
|
||||
if (read_guest_real(vcpu, apcb_gpa, apcb_s,
|
||||
sizeof(struct kvm_s390_apcb1)))
|
||||
return -EFAULT;
|
||||
|
||||
@ -200,7 +212,7 @@ static int setup_apcb11(struct kvm_vcpu *vcpu, unsigned long *apcb_s,
|
||||
* setup_apcb - Create a shadow copy of the apcb.
|
||||
* @vcpu: pointer to the virtual CPU
|
||||
* @crycb_s: pointer to shadow crycb
|
||||
* @crycb_o: pointer to original guest crycb
|
||||
* @crycb_gpa: guest physical address of original guest crycb
|
||||
* @crycb_h: pointer to the host crycb
|
||||
* @fmt_o: format of the original guest crycb.
|
||||
* @fmt_h: format of the host crycb.
|
||||
@ -211,50 +223,46 @@ static int setup_apcb11(struct kvm_vcpu *vcpu, unsigned long *apcb_s,
|
||||
* Return 0 or an error number if the guest and host crycb are incompatible.
|
||||
*/
|
||||
static int setup_apcb(struct kvm_vcpu *vcpu, struct kvm_s390_crypto_cb *crycb_s,
|
||||
const u32 crycb_o,
|
||||
const u32 crycb_gpa,
|
||||
struct kvm_s390_crypto_cb *crycb_h,
|
||||
int fmt_o, int fmt_h)
|
||||
{
|
||||
struct kvm_s390_crypto_cb *crycb;
|
||||
|
||||
crycb = (struct kvm_s390_crypto_cb *) (unsigned long)crycb_o;
|
||||
|
||||
switch (fmt_o) {
|
||||
case CRYCB_FORMAT2:
|
||||
if ((crycb_o & PAGE_MASK) != ((crycb_o + 256) & PAGE_MASK))
|
||||
if ((crycb_gpa & PAGE_MASK) != ((crycb_gpa + 256) & PAGE_MASK))
|
||||
return -EACCES;
|
||||
if (fmt_h != CRYCB_FORMAT2)
|
||||
return -EINVAL;
|
||||
return setup_apcb11(vcpu, (unsigned long *)&crycb_s->apcb1,
|
||||
(unsigned long) &crycb->apcb1,
|
||||
crycb_gpa,
|
||||
(unsigned long *)&crycb_h->apcb1);
|
||||
case CRYCB_FORMAT1:
|
||||
switch (fmt_h) {
|
||||
case CRYCB_FORMAT2:
|
||||
return setup_apcb10(vcpu, &crycb_s->apcb1,
|
||||
(unsigned long) &crycb->apcb0,
|
||||
crycb_gpa,
|
||||
&crycb_h->apcb1);
|
||||
case CRYCB_FORMAT1:
|
||||
return setup_apcb00(vcpu,
|
||||
(unsigned long *) &crycb_s->apcb0,
|
||||
(unsigned long) &crycb->apcb0,
|
||||
crycb_gpa,
|
||||
(unsigned long *) &crycb_h->apcb0);
|
||||
}
|
||||
break;
|
||||
case CRYCB_FORMAT0:
|
||||
if ((crycb_o & PAGE_MASK) != ((crycb_o + 32) & PAGE_MASK))
|
||||
if ((crycb_gpa & PAGE_MASK) != ((crycb_gpa + 32) & PAGE_MASK))
|
||||
return -EACCES;
|
||||
|
||||
switch (fmt_h) {
|
||||
case CRYCB_FORMAT2:
|
||||
return setup_apcb10(vcpu, &crycb_s->apcb1,
|
||||
(unsigned long) &crycb->apcb0,
|
||||
crycb_gpa,
|
||||
&crycb_h->apcb1);
|
||||
case CRYCB_FORMAT1:
|
||||
case CRYCB_FORMAT0:
|
||||
return setup_apcb00(vcpu,
|
||||
(unsigned long *) &crycb_s->apcb0,
|
||||
(unsigned long) &crycb->apcb0,
|
||||
crycb_gpa,
|
||||
(unsigned long *) &crycb_h->apcb0);
|
||||
}
|
||||
}
|
||||
|
@ -226,10 +226,9 @@
|
||||
|
||||
/* Virtualization flags: Linux defined, word 8 */
|
||||
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
|
||||
#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
|
||||
#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */
|
||||
#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */
|
||||
#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */
|
||||
#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 1) /* Intel FlexPriority */
|
||||
#define X86_FEATURE_EPT ( 8*32+ 2) /* Intel Extended Page Table */
|
||||
#define X86_FEATURE_VPID ( 8*32+ 3) /* Intel Virtual Processor ID */
|
||||
|
||||
#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */
|
||||
#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */
|
||||
@ -338,6 +337,7 @@
|
||||
#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
|
||||
#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
|
||||
#define X86_FEATURE_CPPC (13*32+27) /* Collaborative Processor Performance Control */
|
||||
#define X86_FEATURE_AMD_PSFD (13*32+28) /* "" Predictive Store Forwarding Disable */
|
||||
#define X86_FEATURE_BTC_NO (13*32+29) /* "" Not vulnerable to Branch Type Confusion */
|
||||
#define X86_FEATURE_BRS (13*32+31) /* Branch Sampling available */
|
||||
|
||||
@ -370,6 +370,7 @@
|
||||
#define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */
|
||||
#define X86_FEATURE_X2AVIC (15*32+18) /* Virtual x2apic */
|
||||
#define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* Virtual SPEC_CTRL */
|
||||
#define X86_FEATURE_VNMI (15*32+25) /* Virtual NMI */
|
||||
#define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* "" SVME addr check */
|
||||
|
||||
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */
|
||||
|
@ -54,8 +54,8 @@ KVM_X86_OP(set_rflags)
|
||||
KVM_X86_OP(get_if_flag)
|
||||
KVM_X86_OP(flush_tlb_all)
|
||||
KVM_X86_OP(flush_tlb_current)
|
||||
KVM_X86_OP_OPTIONAL(tlb_remote_flush)
|
||||
KVM_X86_OP_OPTIONAL(tlb_remote_flush_with_range)
|
||||
KVM_X86_OP_OPTIONAL(flush_remote_tlbs)
|
||||
KVM_X86_OP_OPTIONAL(flush_remote_tlbs_range)
|
||||
KVM_X86_OP(flush_tlb_gva)
|
||||
KVM_X86_OP(flush_tlb_guest)
|
||||
KVM_X86_OP(vcpu_pre_run)
|
||||
@ -68,6 +68,8 @@ KVM_X86_OP(get_interrupt_shadow)
|
||||
KVM_X86_OP(patch_hypercall)
|
||||
KVM_X86_OP(inject_irq)
|
||||
KVM_X86_OP(inject_nmi)
|
||||
KVM_X86_OP_OPTIONAL_RET0(is_vnmi_pending)
|
||||
KVM_X86_OP_OPTIONAL_RET0(set_vnmi_pending)
|
||||
KVM_X86_OP(inject_exception)
|
||||
KVM_X86_OP(cancel_injection)
|
||||
KVM_X86_OP(interrupt_allowed)
|
||||
|
@ -420,6 +420,10 @@ struct kvm_mmu_root_info {
|
||||
|
||||
#define KVM_MMU_NUM_PREV_ROOTS 3
|
||||
|
||||
#define KVM_MMU_ROOT_CURRENT BIT(0)
|
||||
#define KVM_MMU_ROOT_PREVIOUS(i) BIT(1+i)
|
||||
#define KVM_MMU_ROOTS_ALL (BIT(1 + KVM_MMU_NUM_PREV_ROOTS) - 1)
|
||||
|
||||
#define KVM_HAVE_MMU_RWLOCK
|
||||
|
||||
struct kvm_mmu_page;
|
||||
@ -439,9 +443,8 @@ struct kvm_mmu {
|
||||
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
gpa_t gva_or_gpa, u64 access,
|
||||
struct x86_exception *exception);
|
||||
int (*sync_page)(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu_page *sp);
|
||||
void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
|
||||
int (*sync_spte)(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu_page *sp, int i);
|
||||
struct kvm_mmu_root_info root;
|
||||
union kvm_cpu_role cpu_role;
|
||||
union kvm_mmu_page_role root_role;
|
||||
@ -479,11 +482,6 @@ struct kvm_mmu {
|
||||
u64 pdptrs[4]; /* pae */
|
||||
};
|
||||
|
||||
struct kvm_tlb_range {
|
||||
u64 start_gfn;
|
||||
u64 pages;
|
||||
};
|
||||
|
||||
enum pmc_type {
|
||||
KVM_PMC_GP = 0,
|
||||
KVM_PMC_FIXED,
|
||||
@ -515,6 +513,7 @@ struct kvm_pmc {
|
||||
#define MSR_ARCH_PERFMON_FIXED_CTR_MAX (MSR_ARCH_PERFMON_FIXED_CTR0 + KVM_PMC_MAX_FIXED - 1)
|
||||
#define KVM_AMD_PMC_MAX_GENERIC 6
|
||||
struct kvm_pmu {
|
||||
u8 version;
|
||||
unsigned nr_arch_gp_counters;
|
||||
unsigned nr_arch_fixed_counters;
|
||||
unsigned available_event_types;
|
||||
@ -527,7 +526,6 @@ struct kvm_pmu {
|
||||
u64 global_ovf_ctrl_mask;
|
||||
u64 reserved_bits;
|
||||
u64 raw_event_mask;
|
||||
u8 version;
|
||||
struct kvm_pmc gp_counters[KVM_INTEL_PMC_MAX_GENERIC];
|
||||
struct kvm_pmc fixed_counters[KVM_PMC_MAX_FIXED];
|
||||
struct irq_work irq_work;
|
||||
@ -876,7 +874,8 @@ struct kvm_vcpu_arch {
|
||||
u64 tsc_scaling_ratio; /* current scaling ratio */
|
||||
|
||||
atomic_t nmi_queued; /* unprocessed asynchronous NMIs */
|
||||
unsigned nmi_pending; /* NMI queued after currently running handler */
|
||||
/* Number of NMIs pending injection, not including hardware vNMIs. */
|
||||
unsigned int nmi_pending;
|
||||
bool nmi_injected; /* Trying to inject an NMI this entry */
|
||||
bool smi_pending; /* SMI queued after currently running handler */
|
||||
u8 handling_intr_from_guest;
|
||||
@ -947,23 +946,6 @@ struct kvm_vcpu_arch {
|
||||
|
||||
u64 msr_kvm_poll_control;
|
||||
|
||||
/*
|
||||
* Indicates the guest is trying to write a gfn that contains one or
|
||||
* more of the PTEs used to translate the write itself, i.e. the access
|
||||
* is changing its own translation in the guest page tables. KVM exits
|
||||
* to userspace if emulation of the faulting instruction fails and this
|
||||
* flag is set, as KVM cannot make forward progress.
|
||||
*
|
||||
* If emulation fails for a write to guest page tables, KVM unprotects
|
||||
* (zaps) the shadow page for the target gfn and resumes the guest to
|
||||
* retry the non-emulatable instruction (on hardware). Unprotecting the
|
||||
* gfn doesn't allow forward progress for a self-changing access because
|
||||
* doing so also zaps the translation for the gfn, i.e. retrying the
|
||||
* instruction will hit a !PRESENT fault, which results in a new shadow
|
||||
* page and sends KVM back to square one.
|
||||
*/
|
||||
bool write_fault_to_shadow_pgtable;
|
||||
|
||||
/* set at EPT violation at this point */
|
||||
unsigned long exit_qualification;
|
||||
|
||||
@ -1602,9 +1584,9 @@ struct kvm_x86_ops {
|
||||
|
||||
void (*flush_tlb_all)(struct kvm_vcpu *vcpu);
|
||||
void (*flush_tlb_current)(struct kvm_vcpu *vcpu);
|
||||
int (*tlb_remote_flush)(struct kvm *kvm);
|
||||
int (*tlb_remote_flush_with_range)(struct kvm *kvm,
|
||||
struct kvm_tlb_range *range);
|
||||
int (*flush_remote_tlbs)(struct kvm *kvm);
|
||||
int (*flush_remote_tlbs_range)(struct kvm *kvm, gfn_t gfn,
|
||||
gfn_t nr_pages);
|
||||
|
||||
/*
|
||||
* Flush any TLB entries associated with the given GVA.
|
||||
@ -1638,6 +1620,13 @@ struct kvm_x86_ops {
|
||||
int (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
|
||||
bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
|
||||
void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked);
|
||||
/* Whether or not a virtual NMI is pending in hardware. */
|
||||
bool (*is_vnmi_pending)(struct kvm_vcpu *vcpu);
|
||||
/*
|
||||
* Attempt to pend a virtual NMI in harware. Returns %true on success
|
||||
* to allow using static_call_ret0 as the fallback.
|
||||
*/
|
||||
bool (*set_vnmi_pending)(struct kvm_vcpu *vcpu);
|
||||
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
|
||||
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
|
||||
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
|
||||
@ -1808,8 +1797,8 @@ void kvm_arch_free_vm(struct kvm *kvm);
|
||||
#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB
|
||||
static inline int kvm_arch_flush_remote_tlb(struct kvm *kvm)
|
||||
{
|
||||
if (kvm_x86_ops.tlb_remote_flush &&
|
||||
!static_call(kvm_x86_tlb_remote_flush)(kvm))
|
||||
if (kvm_x86_ops.flush_remote_tlbs &&
|
||||
!static_call(kvm_x86_flush_remote_tlbs)(kvm))
|
||||
return 0;
|
||||
else
|
||||
return -ENOTSUPP;
|
||||
@ -1907,6 +1896,25 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
|
||||
* EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
|
||||
* state and inject single-step #DBs after skipping
|
||||
* an instruction (after completing userspace I/O).
|
||||
*
|
||||
* EMULTYPE_WRITE_PF_TO_SP - Set when emulating an intercepted page fault that
|
||||
* is attempting to write a gfn that contains one or
|
||||
* more of the PTEs used to translate the write itself,
|
||||
* and the owning page table is being shadowed by KVM.
|
||||
* If emulation of the faulting instruction fails and
|
||||
* this flag is set, KVM will exit to userspace instead
|
||||
* of retrying emulation as KVM cannot make forward
|
||||
* progress.
|
||||
*
|
||||
* If emulation fails for a write to guest page tables,
|
||||
* KVM unprotects (zaps) the shadow page for the target
|
||||
* gfn and resumes the guest to retry the non-emulatable
|
||||
* instruction (on hardware). Unprotecting the gfn
|
||||
* doesn't allow forward progress for a self-changing
|
||||
* access because doing so also zaps the translation for
|
||||
* the gfn, i.e. retrying the instruction will hit a
|
||||
* !PRESENT fault, which results in a new shadow page
|
||||
* and sends KVM back to square one.
|
||||
*/
|
||||
#define EMULTYPE_NO_DECODE (1 << 0)
|
||||
#define EMULTYPE_TRAP_UD (1 << 1)
|
||||
@ -1916,6 +1924,7 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
|
||||
#define EMULTYPE_VMWARE_GP (1 << 5)
|
||||
#define EMULTYPE_PF (1 << 6)
|
||||
#define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
|
||||
#define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
|
||||
|
||||
int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
|
||||
int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
|
||||
@ -1994,14 +2003,11 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
|
||||
return !!(*irq_state);
|
||||
}
|
||||
|
||||
#define KVM_MMU_ROOT_CURRENT BIT(0)
|
||||
#define KVM_MMU_ROOT_PREVIOUS(i) BIT(1+i)
|
||||
#define KVM_MMU_ROOTS_ALL (~0UL)
|
||||
|
||||
int kvm_pic_set_irq(struct kvm_pic *pic, int irq, int irq_source_id, int level);
|
||||
void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
|
||||
|
||||
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
|
||||
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_update_dr7(struct kvm_vcpu *vcpu);
|
||||
|
||||
@ -2041,8 +2047,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
|
||||
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
|
||||
void *insn, int insn_len);
|
||||
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
|
||||
void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
gva_t gva, hpa_t root_hpa);
|
||||
void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
u64 addr, unsigned long roots);
|
||||
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
|
||||
void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
|
||||
|
||||
@ -2204,4 +2210,11 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
|
||||
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
|
||||
KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS)
|
||||
|
||||
/*
|
||||
* KVM previously used a u32 field in kvm_run to indicate the hypercall was
|
||||
* initiated from long mode. KVM now sets bit 0 to indicate long mode, but the
|
||||
* remaining 31 lower bits must be 0 to preserve ABI.
|
||||
*/
|
||||
#define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1)
|
||||
|
||||
#endif /* _ASM_X86_KVM_HOST_H */
|
||||
|
@ -183,6 +183,12 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
|
||||
#define V_GIF_SHIFT 9
|
||||
#define V_GIF_MASK (1 << V_GIF_SHIFT)
|
||||
|
||||
#define V_NMI_PENDING_SHIFT 11
|
||||
#define V_NMI_PENDING_MASK (1 << V_NMI_PENDING_SHIFT)
|
||||
|
||||
#define V_NMI_BLOCKING_SHIFT 12
|
||||
#define V_NMI_BLOCKING_MASK (1 << V_NMI_BLOCKING_SHIFT)
|
||||
|
||||
#define V_INTR_PRIO_SHIFT 16
|
||||
#define V_INTR_PRIO_MASK (0x0f << V_INTR_PRIO_SHIFT)
|
||||
|
||||
@ -197,6 +203,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
|
||||
#define V_GIF_ENABLE_SHIFT 25
|
||||
#define V_GIF_ENABLE_MASK (1 << V_GIF_ENABLE_SHIFT)
|
||||
|
||||
#define V_NMI_ENABLE_SHIFT 26
|
||||
#define V_NMI_ENABLE_MASK (1 << V_NMI_ENABLE_SHIFT)
|
||||
|
||||
#define AVIC_ENABLE_SHIFT 31
|
||||
#define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
|
||||
|
||||
@ -278,7 +287,6 @@ static_assert((AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == AVIC_MAX_
|
||||
static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_MAX_PHYSICAL_ID);
|
||||
|
||||
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
|
||||
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL
|
||||
|
||||
|
||||
struct vmcb_seg {
|
||||
|
@ -559,4 +559,7 @@ struct kvm_pmu_event_filter {
|
||||
#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
|
||||
#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
|
||||
|
||||
/* x86-specific KVM_EXIT_HYPERCALL flags. */
|
||||
#define KVM_EXIT_HYPERCALL_LONG_MODE BIT(0)
|
||||
|
||||
#endif /* _ASM_X86_KVM_H */
|
||||
|
@ -60,12 +60,6 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* This one is tied to SSB in the user API, and not
|
||||
* visible in /proc/cpuinfo.
|
||||
*/
|
||||
#define KVM_X86_FEATURE_AMD_PSFD (13*32+28) /* Predictive Store Forwarding Disable */
|
||||
|
||||
#define F feature_bit
|
||||
|
||||
/* Scattered Flag - For features that are scattered by cpufeatures.h. */
|
||||
@ -266,7 +260,7 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
|
||||
/* Update OSXSAVE bit */
|
||||
if (boot_cpu_has(X86_FEATURE_XSAVE))
|
||||
cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
|
||||
kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE));
|
||||
kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
|
||||
|
||||
cpuid_entry_change(best, X86_FEATURE_APIC,
|
||||
vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
|
||||
@ -275,7 +269,7 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
|
||||
best = cpuid_entry2_find(entries, nent, 7, 0);
|
||||
if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
|
||||
cpuid_entry_change(best, X86_FEATURE_OSPKE,
|
||||
kvm_read_cr4_bits(vcpu, X86_CR4_PKE));
|
||||
kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
|
||||
|
||||
best = cpuid_entry2_find(entries, nent, 0xD, 0);
|
||||
if (best)
|
||||
@ -420,7 +414,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
|
||||
* KVM_SET_CPUID{,2} again. To support this legacy behavior, check
|
||||
* whether the supplied CPUID data is equal to what's already set.
|
||||
*/
|
||||
if (vcpu->arch.last_vmentry_cpu != -1) {
|
||||
if (kvm_vcpu_has_run(vcpu)) {
|
||||
r = kvm_cpuid_check_equal(vcpu, e2, nent);
|
||||
if (r)
|
||||
return r;
|
||||
@ -653,7 +647,7 @@ void kvm_set_cpu_caps(void)
|
||||
F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
|
||||
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
|
||||
F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
|
||||
F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16)
|
||||
F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
|
||||
);
|
||||
|
||||
/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
|
||||
@ -715,7 +709,7 @@ void kvm_set_cpu_caps(void)
|
||||
F(CLZERO) | F(XSAVEERPTR) |
|
||||
F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) |
|
||||
F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) |
|
||||
__feature_bit(KVM_X86_FEATURE_AMD_PSFD)
|
||||
F(AMD_PSFD)
|
||||
);
|
||||
|
||||
/*
|
||||
@ -1002,7 +996,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
|
||||
entry->eax = entry->ebx = entry->ecx = 0;
|
||||
break;
|
||||
case 0xd: {
|
||||
u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm();
|
||||
u64 permitted_xcr0 = kvm_get_filtered_xcr0();
|
||||
u64 permitted_xss = kvm_caps.supported_xss;
|
||||
|
||||
entry->eax &= permitted_xcr0;
|
||||
|
@ -1640,6 +1640,14 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
||||
goto exception;
|
||||
break;
|
||||
case VCPU_SREG_CS:
|
||||
/*
|
||||
* KVM uses "none" when loading CS as part of emulating Real
|
||||
* Mode exceptions and IRET (handled above). In all other
|
||||
* cases, loading CS without a control transfer is a KVM bug.
|
||||
*/
|
||||
if (WARN_ON_ONCE(transfer == X86_TRANSFER_NONE))
|
||||
goto exception;
|
||||
|
||||
if (!(seg_desc.type & 8))
|
||||
goto exception;
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
#include <linux/kvm_host.h>
|
||||
|
||||
#define KVM_POSSIBLE_CR0_GUEST_BITS X86_CR0_TS
|
||||
#define KVM_POSSIBLE_CR0_GUEST_BITS (X86_CR0_TS | X86_CR0_WP)
|
||||
#define KVM_POSSIBLE_CR4_GUEST_BITS \
|
||||
(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \
|
||||
| X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
|
||||
@ -157,6 +157,14 @@ static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
|
||||
return vcpu->arch.cr0 & mask;
|
||||
}
|
||||
|
||||
static __always_inline bool kvm_is_cr0_bit_set(struct kvm_vcpu *vcpu,
|
||||
unsigned long cr0_bit)
|
||||
{
|
||||
BUILD_BUG_ON(!is_power_of_2(cr0_bit));
|
||||
|
||||
return !!kvm_read_cr0_bits(vcpu, cr0_bit);
|
||||
}
|
||||
|
||||
static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_read_cr0_bits(vcpu, ~0UL);
|
||||
@ -171,6 +179,14 @@ static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
|
||||
return vcpu->arch.cr4 & mask;
|
||||
}
|
||||
|
||||
static __always_inline bool kvm_is_cr4_bit_set(struct kvm_vcpu *vcpu,
|
||||
unsigned long cr4_bit)
|
||||
{
|
||||
BUILD_BUG_ON(!is_power_of_2(cr4_bit));
|
||||
|
||||
return !!kvm_read_cr4_bits(vcpu, cr4_bit);
|
||||
}
|
||||
|
||||
static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3))
|
||||
|
@ -10,17 +10,22 @@
|
||||
#include "hyperv.h"
|
||||
#include "kvm_onhyperv.h"
|
||||
|
||||
struct kvm_hv_tlb_range {
|
||||
u64 start_gfn;
|
||||
u64 pages;
|
||||
};
|
||||
|
||||
static int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush,
|
||||
void *data)
|
||||
{
|
||||
struct kvm_tlb_range *range = data;
|
||||
struct kvm_hv_tlb_range *range = data;
|
||||
|
||||
return hyperv_fill_flush_guest_mapping_list(flush, range->start_gfn,
|
||||
range->pages);
|
||||
}
|
||||
|
||||
static inline int hv_remote_flush_root_tdp(hpa_t root_tdp,
|
||||
struct kvm_tlb_range *range)
|
||||
struct kvm_hv_tlb_range *range)
|
||||
{
|
||||
if (range)
|
||||
return hyperv_flush_guest_mapping_range(root_tdp,
|
||||
@ -29,8 +34,8 @@ static inline int hv_remote_flush_root_tdp(hpa_t root_tdp,
|
||||
return hyperv_flush_guest_mapping(root_tdp);
|
||||
}
|
||||
|
||||
int hv_remote_flush_tlb_with_range(struct kvm *kvm,
|
||||
struct kvm_tlb_range *range)
|
||||
static int __hv_flush_remote_tlbs_range(struct kvm *kvm,
|
||||
struct kvm_hv_tlb_range *range)
|
||||
{
|
||||
struct kvm_arch *kvm_arch = &kvm->arch;
|
||||
struct kvm_vcpu *vcpu;
|
||||
@ -86,19 +91,29 @@ int hv_remote_flush_tlb_with_range(struct kvm *kvm,
|
||||
spin_unlock(&kvm_arch->hv_root_tdp_lock);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(hv_remote_flush_tlb_with_range);
|
||||
|
||||
int hv_remote_flush_tlb(struct kvm *kvm)
|
||||
int hv_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, gfn_t nr_pages)
|
||||
{
|
||||
return hv_remote_flush_tlb_with_range(kvm, NULL);
|
||||
struct kvm_hv_tlb_range range = {
|
||||
.start_gfn = start_gfn,
|
||||
.pages = nr_pages,
|
||||
};
|
||||
|
||||
return __hv_flush_remote_tlbs_range(kvm, &range);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(hv_remote_flush_tlb);
|
||||
EXPORT_SYMBOL_GPL(hv_flush_remote_tlbs_range);
|
||||
|
||||
int hv_flush_remote_tlbs(struct kvm *kvm)
|
||||
{
|
||||
return __hv_flush_remote_tlbs_range(kvm, NULL);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(hv_flush_remote_tlbs);
|
||||
|
||||
void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
|
||||
{
|
||||
struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
|
||||
|
||||
if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
|
||||
if (kvm_x86_ops.flush_remote_tlbs == hv_flush_remote_tlbs) {
|
||||
spin_lock(&kvm_arch->hv_root_tdp_lock);
|
||||
vcpu->arch.hv_root_tdp = root_tdp;
|
||||
if (root_tdp != kvm_arch->hv_root_tdp)
|
||||
|
@ -7,12 +7,11 @@
|
||||
#define __ARCH_X86_KVM_KVM_ONHYPERV_H__
|
||||
|
||||
#if IS_ENABLED(CONFIG_HYPERV)
|
||||
int hv_remote_flush_tlb_with_range(struct kvm *kvm,
|
||||
struct kvm_tlb_range *range);
|
||||
int hv_remote_flush_tlb(struct kvm *kvm);
|
||||
int hv_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, gfn_t nr_pages);
|
||||
int hv_flush_remote_tlbs(struct kvm *kvm);
|
||||
void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp);
|
||||
#else /* !CONFIG_HYPERV */
|
||||
static inline int hv_remote_flush_tlb(struct kvm *kvm)
|
||||
static inline int hv_flush_remote_tlbs(struct kvm *kvm)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
@ -113,6 +113,8 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
|
||||
bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
|
||||
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
|
||||
u64 fault_address, char *insn, int insn_len);
|
||||
void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu *mmu);
|
||||
|
||||
int kvm_mmu_load(struct kvm_vcpu *vcpu);
|
||||
void kvm_mmu_unload(struct kvm_vcpu *vcpu);
|
||||
@ -132,7 +134,7 @@ static inline unsigned long kvm_get_pcid(struct kvm_vcpu *vcpu, gpa_t cr3)
|
||||
{
|
||||
BUILD_BUG_ON((X86_CR3_PCID_MASK & PAGE_MASK) != 0);
|
||||
|
||||
return kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)
|
||||
return kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)
|
||||
? cr3 & X86_CR3_PCID_MASK
|
||||
: 0;
|
||||
}
|
||||
@ -153,6 +155,24 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
|
||||
vcpu->arch.mmu->root_role.level);
|
||||
}
|
||||
|
||||
static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu *mmu)
|
||||
{
|
||||
/*
|
||||
* When EPT is enabled, KVM may passthrough CR0.WP to the guest, i.e.
|
||||
* @mmu's snapshot of CR0.WP and thus all related paging metadata may
|
||||
* be stale. Refresh CR0.WP and the metadata on-demand when checking
|
||||
* for permission faults. Exempt nested MMUs, i.e. MMUs for shadowing
|
||||
* nEPT and nNPT, as CR0.WP is ignored in both cases. Note, KVM does
|
||||
* need to refresh nested_mmu, a.k.a. the walker used to translate L2
|
||||
* GVAs to GPAs, as that "MMU" needs to honor L2's CR0.WP.
|
||||
*/
|
||||
if (!tdp_enabled || mmu == &vcpu->arch.guest_mmu)
|
||||
return;
|
||||
|
||||
__kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
|
||||
}
|
||||
|
||||
/*
|
||||
* Check if a given access (described through the I/D, W/R and U/S bits of a
|
||||
* page fault error code pfec) causes a permission fault with the given PTE
|
||||
@ -184,8 +204,12 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
|
||||
bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
|
||||
int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
|
||||
bool fault = (mmu->permissions[index] >> pte_access) & 1;
|
||||
u32 errcode = PFERR_PRESENT_MASK;
|
||||
bool fault;
|
||||
|
||||
kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
|
||||
|
||||
fault = (mmu->permissions[index] >> pte_access) & 1;
|
||||
|
||||
WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
|
||||
if (unlikely(mmu->pkru_mask)) {
|
||||
|
@ -125,17 +125,31 @@ module_param(dbg, bool, 0644);
|
||||
#define PTE_LIST_EXT 14
|
||||
|
||||
/*
|
||||
* Slight optimization of cacheline layout, by putting `more' and `spte_count'
|
||||
* at the start; then accessing it will only use one single cacheline for
|
||||
* either full (entries==PTE_LIST_EXT) case or entries<=6.
|
||||
* struct pte_list_desc is the core data structure used to implement a custom
|
||||
* list for tracking a set of related SPTEs, e.g. all the SPTEs that map a
|
||||
* given GFN when used in the context of rmaps. Using a custom list allows KVM
|
||||
* to optimize for the common case where many GFNs will have at most a handful
|
||||
* of SPTEs pointing at them, i.e. allows packing multiple SPTEs into a small
|
||||
* memory footprint, which in turn improves runtime performance by exploiting
|
||||
* cache locality.
|
||||
*
|
||||
* A list is comprised of one or more pte_list_desc objects (descriptors).
|
||||
* Each individual descriptor stores up to PTE_LIST_EXT SPTEs. If a descriptor
|
||||
* is full and a new SPTEs needs to be added, a new descriptor is allocated and
|
||||
* becomes the head of the list. This means that by definitions, all tail
|
||||
* descriptors are full.
|
||||
*
|
||||
* Note, the meta data fields are deliberately placed at the start of the
|
||||
* structure to optimize the cacheline layout; accessing the descriptor will
|
||||
* touch only a single cacheline so long as @spte_count<=6 (or if only the
|
||||
* descriptors metadata is accessed).
|
||||
*/
|
||||
struct pte_list_desc {
|
||||
struct pte_list_desc *more;
|
||||
/*
|
||||
* Stores number of entries stored in the pte_list_desc. No need to be
|
||||
* u64 but just for easier alignment. When PTE_LIST_EXT, means full.
|
||||
*/
|
||||
u64 spte_count;
|
||||
/* The number of PTEs stored in _this_ descriptor. */
|
||||
u32 spte_count;
|
||||
/* The number of PTEs stored in all tails of this descriptor. */
|
||||
u32 tail_count;
|
||||
u64 *sptes[PTE_LIST_EXT];
|
||||
};
|
||||
|
||||
@ -242,34 +256,37 @@ static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
|
||||
return regs;
|
||||
}
|
||||
|
||||
static inline bool kvm_available_flush_tlb_with_range(void)
|
||||
static unsigned long get_guest_cr3(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_x86_ops.tlb_remote_flush_with_range;
|
||||
return kvm_read_cr3(vcpu);
|
||||
}
|
||||
|
||||
static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
|
||||
struct kvm_tlb_range *range)
|
||||
static inline unsigned long kvm_mmu_get_guest_pgd(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu *mmu)
|
||||
{
|
||||
int ret = -ENOTSUPP;
|
||||
if (IS_ENABLED(CONFIG_RETPOLINE) && mmu->get_guest_pgd == get_guest_cr3)
|
||||
return kvm_read_cr3(vcpu);
|
||||
|
||||
if (range && kvm_x86_ops.tlb_remote_flush_with_range)
|
||||
ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, range);
|
||||
return mmu->get_guest_pgd(vcpu);
|
||||
}
|
||||
|
||||
static inline bool kvm_available_flush_remote_tlbs_range(void)
|
||||
{
|
||||
return kvm_x86_ops.flush_remote_tlbs_range;
|
||||
}
|
||||
|
||||
void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn,
|
||||
gfn_t nr_pages)
|
||||
{
|
||||
int ret = -EOPNOTSUPP;
|
||||
|
||||
if (kvm_x86_ops.flush_remote_tlbs_range)
|
||||
ret = static_call(kvm_x86_flush_remote_tlbs_range)(kvm, start_gfn,
|
||||
nr_pages);
|
||||
if (ret)
|
||||
kvm_flush_remote_tlbs(kvm);
|
||||
}
|
||||
|
||||
void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
|
||||
u64 start_gfn, u64 pages)
|
||||
{
|
||||
struct kvm_tlb_range range;
|
||||
|
||||
range.start_gfn = start_gfn;
|
||||
range.pages = pages;
|
||||
|
||||
kvm_flush_remote_tlbs_with_range(kvm, &range);
|
||||
}
|
||||
|
||||
static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index);
|
||||
|
||||
/* Flush the range of guest memory mapped by the given SPTE. */
|
||||
@ -888,9 +905,9 @@ static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
untrack_possible_nx_huge_page(kvm, sp);
|
||||
}
|
||||
|
||||
static struct kvm_memory_slot *
|
||||
gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
|
||||
bool no_dirty_log)
|
||||
static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu,
|
||||
gfn_t gfn,
|
||||
bool no_dirty_log)
|
||||
{
|
||||
struct kvm_memory_slot *slot;
|
||||
|
||||
@ -929,53 +946,69 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte,
|
||||
desc->sptes[0] = (u64 *)rmap_head->val;
|
||||
desc->sptes[1] = spte;
|
||||
desc->spte_count = 2;
|
||||
desc->tail_count = 0;
|
||||
rmap_head->val = (unsigned long)desc | 1;
|
||||
++count;
|
||||
} else {
|
||||
rmap_printk("%p %llx many->many\n", spte, *spte);
|
||||
desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
|
||||
while (desc->spte_count == PTE_LIST_EXT) {
|
||||
count += PTE_LIST_EXT;
|
||||
if (!desc->more) {
|
||||
desc->more = kvm_mmu_memory_cache_alloc(cache);
|
||||
desc = desc->more;
|
||||
desc->spte_count = 0;
|
||||
break;
|
||||
}
|
||||
desc = desc->more;
|
||||
count = desc->tail_count + desc->spte_count;
|
||||
|
||||
/*
|
||||
* If the previous head is full, allocate a new head descriptor
|
||||
* as tail descriptors are always kept full.
|
||||
*/
|
||||
if (desc->spte_count == PTE_LIST_EXT) {
|
||||
desc = kvm_mmu_memory_cache_alloc(cache);
|
||||
desc->more = (struct pte_list_desc *)(rmap_head->val & ~1ul);
|
||||
desc->spte_count = 0;
|
||||
desc->tail_count = count;
|
||||
rmap_head->val = (unsigned long)desc | 1;
|
||||
}
|
||||
count += desc->spte_count;
|
||||
desc->sptes[desc->spte_count++] = spte;
|
||||
}
|
||||
return count;
|
||||
}
|
||||
|
||||
static void
|
||||
pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
|
||||
struct pte_list_desc *desc, int i,
|
||||
struct pte_list_desc *prev_desc)
|
||||
static void pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
|
||||
struct pte_list_desc *desc, int i)
|
||||
{
|
||||
int j = desc->spte_count - 1;
|
||||
struct pte_list_desc *head_desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
|
||||
int j = head_desc->spte_count - 1;
|
||||
|
||||
desc->sptes[i] = desc->sptes[j];
|
||||
desc->sptes[j] = NULL;
|
||||
desc->spte_count--;
|
||||
if (desc->spte_count)
|
||||
/*
|
||||
* The head descriptor should never be empty. A new head is added only
|
||||
* when adding an entry and the previous head is full, and heads are
|
||||
* removed (this flow) when they become empty.
|
||||
*/
|
||||
BUG_ON(j < 0);
|
||||
|
||||
/*
|
||||
* Replace the to-be-freed SPTE with the last valid entry from the head
|
||||
* descriptor to ensure that tail descriptors are full at all times.
|
||||
* Note, this also means that tail_count is stable for each descriptor.
|
||||
*/
|
||||
desc->sptes[i] = head_desc->sptes[j];
|
||||
head_desc->sptes[j] = NULL;
|
||||
head_desc->spte_count--;
|
||||
if (head_desc->spte_count)
|
||||
return;
|
||||
if (!prev_desc && !desc->more)
|
||||
|
||||
/*
|
||||
* The head descriptor is empty. If there are no tail descriptors,
|
||||
* nullify the rmap head to mark the list as emtpy, else point the rmap
|
||||
* head at the next descriptor, i.e. the new head.
|
||||
*/
|
||||
if (!head_desc->more)
|
||||
rmap_head->val = 0;
|
||||
else
|
||||
if (prev_desc)
|
||||
prev_desc->more = desc->more;
|
||||
else
|
||||
rmap_head->val = (unsigned long)desc->more | 1;
|
||||
mmu_free_pte_list_desc(desc);
|
||||
rmap_head->val = (unsigned long)head_desc->more | 1;
|
||||
mmu_free_pte_list_desc(head_desc);
|
||||
}
|
||||
|
||||
static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
|
||||
{
|
||||
struct pte_list_desc *desc;
|
||||
struct pte_list_desc *prev_desc;
|
||||
int i;
|
||||
|
||||
if (!rmap_head->val) {
|
||||
@ -991,16 +1024,13 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
|
||||
} else {
|
||||
rmap_printk("%p many->many\n", spte);
|
||||
desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
|
||||
prev_desc = NULL;
|
||||
while (desc) {
|
||||
for (i = 0; i < desc->spte_count; ++i) {
|
||||
if (desc->sptes[i] == spte) {
|
||||
pte_list_desc_remove_entry(rmap_head,
|
||||
desc, i, prev_desc);
|
||||
pte_list_desc_remove_entry(rmap_head, desc, i);
|
||||
return;
|
||||
}
|
||||
}
|
||||
prev_desc = desc;
|
||||
desc = desc->more;
|
||||
}
|
||||
pr_err("%s: %p many->many\n", __func__, spte);
|
||||
@ -1047,7 +1077,6 @@ out:
|
||||
unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
|
||||
{
|
||||
struct pte_list_desc *desc;
|
||||
unsigned int count = 0;
|
||||
|
||||
if (!rmap_head->val)
|
||||
return 0;
|
||||
@ -1055,13 +1084,7 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head)
|
||||
return 1;
|
||||
|
||||
desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
|
||||
|
||||
while (desc) {
|
||||
count += desc->spte_count;
|
||||
desc = desc->more;
|
||||
}
|
||||
|
||||
return count;
|
||||
return desc->tail_count + desc->spte_count;
|
||||
}
|
||||
|
||||
static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
|
||||
@ -1073,14 +1096,6 @@ static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
|
||||
return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
|
||||
}
|
||||
|
||||
static bool rmap_can_add(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_mmu_memory_cache *mc;
|
||||
|
||||
mc = &vcpu->arch.mmu_pte_list_desc_cache;
|
||||
return kvm_mmu_memory_cache_nr_free_objects(mc);
|
||||
}
|
||||
|
||||
static void rmap_remove(struct kvm *kvm, u64 *spte)
|
||||
{
|
||||
struct kvm_memslots *slots;
|
||||
@ -1479,7 +1494,7 @@ restart:
|
||||
}
|
||||
}
|
||||
|
||||
if (need_flush && kvm_available_flush_tlb_with_range()) {
|
||||
if (need_flush && kvm_available_flush_remote_tlbs_range()) {
|
||||
kvm_flush_remote_tlbs_gfn(kvm, gfn, level);
|
||||
return false;
|
||||
}
|
||||
@ -1504,8 +1519,8 @@ struct slot_rmap_walk_iterator {
|
||||
struct kvm_rmap_head *end_rmap;
|
||||
};
|
||||
|
||||
static void
|
||||
rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
|
||||
static void rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator,
|
||||
int level)
|
||||
{
|
||||
iterator->level = level;
|
||||
iterator->gfn = iterator->start_gfn;
|
||||
@ -1513,10 +1528,10 @@ rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
|
||||
iterator->end_rmap = gfn_to_rmap(iterator->end_gfn, level, iterator->slot);
|
||||
}
|
||||
|
||||
static void
|
||||
slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
|
||||
const struct kvm_memory_slot *slot, int start_level,
|
||||
int end_level, gfn_t start_gfn, gfn_t end_gfn)
|
||||
static void slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
|
||||
const struct kvm_memory_slot *slot,
|
||||
int start_level, int end_level,
|
||||
gfn_t start_gfn, gfn_t end_gfn)
|
||||
{
|
||||
iterator->slot = slot;
|
||||
iterator->start_level = start_level;
|
||||
@ -1789,12 +1804,6 @@ static void mark_unsync(u64 *spte)
|
||||
kvm_mmu_mark_parents_unsync(sp);
|
||||
}
|
||||
|
||||
static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu_page *sp)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
#define KVM_PAGE_ARRAY_NR 16
|
||||
|
||||
struct kvm_mmu_pages {
|
||||
@ -1914,10 +1923,79 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
|
||||
&(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)]) \
|
||||
if ((_sp)->gfn != (_gfn) || !sp_has_gptes(_sp)) {} else
|
||||
|
||||
static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
||||
{
|
||||
union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role;
|
||||
|
||||
/*
|
||||
* Ignore various flags when verifying that it's safe to sync a shadow
|
||||
* page using the current MMU context.
|
||||
*
|
||||
* - level: not part of the overall MMU role and will never match as the MMU's
|
||||
* level tracks the root level
|
||||
* - access: updated based on the new guest PTE
|
||||
* - quadrant: not part of the overall MMU role (similar to level)
|
||||
*/
|
||||
const union kvm_mmu_page_role sync_role_ign = {
|
||||
.level = 0xf,
|
||||
.access = 0x7,
|
||||
.quadrant = 0x3,
|
||||
.passthrough = 0x1,
|
||||
};
|
||||
|
||||
/*
|
||||
* Direct pages can never be unsync, and KVM should never attempt to
|
||||
* sync a shadow page for a different MMU context, e.g. if the role
|
||||
* differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
|
||||
* reserved bits checks will be wrong, etc...
|
||||
*/
|
||||
if (WARN_ON_ONCE(sp->role.direct || !vcpu->arch.mmu->sync_spte ||
|
||||
(sp->role.word ^ root_role.word) & ~sync_role_ign.word))
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static int kvm_sync_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i)
|
||||
{
|
||||
if (!sp->spt[i])
|
||||
return 0;
|
||||
|
||||
return vcpu->arch.mmu->sync_spte(vcpu, sp, i);
|
||||
}
|
||||
|
||||
static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
||||
{
|
||||
int flush = 0;
|
||||
int i;
|
||||
|
||||
if (!kvm_sync_page_check(vcpu, sp))
|
||||
return -1;
|
||||
|
||||
for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
|
||||
int ret = kvm_sync_spte(vcpu, sp, i);
|
||||
|
||||
if (ret < -1)
|
||||
return -1;
|
||||
flush |= ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* Note, any flush is purely for KVM's correctness, e.g. when dropping
|
||||
* an existing SPTE or clearing W/A/D bits to ensure an mmu_notifier
|
||||
* unmap or dirty logging event doesn't fail to flush. The guest is
|
||||
* responsible for flushing the TLB to ensure any changes in protection
|
||||
* bits are recognized, i.e. until the guest flushes or page faults on
|
||||
* a relevant address, KVM is architecturally allowed to let vCPUs use
|
||||
* cached translations with the old protection bits.
|
||||
*/
|
||||
return flush;
|
||||
}
|
||||
|
||||
static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
||||
struct list_head *invalid_list)
|
||||
{
|
||||
int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
|
||||
int ret = __kvm_sync_page(vcpu, sp);
|
||||
|
||||
if (ret < 0)
|
||||
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
|
||||
@ -3304,9 +3382,9 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
|
||||
* Returns true if the SPTE was fixed successfully. Otherwise,
|
||||
* someone else modified the SPTE from its original value.
|
||||
*/
|
||||
static bool
|
||||
fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
|
||||
u64 *sptep, u64 old_spte, u64 new_spte)
|
||||
static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
|
||||
struct kvm_page_fault *fault,
|
||||
u64 *sptep, u64 old_spte, u64 new_spte)
|
||||
{
|
||||
/*
|
||||
* Theoretically we could also set dirty bit (and flush TLB) here in
|
||||
@ -3513,6 +3591,8 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
|
||||
LIST_HEAD(invalid_list);
|
||||
bool free_active_root;
|
||||
|
||||
WARN_ON_ONCE(roots_to_free & ~KVM_MMU_ROOTS_ALL);
|
||||
|
||||
BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);
|
||||
|
||||
/* Before acquiring the MMU lock, see if we need to do any real work. */
|
||||
@ -3731,7 +3811,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
|
||||
int quadrant, i, r;
|
||||
hpa_t root;
|
||||
|
||||
root_pgd = mmu->get_guest_pgd(vcpu);
|
||||
root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
|
||||
root_gfn = root_pgd >> PAGE_SHIFT;
|
||||
|
||||
if (mmu_check_root(vcpu, root_gfn))
|
||||
@ -4181,7 +4261,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
arch.token = alloc_apf_token(vcpu);
|
||||
arch.gfn = gfn;
|
||||
arch.direct_map = vcpu->arch.mmu->root_role.direct;
|
||||
arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
|
||||
arch.cr3 = kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu);
|
||||
|
||||
return kvm_setup_async_pf(vcpu, cr2_or_gpa,
|
||||
kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
|
||||
@ -4200,10 +4280,10 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
|
||||
return;
|
||||
|
||||
if (!vcpu->arch.mmu->root_role.direct &&
|
||||
work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
|
||||
work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
|
||||
return;
|
||||
|
||||
kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
|
||||
kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL);
|
||||
}
|
||||
|
||||
static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
|
||||
@ -4469,8 +4549,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
|
||||
{
|
||||
context->page_fault = nonpaging_page_fault;
|
||||
context->gva_to_gpa = nonpaging_gva_to_gpa;
|
||||
context->sync_page = nonpaging_sync_page;
|
||||
context->invlpg = NULL;
|
||||
context->sync_spte = NULL;
|
||||
}
|
||||
|
||||
static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
|
||||
@ -4604,11 +4683,6 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
|
||||
|
||||
static unsigned long get_cr3(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_read_cr3(vcpu);
|
||||
}
|
||||
|
||||
static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
|
||||
unsigned int access)
|
||||
{
|
||||
@ -4638,10 +4712,9 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
|
||||
#include "paging_tmpl.h"
|
||||
#undef PTTYPE
|
||||
|
||||
static void
|
||||
__reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
|
||||
u64 pa_bits_rsvd, int level, bool nx, bool gbpages,
|
||||
bool pse, bool amd)
|
||||
static void __reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check,
|
||||
u64 pa_bits_rsvd, int level, bool nx,
|
||||
bool gbpages, bool pse, bool amd)
|
||||
{
|
||||
u64 gbpages_bit_rsvd = 0;
|
||||
u64 nonleaf_bit8_rsvd = 0;
|
||||
@ -4754,9 +4827,9 @@ static void reset_guest_rsvds_bits_mask(struct kvm_vcpu *vcpu,
|
||||
guest_cpuid_is_amd_or_hygon(vcpu));
|
||||
}
|
||||
|
||||
static void
|
||||
__reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
|
||||
u64 pa_bits_rsvd, bool execonly, int huge_page_level)
|
||||
static void __reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
|
||||
u64 pa_bits_rsvd, bool execonly,
|
||||
int huge_page_level)
|
||||
{
|
||||
u64 high_bits_rsvd = pa_bits_rsvd & rsvd_bits(0, 51);
|
||||
u64 large_1g_rsvd = 0, large_2m_rsvd = 0;
|
||||
@ -4856,8 +4929,7 @@ static inline bool boot_cpu_is_amd(void)
|
||||
* the direct page table on host, use as much mmu features as
|
||||
* possible, however, kvm currently does not do execution-protection.
|
||||
*/
|
||||
static void
|
||||
reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context)
|
||||
static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context)
|
||||
{
|
||||
struct rsvd_bits_validate *shadow_zero_check;
|
||||
int i;
|
||||
@ -5060,20 +5132,18 @@ static void paging64_init_context(struct kvm_mmu *context)
|
||||
{
|
||||
context->page_fault = paging64_page_fault;
|
||||
context->gva_to_gpa = paging64_gva_to_gpa;
|
||||
context->sync_page = paging64_sync_page;
|
||||
context->invlpg = paging64_invlpg;
|
||||
context->sync_spte = paging64_sync_spte;
|
||||
}
|
||||
|
||||
static void paging32_init_context(struct kvm_mmu *context)
|
||||
{
|
||||
context->page_fault = paging32_page_fault;
|
||||
context->gva_to_gpa = paging32_gva_to_gpa;
|
||||
context->sync_page = paging32_sync_page;
|
||||
context->invlpg = paging32_invlpg;
|
||||
context->sync_spte = paging32_sync_spte;
|
||||
}
|
||||
|
||||
static union kvm_cpu_role
|
||||
kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
|
||||
static union kvm_cpu_role kvm_calc_cpu_role(struct kvm_vcpu *vcpu,
|
||||
const struct kvm_mmu_role_regs *regs)
|
||||
{
|
||||
union kvm_cpu_role role = {0};
|
||||
|
||||
@ -5112,6 +5182,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
|
||||
return role;
|
||||
}
|
||||
|
||||
void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu *mmu)
|
||||
{
|
||||
const bool cr0_wp = kvm_is_cr0_bit_set(vcpu, X86_CR0_WP);
|
||||
|
||||
BUILD_BUG_ON((KVM_MMU_CR0_ROLE_BITS & KVM_POSSIBLE_CR0_GUEST_BITS) != X86_CR0_WP);
|
||||
BUILD_BUG_ON((KVM_MMU_CR4_ROLE_BITS & KVM_POSSIBLE_CR4_GUEST_BITS));
|
||||
|
||||
if (is_cr0_wp(mmu) == cr0_wp)
|
||||
return;
|
||||
|
||||
mmu->cpu_role.base.cr0_wp = cr0_wp;
|
||||
reset_guest_paging_metadata(vcpu, mmu);
|
||||
}
|
||||
|
||||
static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
/* tdp_root_level is architecture forced level, use it if nonzero */
|
||||
@ -5157,9 +5242,8 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
|
||||
context->cpu_role.as_u64 = cpu_role.as_u64;
|
||||
context->root_role.word = root_role.word;
|
||||
context->page_fault = kvm_tdp_page_fault;
|
||||
context->sync_page = nonpaging_sync_page;
|
||||
context->invlpg = NULL;
|
||||
context->get_guest_pgd = get_cr3;
|
||||
context->sync_spte = NULL;
|
||||
context->get_guest_pgd = get_guest_cr3;
|
||||
context->get_pdptr = kvm_pdptr_read;
|
||||
context->inject_page_fault = kvm_inject_page_fault;
|
||||
|
||||
@ -5289,8 +5373,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
|
||||
|
||||
context->page_fault = ept_page_fault;
|
||||
context->gva_to_gpa = ept_gva_to_gpa;
|
||||
context->sync_page = ept_sync_page;
|
||||
context->invlpg = ept_invlpg;
|
||||
context->sync_spte = ept_sync_spte;
|
||||
|
||||
update_permission_bitmask(context, true);
|
||||
context->pkru_mask = 0;
|
||||
@ -5309,7 +5392,7 @@ static void init_kvm_softmmu(struct kvm_vcpu *vcpu,
|
||||
|
||||
kvm_init_shadow_mmu(vcpu, cpu_role);
|
||||
|
||||
context->get_guest_pgd = get_cr3;
|
||||
context->get_guest_pgd = get_guest_cr3;
|
||||
context->get_pdptr = kvm_pdptr_read;
|
||||
context->inject_page_fault = kvm_inject_page_fault;
|
||||
}
|
||||
@ -5323,7 +5406,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu,
|
||||
return;
|
||||
|
||||
g_context->cpu_role.as_u64 = new_mode.as_u64;
|
||||
g_context->get_guest_pgd = get_cr3;
|
||||
g_context->get_guest_pgd = get_guest_cr3;
|
||||
g_context->get_pdptr = kvm_pdptr_read;
|
||||
g_context->inject_page_fault = kvm_inject_page_fault;
|
||||
|
||||
@ -5331,7 +5414,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu,
|
||||
* L2 page tables are never shadowed, so there is no need to sync
|
||||
* SPTEs.
|
||||
*/
|
||||
g_context->invlpg = NULL;
|
||||
g_context->sync_spte = NULL;
|
||||
|
||||
/*
|
||||
* Note that arch.mmu->gva_to_gpa translates l2_gpa to l1_gpa using
|
||||
@ -5393,7 +5476,7 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
* Changing guest CPUID after KVM_RUN is forbidden, see the comment in
|
||||
* kvm_arch_vcpu_ioctl().
|
||||
*/
|
||||
KVM_BUG_ON(vcpu->arch.last_vmentry_cpu != -1, vcpu->kvm);
|
||||
KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm);
|
||||
}
|
||||
|
||||
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
|
||||
@ -5664,7 +5747,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
|
||||
|
||||
if (r == RET_PF_INVALID) {
|
||||
r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
|
||||
lower_32_bits(error_code), false);
|
||||
lower_32_bits(error_code), false,
|
||||
&emulation_type);
|
||||
if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
|
||||
return -EIO;
|
||||
}
|
||||
@ -5706,48 +5790,77 @@ emulate:
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
|
||||
|
||||
void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
gva_t gva, hpa_t root_hpa)
|
||||
static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
u64 addr, hpa_t root_hpa)
|
||||
{
|
||||
struct kvm_shadow_walk_iterator iterator;
|
||||
|
||||
vcpu_clear_mmio_info(vcpu, addr);
|
||||
|
||||
if (!VALID_PAGE(root_hpa))
|
||||
return;
|
||||
|
||||
write_lock(&vcpu->kvm->mmu_lock);
|
||||
for_each_shadow_entry_using_root(vcpu, root_hpa, addr, iterator) {
|
||||
struct kvm_mmu_page *sp = sptep_to_sp(iterator.sptep);
|
||||
|
||||
if (sp->unsync) {
|
||||
int ret = kvm_sync_spte(vcpu, sp, iterator.index);
|
||||
|
||||
if (ret < 0)
|
||||
mmu_page_zap_pte(vcpu->kvm, sp, iterator.sptep, NULL);
|
||||
if (ret)
|
||||
kvm_flush_remote_tlbs_sptep(vcpu->kvm, iterator.sptep);
|
||||
}
|
||||
|
||||
if (!sp->unsync_children)
|
||||
break;
|
||||
}
|
||||
write_unlock(&vcpu->kvm->mmu_lock);
|
||||
}
|
||||
|
||||
void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
u64 addr, unsigned long roots)
|
||||
{
|
||||
int i;
|
||||
|
||||
WARN_ON_ONCE(roots & ~KVM_MMU_ROOTS_ALL);
|
||||
|
||||
/* It's actually a GPA for vcpu->arch.guest_mmu. */
|
||||
if (mmu != &vcpu->arch.guest_mmu) {
|
||||
/* INVLPG on a non-canonical address is a NOP according to the SDM. */
|
||||
if (is_noncanonical_address(gva, vcpu))
|
||||
if (is_noncanonical_address(addr, vcpu))
|
||||
return;
|
||||
|
||||
static_call(kvm_x86_flush_tlb_gva)(vcpu, gva);
|
||||
static_call(kvm_x86_flush_tlb_gva)(vcpu, addr);
|
||||
}
|
||||
|
||||
if (!mmu->invlpg)
|
||||
if (!mmu->sync_spte)
|
||||
return;
|
||||
|
||||
if (root_hpa == INVALID_PAGE) {
|
||||
mmu->invlpg(vcpu, gva, mmu->root.hpa);
|
||||
if (roots & KVM_MMU_ROOT_CURRENT)
|
||||
__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->root.hpa);
|
||||
|
||||
/*
|
||||
* INVLPG is required to invalidate any global mappings for the VA,
|
||||
* irrespective of PCID. Since it would take us roughly similar amount
|
||||
* of work to determine whether any of the prev_root mappings of the VA
|
||||
* is marked global, or to just sync it blindly, so we might as well
|
||||
* just always sync it.
|
||||
*
|
||||
* Mappings not reachable via the current cr3 or the prev_roots will be
|
||||
* synced when switching to that cr3, so nothing needs to be done here
|
||||
* for them.
|
||||
*/
|
||||
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
|
||||
if (VALID_PAGE(mmu->prev_roots[i].hpa))
|
||||
mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
|
||||
} else {
|
||||
mmu->invlpg(vcpu, gva, root_hpa);
|
||||
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
|
||||
if (roots & KVM_MMU_ROOT_PREVIOUS(i))
|
||||
__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr);
|
||||
|
||||
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
|
||||
{
|
||||
kvm_mmu_invalidate_gva(vcpu, vcpu->arch.walk_mmu, gva, INVALID_PAGE);
|
||||
/*
|
||||
* INVLPG is required to invalidate any global mappings for the VA,
|
||||
* irrespective of PCID. Blindly sync all roots as it would take
|
||||
* roughly the same amount of work/time to determine whether any of the
|
||||
* previous roots have a global mapping.
|
||||
*
|
||||
* Mappings not reachable via the current or previous cached roots will
|
||||
* be synced when switching to that new cr3, so nothing needs to be
|
||||
* done here for them.
|
||||
*/
|
||||
kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, KVM_MMU_ROOTS_ALL);
|
||||
++vcpu->stat.invlpg;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
|
||||
@ -5756,27 +5869,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
|
||||
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.mmu;
|
||||
bool tlb_flush = false;
|
||||
unsigned long roots = 0;
|
||||
uint i;
|
||||
|
||||
if (pcid == kvm_get_active_pcid(vcpu)) {
|
||||
if (mmu->invlpg)
|
||||
mmu->invlpg(vcpu, gva, mmu->root.hpa);
|
||||
tlb_flush = true;
|
||||
}
|
||||
if (pcid == kvm_get_active_pcid(vcpu))
|
||||
roots |= KVM_MMU_ROOT_CURRENT;
|
||||
|
||||
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
|
||||
if (VALID_PAGE(mmu->prev_roots[i].hpa) &&
|
||||
pcid == kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd)) {
|
||||
if (mmu->invlpg)
|
||||
mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
|
||||
tlb_flush = true;
|
||||
}
|
||||
pcid == kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd))
|
||||
roots |= KVM_MMU_ROOT_PREVIOUS(i);
|
||||
}
|
||||
|
||||
if (tlb_flush)
|
||||
static_call(kvm_x86_flush_tlb_gva)(vcpu, gva);
|
||||
|
||||
if (roots)
|
||||
kvm_mmu_invalidate_addr(vcpu, mmu, gva, roots);
|
||||
++vcpu->stat.invlpg;
|
||||
|
||||
/*
|
||||
@ -5813,29 +5919,30 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
|
||||
EXPORT_SYMBOL_GPL(kvm_configure_mmu);
|
||||
|
||||
/* The return value indicates if tlb flush on all vcpus is needed. */
|
||||
typedef bool (*slot_level_handler) (struct kvm *kvm,
|
||||
typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
|
||||
struct kvm_rmap_head *rmap_head,
|
||||
const struct kvm_memory_slot *slot);
|
||||
|
||||
/* The caller should hold mmu-lock before calling this function. */
|
||||
static __always_inline bool
|
||||
slot_handle_level_range(struct kvm *kvm, const struct kvm_memory_slot *memslot,
|
||||
slot_level_handler fn, int start_level, int end_level,
|
||||
gfn_t start_gfn, gfn_t end_gfn, bool flush_on_yield,
|
||||
bool flush)
|
||||
static __always_inline bool __walk_slot_rmaps(struct kvm *kvm,
|
||||
const struct kvm_memory_slot *slot,
|
||||
slot_rmaps_handler fn,
|
||||
int start_level, int end_level,
|
||||
gfn_t start_gfn, gfn_t end_gfn,
|
||||
bool flush_on_yield, bool flush)
|
||||
{
|
||||
struct slot_rmap_walk_iterator iterator;
|
||||
|
||||
for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
|
||||
lockdep_assert_held_write(&kvm->mmu_lock);
|
||||
|
||||
for_each_slot_rmap_range(slot, start_level, end_level, start_gfn,
|
||||
end_gfn, &iterator) {
|
||||
if (iterator.rmap)
|
||||
flush |= fn(kvm, iterator.rmap, memslot);
|
||||
flush |= fn(kvm, iterator.rmap, slot);
|
||||
|
||||
if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
|
||||
if (flush && flush_on_yield) {
|
||||
kvm_flush_remote_tlbs_with_address(kvm,
|
||||
start_gfn,
|
||||
iterator.gfn - start_gfn + 1);
|
||||
kvm_flush_remote_tlbs_range(kvm, start_gfn,
|
||||
iterator.gfn - start_gfn + 1);
|
||||
flush = false;
|
||||
}
|
||||
cond_resched_rwlock_write(&kvm->mmu_lock);
|
||||
@ -5845,23 +5952,23 @@ slot_handle_level_range(struct kvm *kvm, const struct kvm_memory_slot *memslot,
|
||||
return flush;
|
||||
}
|
||||
|
||||
static __always_inline bool
|
||||
slot_handle_level(struct kvm *kvm, const struct kvm_memory_slot *memslot,
|
||||
slot_level_handler fn, int start_level, int end_level,
|
||||
bool flush_on_yield)
|
||||
static __always_inline bool walk_slot_rmaps(struct kvm *kvm,
|
||||
const struct kvm_memory_slot *slot,
|
||||
slot_rmaps_handler fn,
|
||||
int start_level, int end_level,
|
||||
bool flush_on_yield)
|
||||
{
|
||||
return slot_handle_level_range(kvm, memslot, fn, start_level,
|
||||
end_level, memslot->base_gfn,
|
||||
memslot->base_gfn + memslot->npages - 1,
|
||||
flush_on_yield, false);
|
||||
return __walk_slot_rmaps(kvm, slot, fn, start_level, end_level,
|
||||
slot->base_gfn, slot->base_gfn + slot->npages - 1,
|
||||
flush_on_yield, false);
|
||||
}
|
||||
|
||||
static __always_inline bool
|
||||
slot_handle_level_4k(struct kvm *kvm, const struct kvm_memory_slot *memslot,
|
||||
slot_level_handler fn, bool flush_on_yield)
|
||||
static __always_inline bool walk_slot_rmaps_4k(struct kvm *kvm,
|
||||
const struct kvm_memory_slot *slot,
|
||||
slot_rmaps_handler fn,
|
||||
bool flush_on_yield)
|
||||
{
|
||||
return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K,
|
||||
PG_LEVEL_4K, flush_on_yield);
|
||||
return walk_slot_rmaps(kvm, slot, fn, PG_LEVEL_4K, PG_LEVEL_4K, flush_on_yield);
|
||||
}
|
||||
|
||||
static void free_mmu_pages(struct kvm_mmu *mmu)
|
||||
@ -6156,9 +6263,9 @@ static bool kvm_rmap_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_e
|
||||
if (WARN_ON_ONCE(start >= end))
|
||||
continue;
|
||||
|
||||
flush = slot_handle_level_range(kvm, memslot, __kvm_zap_rmap,
|
||||
PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
|
||||
start, end - 1, true, flush);
|
||||
flush = __walk_slot_rmaps(kvm, memslot, __kvm_zap_rmap,
|
||||
PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
|
||||
start, end - 1, true, flush);
|
||||
}
|
||||
}
|
||||
|
||||
@ -6190,8 +6297,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
|
||||
}
|
||||
|
||||
if (flush)
|
||||
kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
|
||||
gfn_end - gfn_start);
|
||||
kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start);
|
||||
|
||||
kvm_mmu_invalidate_end(kvm, 0, -1ul);
|
||||
|
||||
@ -6211,8 +6317,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
|
||||
{
|
||||
if (kvm_memslots_have_rmaps(kvm)) {
|
||||
write_lock(&kvm->mmu_lock);
|
||||
slot_handle_level(kvm, memslot, slot_rmap_write_protect,
|
||||
start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
|
||||
walk_slot_rmaps(kvm, memslot, slot_rmap_write_protect,
|
||||
start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
|
||||
write_unlock(&kvm->mmu_lock);
|
||||
}
|
||||
|
||||
@ -6447,10 +6553,9 @@ static void kvm_shadow_mmu_try_split_huge_pages(struct kvm *kvm,
|
||||
* all the way to the target level. There's no need to split pages
|
||||
* already at the target level.
|
||||
*/
|
||||
for (level = KVM_MAX_HUGEPAGE_LEVEL; level > target_level; level--) {
|
||||
slot_handle_level_range(kvm, slot, shadow_mmu_try_split_huge_pages,
|
||||
level, level, start, end - 1, true, false);
|
||||
}
|
||||
for (level = KVM_MAX_HUGEPAGE_LEVEL; level > target_level; level--)
|
||||
__walk_slot_rmaps(kvm, slot, shadow_mmu_try_split_huge_pages,
|
||||
level, level, start, end - 1, true, false);
|
||||
}
|
||||
|
||||
/* Must be called with the mmu_lock held in write-mode. */
|
||||
@ -6529,7 +6634,7 @@ restart:
|
||||
PG_LEVEL_NUM)) {
|
||||
kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
|
||||
|
||||
if (kvm_available_flush_tlb_with_range())
|
||||
if (kvm_available_flush_remote_tlbs_range())
|
||||
kvm_flush_remote_tlbs_sptep(kvm, sptep);
|
||||
else
|
||||
need_tlb_flush = 1;
|
||||
@ -6548,8 +6653,8 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
|
||||
* Note, use KVM_MAX_HUGEPAGE_LEVEL - 1 since there's no need to zap
|
||||
* pages that are already mapped at the maximum hugepage level.
|
||||
*/
|
||||
if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
|
||||
PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
|
||||
if (walk_slot_rmaps(kvm, slot, kvm_mmu_zap_collapsible_spte,
|
||||
PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
|
||||
kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
|
||||
}
|
||||
|
||||
@ -6580,8 +6685,7 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
|
||||
* is observed by any other operation on the same memslot.
|
||||
*/
|
||||
lockdep_assert_held(&kvm->slots_lock);
|
||||
kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
|
||||
memslot->npages);
|
||||
kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
|
||||
}
|
||||
|
||||
void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
|
||||
@ -6593,7 +6697,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
|
||||
* Clear dirty bits only on 4k SPTEs since the legacy MMU only
|
||||
* support dirty logging at a 4k granularity.
|
||||
*/
|
||||
slot_handle_level_4k(kvm, memslot, __rmap_clear_dirty, false);
|
||||
walk_slot_rmaps_4k(kvm, memslot, __rmap_clear_dirty, false);
|
||||
write_unlock(&kvm->mmu_lock);
|
||||
}
|
||||
|
||||
@ -6663,8 +6767,8 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
|
||||
}
|
||||
}
|
||||
|
||||
static unsigned long
|
||||
mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
|
||||
static unsigned long mmu_shrink_scan(struct shrinker *shrink,
|
||||
struct shrink_control *sc)
|
||||
{
|
||||
struct kvm *kvm;
|
||||
int nr_to_scan = sc->nr_to_scan;
|
||||
@ -6722,8 +6826,8 @@ unlock:
|
||||
return freed;
|
||||
}
|
||||
|
||||
static unsigned long
|
||||
mmu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
|
||||
static unsigned long mmu_shrink_count(struct shrinker *shrink,
|
||||
struct shrink_control *sc)
|
||||
{
|
||||
return percpu_counter_read_positive(&kvm_total_used_mmu_pages);
|
||||
}
|
||||
|
@ -170,14 +170,14 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
|
||||
struct kvm_memory_slot *slot, u64 gfn,
|
||||
int min_level);
|
||||
|
||||
void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
|
||||
u64 start_gfn, u64 pages);
|
||||
void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn,
|
||||
gfn_t nr_pages);
|
||||
|
||||
/* Flush the given page (huge or not) of guest memory. */
|
||||
static inline void kvm_flush_remote_tlbs_gfn(struct kvm *kvm, gfn_t gfn, int level)
|
||||
{
|
||||
kvm_flush_remote_tlbs_with_address(kvm, gfn_round_for_level(gfn, level),
|
||||
KVM_PAGES_PER_HPAGE(level));
|
||||
kvm_flush_remote_tlbs_range(kvm, gfn_round_for_level(gfn, level),
|
||||
KVM_PAGES_PER_HPAGE(level));
|
||||
}
|
||||
|
||||
unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
|
||||
@ -240,6 +240,13 @@ struct kvm_page_fault {
|
||||
kvm_pfn_t pfn;
|
||||
hva_t hva;
|
||||
bool map_writable;
|
||||
|
||||
/*
|
||||
* Indicates the guest is trying to write a gfn that contains one or
|
||||
* more of the PTEs used to translate the write itself, i.e. the access
|
||||
* is changing its own translation in the guest page tables.
|
||||
*/
|
||||
bool write_fault_to_shadow_pgtable;
|
||||
};
|
||||
|
||||
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
|
||||
@ -273,7 +280,7 @@ enum {
|
||||
};
|
||||
|
||||
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
u32 err, bool prefetch)
|
||||
u32 err, bool prefetch, int *emulation_type)
|
||||
{
|
||||
struct kvm_page_fault fault = {
|
||||
.addr = cr2_or_gpa,
|
||||
@ -312,6 +319,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
else
|
||||
r = vcpu->arch.mmu->page_fault(vcpu, &fault);
|
||||
|
||||
if (fault.write_fault_to_shadow_pgtable && emulation_type)
|
||||
*emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
|
||||
|
||||
/*
|
||||
* Similar to above, prefetch faults aren't truly spurious, and the
|
||||
* async #PF path doesn't do emulation. Do count faults that are fixed
|
||||
|
@ -324,7 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
|
||||
trace_kvm_mmu_pagetable_walk(addr, access);
|
||||
retry_walk:
|
||||
walker->level = mmu->cpu_role.base.level;
|
||||
pte = mmu->get_guest_pgd(vcpu);
|
||||
pte = kvm_mmu_get_guest_pgd(vcpu, mmu);
|
||||
have_ad = PT_HAVE_ACCESSED_DIRTY(mmu);
|
||||
|
||||
#if PTTYPE == 64
|
||||
@ -519,7 +519,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
|
||||
|
||||
static bool
|
||||
FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
||||
u64 *spte, pt_element_t gpte, bool no_dirty_log)
|
||||
u64 *spte, pt_element_t gpte)
|
||||
{
|
||||
struct kvm_memory_slot *slot;
|
||||
unsigned pte_access;
|
||||
@ -535,8 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
||||
pte_access = sp->role.access & FNAME(gpte_access)(gpte);
|
||||
FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
|
||||
|
||||
slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
|
||||
no_dirty_log && (pte_access & ACC_WRITE_MASK));
|
||||
slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, pte_access & ACC_WRITE_MASK);
|
||||
if (!slot)
|
||||
return false;
|
||||
|
||||
@ -605,7 +604,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
|
||||
if (is_shadow_present_pte(*spte))
|
||||
continue;
|
||||
|
||||
if (!FNAME(prefetch_gpte)(vcpu, sp, spte, gptep[i], true))
|
||||
if (!FNAME(prefetch_gpte)(vcpu, sp, spte, gptep[i]))
|
||||
break;
|
||||
}
|
||||
}
|
||||
@ -685,8 +684,17 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
|
||||
|
||||
if (sp != ERR_PTR(-EEXIST))
|
||||
link_shadow_page(vcpu, it.sptep, sp);
|
||||
|
||||
if (fault->write && table_gfn == fault->gfn)
|
||||
fault->write_fault_to_shadow_pgtable = true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Adjust the hugepage size _after_ resolving indirect shadow pages.
|
||||
* KVM doesn't support mapping hugepages into the guest for gfns that
|
||||
* are being shadowed by KVM, i.e. allocating a new shadow page may
|
||||
* affect the allowed hugepage size.
|
||||
*/
|
||||
kvm_mmu_hugepage_adjust(vcpu, fault);
|
||||
|
||||
trace_kvm_mmu_spte_requested(fault);
|
||||
@ -731,46 +739,6 @@ out_gpte_changed:
|
||||
return RET_PF_RETRY;
|
||||
}
|
||||
|
||||
/*
|
||||
* To see whether the mapped gfn can write its page table in the current
|
||||
* mapping.
|
||||
*
|
||||
* It is the helper function of FNAME(page_fault). When guest uses large page
|
||||
* size to map the writable gfn which is used as current page table, we should
|
||||
* force kvm to use small page size to map it because new shadow page will be
|
||||
* created when kvm establishes shadow page table that stop kvm using large
|
||||
* page size. Do it early can avoid unnecessary #PF and emulation.
|
||||
*
|
||||
* @write_fault_to_shadow_pgtable will return true if the fault gfn is
|
||||
* currently used as its page table.
|
||||
*
|
||||
* Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
|
||||
* since the PDPT is always shadowed, that means, we can not use large page
|
||||
* size to map the gfn which is used as PDPT.
|
||||
*/
|
||||
static bool
|
||||
FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
|
||||
struct guest_walker *walker, bool user_fault,
|
||||
bool *write_fault_to_shadow_pgtable)
|
||||
{
|
||||
int level;
|
||||
gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1);
|
||||
bool self_changed = false;
|
||||
|
||||
if (!(walker->pte_access & ACC_WRITE_MASK ||
|
||||
(!is_cr0_wp(vcpu->arch.mmu) && !user_fault)))
|
||||
return false;
|
||||
|
||||
for (level = walker->level; level <= walker->max_level; level++) {
|
||||
gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1];
|
||||
|
||||
self_changed |= !(gfn & mask);
|
||||
*write_fault_to_shadow_pgtable |= !gfn;
|
||||
}
|
||||
|
||||
return self_changed;
|
||||
}
|
||||
|
||||
/*
|
||||
* Page fault handler. There are several causes for a page fault:
|
||||
* - there is no shadow pte for the guest pte
|
||||
@ -789,7 +757,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
|
||||
{
|
||||
struct guest_walker walker;
|
||||
int r;
|
||||
bool is_self_change_mapping;
|
||||
|
||||
pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
|
||||
WARN_ON_ONCE(fault->is_tdp);
|
||||
@ -814,6 +781,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
|
||||
}
|
||||
|
||||
fault->gfn = walker.gfn;
|
||||
fault->max_level = walker.level;
|
||||
fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
|
||||
|
||||
if (page_fault_handle_page_track(vcpu, fault)) {
|
||||
@ -825,16 +793,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
|
||||
if (r)
|
||||
return r;
|
||||
|
||||
vcpu->arch.write_fault_to_shadow_pgtable = false;
|
||||
|
||||
is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
|
||||
&walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
|
||||
|
||||
if (is_self_change_mapping)
|
||||
fault->max_level = PG_LEVEL_4K;
|
||||
else
|
||||
fault->max_level = walker.level;
|
||||
|
||||
r = kvm_faultin_pfn(vcpu, fault, walker.pte_access);
|
||||
if (r != RET_PF_CONTINUE)
|
||||
return r;
|
||||
@ -887,64 +845,6 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
|
||||
return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
|
||||
}
|
||||
|
||||
static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
|
||||
{
|
||||
struct kvm_shadow_walk_iterator iterator;
|
||||
struct kvm_mmu_page *sp;
|
||||
u64 old_spte;
|
||||
int level;
|
||||
u64 *sptep;
|
||||
|
||||
vcpu_clear_mmio_info(vcpu, gva);
|
||||
|
||||
/*
|
||||
* No need to check return value here, rmap_can_add() can
|
||||
* help us to skip pte prefetch later.
|
||||
*/
|
||||
mmu_topup_memory_caches(vcpu, true);
|
||||
|
||||
if (!VALID_PAGE(root_hpa)) {
|
||||
WARN_ON(1);
|
||||
return;
|
||||
}
|
||||
|
||||
write_lock(&vcpu->kvm->mmu_lock);
|
||||
for_each_shadow_entry_using_root(vcpu, root_hpa, gva, iterator) {
|
||||
level = iterator.level;
|
||||
sptep = iterator.sptep;
|
||||
|
||||
sp = sptep_to_sp(sptep);
|
||||
old_spte = *sptep;
|
||||
if (is_last_spte(old_spte, level)) {
|
||||
pt_element_t gpte;
|
||||
gpa_t pte_gpa;
|
||||
|
||||
if (!sp->unsync)
|
||||
break;
|
||||
|
||||
pte_gpa = FNAME(get_level1_sp_gpa)(sp);
|
||||
pte_gpa += spte_index(sptep) * sizeof(pt_element_t);
|
||||
|
||||
mmu_page_zap_pte(vcpu->kvm, sp, sptep, NULL);
|
||||
if (is_shadow_present_pte(old_spte))
|
||||
kvm_flush_remote_tlbs_sptep(vcpu->kvm, sptep);
|
||||
|
||||
if (!rmap_can_add(vcpu))
|
||||
break;
|
||||
|
||||
if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
|
||||
sizeof(pt_element_t)))
|
||||
break;
|
||||
|
||||
FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
|
||||
}
|
||||
|
||||
if (!sp->unsync_children)
|
||||
break;
|
||||
}
|
||||
write_unlock(&vcpu->kvm->mmu_lock);
|
||||
}
|
||||
|
||||
/* Note, @addr is a GPA when gva_to_gpa() translates an L2 GPA to an L1 GPA. */
|
||||
static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
gpa_t addr, u64 access,
|
||||
@ -977,114 +877,75 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
* can't change unless all sptes pointing to it are nuked first.
|
||||
*
|
||||
* Returns
|
||||
* < 0: the sp should be zapped
|
||||
* 0: the sp is synced and no tlb flushing is required
|
||||
* > 0: the sp is synced and tlb flushing is required
|
||||
* < 0: failed to sync spte
|
||||
* 0: the spte is synced and no tlb flushing is required
|
||||
* > 0: the spte is synced and tlb flushing is required
|
||||
*/
|
||||
static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
||||
static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i)
|
||||
{
|
||||
union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role;
|
||||
int i;
|
||||
bool host_writable;
|
||||
gpa_t first_pte_gpa;
|
||||
bool flush = false;
|
||||
u64 *sptep, spte;
|
||||
struct kvm_memory_slot *slot;
|
||||
unsigned pte_access;
|
||||
pt_element_t gpte;
|
||||
gpa_t pte_gpa;
|
||||
gfn_t gfn;
|
||||
|
||||
/*
|
||||
* Ignore various flags when verifying that it's safe to sync a shadow
|
||||
* page using the current MMU context.
|
||||
*
|
||||
* - level: not part of the overall MMU role and will never match as the MMU's
|
||||
* level tracks the root level
|
||||
* - access: updated based on the new guest PTE
|
||||
* - quadrant: not part of the overall MMU role (similar to level)
|
||||
*/
|
||||
const union kvm_mmu_page_role sync_role_ign = {
|
||||
.level = 0xf,
|
||||
.access = 0x7,
|
||||
.quadrant = 0x3,
|
||||
.passthrough = 0x1,
|
||||
};
|
||||
|
||||
/*
|
||||
* Direct pages can never be unsync, and KVM should never attempt to
|
||||
* sync a shadow page for a different MMU context, e.g. if the role
|
||||
* differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
|
||||
* reserved bits checks will be wrong, etc...
|
||||
*/
|
||||
if (WARN_ON_ONCE(sp->role.direct ||
|
||||
(sp->role.word ^ root_role.word) & ~sync_role_ign.word))
|
||||
return -1;
|
||||
if (WARN_ON_ONCE(!sp->spt[i]))
|
||||
return 0;
|
||||
|
||||
first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
|
||||
pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
|
||||
|
||||
for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
|
||||
u64 *sptep, spte;
|
||||
struct kvm_memory_slot *slot;
|
||||
unsigned pte_access;
|
||||
pt_element_t gpte;
|
||||
gpa_t pte_gpa;
|
||||
gfn_t gfn;
|
||||
if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
|
||||
sizeof(pt_element_t)))
|
||||
return -1;
|
||||
|
||||
if (!sp->spt[i])
|
||||
continue;
|
||||
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte))
|
||||
return 1;
|
||||
|
||||
pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
|
||||
gfn = gpte_to_gfn(gpte);
|
||||
pte_access = sp->role.access;
|
||||
pte_access &= FNAME(gpte_access)(gpte);
|
||||
FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
|
||||
|
||||
if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
|
||||
sizeof(pt_element_t)))
|
||||
return -1;
|
||||
|
||||
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
|
||||
flush = true;
|
||||
continue;
|
||||
}
|
||||
|
||||
gfn = gpte_to_gfn(gpte);
|
||||
pte_access = sp->role.access;
|
||||
pte_access &= FNAME(gpte_access)(gpte);
|
||||
FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
|
||||
|
||||
if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
|
||||
continue;
|
||||
|
||||
/*
|
||||
* Drop the SPTE if the new protections would result in a RWX=0
|
||||
* SPTE or if the gfn is changing. The RWX=0 case only affects
|
||||
* EPT with execute-only support, i.e. EPT without an effective
|
||||
* "present" bit, as all other paging modes will create a
|
||||
* read-only SPTE if pte_access is zero.
|
||||
*/
|
||||
if ((!pte_access && !shadow_present_mask) ||
|
||||
gfn != kvm_mmu_page_get_gfn(sp, i)) {
|
||||
drop_spte(vcpu->kvm, &sp->spt[i]);
|
||||
flush = true;
|
||||
continue;
|
||||
}
|
||||
|
||||
/* Update the shadowed access bits in case they changed. */
|
||||
kvm_mmu_page_set_access(sp, i, pte_access);
|
||||
|
||||
sptep = &sp->spt[i];
|
||||
spte = *sptep;
|
||||
host_writable = spte & shadow_host_writable_mask;
|
||||
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
|
||||
make_spte(vcpu, sp, slot, pte_access, gfn,
|
||||
spte_to_pfn(spte), spte, true, false,
|
||||
host_writable, &spte);
|
||||
|
||||
flush |= mmu_spte_update(sptep, spte);
|
||||
}
|
||||
if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* Note, any flush is purely for KVM's correctness, e.g. when dropping
|
||||
* an existing SPTE or clearing W/A/D bits to ensure an mmu_notifier
|
||||
* unmap or dirty logging event doesn't fail to flush. The guest is
|
||||
* responsible for flushing the TLB to ensure any changes in protection
|
||||
* bits are recognized, i.e. until the guest flushes or page faults on
|
||||
* a relevant address, KVM is architecturally allowed to let vCPUs use
|
||||
* cached translations with the old protection bits.
|
||||
* Drop the SPTE if the new protections would result in a RWX=0
|
||||
* SPTE or if the gfn is changing. The RWX=0 case only affects
|
||||
* EPT with execute-only support, i.e. EPT without an effective
|
||||
* "present" bit, as all other paging modes will create a
|
||||
* read-only SPTE if pte_access is zero.
|
||||
*/
|
||||
return flush;
|
||||
if ((!pte_access && !shadow_present_mask) ||
|
||||
gfn != kvm_mmu_page_get_gfn(sp, i)) {
|
||||
drop_spte(vcpu->kvm, &sp->spt[i]);
|
||||
return 1;
|
||||
}
|
||||
/*
|
||||
* Do nothing if the permissions are unchanged. The existing SPTE is
|
||||
* still, and prefetch_invalid_gpte() has verified that the A/D bits
|
||||
* are set in the "new" gPTE, i.e. there is no danger of missing an A/D
|
||||
* update due to A/D bits being set in the SPTE but not the gPTE.
|
||||
*/
|
||||
if (kvm_mmu_page_get_access(sp, i) == pte_access)
|
||||
return 0;
|
||||
|
||||
/* Update the shadowed access bits in case they changed. */
|
||||
kvm_mmu_page_set_access(sp, i, pte_access);
|
||||
|
||||
sptep = &sp->spt[i];
|
||||
spte = *sptep;
|
||||
host_writable = spte & shadow_host_writable_mask;
|
||||
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
|
||||
make_spte(vcpu, sp, slot, pte_access, gfn,
|
||||
spte_to_pfn(spte), spte, true, false,
|
||||
host_writable, &spte);
|
||||
|
||||
return mmu_spte_update(sptep, spte);
|
||||
}
|
||||
|
||||
#undef pt_element_t
|
||||
|
@ -164,7 +164,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
||||
/*
|
||||
* For simplicity, enforce the NX huge page mitigation even if not
|
||||
* strictly necessary. KVM could ignore the mitigation if paging is
|
||||
* disabled in the guest, as the guest doesn't have an page tables to
|
||||
* disabled in the guest, as the guest doesn't have any page tables to
|
||||
* abuse. But to safely ignore the mitigation, KVM would have to
|
||||
* ensure a new MMU is loaded (or all shadow pages zapped) when CR0.PG
|
||||
* is toggled on, and that's a net negative for performance when TDP is
|
||||
|
@ -29,29 +29,49 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte)
|
||||
WRITE_ONCE(*rcu_dereference(sptep), new_spte);
|
||||
}
|
||||
|
||||
/*
|
||||
* SPTEs must be modified atomically if they are shadow-present, leaf
|
||||
* SPTEs, and have volatile bits, i.e. has bits that can be set outside
|
||||
* of mmu_lock. The Writable bit can be set by KVM's fast page fault
|
||||
* handler, and Accessed and Dirty bits can be set by the CPU.
|
||||
*
|
||||
* Note, non-leaf SPTEs do have Accessed bits and those bits are
|
||||
* technically volatile, but KVM doesn't consume the Accessed bit of
|
||||
* non-leaf SPTEs, i.e. KVM doesn't care if it clobbers the bit. This
|
||||
* logic needs to be reassessed if KVM were to use non-leaf Accessed
|
||||
* bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
|
||||
*/
|
||||
static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level)
|
||||
{
|
||||
return is_shadow_present_pte(old_spte) &&
|
||||
is_last_spte(old_spte, level) &&
|
||||
spte_has_volatile_bits(old_spte);
|
||||
}
|
||||
|
||||
static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
|
||||
u64 new_spte, int level)
|
||||
{
|
||||
/*
|
||||
* Atomically write the SPTE if it is a shadow-present, leaf SPTE with
|
||||
* volatile bits, i.e. has bits that can be set outside of mmu_lock.
|
||||
* The Writable bit can be set by KVM's fast page fault handler, and
|
||||
* Accessed and Dirty bits can be set by the CPU.
|
||||
*
|
||||
* Note, non-leaf SPTEs do have Accessed bits and those bits are
|
||||
* technically volatile, but KVM doesn't consume the Accessed bit of
|
||||
* non-leaf SPTEs, i.e. KVM doesn't care if it clobbers the bit. This
|
||||
* logic needs to be reassessed if KVM were to use non-leaf Accessed
|
||||
* bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
|
||||
*/
|
||||
if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
|
||||
spte_has_volatile_bits(old_spte))
|
||||
if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level))
|
||||
return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);
|
||||
|
||||
__kvm_tdp_mmu_write_spte(sptep, new_spte);
|
||||
return old_spte;
|
||||
}
|
||||
|
||||
static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte,
|
||||
u64 mask, int level)
|
||||
{
|
||||
atomic64_t *sptep_atomic;
|
||||
|
||||
if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) {
|
||||
sptep_atomic = (atomic64_t *)rcu_dereference(sptep);
|
||||
return (u64)atomic64_fetch_and(~mask, sptep_atomic);
|
||||
}
|
||||
|
||||
__kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask);
|
||||
return old_spte;
|
||||
}
|
||||
|
||||
/*
|
||||
* A TDP iterator performs a pre-order walk over a TDP paging structure.
|
||||
*/
|
||||
|
@ -334,35 +334,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
|
||||
u64 old_spte, u64 new_spte, int level,
|
||||
bool shared);
|
||||
|
||||
static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
|
||||
{
|
||||
if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
|
||||
return;
|
||||
|
||||
if (is_accessed_spte(old_spte) &&
|
||||
(!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
|
||||
spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
|
||||
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
|
||||
}
|
||||
|
||||
static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
|
||||
u64 old_spte, u64 new_spte, int level)
|
||||
{
|
||||
bool pfn_changed;
|
||||
struct kvm_memory_slot *slot;
|
||||
|
||||
if (level > PG_LEVEL_4K)
|
||||
return;
|
||||
|
||||
pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
|
||||
|
||||
if ((!is_writable_pte(old_spte) || pfn_changed) &&
|
||||
is_writable_pte(new_spte)) {
|
||||
slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
|
||||
mark_page_dirty_in_slot(kvm, slot, gfn);
|
||||
}
|
||||
}
|
||||
|
||||
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
{
|
||||
kvm_account_pgtable_pages((void *)sp->spt, +1);
|
||||
@ -505,7 +476,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
|
||||
}
|
||||
|
||||
/**
|
||||
* __handle_changed_spte - handle bookkeeping associated with an SPTE change
|
||||
* handle_changed_spte - handle bookkeeping associated with an SPTE change
|
||||
* @kvm: kvm instance
|
||||
* @as_id: the address space of the paging structure the SPTE was a part of
|
||||
* @gfn: the base GFN that was mapped by the SPTE
|
||||
@ -516,12 +487,13 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
|
||||
* the MMU lock and the operation must synchronize with other
|
||||
* threads that might be modifying SPTEs.
|
||||
*
|
||||
* Handle bookkeeping that might result from the modification of a SPTE.
|
||||
* This function must be called for all TDP SPTE modifications.
|
||||
* Handle bookkeeping that might result from the modification of a SPTE. Note,
|
||||
* dirty logging updates are handled in common code, not here (see make_spte()
|
||||
* and fast_pf_fix_direct_spte()).
|
||||
*/
|
||||
static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
|
||||
u64 old_spte, u64 new_spte, int level,
|
||||
bool shared)
|
||||
static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
|
||||
u64 old_spte, u64 new_spte, int level,
|
||||
bool shared)
|
||||
{
|
||||
bool was_present = is_shadow_present_pte(old_spte);
|
||||
bool is_present = is_shadow_present_pte(new_spte);
|
||||
@ -605,17 +577,10 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
|
||||
if (was_present && !was_leaf &&
|
||||
(is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
|
||||
handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
|
||||
}
|
||||
|
||||
static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
|
||||
u64 old_spte, u64 new_spte, int level,
|
||||
bool shared)
|
||||
{
|
||||
__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
|
||||
shared);
|
||||
handle_changed_spte_acc_track(old_spte, new_spte, level);
|
||||
handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
|
||||
new_spte, level);
|
||||
if (was_leaf && is_accessed_spte(old_spte) &&
|
||||
(!is_present || !is_accessed_spte(new_spte) || pfn_changed))
|
||||
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
|
||||
}
|
||||
|
||||
/*
|
||||
@ -658,9 +623,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
|
||||
if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
|
||||
return -EBUSY;
|
||||
|
||||
__handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
|
||||
new_spte, iter->level, true);
|
||||
handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
|
||||
handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
|
||||
new_spte, iter->level, true);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -696,7 +660,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
|
||||
|
||||
|
||||
/*
|
||||
* __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
|
||||
* tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
|
||||
* @kvm: KVM instance
|
||||
* @as_id: Address space ID, i.e. regular vs. SMM
|
||||
* @sptep: Pointer to the SPTE
|
||||
@ -704,23 +668,12 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
|
||||
* @new_spte: The new value that will be set for the SPTE
|
||||
* @gfn: The base GFN that was (or will be) mapped by the SPTE
|
||||
* @level: The level _containing_ the SPTE (its parent PT's level)
|
||||
* @record_acc_track: Notify the MM subsystem of changes to the accessed state
|
||||
* of the page. Should be set unless handling an MMU
|
||||
* notifier for access tracking. Leaving record_acc_track
|
||||
* unset in that case prevents page accesses from being
|
||||
* double counted.
|
||||
* @record_dirty_log: Record the page as dirty in the dirty bitmap if
|
||||
* appropriate for the change being made. Should be set
|
||||
* unless performing certain dirty logging operations.
|
||||
* Leaving record_dirty_log unset in that case prevents page
|
||||
* writes from being double counted.
|
||||
*
|
||||
* Returns the old SPTE value, which _may_ be different than @old_spte if the
|
||||
* SPTE had voldatile bits.
|
||||
*/
|
||||
static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
|
||||
u64 old_spte, u64 new_spte, gfn_t gfn, int level,
|
||||
bool record_acc_track, bool record_dirty_log)
|
||||
static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
|
||||
u64 old_spte, u64 new_spte, gfn_t gfn, int level)
|
||||
{
|
||||
lockdep_assert_held_write(&kvm->mmu_lock);
|
||||
|
||||
@ -735,46 +688,17 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
|
||||
|
||||
old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
|
||||
|
||||
__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
|
||||
|
||||
if (record_acc_track)
|
||||
handle_changed_spte_acc_track(old_spte, new_spte, level);
|
||||
if (record_dirty_log)
|
||||
handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
|
||||
new_spte, level);
|
||||
handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
|
||||
return old_spte;
|
||||
}
|
||||
|
||||
static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
|
||||
u64 new_spte, bool record_acc_track,
|
||||
bool record_dirty_log)
|
||||
static inline void tdp_mmu_iter_set_spte(struct kvm *kvm, struct tdp_iter *iter,
|
||||
u64 new_spte)
|
||||
{
|
||||
WARN_ON_ONCE(iter->yielded);
|
||||
|
||||
iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
|
||||
iter->old_spte, new_spte,
|
||||
iter->gfn, iter->level,
|
||||
record_acc_track, record_dirty_log);
|
||||
}
|
||||
|
||||
static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
|
||||
u64 new_spte)
|
||||
{
|
||||
_tdp_mmu_set_spte(kvm, iter, new_spte, true, true);
|
||||
}
|
||||
|
||||
static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
|
||||
struct tdp_iter *iter,
|
||||
u64 new_spte)
|
||||
{
|
||||
_tdp_mmu_set_spte(kvm, iter, new_spte, false, true);
|
||||
}
|
||||
|
||||
static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
|
||||
struct tdp_iter *iter,
|
||||
u64 new_spte)
|
||||
{
|
||||
_tdp_mmu_set_spte(kvm, iter, new_spte, true, false);
|
||||
iter->old_spte = tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
|
||||
iter->old_spte, new_spte,
|
||||
iter->gfn, iter->level);
|
||||
}
|
||||
|
||||
#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
|
||||
@ -866,7 +790,7 @@ retry:
|
||||
continue;
|
||||
|
||||
if (!shared)
|
||||
tdp_mmu_set_spte(kvm, &iter, 0);
|
||||
tdp_mmu_iter_set_spte(kvm, &iter, 0);
|
||||
else if (tdp_mmu_set_spte_atomic(kvm, &iter, 0))
|
||||
goto retry;
|
||||
}
|
||||
@ -923,8 +847,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
|
||||
return false;
|
||||
|
||||
__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
|
||||
sp->gfn, sp->role.level + 1, true, true);
|
||||
tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
|
||||
sp->gfn, sp->role.level + 1);
|
||||
|
||||
return true;
|
||||
}
|
||||
@ -958,7 +882,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
!is_last_spte(iter.old_spte, iter.level))
|
||||
continue;
|
||||
|
||||
tdp_mmu_set_spte(kvm, &iter, 0);
|
||||
tdp_mmu_iter_set_spte(kvm, &iter, 0);
|
||||
flush = true;
|
||||
}
|
||||
|
||||
@ -1128,7 +1052,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
|
||||
if (ret)
|
||||
return ret;
|
||||
} else {
|
||||
tdp_mmu_set_spte(kvm, iter, spte);
|
||||
tdp_mmu_iter_set_spte(kvm, iter, spte);
|
||||
}
|
||||
|
||||
tdp_account_mmu_page(kvm, sp);
|
||||
@ -1262,33 +1186,42 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
|
||||
/*
|
||||
* Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero
|
||||
* if any of the GFNs in the range have been accessed.
|
||||
*
|
||||
* No need to mark the corresponding PFN as accessed as this call is coming
|
||||
* from the clear_young() or clear_flush_young() notifier, which uses the
|
||||
* return value to determine if the page has been accessed.
|
||||
*/
|
||||
static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
|
||||
struct kvm_gfn_range *range)
|
||||
{
|
||||
u64 new_spte = 0;
|
||||
u64 new_spte;
|
||||
|
||||
/* If we have a non-accessed entry we don't need to change the pte. */
|
||||
if (!is_accessed_spte(iter->old_spte))
|
||||
return false;
|
||||
|
||||
new_spte = iter->old_spte;
|
||||
|
||||
if (spte_ad_enabled(new_spte)) {
|
||||
new_spte &= ~shadow_accessed_mask;
|
||||
if (spte_ad_enabled(iter->old_spte)) {
|
||||
iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep,
|
||||
iter->old_spte,
|
||||
shadow_accessed_mask,
|
||||
iter->level);
|
||||
new_spte = iter->old_spte & ~shadow_accessed_mask;
|
||||
} else {
|
||||
/*
|
||||
* Capture the dirty status of the page, so that it doesn't get
|
||||
* lost when the SPTE is marked for access tracking.
|
||||
*/
|
||||
if (is_writable_pte(new_spte))
|
||||
kvm_set_pfn_dirty(spte_to_pfn(new_spte));
|
||||
if (is_writable_pte(iter->old_spte))
|
||||
kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte));
|
||||
|
||||
new_spte = mark_spte_for_access_track(new_spte);
|
||||
new_spte = mark_spte_for_access_track(iter->old_spte);
|
||||
iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep,
|
||||
iter->old_spte, new_spte,
|
||||
iter->level);
|
||||
}
|
||||
|
||||
tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
|
||||
|
||||
trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level,
|
||||
iter->old_spte, new_spte);
|
||||
return true;
|
||||
}
|
||||
|
||||
@ -1324,15 +1257,15 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
|
||||
* Note, when changing a read-only SPTE, it's not strictly necessary to
|
||||
* zero the SPTE before setting the new PFN, but doing so preserves the
|
||||
* invariant that the PFN of a present * leaf SPTE can never change.
|
||||
* See __handle_changed_spte().
|
||||
* See handle_changed_spte().
|
||||
*/
|
||||
tdp_mmu_set_spte(kvm, iter, 0);
|
||||
tdp_mmu_iter_set_spte(kvm, iter, 0);
|
||||
|
||||
if (!pte_write(range->pte)) {
|
||||
new_spte = kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
|
||||
pte_pfn(range->pte));
|
||||
|
||||
tdp_mmu_set_spte(kvm, iter, new_spte);
|
||||
tdp_mmu_iter_set_spte(kvm, iter, new_spte);
|
||||
}
|
||||
|
||||
return true;
|
||||
@ -1349,7 +1282,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
|
||||
/*
|
||||
* No need to handle the remote TLB flush under RCU protection, the
|
||||
* target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
|
||||
* shadow page. See the WARN on pfn_changed in __handle_changed_spte().
|
||||
* shadow page. See the WARN on pfn_changed in handle_changed_spte().
|
||||
*/
|
||||
return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
|
||||
}
|
||||
@ -1607,8 +1540,8 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm,
|
||||
static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
gfn_t start, gfn_t end)
|
||||
{
|
||||
u64 dbit = kvm_ad_enabled() ? shadow_dirty_mask : PT_WRITABLE_MASK;
|
||||
struct tdp_iter iter;
|
||||
u64 new_spte;
|
||||
bool spte_set = false;
|
||||
|
||||
rcu_read_lock();
|
||||
@ -1621,19 +1554,13 @@ retry:
|
||||
if (!is_shadow_present_pte(iter.old_spte))
|
||||
continue;
|
||||
|
||||
if (spte_ad_need_write_protect(iter.old_spte)) {
|
||||
if (is_writable_pte(iter.old_spte))
|
||||
new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
|
||||
else
|
||||
continue;
|
||||
} else {
|
||||
if (iter.old_spte & shadow_dirty_mask)
|
||||
new_spte = iter.old_spte & ~shadow_dirty_mask;
|
||||
else
|
||||
continue;
|
||||
}
|
||||
MMU_WARN_ON(kvm_ad_enabled() &&
|
||||
spte_ad_need_write_protect(iter.old_spte));
|
||||
|
||||
if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
|
||||
if (!(iter.old_spte & dbit))
|
||||
continue;
|
||||
|
||||
if (tdp_mmu_set_spte_atomic(kvm, &iter, iter.old_spte & ~dbit))
|
||||
goto retry;
|
||||
|
||||
spte_set = true;
|
||||
@ -1675,8 +1602,9 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
|
||||
static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
gfn_t gfn, unsigned long mask, bool wrprot)
|
||||
{
|
||||
u64 dbit = (wrprot || !kvm_ad_enabled()) ? PT_WRITABLE_MASK :
|
||||
shadow_dirty_mask;
|
||||
struct tdp_iter iter;
|
||||
u64 new_spte;
|
||||
|
||||
rcu_read_lock();
|
||||
|
||||
@ -1685,25 +1613,26 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
if (!mask)
|
||||
break;
|
||||
|
||||
MMU_WARN_ON(kvm_ad_enabled() &&
|
||||
spte_ad_need_write_protect(iter.old_spte));
|
||||
|
||||
if (iter.level > PG_LEVEL_4K ||
|
||||
!(mask & (1UL << (iter.gfn - gfn))))
|
||||
continue;
|
||||
|
||||
mask &= ~(1UL << (iter.gfn - gfn));
|
||||
|
||||
if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
|
||||
if (is_writable_pte(iter.old_spte))
|
||||
new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
|
||||
else
|
||||
continue;
|
||||
} else {
|
||||
if (iter.old_spte & shadow_dirty_mask)
|
||||
new_spte = iter.old_spte & ~shadow_dirty_mask;
|
||||
else
|
||||
continue;
|
||||
}
|
||||
if (!(iter.old_spte & dbit))
|
||||
continue;
|
||||
|
||||
tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
|
||||
iter.old_spte = tdp_mmu_clear_spte_bits(iter.sptep,
|
||||
iter.old_spte, dbit,
|
||||
iter.level);
|
||||
|
||||
trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
|
||||
iter.old_spte,
|
||||
iter.old_spte & ~dbit);
|
||||
kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
|
||||
}
|
||||
|
||||
rcu_read_unlock();
|
||||
@ -1821,7 +1750,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
if (new_spte == iter.old_spte)
|
||||
break;
|
||||
|
||||
tdp_mmu_set_spte(kvm, &iter, new_spte);
|
||||
tdp_mmu_iter_set_spte(kvm, &iter, new_spte);
|
||||
spte_set = true;
|
||||
}
|
||||
|
||||
|
@ -93,7 +93,7 @@ void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
|
||||
#undef __KVM_X86_PMU_OP
|
||||
}
|
||||
|
||||
static inline bool pmc_is_enabled(struct kvm_pmc *pmc)
|
||||
static inline bool pmc_is_globally_enabled(struct kvm_pmc *pmc)
|
||||
{
|
||||
return static_call(kvm_x86_pmu_pmc_is_enabled)(pmc);
|
||||
}
|
||||
@ -400,6 +400,12 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
|
||||
return is_fixed_event_allowed(filter, pmc->idx);
|
||||
}
|
||||
|
||||
static bool pmc_event_is_allowed(struct kvm_pmc *pmc)
|
||||
{
|
||||
return pmc_is_globally_enabled(pmc) && pmc_speculative_in_use(pmc) &&
|
||||
check_pmu_event_filter(pmc);
|
||||
}
|
||||
|
||||
static void reprogram_counter(struct kvm_pmc *pmc)
|
||||
{
|
||||
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
|
||||
@ -409,10 +415,7 @@ static void reprogram_counter(struct kvm_pmc *pmc)
|
||||
|
||||
pmc_pause_counter(pmc);
|
||||
|
||||
if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
|
||||
goto reprogram_complete;
|
||||
|
||||
if (!check_pmu_event_filter(pmc))
|
||||
if (!pmc_event_is_allowed(pmc))
|
||||
goto reprogram_complete;
|
||||
|
||||
if (pmc->counter < pmc->prev_counter)
|
||||
@ -540,9 +543,9 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
|
||||
if (!pmc)
|
||||
return 1;
|
||||
|
||||
if (!(kvm_read_cr4(vcpu) & X86_CR4_PCE) &&
|
||||
if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_PCE) &&
|
||||
(static_call(kvm_x86_get_cpl)(vcpu) != 0) &&
|
||||
(kvm_read_cr0(vcpu) & X86_CR0_PE))
|
||||
kvm_is_cr0_bit_set(vcpu, X86_CR0_PE))
|
||||
return 1;
|
||||
|
||||
*data = pmc_read_counter(pmc) & mask;
|
||||
@ -589,6 +592,10 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
*/
|
||||
void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
|
||||
return;
|
||||
|
||||
bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX);
|
||||
static_call(kvm_x86_pmu_refresh)(vcpu);
|
||||
}
|
||||
|
||||
@ -646,7 +653,7 @@ static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
|
||||
{
|
||||
pmc->prev_counter = pmc->counter;
|
||||
pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);
|
||||
kvm_pmu_request_counter_reprogam(pmc);
|
||||
kvm_pmu_request_counter_reprogram(pmc);
|
||||
}
|
||||
|
||||
static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
|
||||
@ -684,7 +691,7 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
|
||||
for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
|
||||
pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, i);
|
||||
|
||||
if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc))
|
||||
if (!pmc || !pmc_event_is_allowed(pmc))
|
||||
continue;
|
||||
|
||||
/* Ignore checks for edge detect, pin control, invert and CMASK bits */
|
||||
|
@ -195,7 +195,7 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
|
||||
KVM_PMC_MAX_FIXED);
|
||||
}
|
||||
|
||||
static inline void kvm_pmu_request_counter_reprogam(struct kvm_pmc *pmc)
|
||||
static inline void kvm_pmu_request_counter_reprogram(struct kvm_pmc *pmc)
|
||||
{
|
||||
set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi);
|
||||
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
|
||||
|
@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
|
||||
|
||||
if (g->int_ctl & V_INTR_MASKING_MASK) {
|
||||
/*
|
||||
* Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
|
||||
* does not affect any interrupt we may want to inject;
|
||||
* therefore, writes to CR8 are irrelevant to L0, as are
|
||||
* interrupt window vmexits.
|
||||
* If L2 is active and V_INTR_MASKING is enabled in vmcb12,
|
||||
* disable intercept of CR8 writes as L2's CR8 does not affect
|
||||
* any interrupt KVM may want to inject.
|
||||
*
|
||||
* Similarly, disable intercept of virtual interrupts (used to
|
||||
* detect interrupt windows) if the saved RFLAGS.IF is '0', as
|
||||
* the effective RFLAGS.IF for L1 interrupts will never be set
|
||||
* while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
|
||||
*/
|
||||
vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
|
||||
vmcb_clr_intercept(c, INTERCEPT_VINTR);
|
||||
if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
|
||||
vmcb_clr_intercept(c, INTERCEPT_VINTR);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -276,6 +281,11 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
|
||||
if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
|
||||
return false;
|
||||
|
||||
if (CC((control->int_ctl & V_NMI_ENABLE_MASK) &&
|
||||
!vmcb12_is_intercept(control, INTERCEPT_NMI))) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
@ -416,22 +426,24 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
|
||||
|
||||
/* Only a few fields of int_ctl are written by the processor. */
|
||||
mask = V_IRQ_MASK | V_TPR_MASK;
|
||||
if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
|
||||
svm_is_intercept(svm, INTERCEPT_VINTR)) {
|
||||
/*
|
||||
* In order to request an interrupt window, L0 is usurping
|
||||
* svm->vmcb->control.int_ctl and possibly setting V_IRQ
|
||||
* even if it was clear in L1's VMCB. Restoring it would be
|
||||
* wrong. However, in this case V_IRQ will remain true until
|
||||
* interrupt_window_interception calls svm_clear_vintr and
|
||||
* restores int_ctl. We can just leave it aside.
|
||||
*/
|
||||
/*
|
||||
* Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting
|
||||
* virtual interrupts in order to request an interrupt window, as KVM
|
||||
* has usurped vmcb02's int_ctl. If an interrupt window opens before
|
||||
* the next VM-Exit, svm_clear_vintr() will restore vmcb12's int_ctl.
|
||||
* If no window opens, V_IRQ will be correctly preserved in vmcb12's
|
||||
* int_ctl (because it was never recognized while L2 was running).
|
||||
*/
|
||||
if (svm_is_intercept(svm, INTERCEPT_VINTR) &&
|
||||
!test_bit(INTERCEPT_VINTR, (unsigned long *)svm->nested.ctl.intercepts))
|
||||
mask &= ~V_IRQ_MASK;
|
||||
}
|
||||
|
||||
if (nested_vgif_enabled(svm))
|
||||
mask |= V_GIF_MASK;
|
||||
|
||||
if (nested_vnmi_enabled(svm))
|
||||
mask |= V_NMI_BLOCKING_MASK | V_NMI_PENDING_MASK;
|
||||
|
||||
svm->nested.ctl.int_ctl &= ~mask;
|
||||
svm->nested.ctl.int_ctl |= svm->vmcb->control.int_ctl & mask;
|
||||
}
|
||||
@ -651,6 +663,17 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
|
||||
else
|
||||
int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
|
||||
|
||||
if (vnmi) {
|
||||
if (vmcb01->control.int_ctl & V_NMI_PENDING_MASK) {
|
||||
svm->vcpu.arch.nmi_pending++;
|
||||
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
|
||||
}
|
||||
if (nested_vnmi_enabled(svm))
|
||||
int_ctl_vmcb12_bits |= (V_NMI_PENDING_MASK |
|
||||
V_NMI_ENABLE_MASK |
|
||||
V_NMI_BLOCKING_MASK);
|
||||
}
|
||||
|
||||
/* Copied from vmcb01. msrpm_base can be overwritten later. */
|
||||
vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
|
||||
vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
|
||||
@ -1021,6 +1044,28 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
|
||||
|
||||
svm_switch_vmcb(svm, &svm->vmcb01);
|
||||
|
||||
/*
|
||||
* Rules for synchronizing int_ctl bits from vmcb02 to vmcb01:
|
||||
*
|
||||
* V_IRQ, V_IRQ_VECTOR, V_INTR_PRIO_MASK, V_IGN_TPR: If L1 doesn't
|
||||
* intercept interrupts, then KVM will use vmcb02's V_IRQ (and related
|
||||
* flags) to detect interrupt windows for L1 IRQs (even if L1 uses
|
||||
* virtual interrupt masking). Raise KVM_REQ_EVENT to ensure that
|
||||
* KVM re-requests an interrupt window if necessary, which implicitly
|
||||
* copies this bits from vmcb02 to vmcb01.
|
||||
*
|
||||
* V_TPR: If L1 doesn't use virtual interrupt masking, then L1's vTPR
|
||||
* is stored in vmcb02, but its value doesn't need to be copied from/to
|
||||
* vmcb01 because it is copied from/to the virtual APIC's TPR register
|
||||
* on each VM entry/exit.
|
||||
*
|
||||
* V_GIF: If nested vGIF is not used, KVM uses vmcb02's V_GIF for L1's
|
||||
* V_GIF. However, GIF is architecturally clear on each VM exit, thus
|
||||
* there is no need to copy V_GIF from vmcb02 to vmcb01.
|
||||
*/
|
||||
if (!nested_exit_on_intr(svm))
|
||||
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
|
||||
|
||||
if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
|
||||
svm_copy_lbrs(vmcb12, vmcb02);
|
||||
svm_update_lbrv(vcpu);
|
||||
@ -1029,6 +1074,20 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
|
||||
svm_update_lbrv(vcpu);
|
||||
}
|
||||
|
||||
if (vnmi) {
|
||||
if (vmcb02->control.int_ctl & V_NMI_BLOCKING_MASK)
|
||||
vmcb01->control.int_ctl |= V_NMI_BLOCKING_MASK;
|
||||
else
|
||||
vmcb01->control.int_ctl &= ~V_NMI_BLOCKING_MASK;
|
||||
|
||||
if (vcpu->arch.nmi_pending) {
|
||||
vcpu->arch.nmi_pending--;
|
||||
vmcb01->control.int_ctl |= V_NMI_PENDING_MASK;
|
||||
} else {
|
||||
vmcb01->control.int_ctl &= ~V_NMI_PENDING_MASK;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* On vmexit the GIF is set to false and
|
||||
* no event can be injected in L1.
|
||||
|
@ -161,7 +161,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
data &= ~pmu->reserved_bits;
|
||||
if (data != pmc->eventsel) {
|
||||
pmc->eventsel = data;
|
||||
kvm_pmu_request_counter_reprogam(pmc);
|
||||
kvm_pmu_request_counter_reprogram(pmc);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
@ -99,6 +99,7 @@ static const struct svm_direct_access_msrs {
|
||||
#endif
|
||||
{ .index = MSR_IA32_SPEC_CTRL, .always = false },
|
||||
{ .index = MSR_IA32_PRED_CMD, .always = false },
|
||||
{ .index = MSR_IA32_FLUSH_CMD, .always = false },
|
||||
{ .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
|
||||
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
|
||||
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
|
||||
@ -234,6 +235,8 @@ module_param(dump_invalid_vmcb, bool, 0644);
|
||||
bool intercept_smi = true;
|
||||
module_param(intercept_smi, bool, 0444);
|
||||
|
||||
bool vnmi = true;
|
||||
module_param(vnmi, bool, 0444);
|
||||
|
||||
static bool svm_gp_erratum_intercept = true;
|
||||
|
||||
@ -1315,6 +1318,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
|
||||
if (kvm_vcpu_apicv_active(vcpu))
|
||||
avic_init_vmcb(svm, vmcb);
|
||||
|
||||
if (vnmi)
|
||||
svm->vmcb->control.int_ctl |= V_NMI_ENABLE_MASK;
|
||||
|
||||
if (vgif) {
|
||||
svm_clr_intercept(svm, INTERCEPT_STGI);
|
||||
svm_clr_intercept(svm, INTERCEPT_CLGI);
|
||||
@ -1587,6 +1593,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
|
||||
|
||||
svm_set_intercept(svm, INTERCEPT_VINTR);
|
||||
|
||||
/*
|
||||
* Recalculating intercepts may have cleared the VINTR intercept. If
|
||||
* V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
|
||||
* for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
|
||||
* Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
|
||||
* interrupts will never be unblocked while L2 is running.
|
||||
*/
|
||||
if (!svm_is_intercept(svm, INTERCEPT_VINTR))
|
||||
return;
|
||||
|
||||
/*
|
||||
* This is just a dummy VINTR to actually cause a vmexit to happen.
|
||||
* Actual injection of virtual interrupts happens through EVENTINJ.
|
||||
@ -2484,16 +2500,29 @@ static int task_switch_interception(struct kvm_vcpu *vcpu)
|
||||
has_error_code, error_code);
|
||||
}
|
||||
|
||||
static void svm_clr_iret_intercept(struct vcpu_svm *svm)
|
||||
{
|
||||
if (!sev_es_guest(svm->vcpu.kvm))
|
||||
svm_clr_intercept(svm, INTERCEPT_IRET);
|
||||
}
|
||||
|
||||
static void svm_set_iret_intercept(struct vcpu_svm *svm)
|
||||
{
|
||||
if (!sev_es_guest(svm->vcpu.kvm))
|
||||
svm_set_intercept(svm, INTERCEPT_IRET);
|
||||
}
|
||||
|
||||
static int iret_interception(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
++vcpu->stat.nmi_window_exits;
|
||||
svm->awaiting_iret_completion = true;
|
||||
if (!sev_es_guest(vcpu->kvm)) {
|
||||
svm_clr_intercept(svm, INTERCEPT_IRET);
|
||||
|
||||
svm_clr_iret_intercept(svm);
|
||||
if (!sev_es_guest(vcpu->kvm))
|
||||
svm->nmi_iret_rip = kvm_rip_read(vcpu);
|
||||
}
|
||||
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
return 1;
|
||||
}
|
||||
@ -2876,7 +2905,7 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
|
||||
static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
int r;
|
||||
int ret = 0;
|
||||
|
||||
u32 ecx = msr->index;
|
||||
u64 data = msr->data;
|
||||
@ -2946,21 +2975,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
|
||||
*/
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
|
||||
break;
|
||||
case MSR_IA32_PRED_CMD:
|
||||
if (!msr->host_initiated &&
|
||||
!guest_has_pred_cmd_msr(vcpu))
|
||||
return 1;
|
||||
|
||||
if (data & ~PRED_CMD_IBPB)
|
||||
return 1;
|
||||
if (!boot_cpu_has(X86_FEATURE_IBPB))
|
||||
return 1;
|
||||
if (!data)
|
||||
break;
|
||||
|
||||
wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
|
||||
break;
|
||||
case MSR_AMD64_VIRT_SPEC_CTRL:
|
||||
if (!msr->host_initiated &&
|
||||
!guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
|
||||
@ -3013,10 +3027,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
|
||||
* guest via direct_access_msrs, and switch it via user return.
|
||||
*/
|
||||
preempt_disable();
|
||||
r = kvm_set_user_return_msr(tsc_aux_uret_slot, data, -1ull);
|
||||
ret = kvm_set_user_return_msr(tsc_aux_uret_slot, data, -1ull);
|
||||
preempt_enable();
|
||||
if (r)
|
||||
return 1;
|
||||
if (ret)
|
||||
break;
|
||||
|
||||
svm->tsc_aux = data;
|
||||
break;
|
||||
@ -3074,7 +3088,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
|
||||
default:
|
||||
return kvm_set_msr_common(vcpu, msr);
|
||||
}
|
||||
return 0;
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int msr_interception(struct kvm_vcpu *vcpu)
|
||||
@ -3485,11 +3499,43 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
|
||||
return;
|
||||
|
||||
svm->nmi_masked = true;
|
||||
if (!sev_es_guest(vcpu->kvm))
|
||||
svm_set_intercept(svm, INTERCEPT_IRET);
|
||||
svm_set_iret_intercept(svm);
|
||||
++vcpu->stat.nmi_injections;
|
||||
}
|
||||
|
||||
static bool svm_is_vnmi_pending(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
if (!is_vnmi_enabled(svm))
|
||||
return false;
|
||||
|
||||
return !!(svm->vmcb->control.int_ctl & V_NMI_BLOCKING_MASK);
|
||||
}
|
||||
|
||||
static bool svm_set_vnmi_pending(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
if (!is_vnmi_enabled(svm))
|
||||
return false;
|
||||
|
||||
if (svm->vmcb->control.int_ctl & V_NMI_PENDING_MASK)
|
||||
return false;
|
||||
|
||||
svm->vmcb->control.int_ctl |= V_NMI_PENDING_MASK;
|
||||
vmcb_mark_dirty(svm->vmcb, VMCB_INTR);
|
||||
|
||||
/*
|
||||
* Because the pending NMI is serviced by hardware, KVM can't know when
|
||||
* the NMI is "injected", but for all intents and purposes, passing the
|
||||
* NMI off to hardware counts as injection.
|
||||
*/
|
||||
++vcpu->stat.nmi_injections;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
@ -3585,6 +3631,35 @@ static void svm_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
|
||||
svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
|
||||
}
|
||||
|
||||
static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
if (is_vnmi_enabled(svm))
|
||||
return svm->vmcb->control.int_ctl & V_NMI_BLOCKING_MASK;
|
||||
else
|
||||
return svm->nmi_masked;
|
||||
}
|
||||
|
||||
static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
if (is_vnmi_enabled(svm)) {
|
||||
if (masked)
|
||||
svm->vmcb->control.int_ctl |= V_NMI_BLOCKING_MASK;
|
||||
else
|
||||
svm->vmcb->control.int_ctl &= ~V_NMI_BLOCKING_MASK;
|
||||
|
||||
} else {
|
||||
svm->nmi_masked = masked;
|
||||
if (masked)
|
||||
svm_set_iret_intercept(svm);
|
||||
else
|
||||
svm_clr_iret_intercept(svm);
|
||||
}
|
||||
}
|
||||
|
||||
bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
@ -3596,8 +3671,10 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
|
||||
if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
|
||||
return false;
|
||||
|
||||
return (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
|
||||
svm->nmi_masked;
|
||||
if (svm_get_nmi_mask(vcpu))
|
||||
return true;
|
||||
|
||||
return vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK;
|
||||
}
|
||||
|
||||
static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
|
||||
@ -3615,26 +3692,6 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
|
||||
return 1;
|
||||
}
|
||||
|
||||
static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return to_svm(vcpu)->nmi_masked;
|
||||
}
|
||||
|
||||
static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
if (masked) {
|
||||
svm->nmi_masked = true;
|
||||
if (!sev_es_guest(vcpu->kvm))
|
||||
svm_set_intercept(svm, INTERCEPT_IRET);
|
||||
} else {
|
||||
svm->nmi_masked = false;
|
||||
if (!sev_es_guest(vcpu->kvm))
|
||||
svm_clr_intercept(svm, INTERCEPT_IRET);
|
||||
}
|
||||
}
|
||||
|
||||
bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
@ -3715,7 +3772,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
if (svm->nmi_masked && !svm->awaiting_iret_completion)
|
||||
/*
|
||||
* KVM should never request an NMI window when vNMI is enabled, as KVM
|
||||
* allows at most one to-be-injected NMI and one pending NMI, i.e. if
|
||||
* two NMIs arrive simultaneously, KVM will inject one and set
|
||||
* V_NMI_PENDING for the other. WARN, but continue with the standard
|
||||
* single-step approach to try and salvage the pending NMI.
|
||||
*/
|
||||
WARN_ON_ONCE(is_vnmi_enabled(svm));
|
||||
|
||||
if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
|
||||
return; /* IRET will cause a vm exit */
|
||||
|
||||
if (!gif_set(svm)) {
|
||||
@ -3777,13 +3843,13 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
/*
|
||||
* When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
|
||||
* flushes should be routed to hv_remote_flush_tlb() without requesting
|
||||
* flushes should be routed to hv_flush_remote_tlbs() without requesting
|
||||
* a "regular" remote flush. Reaching this point means either there's
|
||||
* a KVM bug or a prior hv_remote_flush_tlb() call failed, both of
|
||||
* a KVM bug or a prior hv_flush_remote_tlbs() call failed, both of
|
||||
* which might be fatal to the guest. Yell, but try to recover.
|
||||
*/
|
||||
if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
|
||||
hv_remote_flush_tlb(vcpu->kvm);
|
||||
hv_flush_remote_tlbs(vcpu->kvm);
|
||||
|
||||
svm_flush_tlb_asid(vcpu);
|
||||
}
|
||||
@ -4142,7 +4208,7 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
|
||||
{
|
||||
switch (index) {
|
||||
case MSR_IA32_MCG_EXT_CTL:
|
||||
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
|
||||
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
|
||||
return false;
|
||||
case MSR_IA32_SMBASE:
|
||||
if (!IS_ENABLED(CONFIG_KVM_SMM))
|
||||
@ -4184,8 +4250,18 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
|
||||
svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
|
||||
|
||||
svm->vnmi_enabled = vnmi && guest_cpuid_has(vcpu, X86_FEATURE_VNMI);
|
||||
|
||||
svm_recalc_instruction_intercepts(vcpu, svm);
|
||||
|
||||
if (boot_cpu_has(X86_FEATURE_IBPB))
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PRED_CMD, 0,
|
||||
!!guest_has_pred_cmd_msr(vcpu));
|
||||
|
||||
if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
|
||||
!!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
|
||||
|
||||
/* For sev guests, the memory encryption bit is not reserved in CR3. */
|
||||
if (sev_guest(vcpu->kvm)) {
|
||||
best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
|
||||
@ -4563,7 +4639,6 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
|
||||
void *insn, int insn_len)
|
||||
{
|
||||
bool smep, smap, is_user;
|
||||
unsigned long cr4;
|
||||
u64 error_code;
|
||||
|
||||
/* Emulation is always possible when KVM has access to all guest state. */
|
||||
@ -4655,9 +4730,8 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
|
||||
if (error_code & (PFERR_GUEST_PAGE_MASK | PFERR_FETCH_MASK))
|
||||
goto resume_guest;
|
||||
|
||||
cr4 = kvm_read_cr4(vcpu);
|
||||
smep = cr4 & X86_CR4_SMEP;
|
||||
smap = cr4 & X86_CR4_SMAP;
|
||||
smep = kvm_is_cr4_bit_set(vcpu, X86_CR4_SMEP);
|
||||
smap = kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP);
|
||||
is_user = svm_get_cpl(vcpu) == 3;
|
||||
if (smap && (!smep || is_user)) {
|
||||
pr_err_ratelimited("SEV Guest triggered AMD Erratum 1096\n");
|
||||
@ -4795,6 +4869,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
|
||||
.patch_hypercall = svm_patch_hypercall,
|
||||
.inject_irq = svm_inject_irq,
|
||||
.inject_nmi = svm_inject_nmi,
|
||||
.is_vnmi_pending = svm_is_vnmi_pending,
|
||||
.set_vnmi_pending = svm_set_vnmi_pending,
|
||||
.inject_exception = svm_inject_exception,
|
||||
.cancel_injection = svm_cancel_injection,
|
||||
.interrupt_allowed = svm_interrupt_allowed,
|
||||
@ -4937,6 +5013,9 @@ static __init void svm_set_cpu_caps(void)
|
||||
if (vgif)
|
||||
kvm_cpu_cap_set(X86_FEATURE_VGIF);
|
||||
|
||||
if (vnmi)
|
||||
kvm_cpu_cap_set(X86_FEATURE_VNMI);
|
||||
|
||||
/* Nested VM can receive #VMEXIT instead of triggering #GP */
|
||||
kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
|
||||
}
|
||||
@ -5088,6 +5167,16 @@ static __init int svm_hardware_setup(void)
|
||||
pr_info("Virtual GIF supported\n");
|
||||
}
|
||||
|
||||
vnmi = vgif && vnmi && boot_cpu_has(X86_FEATURE_VNMI);
|
||||
if (vnmi)
|
||||
pr_info("Virtual NMI enabled\n");
|
||||
|
||||
if (!vnmi) {
|
||||
svm_x86_ops.is_vnmi_pending = NULL;
|
||||
svm_x86_ops.set_vnmi_pending = NULL;
|
||||
}
|
||||
|
||||
|
||||
if (lbrv) {
|
||||
if (!boot_cpu_has(X86_FEATURE_LBRV))
|
||||
lbrv = false;
|
||||
|
@ -36,6 +36,7 @@ extern bool npt_enabled;
|
||||
extern int vgif;
|
||||
extern bool intercept_smi;
|
||||
extern bool x2avic_enabled;
|
||||
extern bool vnmi;
|
||||
|
||||
/*
|
||||
* Clean bits in VMCB.
|
||||
@ -265,6 +266,7 @@ struct vcpu_svm {
|
||||
bool pause_filter_enabled : 1;
|
||||
bool pause_threshold_enabled : 1;
|
||||
bool vgif_enabled : 1;
|
||||
bool vnmi_enabled : 1;
|
||||
|
||||
u32 ldr_reg;
|
||||
u32 dfr_reg;
|
||||
@ -539,6 +541,12 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
|
||||
return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
|
||||
}
|
||||
|
||||
static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
|
||||
{
|
||||
return svm->vnmi_enabled &&
|
||||
(svm->nested.ctl.int_ctl & V_NMI_ENABLE_MASK);
|
||||
}
|
||||
|
||||
static inline bool is_x2apic_msrpm_offset(u32 offset)
|
||||
{
|
||||
/* 4 msrs per u8, and 4 u8 in u32 */
|
||||
@ -548,6 +556,27 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
|
||||
(msr < (APIC_BASE_MSR + 0x100));
|
||||
}
|
||||
|
||||
static inline struct vmcb *get_vnmi_vmcb_l1(struct vcpu_svm *svm)
|
||||
{
|
||||
if (!vnmi)
|
||||
return NULL;
|
||||
|
||||
if (is_guest_mode(&svm->vcpu))
|
||||
return NULL;
|
||||
else
|
||||
return svm->vmcb01.ptr;
|
||||
}
|
||||
|
||||
static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
|
||||
{
|
||||
struct vmcb *vmcb = get_vnmi_vmcb_l1(svm);
|
||||
|
||||
if (vmcb)
|
||||
return !!(vmcb->control.int_ctl & V_NMI_ENABLE_MASK);
|
||||
else
|
||||
return false;
|
||||
}
|
||||
|
||||
/* svm.c */
|
||||
#define MSR_INVALID 0xffffffffU
|
||||
|
||||
|
@ -45,9 +45,8 @@ static inline __init void svm_hv_hardware_setup(void)
|
||||
if (npt_enabled &&
|
||||
ms_hyperv.nested_features & HV_X64_NESTED_ENLIGHTENED_TLB) {
|
||||
pr_info(KBUILD_MODNAME ": Hyper-V enlightened NPT TLB flush enabled\n");
|
||||
svm_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
|
||||
svm_x86_ops.tlb_remote_flush_with_range =
|
||||
hv_remote_flush_tlb_with_range;
|
||||
svm_x86_ops.flush_remote_tlbs = hv_flush_remote_tlbs;
|
||||
svm_x86_ops.flush_remote_tlbs_range = hv_flush_remote_tlbs_range;
|
||||
}
|
||||
|
||||
if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) {
|
||||
|
@ -13,7 +13,110 @@
|
||||
|
||||
#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
|
||||
|
||||
DEFINE_STATIC_KEY_FALSE(enable_evmcs);
|
||||
/*
|
||||
* Enlightened VMCSv1 doesn't support these:
|
||||
*
|
||||
* POSTED_INTR_NV = 0x00000002,
|
||||
* GUEST_INTR_STATUS = 0x00000810,
|
||||
* APIC_ACCESS_ADDR = 0x00002014,
|
||||
* POSTED_INTR_DESC_ADDR = 0x00002016,
|
||||
* EOI_EXIT_BITMAP0 = 0x0000201c,
|
||||
* EOI_EXIT_BITMAP1 = 0x0000201e,
|
||||
* EOI_EXIT_BITMAP2 = 0x00002020,
|
||||
* EOI_EXIT_BITMAP3 = 0x00002022,
|
||||
* GUEST_PML_INDEX = 0x00000812,
|
||||
* PML_ADDRESS = 0x0000200e,
|
||||
* VM_FUNCTION_CONTROL = 0x00002018,
|
||||
* EPTP_LIST_ADDRESS = 0x00002024,
|
||||
* VMREAD_BITMAP = 0x00002026,
|
||||
* VMWRITE_BITMAP = 0x00002028,
|
||||
*
|
||||
* TSC_MULTIPLIER = 0x00002032,
|
||||
* PLE_GAP = 0x00004020,
|
||||
* PLE_WINDOW = 0x00004022,
|
||||
* VMX_PREEMPTION_TIMER_VALUE = 0x0000482E,
|
||||
*
|
||||
* Currently unsupported in KVM:
|
||||
* GUEST_IA32_RTIT_CTL = 0x00002814,
|
||||
*/
|
||||
#define EVMCS1_SUPPORTED_PINCTRL \
|
||||
(PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR | \
|
||||
PIN_BASED_EXT_INTR_MASK | \
|
||||
PIN_BASED_NMI_EXITING | \
|
||||
PIN_BASED_VIRTUAL_NMIS)
|
||||
|
||||
#define EVMCS1_SUPPORTED_EXEC_CTRL \
|
||||
(CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR | \
|
||||
CPU_BASED_HLT_EXITING | \
|
||||
CPU_BASED_CR3_LOAD_EXITING | \
|
||||
CPU_BASED_CR3_STORE_EXITING | \
|
||||
CPU_BASED_UNCOND_IO_EXITING | \
|
||||
CPU_BASED_MOV_DR_EXITING | \
|
||||
CPU_BASED_USE_TSC_OFFSETTING | \
|
||||
CPU_BASED_MWAIT_EXITING | \
|
||||
CPU_BASED_MONITOR_EXITING | \
|
||||
CPU_BASED_INVLPG_EXITING | \
|
||||
CPU_BASED_RDPMC_EXITING | \
|
||||
CPU_BASED_INTR_WINDOW_EXITING | \
|
||||
CPU_BASED_CR8_LOAD_EXITING | \
|
||||
CPU_BASED_CR8_STORE_EXITING | \
|
||||
CPU_BASED_RDTSC_EXITING | \
|
||||
CPU_BASED_TPR_SHADOW | \
|
||||
CPU_BASED_USE_IO_BITMAPS | \
|
||||
CPU_BASED_MONITOR_TRAP_FLAG | \
|
||||
CPU_BASED_USE_MSR_BITMAPS | \
|
||||
CPU_BASED_NMI_WINDOW_EXITING | \
|
||||
CPU_BASED_PAUSE_EXITING | \
|
||||
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
|
||||
|
||||
#define EVMCS1_SUPPORTED_2NDEXEC \
|
||||
(SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE | \
|
||||
SECONDARY_EXEC_WBINVD_EXITING | \
|
||||
SECONDARY_EXEC_ENABLE_VPID | \
|
||||
SECONDARY_EXEC_ENABLE_EPT | \
|
||||
SECONDARY_EXEC_UNRESTRICTED_GUEST | \
|
||||
SECONDARY_EXEC_DESC | \
|
||||
SECONDARY_EXEC_ENABLE_RDTSCP | \
|
||||
SECONDARY_EXEC_ENABLE_INVPCID | \
|
||||
SECONDARY_EXEC_XSAVES | \
|
||||
SECONDARY_EXEC_RDSEED_EXITING | \
|
||||
SECONDARY_EXEC_RDRAND_EXITING | \
|
||||
SECONDARY_EXEC_TSC_SCALING | \
|
||||
SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE | \
|
||||
SECONDARY_EXEC_PT_USE_GPA | \
|
||||
SECONDARY_EXEC_PT_CONCEAL_VMX | \
|
||||
SECONDARY_EXEC_BUS_LOCK_DETECTION | \
|
||||
SECONDARY_EXEC_NOTIFY_VM_EXITING | \
|
||||
SECONDARY_EXEC_ENCLS_EXITING)
|
||||
|
||||
#define EVMCS1_SUPPORTED_3RDEXEC (0ULL)
|
||||
|
||||
#define EVMCS1_SUPPORTED_VMEXIT_CTRL \
|
||||
(VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | \
|
||||
VM_EXIT_SAVE_DEBUG_CONTROLS | \
|
||||
VM_EXIT_ACK_INTR_ON_EXIT | \
|
||||
VM_EXIT_HOST_ADDR_SPACE_SIZE | \
|
||||
VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | \
|
||||
VM_EXIT_SAVE_IA32_PAT | \
|
||||
VM_EXIT_LOAD_IA32_PAT | \
|
||||
VM_EXIT_SAVE_IA32_EFER | \
|
||||
VM_EXIT_LOAD_IA32_EFER | \
|
||||
VM_EXIT_CLEAR_BNDCFGS | \
|
||||
VM_EXIT_PT_CONCEAL_PIP | \
|
||||
VM_EXIT_CLEAR_IA32_RTIT_CTL)
|
||||
|
||||
#define EVMCS1_SUPPORTED_VMENTRY_CTRL \
|
||||
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | \
|
||||
VM_ENTRY_LOAD_DEBUG_CONTROLS | \
|
||||
VM_ENTRY_IA32E_MODE | \
|
||||
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | \
|
||||
VM_ENTRY_LOAD_IA32_PAT | \
|
||||
VM_ENTRY_LOAD_IA32_EFER | \
|
||||
VM_ENTRY_LOAD_BNDCFGS | \
|
||||
VM_ENTRY_PT_CONCEAL_PIP | \
|
||||
VM_ENTRY_LOAD_IA32_RTIT_CTL)
|
||||
|
||||
#define EVMCS1_SUPPORTED_VMFUNC (0)
|
||||
|
||||
#define EVMCS1_OFFSET(x) offsetof(struct hv_enlightened_vmcs, x)
|
||||
#define EVMCS1_FIELD(number, name, clean_field)[ROL16(number, 6)] = \
|
||||
@ -506,6 +609,8 @@ int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
|
||||
}
|
||||
|
||||
#if IS_ENABLED(CONFIG_HYPERV)
|
||||
DEFINE_STATIC_KEY_FALSE(__kvm_is_using_evmcs);
|
||||
|
||||
/*
|
||||
* KVM on Hyper-V always uses the latest known eVMCSv1 revision, the assumption
|
||||
* is: in case a feature has corresponding fields in eVMCS described and it was
|
||||
|
@ -16,117 +16,10 @@
|
||||
|
||||
struct vmcs_config;
|
||||
|
||||
DECLARE_STATIC_KEY_FALSE(enable_evmcs);
|
||||
|
||||
#define current_evmcs ((struct hv_enlightened_vmcs *)this_cpu_read(current_vmcs))
|
||||
|
||||
#define KVM_EVMCS_VERSION 1
|
||||
|
||||
/*
|
||||
* Enlightened VMCSv1 doesn't support these:
|
||||
*
|
||||
* POSTED_INTR_NV = 0x00000002,
|
||||
* GUEST_INTR_STATUS = 0x00000810,
|
||||
* APIC_ACCESS_ADDR = 0x00002014,
|
||||
* POSTED_INTR_DESC_ADDR = 0x00002016,
|
||||
* EOI_EXIT_BITMAP0 = 0x0000201c,
|
||||
* EOI_EXIT_BITMAP1 = 0x0000201e,
|
||||
* EOI_EXIT_BITMAP2 = 0x00002020,
|
||||
* EOI_EXIT_BITMAP3 = 0x00002022,
|
||||
* GUEST_PML_INDEX = 0x00000812,
|
||||
* PML_ADDRESS = 0x0000200e,
|
||||
* VM_FUNCTION_CONTROL = 0x00002018,
|
||||
* EPTP_LIST_ADDRESS = 0x00002024,
|
||||
* VMREAD_BITMAP = 0x00002026,
|
||||
* VMWRITE_BITMAP = 0x00002028,
|
||||
*
|
||||
* TSC_MULTIPLIER = 0x00002032,
|
||||
* PLE_GAP = 0x00004020,
|
||||
* PLE_WINDOW = 0x00004022,
|
||||
* VMX_PREEMPTION_TIMER_VALUE = 0x0000482E,
|
||||
*
|
||||
* Currently unsupported in KVM:
|
||||
* GUEST_IA32_RTIT_CTL = 0x00002814,
|
||||
*/
|
||||
#define EVMCS1_SUPPORTED_PINCTRL \
|
||||
(PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR | \
|
||||
PIN_BASED_EXT_INTR_MASK | \
|
||||
PIN_BASED_NMI_EXITING | \
|
||||
PIN_BASED_VIRTUAL_NMIS)
|
||||
|
||||
#define EVMCS1_SUPPORTED_EXEC_CTRL \
|
||||
(CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR | \
|
||||
CPU_BASED_HLT_EXITING | \
|
||||
CPU_BASED_CR3_LOAD_EXITING | \
|
||||
CPU_BASED_CR3_STORE_EXITING | \
|
||||
CPU_BASED_UNCOND_IO_EXITING | \
|
||||
CPU_BASED_MOV_DR_EXITING | \
|
||||
CPU_BASED_USE_TSC_OFFSETTING | \
|
||||
CPU_BASED_MWAIT_EXITING | \
|
||||
CPU_BASED_MONITOR_EXITING | \
|
||||
CPU_BASED_INVLPG_EXITING | \
|
||||
CPU_BASED_RDPMC_EXITING | \
|
||||
CPU_BASED_INTR_WINDOW_EXITING | \
|
||||
CPU_BASED_CR8_LOAD_EXITING | \
|
||||
CPU_BASED_CR8_STORE_EXITING | \
|
||||
CPU_BASED_RDTSC_EXITING | \
|
||||
CPU_BASED_TPR_SHADOW | \
|
||||
CPU_BASED_USE_IO_BITMAPS | \
|
||||
CPU_BASED_MONITOR_TRAP_FLAG | \
|
||||
CPU_BASED_USE_MSR_BITMAPS | \
|
||||
CPU_BASED_NMI_WINDOW_EXITING | \
|
||||
CPU_BASED_PAUSE_EXITING | \
|
||||
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
|
||||
|
||||
#define EVMCS1_SUPPORTED_2NDEXEC \
|
||||
(SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE | \
|
||||
SECONDARY_EXEC_WBINVD_EXITING | \
|
||||
SECONDARY_EXEC_ENABLE_VPID | \
|
||||
SECONDARY_EXEC_ENABLE_EPT | \
|
||||
SECONDARY_EXEC_UNRESTRICTED_GUEST | \
|
||||
SECONDARY_EXEC_DESC | \
|
||||
SECONDARY_EXEC_ENABLE_RDTSCP | \
|
||||
SECONDARY_EXEC_ENABLE_INVPCID | \
|
||||
SECONDARY_EXEC_XSAVES | \
|
||||
SECONDARY_EXEC_RDSEED_EXITING | \
|
||||
SECONDARY_EXEC_RDRAND_EXITING | \
|
||||
SECONDARY_EXEC_TSC_SCALING | \
|
||||
SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE | \
|
||||
SECONDARY_EXEC_PT_USE_GPA | \
|
||||
SECONDARY_EXEC_PT_CONCEAL_VMX | \
|
||||
SECONDARY_EXEC_BUS_LOCK_DETECTION | \
|
||||
SECONDARY_EXEC_NOTIFY_VM_EXITING | \
|
||||
SECONDARY_EXEC_ENCLS_EXITING)
|
||||
|
||||
#define EVMCS1_SUPPORTED_3RDEXEC (0ULL)
|
||||
|
||||
#define EVMCS1_SUPPORTED_VMEXIT_CTRL \
|
||||
(VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | \
|
||||
VM_EXIT_SAVE_DEBUG_CONTROLS | \
|
||||
VM_EXIT_ACK_INTR_ON_EXIT | \
|
||||
VM_EXIT_HOST_ADDR_SPACE_SIZE | \
|
||||
VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | \
|
||||
VM_EXIT_SAVE_IA32_PAT | \
|
||||
VM_EXIT_LOAD_IA32_PAT | \
|
||||
VM_EXIT_SAVE_IA32_EFER | \
|
||||
VM_EXIT_LOAD_IA32_EFER | \
|
||||
VM_EXIT_CLEAR_BNDCFGS | \
|
||||
VM_EXIT_PT_CONCEAL_PIP | \
|
||||
VM_EXIT_CLEAR_IA32_RTIT_CTL)
|
||||
|
||||
#define EVMCS1_SUPPORTED_VMENTRY_CTRL \
|
||||
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | \
|
||||
VM_ENTRY_LOAD_DEBUG_CONTROLS | \
|
||||
VM_ENTRY_IA32E_MODE | \
|
||||
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | \
|
||||
VM_ENTRY_LOAD_IA32_PAT | \
|
||||
VM_ENTRY_LOAD_IA32_EFER | \
|
||||
VM_ENTRY_LOAD_BNDCFGS | \
|
||||
VM_ENTRY_PT_CONCEAL_PIP | \
|
||||
VM_ENTRY_LOAD_IA32_RTIT_CTL)
|
||||
|
||||
#define EVMCS1_SUPPORTED_VMFUNC (0)
|
||||
|
||||
struct evmcs_field {
|
||||
u16 offset;
|
||||
u16 clean_field;
|
||||
@ -174,6 +67,13 @@ static inline u64 evmcs_read_any(struct hv_enlightened_vmcs *evmcs,
|
||||
|
||||
#if IS_ENABLED(CONFIG_HYPERV)
|
||||
|
||||
DECLARE_STATIC_KEY_FALSE(__kvm_is_using_evmcs);
|
||||
|
||||
static __always_inline bool kvm_is_using_evmcs(void)
|
||||
{
|
||||
return static_branch_unlikely(&__kvm_is_using_evmcs);
|
||||
}
|
||||
|
||||
static __always_inline int get_evmcs_offset(unsigned long field,
|
||||
u16 *clean_field)
|
||||
{
|
||||
@ -263,6 +163,7 @@ static inline void evmcs_load(u64 phys_addr)
|
||||
|
||||
void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf);
|
||||
#else /* !IS_ENABLED(CONFIG_HYPERV) */
|
||||
static __always_inline bool kvm_is_using_evmcs(void) { return false; }
|
||||
static __always_inline void evmcs_write64(unsigned long field, u64 value) {}
|
||||
static __always_inline void evmcs_write32(unsigned long field, u32 value) {}
|
||||
static __always_inline void evmcs_write16(unsigned long field, u16 value) {}
|
||||
|
@ -358,6 +358,7 @@ static bool nested_ept_root_matches(hpa_t root_hpa, u64 root_eptp, u64 eptp)
|
||||
static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,
|
||||
gpa_t addr)
|
||||
{
|
||||
unsigned long roots = 0;
|
||||
uint i;
|
||||
struct kvm_mmu_root_info *cached_root;
|
||||
|
||||
@ -368,8 +369,10 @@ static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,
|
||||
|
||||
if (nested_ept_root_matches(cached_root->hpa, cached_root->pgd,
|
||||
eptp))
|
||||
vcpu->arch.mmu->invlpg(vcpu, addr, cached_root->hpa);
|
||||
roots |= KVM_MMU_ROOT_PREVIOUS(i);
|
||||
}
|
||||
if (roots)
|
||||
kvm_mmu_invalidate_addr(vcpu, vcpu->arch.mmu, addr, roots);
|
||||
}
|
||||
|
||||
static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
|
||||
@ -654,6 +657,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
|
||||
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
|
||||
MSR_IA32_PRED_CMD, MSR_TYPE_W);
|
||||
|
||||
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
|
||||
MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
|
||||
|
||||
kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
|
||||
|
||||
vmx->nested.force_msr_bitmap_recalc = false;
|
||||
@ -4483,7 +4489,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
|
||||
* CR0_GUEST_HOST_MASK is already set in the original vmcs01
|
||||
* (KVM doesn't change it);
|
||||
*/
|
||||
vcpu->arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS;
|
||||
vcpu->arch.cr0_guest_owned_bits = vmx_l1_guest_owned_cr0_bits();
|
||||
vmx_set_cr0(vcpu, vmcs12->host_cr0);
|
||||
|
||||
/* Same as above - no reason to call set_cr4_guest_host_mask(). */
|
||||
@ -4634,7 +4640,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
vmx_set_efer(vcpu, nested_vmx_get_vmcs01_guest_efer(vmx));
|
||||
|
||||
vcpu->arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS;
|
||||
vcpu->arch.cr0_guest_owned_bits = vmx_l1_guest_owned_cr0_bits();
|
||||
vmx_set_cr0(vcpu, vmcs_readl(CR0_READ_SHADOW));
|
||||
|
||||
vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK);
|
||||
@ -5156,7 +5162,7 @@ static int handle_vmxon(struct kvm_vcpu *vcpu)
|
||||
* does force CR0.PE=1, but only to also force VM86 in order to emulate
|
||||
* Real Mode, and so there's no need to check CR0.PE manually.
|
||||
*/
|
||||
if (!kvm_read_cr4_bits(vcpu, X86_CR4_VMXE)) {
|
||||
if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_VMXE)) {
|
||||
kvm_queue_exception(vcpu, UD_VECTOR);
|
||||
return 1;
|
||||
}
|
||||
@ -6755,36 +6761,9 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
|
||||
return (u64)max_idx << VMCS_FIELD_INDEX_SHIFT;
|
||||
}
|
||||
|
||||
/*
|
||||
* nested_vmx_setup_ctls_msrs() sets up variables containing the values to be
|
||||
* returned for the various VMX controls MSRs when nested VMX is enabled.
|
||||
* The same values should also be used to verify that vmcs12 control fields are
|
||||
* valid during nested entry from L1 to L2.
|
||||
* Each of these control msrs has a low and high 32-bit half: A low bit is on
|
||||
* if the corresponding bit in the (32-bit) control field *must* be on, and a
|
||||
* bit in the high half is on if the corresponding bit in the control field
|
||||
* may be on. See also vmx_control_verify().
|
||||
*/
|
||||
void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
static void nested_vmx_setup_pinbased_ctls(struct vmcs_config *vmcs_conf,
|
||||
struct nested_vmx_msrs *msrs)
|
||||
{
|
||||
struct nested_vmx_msrs *msrs = &vmcs_conf->nested;
|
||||
|
||||
/*
|
||||
* Note that as a general rule, the high half of the MSRs (bits in
|
||||
* the control fields which may be 1) should be initialized by the
|
||||
* intersection of the underlying hardware's MSR (i.e., features which
|
||||
* can be supported) and the list of features we want to expose -
|
||||
* because they are known to be properly supported in our code.
|
||||
* Also, usually, the low half of the MSRs (bits which must be 1) can
|
||||
* be set to 0, meaning that L1 may turn off any of these bits. The
|
||||
* reason is that if one of these bits is necessary, it will appear
|
||||
* in vmcs01 and prepare_vmcs02, when it bitwise-or's the control
|
||||
* fields of vmcs01 and vmcs02, will turn these bits off - and
|
||||
* nested_vmx_l1_wants_exit() will not pass related exits to L1.
|
||||
* These rules have exceptions below.
|
||||
*/
|
||||
|
||||
/* pin-based controls */
|
||||
msrs->pinbased_ctls_low =
|
||||
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
|
||||
|
||||
@ -6797,8 +6776,11 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
msrs->pinbased_ctls_high |=
|
||||
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR |
|
||||
PIN_BASED_VMX_PREEMPTION_TIMER;
|
||||
}
|
||||
|
||||
/* exit controls */
|
||||
static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
|
||||
struct nested_vmx_msrs *msrs)
|
||||
{
|
||||
msrs->exit_ctls_low =
|
||||
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
|
||||
|
||||
@ -6817,8 +6799,11 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
|
||||
/* We support free control of debug control saving. */
|
||||
msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;
|
||||
}
|
||||
|
||||
/* entry controls */
|
||||
static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
|
||||
struct nested_vmx_msrs *msrs)
|
||||
{
|
||||
msrs->entry_ctls_low =
|
||||
VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
|
||||
|
||||
@ -6834,8 +6819,11 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
|
||||
/* We support free control of debug control loading. */
|
||||
msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;
|
||||
}
|
||||
|
||||
/* cpu-based controls */
|
||||
static void nested_vmx_setup_cpubased_ctls(struct vmcs_config *vmcs_conf,
|
||||
struct nested_vmx_msrs *msrs)
|
||||
{
|
||||
msrs->procbased_ctls_low =
|
||||
CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
|
||||
|
||||
@ -6867,12 +6855,12 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
/* We support free control of CR3 access interception. */
|
||||
msrs->procbased_ctls_low &=
|
||||
~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING);
|
||||
}
|
||||
|
||||
/*
|
||||
* secondary cpu-based controls. Do not include those that
|
||||
* depend on CPUID bits, they are added later by
|
||||
* vmx_vcpu_after_set_cpuid.
|
||||
*/
|
||||
static void nested_vmx_setup_secondary_ctls(u32 ept_caps,
|
||||
struct vmcs_config *vmcs_conf,
|
||||
struct nested_vmx_msrs *msrs)
|
||||
{
|
||||
msrs->secondary_ctls_low = 0;
|
||||
|
||||
msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
|
||||
@ -6950,8 +6938,11 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
|
||||
if (enable_sgx)
|
||||
msrs->secondary_ctls_high |= SECONDARY_EXEC_ENCLS_EXITING;
|
||||
}
|
||||
|
||||
/* miscellaneous data */
|
||||
static void nested_vmx_setup_misc_data(struct vmcs_config *vmcs_conf,
|
||||
struct nested_vmx_msrs *msrs)
|
||||
{
|
||||
msrs->misc_low = (u32)vmcs_conf->misc & VMX_MISC_SAVE_EFER_LMA;
|
||||
msrs->misc_low |=
|
||||
MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS |
|
||||
@ -6959,7 +6950,10 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
VMX_MISC_ACTIVITY_HLT |
|
||||
VMX_MISC_ACTIVITY_WAIT_SIPI;
|
||||
msrs->misc_high = 0;
|
||||
}
|
||||
|
||||
static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
|
||||
{
|
||||
/*
|
||||
* This MSR reports some information about VMX support. We
|
||||
* should return information about the VMX we emulate for the
|
||||
@ -6974,7 +6968,10 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
|
||||
if (cpu_has_vmx_basic_inout())
|
||||
msrs->basic |= VMX_BASIC_INOUT;
|
||||
}
|
||||
|
||||
static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
|
||||
{
|
||||
/*
|
||||
* These MSRs specify bits which the guest must keep fixed on
|
||||
* while L1 is in VMXON mode (in L1's root mode, or running an L2).
|
||||
@ -6991,6 +6988,51 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
|
||||
if (vmx_umip_emulated())
|
||||
msrs->cr4_fixed1 |= X86_CR4_UMIP;
|
||||
}
|
||||
|
||||
/*
|
||||
* nested_vmx_setup_ctls_msrs() sets up variables containing the values to be
|
||||
* returned for the various VMX controls MSRs when nested VMX is enabled.
|
||||
* The same values should also be used to verify that vmcs12 control fields are
|
||||
* valid during nested entry from L1 to L2.
|
||||
* Each of these control msrs has a low and high 32-bit half: A low bit is on
|
||||
* if the corresponding bit in the (32-bit) control field *must* be on, and a
|
||||
* bit in the high half is on if the corresponding bit in the control field
|
||||
* may be on. See also vmx_control_verify().
|
||||
*/
|
||||
void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
|
||||
{
|
||||
struct nested_vmx_msrs *msrs = &vmcs_conf->nested;
|
||||
|
||||
/*
|
||||
* Note that as a general rule, the high half of the MSRs (bits in
|
||||
* the control fields which may be 1) should be initialized by the
|
||||
* intersection of the underlying hardware's MSR (i.e., features which
|
||||
* can be supported) and the list of features we want to expose -
|
||||
* because they are known to be properly supported in our code.
|
||||
* Also, usually, the low half of the MSRs (bits which must be 1) can
|
||||
* be set to 0, meaning that L1 may turn off any of these bits. The
|
||||
* reason is that if one of these bits is necessary, it will appear
|
||||
* in vmcs01 and prepare_vmcs02, when it bitwise-or's the control
|
||||
* fields of vmcs01 and vmcs02, will turn these bits off - and
|
||||
* nested_vmx_l1_wants_exit() will not pass related exits to L1.
|
||||
* These rules have exceptions below.
|
||||
*/
|
||||
nested_vmx_setup_pinbased_ctls(vmcs_conf, msrs);
|
||||
|
||||
nested_vmx_setup_exit_ctls(vmcs_conf, msrs);
|
||||
|
||||
nested_vmx_setup_entry_ctls(vmcs_conf, msrs);
|
||||
|
||||
nested_vmx_setup_cpubased_ctls(vmcs_conf, msrs);
|
||||
|
||||
nested_vmx_setup_secondary_ctls(ept_caps, vmcs_conf, msrs);
|
||||
|
||||
nested_vmx_setup_misc_data(vmcs_conf, msrs);
|
||||
|
||||
nested_vmx_setup_basic(msrs);
|
||||
|
||||
nested_vmx_setup_cr_fixed(msrs);
|
||||
|
||||
msrs->vmcs_enum = nested_vmx_calc_vmcs_enum_msr();
|
||||
}
|
||||
|
@ -57,7 +57,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
|
||||
pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
|
||||
|
||||
__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
|
||||
kvm_pmu_request_counter_reprogam(pmc);
|
||||
kvm_pmu_request_counter_reprogram(pmc);
|
||||
}
|
||||
}
|
||||
|
||||
@ -76,13 +76,13 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
|
||||
static void reprogram_counters(struct kvm_pmu *pmu, u64 diff)
|
||||
{
|
||||
int bit;
|
||||
struct kvm_pmc *pmc;
|
||||
|
||||
for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX) {
|
||||
pmc = intel_pmc_idx_to_pmc(pmu, bit);
|
||||
if (pmc)
|
||||
kvm_pmu_request_counter_reprogam(pmc);
|
||||
}
|
||||
if (!diff)
|
||||
return;
|
||||
|
||||
for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX)
|
||||
set_bit(bit, pmu->reprogram_pmi);
|
||||
kvm_make_request(KVM_REQ_PMU, pmu_to_vcpu(pmu));
|
||||
}
|
||||
|
||||
static bool intel_hw_event_available(struct kvm_pmc *pmc)
|
||||
@ -351,45 +351,47 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
switch (msr) {
|
||||
case MSR_CORE_PERF_FIXED_CTR_CTRL:
|
||||
msr_info->data = pmu->fixed_ctr_ctrl;
|
||||
return 0;
|
||||
break;
|
||||
case MSR_CORE_PERF_GLOBAL_STATUS:
|
||||
msr_info->data = pmu->global_status;
|
||||
return 0;
|
||||
break;
|
||||
case MSR_CORE_PERF_GLOBAL_CTRL:
|
||||
msr_info->data = pmu->global_ctrl;
|
||||
return 0;
|
||||
break;
|
||||
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
|
||||
msr_info->data = 0;
|
||||
return 0;
|
||||
break;
|
||||
case MSR_IA32_PEBS_ENABLE:
|
||||
msr_info->data = pmu->pebs_enable;
|
||||
return 0;
|
||||
break;
|
||||
case MSR_IA32_DS_AREA:
|
||||
msr_info->data = pmu->ds_area;
|
||||
return 0;
|
||||
break;
|
||||
case MSR_PEBS_DATA_CFG:
|
||||
msr_info->data = pmu->pebs_data_cfg;
|
||||
return 0;
|
||||
break;
|
||||
default:
|
||||
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
|
||||
(pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
|
||||
u64 val = pmc_read_counter(pmc);
|
||||
msr_info->data =
|
||||
val & pmu->counter_bitmask[KVM_PMC_GP];
|
||||
return 0;
|
||||
break;
|
||||
} else if ((pmc = get_fixed_pmc(pmu, msr))) {
|
||||
u64 val = pmc_read_counter(pmc);
|
||||
msr_info->data =
|
||||
val & pmu->counter_bitmask[KVM_PMC_FIXED];
|
||||
return 0;
|
||||
break;
|
||||
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
|
||||
msr_info->data = pmc->eventsel;
|
||||
return 0;
|
||||
} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, true))
|
||||
return 0;
|
||||
break;
|
||||
} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, true)) {
|
||||
break;
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
@ -402,44 +404,43 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
|
||||
switch (msr) {
|
||||
case MSR_CORE_PERF_FIXED_CTR_CTRL:
|
||||
if (pmu->fixed_ctr_ctrl == data)
|
||||
return 0;
|
||||
if (!(data & pmu->fixed_ctr_ctrl_mask)) {
|
||||
if (data & pmu->fixed_ctr_ctrl_mask)
|
||||
return 1;
|
||||
|
||||
if (pmu->fixed_ctr_ctrl != data)
|
||||
reprogram_fixed_counters(pmu, data);
|
||||
return 0;
|
||||
}
|
||||
break;
|
||||
case MSR_CORE_PERF_GLOBAL_STATUS:
|
||||
if (msr_info->host_initiated) {
|
||||
pmu->global_status = data;
|
||||
return 0;
|
||||
}
|
||||
break; /* RO MSR */
|
||||
if (!msr_info->host_initiated)
|
||||
return 1; /* RO MSR */
|
||||
|
||||
pmu->global_status = data;
|
||||
break;
|
||||
case MSR_CORE_PERF_GLOBAL_CTRL:
|
||||
if (pmu->global_ctrl == data)
|
||||
return 0;
|
||||
if (kvm_valid_perf_global_ctrl(pmu, data)) {
|
||||
if (!kvm_valid_perf_global_ctrl(pmu, data))
|
||||
return 1;
|
||||
|
||||
if (pmu->global_ctrl != data) {
|
||||
diff = pmu->global_ctrl ^ data;
|
||||
pmu->global_ctrl = data;
|
||||
reprogram_counters(pmu, diff);
|
||||
return 0;
|
||||
}
|
||||
break;
|
||||
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
|
||||
if (!(data & pmu->global_ovf_ctrl_mask)) {
|
||||
if (!msr_info->host_initiated)
|
||||
pmu->global_status &= ~data;
|
||||
return 0;
|
||||
}
|
||||
if (data & pmu->global_ovf_ctrl_mask)
|
||||
return 1;
|
||||
|
||||
if (!msr_info->host_initiated)
|
||||
pmu->global_status &= ~data;
|
||||
break;
|
||||
case MSR_IA32_PEBS_ENABLE:
|
||||
if (pmu->pebs_enable == data)
|
||||
return 0;
|
||||
if (!(data & pmu->pebs_enable_mask)) {
|
||||
if (data & pmu->pebs_enable_mask)
|
||||
return 1;
|
||||
|
||||
if (pmu->pebs_enable != data) {
|
||||
diff = pmu->pebs_enable ^ data;
|
||||
pmu->pebs_enable = data;
|
||||
reprogram_counters(pmu, diff);
|
||||
return 0;
|
||||
}
|
||||
break;
|
||||
case MSR_IA32_DS_AREA:
|
||||
@ -447,15 +448,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
return 1;
|
||||
if (is_noncanonical_address(data, vcpu))
|
||||
return 1;
|
||||
|
||||
pmu->ds_area = data;
|
||||
return 0;
|
||||
break;
|
||||
case MSR_PEBS_DATA_CFG:
|
||||
if (pmu->pebs_data_cfg == data)
|
||||
return 0;
|
||||
if (!(data & pmu->pebs_data_cfg_mask)) {
|
||||
pmu->pebs_data_cfg = data;
|
||||
return 0;
|
||||
}
|
||||
if (data & pmu->pebs_data_cfg_mask)
|
||||
return 1;
|
||||
|
||||
pmu->pebs_data_cfg = data;
|
||||
break;
|
||||
default:
|
||||
if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
|
||||
@ -463,33 +463,38 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
if ((msr & MSR_PMC_FULL_WIDTH_BIT) &&
|
||||
(data & ~pmu->counter_bitmask[KVM_PMC_GP]))
|
||||
return 1;
|
||||
|
||||
if (!msr_info->host_initiated &&
|
||||
!(msr & MSR_PMC_FULL_WIDTH_BIT))
|
||||
data = (s64)(s32)data;
|
||||
pmc->counter += data - pmc_read_counter(pmc);
|
||||
pmc_update_sample_period(pmc);
|
||||
return 0;
|
||||
break;
|
||||
} else if ((pmc = get_fixed_pmc(pmu, msr))) {
|
||||
pmc->counter += data - pmc_read_counter(pmc);
|
||||
pmc_update_sample_period(pmc);
|
||||
return 0;
|
||||
break;
|
||||
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
|
||||
if (data == pmc->eventsel)
|
||||
return 0;
|
||||
reserved_bits = pmu->reserved_bits;
|
||||
if ((pmc->idx == 2) &&
|
||||
(pmu->raw_event_mask & HSW_IN_TX_CHECKPOINTED))
|
||||
reserved_bits ^= HSW_IN_TX_CHECKPOINTED;
|
||||
if (!(data & reserved_bits)) {
|
||||
if (data & reserved_bits)
|
||||
return 1;
|
||||
|
||||
if (data != pmc->eventsel) {
|
||||
pmc->eventsel = data;
|
||||
kvm_pmu_request_counter_reprogam(pmc);
|
||||
return 0;
|
||||
kvm_pmu_request_counter_reprogram(pmc);
|
||||
}
|
||||
} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
|
||||
return 0;
|
||||
break;
|
||||
} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false)) {
|
||||
break;
|
||||
}
|
||||
/* Not a known PMU MSR. */
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
|
||||
@ -531,6 +536,16 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
|
||||
pmu->pebs_enable_mask = ~0ull;
|
||||
pmu->pebs_data_cfg_mask = ~0ull;
|
||||
|
||||
memset(&lbr_desc->records, 0, sizeof(lbr_desc->records));
|
||||
|
||||
/*
|
||||
* Setting passthrough of LBR MSRs is done only in the VM-Entry loop,
|
||||
* and PMU refresh is disallowed after the vCPU has run, i.e. this code
|
||||
* should never be reached while KVM is passing through MSRs.
|
||||
*/
|
||||
if (KVM_BUG_ON(lbr_desc->msr_passthrough, vcpu->kvm))
|
||||
return;
|
||||
|
||||
entry = kvm_find_cpuid_entry(vcpu, 0xa);
|
||||
if (!entry || !vcpu->kvm->arch.enable_pmu)
|
||||
return;
|
||||
|
@ -29,14 +29,14 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
|
||||
|
||||
/* Skip vmcs.GUEST_DS retrieval for 64-bit mode to avoid VMREADs. */
|
||||
*gva = offset;
|
||||
if (!is_long_mode(vcpu)) {
|
||||
if (!is_64_bit_mode(vcpu)) {
|
||||
vmx_get_segment(vcpu, &s, VCPU_SREG_DS);
|
||||
*gva += s.base;
|
||||
}
|
||||
|
||||
if (!IS_ALIGNED(*gva, alignment)) {
|
||||
fault = true;
|
||||
} else if (likely(is_long_mode(vcpu))) {
|
||||
} else if (likely(is_64_bit_mode(vcpu))) {
|
||||
fault = is_noncanonical_address(*gva, vcpu);
|
||||
} else {
|
||||
*gva &= 0xffffffff;
|
||||
|
@ -164,6 +164,7 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
|
||||
static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
|
||||
MSR_IA32_SPEC_CTRL,
|
||||
MSR_IA32_PRED_CMD,
|
||||
MSR_IA32_FLUSH_CMD,
|
||||
MSR_IA32_TSC,
|
||||
#ifdef CONFIG_X86_64
|
||||
MSR_FS_BASE,
|
||||
@ -579,7 +580,7 @@ static __init void hv_init_evmcs(void)
|
||||
|
||||
if (enlightened_vmcs) {
|
||||
pr_info("Using Hyper-V Enlightened VMCS\n");
|
||||
static_branch_enable(&enable_evmcs);
|
||||
static_branch_enable(&__kvm_is_using_evmcs);
|
||||
}
|
||||
|
||||
if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
|
||||
@ -595,7 +596,7 @@ static void hv_reset_evmcs(void)
|
||||
{
|
||||
struct hv_vp_assist_page *vp_ap;
|
||||
|
||||
if (!static_branch_unlikely(&enable_evmcs))
|
||||
if (!kvm_is_using_evmcs())
|
||||
return;
|
||||
|
||||
/*
|
||||
@ -1945,7 +1946,7 @@ static inline bool is_vmx_feature_control_msr_valid(struct vcpu_vmx *vmx,
|
||||
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
|
||||
{
|
||||
switch (msr->index) {
|
||||
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
|
||||
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
|
||||
if (!nested)
|
||||
return 1;
|
||||
return vmx_get_vmx_msr(&vmcs_config.nested, msr->index, &msr->data);
|
||||
@ -2030,7 +2031,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
|
||||
[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
|
||||
break;
|
||||
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
|
||||
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
|
||||
if (!nested_vmx_allowed(vcpu))
|
||||
return 1;
|
||||
if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
|
||||
@ -2285,33 +2286,6 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR))
|
||||
return 1;
|
||||
goto find_uret_msr;
|
||||
case MSR_IA32_PRED_CMD:
|
||||
if (!msr_info->host_initiated &&
|
||||
!guest_has_pred_cmd_msr(vcpu))
|
||||
return 1;
|
||||
|
||||
if (data & ~PRED_CMD_IBPB)
|
||||
return 1;
|
||||
if (!boot_cpu_has(X86_FEATURE_IBPB))
|
||||
return 1;
|
||||
if (!data)
|
||||
break;
|
||||
|
||||
wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
|
||||
|
||||
/*
|
||||
* For non-nested:
|
||||
* When it's written (to non-zero) for the first time, pass
|
||||
* it through.
|
||||
*
|
||||
* For nested:
|
||||
* The handling of the MSR bitmap for L2 guests is done in
|
||||
* nested_vmx_prepare_msr_bitmap. We should not touch the
|
||||
* vmcs02.msr_bitmap here since it gets completely overwritten
|
||||
* in the merging.
|
||||
*/
|
||||
vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W);
|
||||
break;
|
||||
case MSR_IA32_CR_PAT:
|
||||
if (!kvm_pat_valid(data))
|
||||
return 1;
|
||||
@ -2366,7 +2340,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
vmx->msr_ia32_sgxlepubkeyhash
|
||||
[msr_index - MSR_IA32_SGXLEPUBKEYHASH0] = data;
|
||||
break;
|
||||
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
|
||||
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
|
||||
if (!msr_info->host_initiated)
|
||||
return 1; /* they are read-only */
|
||||
if (!nested_vmx_allowed(vcpu))
|
||||
@ -2816,8 +2790,7 @@ static int vmx_hardware_enable(void)
|
||||
* This can happen if we hot-added a CPU but failed to allocate
|
||||
* VP assist page for it.
|
||||
*/
|
||||
if (static_branch_unlikely(&enable_evmcs) &&
|
||||
!hv_get_vp_assist_page(cpu))
|
||||
if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu))
|
||||
return -EFAULT;
|
||||
|
||||
intel_pt_handle_vmx(1);
|
||||
@ -2869,7 +2842,7 @@ struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags)
|
||||
memset(vmcs, 0, vmcs_config.size);
|
||||
|
||||
/* KVM supports Enlightened VMCS v1 only */
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
vmcs->hdr.revision_id = KVM_EVMCS_VERSION;
|
||||
else
|
||||
vmcs->hdr.revision_id = vmcs_config.revision_id;
|
||||
@ -2964,7 +2937,7 @@ static __init int alloc_kvm_area(void)
|
||||
* still be marked with revision_id reported by
|
||||
* physical CPU.
|
||||
*/
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
vmcs->hdr.revision_id = vmcs_config.revision_id;
|
||||
|
||||
per_cpu(vmxarea, cpu) = vmcs;
|
||||
@ -3931,7 +3904,7 @@ static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx)
|
||||
* 'Enlightened MSR Bitmap' feature L0 needs to know that MSR
|
||||
* bitmap has changed.
|
||||
*/
|
||||
if (IS_ENABLED(CONFIG_HYPERV) && static_branch_unlikely(&enable_evmcs)) {
|
||||
if (kvm_is_using_evmcs()) {
|
||||
struct hv_enlightened_vmcs *evmcs = (void *)vmx->vmcs01.vmcs;
|
||||
|
||||
if (evmcs->hv_enlightenments_control.msr_bitmap)
|
||||
@ -4773,7 +4746,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
|
||||
/* 22.2.1, 20.8.1 */
|
||||
vm_entry_controls_set(vmx, vmx_vmentry_ctrl());
|
||||
|
||||
vmx->vcpu.arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS;
|
||||
vmx->vcpu.arch.cr0_guest_owned_bits = vmx_l1_guest_owned_cr0_bits();
|
||||
vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr0_guest_owned_bits);
|
||||
|
||||
set_cr4_guest_host_mask(vmx);
|
||||
@ -5163,7 +5136,7 @@ bool vmx_guest_inject_ac(struct kvm_vcpu *vcpu)
|
||||
if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
|
||||
return true;
|
||||
|
||||
return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
|
||||
return vmx_get_cpl(vcpu) == 3 && kvm_is_cr0_bit_set(vcpu, X86_CR0_AM) &&
|
||||
(kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
|
||||
}
|
||||
|
||||
@ -5500,7 +5473,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
|
||||
break;
|
||||
case 3: /* lmsw */
|
||||
val = (exit_qualification >> LMSW_SOURCE_DATA_SHIFT) & 0x0f;
|
||||
trace_kvm_cr_write(0, (kvm_read_cr0(vcpu) & ~0xful) | val);
|
||||
trace_kvm_cr_write(0, (kvm_read_cr0_bits(vcpu, ~0xful) | val));
|
||||
kvm_lmsw(vcpu, val);
|
||||
|
||||
return kvm_skip_emulated_instruction(vcpu);
|
||||
@ -6957,7 +6930,7 @@ static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
|
||||
* real mode.
|
||||
*/
|
||||
return enable_unrestricted_guest || emulate_invalid_guest_state;
|
||||
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
|
||||
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
|
||||
return nested;
|
||||
case MSR_AMD64_VIRT_SPEC_CTRL:
|
||||
case MSR_AMD64_TSC_RATIO:
|
||||
@ -7310,7 +7283,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
|
||||
vmx_vcpu_enter_exit(vcpu, __vmx_vcpu_run_flags(vmx));
|
||||
|
||||
/* All fields are clean at this point */
|
||||
if (static_branch_unlikely(&enable_evmcs)) {
|
||||
if (kvm_is_using_evmcs()) {
|
||||
current_evmcs->hv_clean_fields |=
|
||||
HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
|
||||
|
||||
@ -7440,7 +7413,7 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
|
||||
* feature only for vmcs01, KVM currently isn't equipped to realize any
|
||||
* performance benefits from enabling it for vmcs02.
|
||||
*/
|
||||
if (IS_ENABLED(CONFIG_HYPERV) && static_branch_unlikely(&enable_evmcs) &&
|
||||
if (kvm_is_using_evmcs() &&
|
||||
(ms_hyperv.nested_features & HV_X64_NESTED_MSR_BITMAP)) {
|
||||
struct hv_enlightened_vmcs *evmcs = (void *)vmx->vmcs01.vmcs;
|
||||
|
||||
@ -7558,7 +7531,7 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
|
||||
if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
|
||||
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
|
||||
|
||||
if (kvm_read_cr0(vcpu) & X86_CR0_CD) {
|
||||
if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
|
||||
if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
|
||||
cache = MTRR_TYPE_WRBACK;
|
||||
else
|
||||
@ -7744,6 +7717,13 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
|
||||
!guest_cpuid_has(vcpu, X86_FEATURE_XFD));
|
||||
|
||||
if (boot_cpu_has(X86_FEATURE_IBPB))
|
||||
vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
|
||||
!guest_has_pred_cmd_msr(vcpu));
|
||||
|
||||
if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
|
||||
vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
|
||||
!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
|
||||
|
||||
set_cr4_guest_host_mask(vmx);
|
||||
|
||||
@ -7776,9 +7756,11 @@ static u64 vmx_get_perf_capabilities(void)
|
||||
if (boot_cpu_has(X86_FEATURE_PDCM))
|
||||
rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
|
||||
|
||||
x86_perf_get_lbr(&lbr);
|
||||
if (lbr.nr)
|
||||
perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
|
||||
if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) {
|
||||
x86_perf_get_lbr(&lbr);
|
||||
if (lbr.nr)
|
||||
perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
|
||||
}
|
||||
|
||||
if (vmx_pebs_supported()) {
|
||||
perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
|
||||
@ -7918,6 +7900,21 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
|
||||
/* FIXME: produce nested vmexit and return X86EMUL_INTERCEPTED. */
|
||||
break;
|
||||
|
||||
case x86_intercept_pause:
|
||||
/*
|
||||
* PAUSE is a single-byte NOP with a REPE prefix, i.e. collides
|
||||
* with vanilla NOPs in the emulator. Apply the interception
|
||||
* check only to actual PAUSE instructions. Don't check
|
||||
* PAUSE-loop-exiting, software can't expect a given PAUSE to
|
||||
* exit, i.e. KVM is within its rights to allow L2 to execute
|
||||
* the PAUSE.
|
||||
*/
|
||||
if ((info->rep_prefix != REPE_PREFIX) ||
|
||||
!nested_cpu_has2(vmcs12, CPU_BASED_PAUSE_EXITING))
|
||||
return X86EMUL_CONTINUE;
|
||||
|
||||
break;
|
||||
|
||||
/* TODO: check more intercepts... */
|
||||
default:
|
||||
break;
|
||||
@ -8415,9 +8412,8 @@ static __init int hardware_setup(void)
|
||||
#if IS_ENABLED(CONFIG_HYPERV)
|
||||
if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
|
||||
&& enable_ept) {
|
||||
vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
|
||||
vmx_x86_ops.tlb_remote_flush_with_range =
|
||||
hv_remote_flush_tlb_with_range;
|
||||
vmx_x86_ops.flush_remote_tlbs = hv_flush_remote_tlbs;
|
||||
vmx_x86_ops.flush_remote_tlbs_range = hv_flush_remote_tlbs_range;
|
||||
}
|
||||
#endif
|
||||
|
||||
|
@ -369,7 +369,7 @@ struct vcpu_vmx {
|
||||
struct lbr_desc lbr_desc;
|
||||
|
||||
/* Save desired MSR intercept (read: pass-through) state */
|
||||
#define MAX_POSSIBLE_PASSTHROUGH_MSRS 15
|
||||
#define MAX_POSSIBLE_PASSTHROUGH_MSRS 16
|
||||
struct {
|
||||
DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
|
||||
DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
|
||||
@ -640,6 +640,24 @@ BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64)
|
||||
(1 << VCPU_EXREG_EXIT_INFO_1) | \
|
||||
(1 << VCPU_EXREG_EXIT_INFO_2))
|
||||
|
||||
static inline unsigned long vmx_l1_guest_owned_cr0_bits(void)
|
||||
{
|
||||
unsigned long bits = KVM_POSSIBLE_CR0_GUEST_BITS;
|
||||
|
||||
/*
|
||||
* CR0.WP needs to be intercepted when KVM is shadowing legacy paging
|
||||
* in order to construct shadow PTEs with the correct protections.
|
||||
* Note! CR0.WP technically can be passed through to the guest if
|
||||
* paging is disabled, but checking CR0.PG would generate a cyclical
|
||||
* dependency of sorts due to forcing the caller to ensure CR0 holds
|
||||
* the correct value prior to determining which CR0 bits can be owned
|
||||
* by L1. Keep it simple and limit the optimization to EPT.
|
||||
*/
|
||||
if (!enable_ept)
|
||||
bits &= ~X86_CR0_WP;
|
||||
return bits;
|
||||
}
|
||||
|
||||
static __always_inline struct kvm_vmx *to_kvm_vmx(struct kvm *kvm)
|
||||
{
|
||||
return container_of(kvm, struct kvm_vmx, kvm);
|
||||
|
@ -147,7 +147,7 @@ do_exception:
|
||||
static __always_inline u16 vmcs_read16(unsigned long field)
|
||||
{
|
||||
vmcs_check16(field);
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_read16(field);
|
||||
return __vmcs_readl(field);
|
||||
}
|
||||
@ -155,7 +155,7 @@ static __always_inline u16 vmcs_read16(unsigned long field)
|
||||
static __always_inline u32 vmcs_read32(unsigned long field)
|
||||
{
|
||||
vmcs_check32(field);
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_read32(field);
|
||||
return __vmcs_readl(field);
|
||||
}
|
||||
@ -163,7 +163,7 @@ static __always_inline u32 vmcs_read32(unsigned long field)
|
||||
static __always_inline u64 vmcs_read64(unsigned long field)
|
||||
{
|
||||
vmcs_check64(field);
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_read64(field);
|
||||
#ifdef CONFIG_X86_64
|
||||
return __vmcs_readl(field);
|
||||
@ -175,7 +175,7 @@ static __always_inline u64 vmcs_read64(unsigned long field)
|
||||
static __always_inline unsigned long vmcs_readl(unsigned long field)
|
||||
{
|
||||
vmcs_checkl(field);
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_read64(field);
|
||||
return __vmcs_readl(field);
|
||||
}
|
||||
@ -222,7 +222,7 @@ static __always_inline void __vmcs_writel(unsigned long field, unsigned long val
|
||||
static __always_inline void vmcs_write16(unsigned long field, u16 value)
|
||||
{
|
||||
vmcs_check16(field);
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_write16(field, value);
|
||||
|
||||
__vmcs_writel(field, value);
|
||||
@ -231,7 +231,7 @@ static __always_inline void vmcs_write16(unsigned long field, u16 value)
|
||||
static __always_inline void vmcs_write32(unsigned long field, u32 value)
|
||||
{
|
||||
vmcs_check32(field);
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_write32(field, value);
|
||||
|
||||
__vmcs_writel(field, value);
|
||||
@ -240,7 +240,7 @@ static __always_inline void vmcs_write32(unsigned long field, u32 value)
|
||||
static __always_inline void vmcs_write64(unsigned long field, u64 value)
|
||||
{
|
||||
vmcs_check64(field);
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_write64(field, value);
|
||||
|
||||
__vmcs_writel(field, value);
|
||||
@ -252,7 +252,7 @@ static __always_inline void vmcs_write64(unsigned long field, u64 value)
|
||||
static __always_inline void vmcs_writel(unsigned long field, unsigned long value)
|
||||
{
|
||||
vmcs_checkl(field);
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_write64(field, value);
|
||||
|
||||
__vmcs_writel(field, value);
|
||||
@ -262,7 +262,7 @@ static __always_inline void vmcs_clear_bits(unsigned long field, u32 mask)
|
||||
{
|
||||
BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x2000,
|
||||
"vmcs_clear_bits does not support 64-bit fields");
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_write32(field, evmcs_read32(field) & ~mask);
|
||||
|
||||
__vmcs_writel(field, __vmcs_readl(field) & ~mask);
|
||||
@ -272,7 +272,7 @@ static __always_inline void vmcs_set_bits(unsigned long field, u32 mask)
|
||||
{
|
||||
BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 0x2000,
|
||||
"vmcs_set_bits does not support 64-bit fields");
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_write32(field, evmcs_read32(field) | mask);
|
||||
|
||||
__vmcs_writel(field, __vmcs_readl(field) | mask);
|
||||
@ -289,7 +289,7 @@ static inline void vmcs_load(struct vmcs *vmcs)
|
||||
{
|
||||
u64 phys_addr = __pa(vmcs);
|
||||
|
||||
if (static_branch_unlikely(&enable_evmcs))
|
||||
if (kvm_is_using_evmcs())
|
||||
return evmcs_load(phys_addr);
|
||||
|
||||
vmx_asm1(vmptrld, "m"(phys_addr), vmcs, phys_addr);
|
||||
|
@ -196,7 +196,7 @@ bool __read_mostly eager_page_split = true;
|
||||
module_param(eager_page_split, bool, 0644);
|
||||
|
||||
/* Enable/disable SMT_RSB bug mitigation */
|
||||
bool __read_mostly mitigate_smt_rsb;
|
||||
static bool __read_mostly mitigate_smt_rsb;
|
||||
module_param(mitigate_smt_rsb, bool, 0444);
|
||||
|
||||
/*
|
||||
@ -804,8 +804,8 @@ void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
|
||||
*/
|
||||
if ((fault->error_code & PFERR_PRESENT_MASK) &&
|
||||
!(fault->error_code & PFERR_RSVD_MASK))
|
||||
kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
|
||||
fault_mmu->root.hpa);
|
||||
kvm_mmu_invalidate_addr(vcpu, fault_mmu, fault->address,
|
||||
KVM_MMU_ROOT_CURRENT);
|
||||
|
||||
fault_mmu->inject_page_fault(vcpu, fault);
|
||||
}
|
||||
@ -843,7 +843,7 @@ bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl)
|
||||
|
||||
bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr)
|
||||
{
|
||||
if ((dr != 4 && dr != 5) || !kvm_read_cr4_bits(vcpu, X86_CR4_DE))
|
||||
if ((dr != 4 && dr != 5) || !kvm_is_cr4_bit_set(vcpu, X86_CR4_DE))
|
||||
return true;
|
||||
|
||||
kvm_queue_exception(vcpu, UD_VECTOR);
|
||||
@ -908,6 +908,24 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
|
||||
|
||||
void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
|
||||
{
|
||||
/*
|
||||
* CR0.WP is incorporated into the MMU role, but only for non-nested,
|
||||
* indirect shadow MMUs. If paging is disabled, no updates are needed
|
||||
* as there are no permission bits to emulate. If TDP is enabled, the
|
||||
* MMU's metadata needs to be updated, e.g. so that emulating guest
|
||||
* translations does the right thing, but there's no need to unload the
|
||||
* root as CR0.WP doesn't affect SPTEs.
|
||||
*/
|
||||
if ((cr0 ^ old_cr0) == X86_CR0_WP) {
|
||||
if (!(cr0 & X86_CR0_PG))
|
||||
return;
|
||||
|
||||
if (tdp_enabled) {
|
||||
kvm_init_mmu(vcpu);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
if ((cr0 ^ old_cr0) & X86_CR0_PG) {
|
||||
kvm_clear_async_pf_completion_queue(vcpu);
|
||||
kvm_async_pf_hash_reset(vcpu);
|
||||
@ -967,7 +985,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
|
||||
return 1;
|
||||
|
||||
if (!(cr0 & X86_CR0_PG) &&
|
||||
(is_64_bit_mode(vcpu) || kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)))
|
||||
(is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
|
||||
return 1;
|
||||
|
||||
static_call(kvm_x86_set_cr0)(vcpu, cr0);
|
||||
@ -989,7 +1007,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
|
||||
if (vcpu->arch.guest_state_protected)
|
||||
return;
|
||||
|
||||
if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) {
|
||||
if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) {
|
||||
|
||||
if (vcpu->arch.xcr0 != host_xcr0)
|
||||
xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);
|
||||
@ -1003,7 +1021,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
|
||||
if (static_cpu_has(X86_FEATURE_PKU) &&
|
||||
vcpu->arch.pkru != vcpu->arch.host_pkru &&
|
||||
((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
|
||||
kvm_read_cr4_bits(vcpu, X86_CR4_PKE)))
|
||||
kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE)))
|
||||
write_pkru(vcpu->arch.pkru);
|
||||
#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
|
||||
}
|
||||
@ -1017,14 +1035,14 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
|
||||
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
|
||||
if (static_cpu_has(X86_FEATURE_PKU) &&
|
||||
((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
|
||||
kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) {
|
||||
kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) {
|
||||
vcpu->arch.pkru = rdpkru();
|
||||
if (vcpu->arch.pkru != vcpu->arch.host_pkru)
|
||||
write_pkru(vcpu->arch.host_pkru);
|
||||
}
|
||||
#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
|
||||
|
||||
if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) {
|
||||
if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) {
|
||||
|
||||
if (vcpu->arch.xcr0 != host_xcr0)
|
||||
xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
|
||||
@ -1180,9 +1198,6 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
|
||||
return 1;
|
||||
|
||||
if ((cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE)) {
|
||||
if (!guest_cpuid_has(vcpu, X86_FEATURE_PCID))
|
||||
return 1;
|
||||
|
||||
/* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
|
||||
if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
|
||||
return 1;
|
||||
@ -1229,7 +1244,7 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
|
||||
* PCIDs for them are also 0, because MOV to CR3 always flushes the TLB
|
||||
* with PCIDE=0.
|
||||
*/
|
||||
if (!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))
|
||||
if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE))
|
||||
return;
|
||||
|
||||
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
|
||||
@ -1244,9 +1259,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
|
||||
bool skip_tlb_flush = false;
|
||||
unsigned long pcid = 0;
|
||||
#ifdef CONFIG_X86_64
|
||||
bool pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
|
||||
|
||||
if (pcid_enabled) {
|
||||
if (kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)) {
|
||||
skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
|
||||
cr3 &= ~X86_CR3_PCID_NOFLUSH;
|
||||
pcid = cr3 & X86_CR3_PCID_MASK;
|
||||
@ -1545,38 +1558,40 @@ static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
|
||||
static unsigned num_emulated_msrs;
|
||||
|
||||
/*
|
||||
* List of msr numbers which are used to expose MSR-based features that
|
||||
* can be used by a hypervisor to validate requested CPU features.
|
||||
* List of MSRs that control the existence of MSR-based features, i.e. MSRs
|
||||
* that are effectively CPUID leafs. VMX MSRs are also included in the set of
|
||||
* feature MSRs, but are handled separately to allow expedited lookups.
|
||||
*/
|
||||
static const u32 msr_based_features_all[] = {
|
||||
MSR_IA32_VMX_BASIC,
|
||||
MSR_IA32_VMX_TRUE_PINBASED_CTLS,
|
||||
MSR_IA32_VMX_PINBASED_CTLS,
|
||||
MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
|
||||
MSR_IA32_VMX_PROCBASED_CTLS,
|
||||
MSR_IA32_VMX_TRUE_EXIT_CTLS,
|
||||
MSR_IA32_VMX_EXIT_CTLS,
|
||||
MSR_IA32_VMX_TRUE_ENTRY_CTLS,
|
||||
MSR_IA32_VMX_ENTRY_CTLS,
|
||||
MSR_IA32_VMX_MISC,
|
||||
MSR_IA32_VMX_CR0_FIXED0,
|
||||
MSR_IA32_VMX_CR0_FIXED1,
|
||||
MSR_IA32_VMX_CR4_FIXED0,
|
||||
MSR_IA32_VMX_CR4_FIXED1,
|
||||
MSR_IA32_VMX_VMCS_ENUM,
|
||||
MSR_IA32_VMX_PROCBASED_CTLS2,
|
||||
MSR_IA32_VMX_EPT_VPID_CAP,
|
||||
MSR_IA32_VMX_VMFUNC,
|
||||
|
||||
static const u32 msr_based_features_all_except_vmx[] = {
|
||||
MSR_AMD64_DE_CFG,
|
||||
MSR_IA32_UCODE_REV,
|
||||
MSR_IA32_ARCH_CAPABILITIES,
|
||||
MSR_IA32_PERF_CAPABILITIES,
|
||||
};
|
||||
|
||||
static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all)];
|
||||
static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
|
||||
(KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
|
||||
static unsigned int num_msr_based_features;
|
||||
|
||||
/*
|
||||
* All feature MSRs except uCode revID, which tracks the currently loaded uCode
|
||||
* patch, are immutable once the vCPU model is defined.
|
||||
*/
|
||||
static bool kvm_is_immutable_feature_msr(u32 msr)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
|
||||
return true;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
|
||||
if (msr == msr_based_features_all_except_vmx[i])
|
||||
return msr != MSR_IA32_UCODE_REV;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM
|
||||
* does not yet virtualize. These include:
|
||||
@ -2194,6 +2209,22 @@ static int do_get_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
|
||||
|
||||
static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
|
||||
{
|
||||
u64 val;
|
||||
|
||||
/*
|
||||
* Disallow writes to immutable feature MSRs after KVM_RUN. KVM does
|
||||
* not support modifying the guest vCPU model on the fly, e.g. changing
|
||||
* the nVMX capabilities while L2 is running is nonsensical. Ignore
|
||||
* writes of the same value, e.g. to allow userspace to blindly stuff
|
||||
* all MSRs when emulating RESET.
|
||||
*/
|
||||
if (kvm_vcpu_has_run(vcpu) && kvm_is_immutable_feature_msr(index)) {
|
||||
if (do_get_msr(vcpu, index, &val) || *data != val)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
return kvm_set_msr_ignored_check(vcpu, index, *data, true);
|
||||
}
|
||||
|
||||
@ -3616,9 +3647,40 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
if (data & ~kvm_caps.supported_perf_cap)
|
||||
return 1;
|
||||
|
||||
/*
|
||||
* Note, this is not just a performance optimization! KVM
|
||||
* disallows changing feature MSRs after the vCPU has run; PMU
|
||||
* refresh will bug the VM if called after the vCPU has run.
|
||||
*/
|
||||
if (vcpu->arch.perf_capabilities == data)
|
||||
break;
|
||||
|
||||
vcpu->arch.perf_capabilities = data;
|
||||
kvm_pmu_refresh(vcpu);
|
||||
return 0;
|
||||
break;
|
||||
case MSR_IA32_PRED_CMD:
|
||||
if (!msr_info->host_initiated && !guest_has_pred_cmd_msr(vcpu))
|
||||
return 1;
|
||||
|
||||
if (!boot_cpu_has(X86_FEATURE_IBPB) || (data & ~PRED_CMD_IBPB))
|
||||
return 1;
|
||||
if (!data)
|
||||
break;
|
||||
|
||||
wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
|
||||
break;
|
||||
case MSR_IA32_FLUSH_CMD:
|
||||
if (!msr_info->host_initiated &&
|
||||
!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D))
|
||||
return 1;
|
||||
|
||||
if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
|
||||
return 1;
|
||||
if (!data)
|
||||
break;
|
||||
|
||||
wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
|
||||
break;
|
||||
case MSR_EFER:
|
||||
return set_efer(vcpu, msr_info);
|
||||
case MSR_K7_HWCR:
|
||||
@ -4534,9 +4596,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
r = 0;
|
||||
break;
|
||||
case KVM_CAP_XSAVE2: {
|
||||
u64 guest_perm = xstate_get_guest_group_perm();
|
||||
|
||||
r = xstate_required_size(kvm_caps.supported_xcr0 & guest_perm, false);
|
||||
r = xstate_required_size(kvm_get_filtered_xcr0(), false);
|
||||
if (r < sizeof(struct kvm_xsave))
|
||||
r = sizeof(struct kvm_xsave);
|
||||
break;
|
||||
@ -5036,7 +5096,7 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu,
|
||||
return 0;
|
||||
if (mce->status & MCI_STATUS_UC) {
|
||||
if ((vcpu->arch.mcg_status & MCG_STATUS_MCIP) ||
|
||||
!kvm_read_cr4_bits(vcpu, X86_CR4_MCE)) {
|
||||
!kvm_is_cr4_bit_set(vcpu, X86_CR4_MCE)) {
|
||||
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
|
||||
return 0;
|
||||
}
|
||||
@ -5128,7 +5188,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
|
||||
events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
|
||||
|
||||
events->nmi.injected = vcpu->arch.nmi_injected;
|
||||
events->nmi.pending = vcpu->arch.nmi_pending != 0;
|
||||
events->nmi.pending = kvm_get_nr_pending_nmis(vcpu);
|
||||
events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu);
|
||||
|
||||
/* events->sipi_vector is never valid when reporting to user space */
|
||||
@ -5215,8 +5275,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
|
||||
events->interrupt.shadow);
|
||||
|
||||
vcpu->arch.nmi_injected = events->nmi.injected;
|
||||
if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
|
||||
vcpu->arch.nmi_pending = events->nmi.pending;
|
||||
if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
|
||||
vcpu->arch.nmi_pending = 0;
|
||||
atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
|
||||
kvm_make_request(KVM_REQ_NMI, vcpu);
|
||||
}
|
||||
static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
|
||||
|
||||
if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
|
||||
@ -6024,11 +6087,6 @@ static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static unsigned long kvm_vm_ioctl_get_nr_mmu_pages(struct kvm *kvm)
|
||||
{
|
||||
return kvm->arch.n_max_mmu_pages;
|
||||
}
|
||||
|
||||
static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
|
||||
{
|
||||
struct kvm_pic *pic = kvm->arch.vpic;
|
||||
@ -6675,8 +6733,7 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
|
||||
return 0;
|
||||
}
|
||||
|
||||
long kvm_arch_vm_ioctl(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg)
|
||||
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
|
||||
{
|
||||
struct kvm *kvm = filp->private_data;
|
||||
void __user *argp = (void __user *)arg;
|
||||
@ -6714,9 +6771,6 @@ set_identity_unlock:
|
||||
case KVM_SET_NR_MMU_PAGES:
|
||||
r = kvm_vm_ioctl_set_nr_mmu_pages(kvm, arg);
|
||||
break;
|
||||
case KVM_GET_NR_MMU_PAGES:
|
||||
r = kvm_vm_ioctl_get_nr_mmu_pages(kvm);
|
||||
break;
|
||||
case KVM_CREATE_IRQCHIP: {
|
||||
mutex_lock(&kvm->lock);
|
||||
|
||||
@ -7021,6 +7075,18 @@ out:
|
||||
return r;
|
||||
}
|
||||
|
||||
static void kvm_probe_feature_msr(u32 msr_index)
|
||||
{
|
||||
struct kvm_msr_entry msr = {
|
||||
.index = msr_index,
|
||||
};
|
||||
|
||||
if (kvm_get_msr_feature(&msr))
|
||||
return;
|
||||
|
||||
msr_based_features[num_msr_based_features++] = msr_index;
|
||||
}
|
||||
|
||||
static void kvm_probe_msr_to_save(u32 msr_index)
|
||||
{
|
||||
u32 dummy[2];
|
||||
@ -7096,7 +7162,7 @@ static void kvm_probe_msr_to_save(u32 msr_index)
|
||||
msrs_to_save[num_msrs_to_save++] = msr_index;
|
||||
}
|
||||
|
||||
static void kvm_init_msr_list(void)
|
||||
static void kvm_init_msr_lists(void)
|
||||
{
|
||||
unsigned i;
|
||||
|
||||
@ -7122,15 +7188,11 @@ static void kvm_init_msr_list(void)
|
||||
emulated_msrs[num_emulated_msrs++] = emulated_msrs_all[i];
|
||||
}
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(msr_based_features_all); i++) {
|
||||
struct kvm_msr_entry msr;
|
||||
for (i = KVM_FIRST_EMULATED_VMX_MSR; i <= KVM_LAST_EMULATED_VMX_MSR; i++)
|
||||
kvm_probe_feature_msr(i);
|
||||
|
||||
msr.index = msr_based_features_all[i];
|
||||
if (kvm_get_msr_feature(&msr))
|
||||
continue;
|
||||
|
||||
msr_based_features[num_msr_based_features++] = msr_based_features_all[i];
|
||||
}
|
||||
for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++)
|
||||
kvm_probe_feature_msr(msr_based_features_all_except_vmx[i]);
|
||||
}
|
||||
|
||||
static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
|
||||
@ -8466,7 +8528,6 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
|
||||
}
|
||||
|
||||
static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
bool write_fault_to_shadow_pgtable,
|
||||
int emulation_type)
|
||||
{
|
||||
gpa_t gpa = cr2_or_gpa;
|
||||
@ -8537,7 +8598,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
* be fixed by unprotecting shadow page and it should
|
||||
* be reported to userspace.
|
||||
*/
|
||||
return !write_fault_to_shadow_pgtable;
|
||||
return !(emulation_type & EMULTYPE_WRITE_PF_TO_SP);
|
||||
}
|
||||
|
||||
static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
|
||||
@ -8785,20 +8846,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
int r;
|
||||
struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
|
||||
bool writeback = true;
|
||||
bool write_fault_to_spt;
|
||||
|
||||
if (unlikely(!kvm_can_emulate_insn(vcpu, emulation_type, insn, insn_len)))
|
||||
return 1;
|
||||
|
||||
vcpu->arch.l1tf_flush_l1d = true;
|
||||
|
||||
/*
|
||||
* Clear write_fault_to_shadow_pgtable here to ensure it is
|
||||
* never reused.
|
||||
*/
|
||||
write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable;
|
||||
vcpu->arch.write_fault_to_shadow_pgtable = false;
|
||||
|
||||
if (!(emulation_type & EMULTYPE_NO_DECODE)) {
|
||||
kvm_clear_exception_queue(vcpu);
|
||||
|
||||
@ -8819,7 +8872,6 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
return 1;
|
||||
}
|
||||
if (reexecute_instruction(vcpu, cr2_or_gpa,
|
||||
write_fault_to_spt,
|
||||
emulation_type))
|
||||
return 1;
|
||||
|
||||
@ -8898,8 +8950,7 @@ restart:
|
||||
return 1;
|
||||
|
||||
if (r == EMULATION_FAILED) {
|
||||
if (reexecute_instruction(vcpu, cr2_or_gpa, write_fault_to_spt,
|
||||
emulation_type))
|
||||
if (reexecute_instruction(vcpu, cr2_or_gpa, emulation_type))
|
||||
return 1;
|
||||
|
||||
return handle_emulation_failure(vcpu, emulation_type);
|
||||
@ -9477,7 +9528,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
|
||||
kvm_caps.max_guest_tsc_khz = max;
|
||||
}
|
||||
kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
|
||||
kvm_init_msr_list();
|
||||
kvm_init_msr_lists();
|
||||
return 0;
|
||||
|
||||
out_unwind_ops:
|
||||
@ -9808,7 +9859,11 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
|
||||
vcpu->run->hypercall.args[0] = gpa;
|
||||
vcpu->run->hypercall.args[1] = npages;
|
||||
vcpu->run->hypercall.args[2] = attrs;
|
||||
vcpu->run->hypercall.longmode = op_64_bit;
|
||||
vcpu->run->hypercall.flags = 0;
|
||||
if (op_64_bit)
|
||||
vcpu->run->hypercall.flags |= KVM_EXIT_HYPERCALL_LONG_MODE;
|
||||
|
||||
WARN_ON_ONCE(vcpu->run->hypercall.flags & KVM_EXIT_HYPERCALL_MBZ);
|
||||
vcpu->arch.complete_userspace_io = complete_hypercall_exit;
|
||||
return 0;
|
||||
}
|
||||
@ -10170,19 +10225,46 @@ out:
|
||||
|
||||
static void process_nmi(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned limit = 2;
|
||||
unsigned int limit;
|
||||
|
||||
/*
|
||||
* x86 is limited to one NMI running, and one NMI pending after it.
|
||||
* If an NMI is already in progress, limit further NMIs to just one.
|
||||
* Otherwise, allow two (and we'll inject the first one immediately).
|
||||
* x86 is limited to one NMI pending, but because KVM can't react to
|
||||
* incoming NMIs as quickly as bare metal, e.g. if the vCPU is
|
||||
* scheduled out, KVM needs to play nice with two queued NMIs showing
|
||||
* up at the same time. To handle this scenario, allow two NMIs to be
|
||||
* (temporarily) pending so long as NMIs are not blocked and KVM is not
|
||||
* waiting for a previous NMI injection to complete (which effectively
|
||||
* blocks NMIs). KVM will immediately inject one of the two NMIs, and
|
||||
* will request an NMI window to handle the second NMI.
|
||||
*/
|
||||
if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
|
||||
limit = 1;
|
||||
else
|
||||
limit = 2;
|
||||
|
||||
/*
|
||||
* Adjust the limit to account for pending virtual NMIs, which aren't
|
||||
* tracked in vcpu->arch.nmi_pending.
|
||||
*/
|
||||
if (static_call(kvm_x86_is_vnmi_pending)(vcpu))
|
||||
limit--;
|
||||
|
||||
vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
|
||||
vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
|
||||
if (vcpu->arch.nmi_pending &&
|
||||
(static_call(kvm_x86_set_vnmi_pending)(vcpu)))
|
||||
vcpu->arch.nmi_pending--;
|
||||
|
||||
if (vcpu->arch.nmi_pending)
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
}
|
||||
|
||||
/* Return total number of NMIs pending injection to the VM */
|
||||
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.nmi_pending +
|
||||
static_call(kvm_x86_is_vnmi_pending)(vcpu);
|
||||
}
|
||||
|
||||
void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
|
||||
@ -13268,7 +13350,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
|
||||
return 1;
|
||||
}
|
||||
|
||||
pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
|
||||
pcid_enabled = kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE);
|
||||
|
||||
switch (type) {
|
||||
case INVPCID_TYPE_INDIV_ADDR:
|
||||
|
@ -3,6 +3,7 @@
|
||||
#define ARCH_X86_KVM_X86_H
|
||||
|
||||
#include <linux/kvm_host.h>
|
||||
#include <asm/fpu/xstate.h>
|
||||
#include <asm/mce.h>
|
||||
#include <asm/pvclock.h>
|
||||
#include "kvm_cache_regs.h"
|
||||
@ -40,6 +41,14 @@ void kvm_spurious_fault(void);
|
||||
failed; \
|
||||
})
|
||||
|
||||
/*
|
||||
* The first...last VMX feature MSRs that are emulated by KVM. This may or may
|
||||
* not cover all known VMX MSRs, as KVM doesn't emulate an MSR until there's an
|
||||
* associated feature that KVM supports for nested virtualization.
|
||||
*/
|
||||
#define KVM_FIRST_EMULATED_VMX_MSR MSR_IA32_VMX_BASIC
|
||||
#define KVM_LAST_EMULATED_VMX_MSR MSR_IA32_VMX_VMFUNC
|
||||
|
||||
#define KVM_DEFAULT_PLE_GAP 128
|
||||
#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
|
||||
#define KVM_DEFAULT_PLE_WINDOW_GROW 2
|
||||
@ -83,6 +92,11 @@ static inline unsigned int __shrink_ple_window(unsigned int val,
|
||||
void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
|
||||
int kvm_check_nested_events(struct kvm_vcpu *vcpu);
|
||||
|
||||
static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.last_vmentry_cpu != -1;
|
||||
}
|
||||
|
||||
static inline bool kvm_is_exception_pending(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.exception.pending ||
|
||||
@ -123,15 +137,15 @@ static inline bool kvm_exception_is_soft(unsigned int nr)
|
||||
|
||||
static inline bool is_protmode(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_read_cr0_bits(vcpu, X86_CR0_PE);
|
||||
return kvm_is_cr0_bit_set(vcpu, X86_CR0_PE);
|
||||
}
|
||||
|
||||
static inline int is_long_mode(struct kvm_vcpu *vcpu)
|
||||
static inline bool is_long_mode(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
#ifdef CONFIG_X86_64
|
||||
return vcpu->arch.efer & EFER_LMA;
|
||||
return !!(vcpu->arch.efer & EFER_LMA);
|
||||
#else
|
||||
return 0;
|
||||
return false;
|
||||
#endif
|
||||
}
|
||||
|
||||
@ -171,19 +185,19 @@ static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
|
||||
return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
|
||||
}
|
||||
|
||||
static inline int is_pae(struct kvm_vcpu *vcpu)
|
||||
static inline bool is_pae(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
|
||||
return kvm_is_cr4_bit_set(vcpu, X86_CR4_PAE);
|
||||
}
|
||||
|
||||
static inline int is_pse(struct kvm_vcpu *vcpu)
|
||||
static inline bool is_pse(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_read_cr4_bits(vcpu, X86_CR4_PSE);
|
||||
return kvm_is_cr4_bit_set(vcpu, X86_CR4_PSE);
|
||||
}
|
||||
|
||||
static inline int is_paging(struct kvm_vcpu *vcpu)
|
||||
static inline bool is_paging(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return likely(kvm_read_cr0_bits(vcpu, X86_CR0_PG));
|
||||
return likely(kvm_is_cr0_bit_set(vcpu, X86_CR0_PG));
|
||||
}
|
||||
|
||||
static inline bool is_pae_paging(struct kvm_vcpu *vcpu)
|
||||
@ -193,7 +207,7 @@ static inline bool is_pae_paging(struct kvm_vcpu *vcpu)
|
||||
|
||||
static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
|
||||
return kvm_is_cr4_bit_set(vcpu, X86_CR4_LA57) ? 57 : 48;
|
||||
}
|
||||
|
||||
static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
|
||||
@ -315,6 +329,34 @@ extern struct kvm_caps kvm_caps;
|
||||
|
||||
extern bool enable_pmu;
|
||||
|
||||
/*
|
||||
* Get a filtered version of KVM's supported XCR0 that strips out dynamic
|
||||
* features for which the current process doesn't (yet) have permission to use.
|
||||
* This is intended to be used only when enumerating support to userspace,
|
||||
* e.g. in KVM_GET_SUPPORTED_CPUID and KVM_CAP_XSAVE2, it does NOT need to be
|
||||
* used to check/restrict guest behavior as KVM rejects KVM_SET_CPUID{2} if
|
||||
* userspace attempts to enable unpermitted features.
|
||||
*/
|
||||
static inline u64 kvm_get_filtered_xcr0(void)
|
||||
{
|
||||
u64 permitted_xcr0 = kvm_caps.supported_xcr0;
|
||||
|
||||
BUILD_BUG_ON(XFEATURE_MASK_USER_DYNAMIC != XFEATURE_MASK_XTILE_DATA);
|
||||
|
||||
if (permitted_xcr0 & XFEATURE_MASK_USER_DYNAMIC) {
|
||||
permitted_xcr0 &= xstate_get_guest_group_perm();
|
||||
|
||||
/*
|
||||
* Treat XTILE_CFG as unsupported if the current process isn't
|
||||
* allowed to use XTILE_DATA, as attempting to set XTILE_CFG in
|
||||
* XCR0 without setting XTILE_DATA is architecturally illegal.
|
||||
*/
|
||||
if (!(permitted_xcr0 & XFEATURE_MASK_XTILE_DATA))
|
||||
permitted_xcr0 &= ~XFEATURE_MASK_XTILE_CFG;
|
||||
}
|
||||
return permitted_xcr0;
|
||||
}
|
||||
|
||||
static inline bool kvm_mpx_supported(void)
|
||||
{
|
||||
return (kvm_caps.supported_xcr0 & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR))
|
||||
|
@ -21,6 +21,7 @@
|
||||
#define CNTHCTL_EVNTEN (1 << 2)
|
||||
#define CNTHCTL_EVNTDIR (1 << 3)
|
||||
#define CNTHCTL_EVNTI (0xF << 4)
|
||||
#define CNTHCTL_ECV (1 << 12)
|
||||
|
||||
enum arch_timer_reg {
|
||||
ARCH_TIMER_REG_CTRL,
|
||||
|
@ -13,6 +13,9 @@
|
||||
enum kvm_arch_timers {
|
||||
TIMER_PTIMER,
|
||||
TIMER_VTIMER,
|
||||
NR_KVM_EL0_TIMERS,
|
||||
TIMER_HVTIMER = NR_KVM_EL0_TIMERS,
|
||||
TIMER_HPTIMER,
|
||||
NR_KVM_TIMERS
|
||||
};
|
||||
|
||||
@ -21,6 +24,7 @@ enum kvm_arch_timer_regs {
|
||||
TIMER_REG_CVAL,
|
||||
TIMER_REG_TVAL,
|
||||
TIMER_REG_CTL,
|
||||
TIMER_REG_VOFF,
|
||||
};
|
||||
|
||||
struct arch_timer_offset {
|
||||
@ -29,21 +33,29 @@ struct arch_timer_offset {
|
||||
* structure. If NULL, assume a zero offset.
|
||||
*/
|
||||
u64 *vm_offset;
|
||||
/*
|
||||
* If set, pointer to one of the offsets in the vcpu's sysreg
|
||||
* array. If NULL, assume a zero offset.
|
||||
*/
|
||||
u64 *vcpu_offset;
|
||||
};
|
||||
|
||||
struct arch_timer_vm_data {
|
||||
/* Offset applied to the virtual timer/counter */
|
||||
u64 voffset;
|
||||
/* Offset applied to the physical timer/counter */
|
||||
u64 poffset;
|
||||
|
||||
/* The PPI for each timer, global to the VM */
|
||||
u8 ppi[NR_KVM_TIMERS];
|
||||
};
|
||||
|
||||
struct arch_timer_context {
|
||||
struct kvm_vcpu *vcpu;
|
||||
|
||||
/* Timer IRQ */
|
||||
struct kvm_irq_level irq;
|
||||
|
||||
/* Emulated Timer (may be unused) */
|
||||
struct hrtimer hrtimer;
|
||||
u64 ns_frac;
|
||||
|
||||
/* Offset for this counter/timer */
|
||||
struct arch_timer_offset offset;
|
||||
@ -54,14 +66,19 @@ struct arch_timer_context {
|
||||
*/
|
||||
bool loaded;
|
||||
|
||||
/* Output level of the timer IRQ */
|
||||
struct {
|
||||
bool level;
|
||||
} irq;
|
||||
|
||||
/* Duplicated state from arch_timer.c for convenience */
|
||||
u32 host_timer_irq;
|
||||
u32 host_timer_irq_flags;
|
||||
};
|
||||
|
||||
struct timer_map {
|
||||
struct arch_timer_context *direct_vtimer;
|
||||
struct arch_timer_context *direct_ptimer;
|
||||
struct arch_timer_context *emul_vtimer;
|
||||
struct arch_timer_context *emul_ptimer;
|
||||
};
|
||||
|
||||
@ -84,6 +101,8 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
|
||||
void kvm_timer_update_run(struct kvm_vcpu *vcpu);
|
||||
void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_timer_init_vm(struct kvm *kvm);
|
||||
|
||||
u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
|
||||
int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
|
||||
|
||||
@ -98,15 +117,18 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_timer_init_vhe(void);
|
||||
|
||||
bool kvm_arch_timer_get_input_level(int vintid);
|
||||
|
||||
#define vcpu_timer(v) (&(v)->arch.timer_cpu)
|
||||
#define vcpu_get_timer(v,t) (&vcpu_timer(v)->timers[(t)])
|
||||
#define vcpu_vtimer(v) (&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
|
||||
#define vcpu_ptimer(v) (&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
|
||||
#define vcpu_hvtimer(v) (&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
|
||||
#define vcpu_hptimer(v) (&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
|
||||
|
||||
#define arch_timer_ctx_index(ctx) ((ctx) - vcpu_timer((ctx)->vcpu)->timers)
|
||||
|
||||
#define timer_vm_data(ctx) (&(ctx)->vcpu->kvm->arch.timer_data)
|
||||
#define timer_irq(ctx) (timer_vm_data(ctx)->ppi[arch_timer_ctx_index(ctx)])
|
||||
|
||||
u64 kvm_arm_timer_read_sysreg(struct kvm_vcpu *vcpu,
|
||||
enum kvm_arch_timers tmr,
|
||||
enum kvm_arch_timer_regs treg);
|
||||
|
@ -6,7 +6,7 @@
|
||||
|
||||
#include <asm/kvm_emulate.h>
|
||||
|
||||
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
|
||||
int kvm_smccc_call_handler(struct kvm_vcpu *vcpu);
|
||||
|
||||
static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
@ -43,9 +43,13 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
|
||||
struct kvm_one_reg;
|
||||
|
||||
void kvm_arm_init_hypercalls(struct kvm *kvm);
|
||||
void kvm_arm_teardown_hypercalls(struct kvm *kvm);
|
||||
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
|
||||
int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
|
||||
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
|
||||
int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
|
||||
|
||||
int kvm_vm_smccc_has_attr(struct kvm *kvm, struct kvm_device_attr *attr);
|
||||
int kvm_vm_smccc_set_attr(struct kvm *kvm, struct kvm_device_attr *attr);
|
||||
|
||||
#endif
|
||||
|
@ -380,6 +380,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
||||
int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
|
||||
u32 vintid, struct irq_ops *ops);
|
||||
int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
|
||||
int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
|
||||
bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
|
||||
|
||||
int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
|
||||
|
@ -58,7 +58,7 @@
|
||||
|
||||
/*
|
||||
* Bit 63 of the memslot generation number is an "update in-progress flag",
|
||||
* e.g. is temporarily set for the duration of install_new_memslots().
|
||||
* e.g. is temporarily set for the duration of kvm_swap_active_memslots().
|
||||
* This flag effectively creates a unique generation number that is used to
|
||||
* mark cached memslot data, e.g. MMIO accesses, as potentially being stale,
|
||||
* i.e. may (or may not) have come from the previous memslots generation.
|
||||
@ -713,7 +713,7 @@ struct kvm {
|
||||
* use by the VM. To be used under the slots_lock (above) or in a
|
||||
* kvm->srcu critical section where acquiring the slots_lock would
|
||||
* lead to deadlock with the synchronize_srcu in
|
||||
* install_new_memslots.
|
||||
* kvm_swap_active_memslots().
|
||||
*/
|
||||
struct mutex slots_arch_lock;
|
||||
struct mm_struct *mm; /* userspace tied to this vm */
|
||||
@ -1398,8 +1398,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
|
||||
bool line_status);
|
||||
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
|
||||
struct kvm_enable_cap *cap);
|
||||
long kvm_arch_vm_ioctl(struct file *filp,
|
||||
unsigned int ioctl, unsigned long arg);
|
||||
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg);
|
||||
long kvm_arch_vm_compat_ioctl(struct file *filp, unsigned int ioctl,
|
||||
unsigned long arg);
|
||||
|
||||
|
@ -91,11 +91,11 @@ struct gfn_to_pfn_cache {
|
||||
* is topped up (__kvm_mmu_topup_memory_cache()).
|
||||
*/
|
||||
struct kvm_mmu_memory_cache {
|
||||
int nobjs;
|
||||
gfp_t gfp_zero;
|
||||
gfp_t gfp_custom;
|
||||
struct kmem_cache *kmem_cache;
|
||||
int capacity;
|
||||
int nobjs;
|
||||
void **objects;
|
||||
};
|
||||
#endif
|
||||
|
@ -341,8 +341,13 @@ struct kvm_run {
|
||||
__u64 nr;
|
||||
__u64 args[6];
|
||||
__u64 ret;
|
||||
__u32 longmode;
|
||||
__u32 pad;
|
||||
|
||||
union {
|
||||
#ifndef __KERNEL__
|
||||
__u32 longmode;
|
||||
#endif
|
||||
__u64 flags;
|
||||
};
|
||||
} hypercall;
|
||||
/* KVM_EXIT_TPR_ACCESS */
|
||||
struct {
|
||||
@ -1184,6 +1189,7 @@ struct kvm_ppc_resize_hpt {
|
||||
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
|
||||
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
|
||||
#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
|
||||
#define KVM_CAP_COUNTER_OFFSET 227
|
||||
|
||||
#ifdef KVM_CAP_IRQ_ROUTING
|
||||
|
||||
@ -1451,7 +1457,7 @@ struct kvm_vfio_spapr_tce {
|
||||
#define KVM_CREATE_VCPU _IO(KVMIO, 0x41)
|
||||
#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
|
||||
#define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44)
|
||||
#define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45)
|
||||
#define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45) /* deprecated */
|
||||
#define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \
|
||||
struct kvm_userspace_memory_region)
|
||||
#define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47)
|
||||
@ -1543,6 +1549,8 @@ struct kvm_s390_ucas_mapping {
|
||||
#define KVM_SET_PMU_EVENT_FILTER _IOW(KVMIO, 0xb2, struct kvm_pmu_event_filter)
|
||||
#define KVM_PPC_SVM_OFF _IO(KVMIO, 0xb3)
|
||||
#define KVM_ARM_MTE_COPY_TAGS _IOR(KVMIO, 0xb4, struct kvm_arm_copy_mte_tags)
|
||||
/* Available with KVM_CAP_COUNTER_OFFSET */
|
||||
#define KVM_ARM_SET_COUNTER_OFFSET _IOW(KVMIO, 0xb5, struct kvm_arm_counter_offset)
|
||||
|
||||
/* ioctl for vm fd */
|
||||
#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
|
||||
|
@ -1451,7 +1451,7 @@ struct kvm_vfio_spapr_tce {
|
||||
#define KVM_CREATE_VCPU _IO(KVMIO, 0x41)
|
||||
#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
|
||||
#define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44)
|
||||
#define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45)
|
||||
#define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45) /* deprecated */
|
||||
#define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \
|
||||
struct kvm_userspace_memory_region)
|
||||
#define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47)
|
||||
|
@ -105,6 +105,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
|
||||
TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
|
||||
@ -141,6 +142,7 @@ TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
|
||||
TEST_GEN_PROGS_aarch64 += aarch64/hypercalls
|
||||
TEST_GEN_PROGS_aarch64 += aarch64/page_fault_test
|
||||
TEST_GEN_PROGS_aarch64 += aarch64/psci_test
|
||||
TEST_GEN_PROGS_aarch64 += aarch64/smccc_filter
|
||||
TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config
|
||||
TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
|
||||
TEST_GEN_PROGS_aarch64 += aarch64/vgic_irq
|
||||
|
@ -47,6 +47,7 @@ struct test_args {
|
||||
int nr_iter;
|
||||
int timer_period_ms;
|
||||
int migration_freq_ms;
|
||||
struct kvm_arm_counter_offset offset;
|
||||
};
|
||||
|
||||
static struct test_args test_args = {
|
||||
@ -54,6 +55,7 @@ static struct test_args test_args = {
|
||||
.nr_iter = NR_TEST_ITERS_DEF,
|
||||
.timer_period_ms = TIMER_TEST_PERIOD_MS_DEF,
|
||||
.migration_freq_ms = TIMER_TEST_MIGRATION_FREQ_MS,
|
||||
.offset = { .reserved = 1 },
|
||||
};
|
||||
|
||||
#define msecs_to_usecs(msec) ((msec) * 1000LL)
|
||||
@ -121,25 +123,35 @@ static void guest_validate_irq(unsigned int intid,
|
||||
uint64_t xcnt = 0, xcnt_diff_us, cval = 0;
|
||||
unsigned long xctl = 0;
|
||||
unsigned int timer_irq = 0;
|
||||
unsigned int accessor;
|
||||
|
||||
if (stage == GUEST_STAGE_VTIMER_CVAL ||
|
||||
stage == GUEST_STAGE_VTIMER_TVAL) {
|
||||
xctl = timer_get_ctl(VIRTUAL);
|
||||
timer_set_ctl(VIRTUAL, CTL_IMASK);
|
||||
xcnt = timer_get_cntct(VIRTUAL);
|
||||
cval = timer_get_cval(VIRTUAL);
|
||||
if (intid == IAR_SPURIOUS)
|
||||
return;
|
||||
|
||||
switch (stage) {
|
||||
case GUEST_STAGE_VTIMER_CVAL:
|
||||
case GUEST_STAGE_VTIMER_TVAL:
|
||||
accessor = VIRTUAL;
|
||||
timer_irq = vtimer_irq;
|
||||
} else if (stage == GUEST_STAGE_PTIMER_CVAL ||
|
||||
stage == GUEST_STAGE_PTIMER_TVAL) {
|
||||
xctl = timer_get_ctl(PHYSICAL);
|
||||
timer_set_ctl(PHYSICAL, CTL_IMASK);
|
||||
xcnt = timer_get_cntct(PHYSICAL);
|
||||
cval = timer_get_cval(PHYSICAL);
|
||||
break;
|
||||
case GUEST_STAGE_PTIMER_CVAL:
|
||||
case GUEST_STAGE_PTIMER_TVAL:
|
||||
accessor = PHYSICAL;
|
||||
timer_irq = ptimer_irq;
|
||||
} else {
|
||||
break;
|
||||
default:
|
||||
GUEST_ASSERT(0);
|
||||
return;
|
||||
}
|
||||
|
||||
xctl = timer_get_ctl(accessor);
|
||||
if ((xctl & CTL_IMASK) || !(xctl & CTL_ENABLE))
|
||||
return;
|
||||
|
||||
timer_set_ctl(accessor, CTL_IMASK);
|
||||
xcnt = timer_get_cntct(accessor);
|
||||
cval = timer_get_cval(accessor);
|
||||
|
||||
xcnt_diff_us = cycles_to_usec(xcnt - shared_data->xcnt);
|
||||
|
||||
/* Make sure we are dealing with the correct timer IRQ */
|
||||
@ -148,6 +160,8 @@ static void guest_validate_irq(unsigned int intid,
|
||||
/* Basic 'timer condition met' check */
|
||||
GUEST_ASSERT_3(xcnt >= cval, xcnt, cval, xcnt_diff_us);
|
||||
GUEST_ASSERT_1(xctl & CTL_ISTATUS, xctl);
|
||||
|
||||
WRITE_ONCE(shared_data->nr_iter, shared_data->nr_iter + 1);
|
||||
}
|
||||
|
||||
static void guest_irq_handler(struct ex_regs *regs)
|
||||
@ -158,8 +172,6 @@ static void guest_irq_handler(struct ex_regs *regs)
|
||||
|
||||
guest_validate_irq(intid, shared_data);
|
||||
|
||||
WRITE_ONCE(shared_data->nr_iter, shared_data->nr_iter + 1);
|
||||
|
||||
gic_set_eoi(intid);
|
||||
}
|
||||
|
||||
@ -372,6 +384,13 @@ static struct kvm_vm *test_vm_create(void)
|
||||
vm_init_descriptor_tables(vm);
|
||||
vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT, guest_irq_handler);
|
||||
|
||||
if (!test_args.offset.reserved) {
|
||||
if (kvm_has_cap(KVM_CAP_COUNTER_OFFSET))
|
||||
vm_ioctl(vm, KVM_ARM_SET_COUNTER_OFFSET, &test_args.offset);
|
||||
else
|
||||
TEST_FAIL("no support for global offset\n");
|
||||
}
|
||||
|
||||
for (i = 0; i < nr_vcpus; i++)
|
||||
vcpu_init_descriptor_tables(vcpus[i]);
|
||||
|
||||
@ -403,6 +422,7 @@ static void test_print_help(char *name)
|
||||
TIMER_TEST_PERIOD_MS_DEF);
|
||||
pr_info("\t-m: Frequency (in ms) of vCPUs to migrate to different pCPU. 0 to turn off (default: %u)\n",
|
||||
TIMER_TEST_MIGRATION_FREQ_MS);
|
||||
pr_info("\t-o: Counter offset (in counter cycles, default: 0)\n");
|
||||
pr_info("\t-h: print this help screen\n");
|
||||
}
|
||||
|
||||
@ -410,7 +430,7 @@ static bool parse_args(int argc, char *argv[])
|
||||
{
|
||||
int opt;
|
||||
|
||||
while ((opt = getopt(argc, argv, "hn:i:p:m:")) != -1) {
|
||||
while ((opt = getopt(argc, argv, "hn:i:p:m:o:")) != -1) {
|
||||
switch (opt) {
|
||||
case 'n':
|
||||
test_args.nr_vcpus = atoi_positive("Number of vCPUs", optarg);
|
||||
@ -429,6 +449,10 @@ static bool parse_args(int argc, char *argv[])
|
||||
case 'm':
|
||||
test_args.migration_freq_ms = atoi_non_negative("Frequency", optarg);
|
||||
break;
|
||||
case 'o':
|
||||
test_args.offset.counter_offset = strtol(optarg, NULL, 0);
|
||||
test_args.offset.reserved = 0;
|
||||
break;
|
||||
case 'h':
|
||||
default:
|
||||
goto err;
|
||||
|
@ -651,7 +651,7 @@ int main(int ac, char **av)
|
||||
* The current blessed list was primed with the output of kernel version
|
||||
* v4.15 with --core-reg-fixup and then later updated with new registers.
|
||||
*
|
||||
* The blessed list is up to date with kernel version v5.13-rc3
|
||||
* The blessed list is up to date with kernel version v6.4 (or so we hope)
|
||||
*/
|
||||
static __u64 base_regs[] = {
|
||||
KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[0]),
|
||||
@ -807,10 +807,10 @@ static __u64 base_regs[] = {
|
||||
ARM64_SYS_REG(3, 0, 0, 3, 7),
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 0), /* ID_AA64PFR0_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 1), /* ID_AA64PFR1_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 2),
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 2), /* ID_AA64PFR2_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 3),
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 4), /* ID_AA64ZFR0_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 5),
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 5), /* ID_AA64SMFR0_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 6),
|
||||
ARM64_SYS_REG(3, 0, 0, 4, 7),
|
||||
ARM64_SYS_REG(3, 0, 0, 5, 0), /* ID_AA64DFR0_EL1 */
|
||||
@ -823,7 +823,7 @@ static __u64 base_regs[] = {
|
||||
ARM64_SYS_REG(3, 0, 0, 5, 7),
|
||||
ARM64_SYS_REG(3, 0, 0, 6, 0), /* ID_AA64ISAR0_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 6, 1), /* ID_AA64ISAR1_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 6, 2),
|
||||
ARM64_SYS_REG(3, 0, 0, 6, 2), /* ID_AA64ISAR2_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 6, 3),
|
||||
ARM64_SYS_REG(3, 0, 0, 6, 4),
|
||||
ARM64_SYS_REG(3, 0, 0, 6, 5),
|
||||
@ -832,8 +832,8 @@ static __u64 base_regs[] = {
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 0), /* ID_AA64MMFR0_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 1), /* ID_AA64MMFR1_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 2), /* ID_AA64MMFR2_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 3),
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 4),
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 3), /* ID_AA64MMFR3_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 4), /* ID_AA64MMFR4_EL1 */
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 5),
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 6),
|
||||
ARM64_SYS_REG(3, 0, 0, 7, 7),
|
||||
@ -858,6 +858,9 @@ static __u64 base_regs[] = {
|
||||
ARM64_SYS_REG(3, 2, 0, 0, 0), /* CSSELR_EL1 */
|
||||
ARM64_SYS_REG(3, 3, 13, 0, 2), /* TPIDR_EL0 */
|
||||
ARM64_SYS_REG(3, 3, 13, 0, 3), /* TPIDRRO_EL0 */
|
||||
ARM64_SYS_REG(3, 3, 14, 0, 1), /* CNTPCT_EL0 */
|
||||
ARM64_SYS_REG(3, 3, 14, 2, 1), /* CNTP_CTL_EL0 */
|
||||
ARM64_SYS_REG(3, 3, 14, 2, 2), /* CNTP_CVAL_EL0 */
|
||||
ARM64_SYS_REG(3, 4, 3, 0, 0), /* DACR32_EL2 */
|
||||
ARM64_SYS_REG(3, 4, 5, 0, 1), /* IFSR32_EL2 */
|
||||
ARM64_SYS_REG(3, 4, 5, 3, 0), /* FPEXC32_EL2 */
|
||||
|
268
tools/testing/selftests/kvm/aarch64/smccc_filter.c
Normal file
268
tools/testing/selftests/kvm/aarch64/smccc_filter.c
Normal file
@ -0,0 +1,268 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
/*
|
||||
* smccc_filter - Tests for the SMCCC filter UAPI.
|
||||
*
|
||||
* Copyright (c) 2023 Google LLC
|
||||
*
|
||||
* This test includes:
|
||||
* - Tests that the UAPI constraints are upheld by KVM. For example, userspace
|
||||
* is prevented from filtering the architecture range of SMCCC calls.
|
||||
* - Test that the filter actions (DENIED, FWD_TO_USER) work as intended.
|
||||
*/
|
||||
|
||||
#include <linux/arm-smccc.h>
|
||||
#include <linux/psci.h>
|
||||
#include <stdint.h>
|
||||
|
||||
#include "processor.h"
|
||||
#include "test_util.h"
|
||||
|
||||
enum smccc_conduit {
|
||||
HVC_INSN,
|
||||
SMC_INSN,
|
||||
};
|
||||
|
||||
#define for_each_conduit(conduit) \
|
||||
for (conduit = HVC_INSN; conduit <= SMC_INSN; conduit++)
|
||||
|
||||
static void guest_main(uint32_t func_id, enum smccc_conduit conduit)
|
||||
{
|
||||
struct arm_smccc_res res;
|
||||
|
||||
if (conduit == SMC_INSN)
|
||||
smccc_smc(func_id, 0, 0, 0, 0, 0, 0, 0, &res);
|
||||
else
|
||||
smccc_hvc(func_id, 0, 0, 0, 0, 0, 0, 0, &res);
|
||||
|
||||
GUEST_SYNC(res.a0);
|
||||
}
|
||||
|
||||
static int __set_smccc_filter(struct kvm_vm *vm, uint32_t start, uint32_t nr_functions,
|
||||
enum kvm_smccc_filter_action action)
|
||||
{
|
||||
struct kvm_smccc_filter filter = {
|
||||
.base = start,
|
||||
.nr_functions = nr_functions,
|
||||
.action = action,
|
||||
};
|
||||
|
||||
return __kvm_device_attr_set(vm->fd, KVM_ARM_VM_SMCCC_CTRL,
|
||||
KVM_ARM_VM_SMCCC_FILTER, &filter);
|
||||
}
|
||||
|
||||
static void set_smccc_filter(struct kvm_vm *vm, uint32_t start, uint32_t nr_functions,
|
||||
enum kvm_smccc_filter_action action)
|
||||
{
|
||||
int ret = __set_smccc_filter(vm, start, nr_functions, action);
|
||||
|
||||
TEST_ASSERT(!ret, "failed to configure SMCCC filter: %d", ret);
|
||||
}
|
||||
|
||||
static struct kvm_vm *setup_vm(struct kvm_vcpu **vcpu)
|
||||
{
|
||||
struct kvm_vcpu_init init;
|
||||
struct kvm_vm *vm;
|
||||
|
||||
vm = vm_create(1);
|
||||
vm_ioctl(vm, KVM_ARM_PREFERRED_TARGET, &init);
|
||||
|
||||
/*
|
||||
* Enable in-kernel emulation of PSCI to ensure that calls are denied
|
||||
* due to the SMCCC filter, not because of KVM.
|
||||
*/
|
||||
init.features[0] |= (1 << KVM_ARM_VCPU_PSCI_0_2);
|
||||
|
||||
*vcpu = aarch64_vcpu_add(vm, 0, &init, guest_main);
|
||||
return vm;
|
||||
}
|
||||
|
||||
static void test_pad_must_be_zero(void)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm = setup_vm(&vcpu);
|
||||
struct kvm_smccc_filter filter = {
|
||||
.base = PSCI_0_2_FN_PSCI_VERSION,
|
||||
.nr_functions = 1,
|
||||
.action = KVM_SMCCC_FILTER_DENY,
|
||||
.pad = { -1 },
|
||||
};
|
||||
int r;
|
||||
|
||||
r = __kvm_device_attr_set(vm->fd, KVM_ARM_VM_SMCCC_CTRL,
|
||||
KVM_ARM_VM_SMCCC_FILTER, &filter);
|
||||
TEST_ASSERT(r < 0 && errno == EINVAL,
|
||||
"Setting filter with nonzero padding should return EINVAL");
|
||||
}
|
||||
|
||||
/* Ensure that userspace cannot filter the Arm Architecture SMCCC range */
|
||||
static void test_filter_reserved_range(void)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm = setup_vm(&vcpu);
|
||||
uint32_t smc64_fn;
|
||||
int r;
|
||||
|
||||
r = __set_smccc_filter(vm, ARM_SMCCC_ARCH_WORKAROUND_1,
|
||||
1, KVM_SMCCC_FILTER_DENY);
|
||||
TEST_ASSERT(r < 0 && errno == EEXIST,
|
||||
"Attempt to filter reserved range should return EEXIST");
|
||||
|
||||
smc64_fn = ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_64,
|
||||
0, 0);
|
||||
|
||||
r = __set_smccc_filter(vm, smc64_fn, 1, KVM_SMCCC_FILTER_DENY);
|
||||
TEST_ASSERT(r < 0 && errno == EEXIST,
|
||||
"Attempt to filter reserved range should return EEXIST");
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
||||
|
||||
static void test_invalid_nr_functions(void)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm = setup_vm(&vcpu);
|
||||
int r;
|
||||
|
||||
r = __set_smccc_filter(vm, PSCI_0_2_FN64_CPU_ON, 0, KVM_SMCCC_FILTER_DENY);
|
||||
TEST_ASSERT(r < 0 && errno == EINVAL,
|
||||
"Attempt to filter 0 functions should return EINVAL");
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
||||
|
||||
static void test_overflow_nr_functions(void)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm = setup_vm(&vcpu);
|
||||
int r;
|
||||
|
||||
r = __set_smccc_filter(vm, ~0, ~0, KVM_SMCCC_FILTER_DENY);
|
||||
TEST_ASSERT(r < 0 && errno == EINVAL,
|
||||
"Attempt to overflow filter range should return EINVAL");
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
||||
|
||||
static void test_reserved_action(void)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm = setup_vm(&vcpu);
|
||||
int r;
|
||||
|
||||
r = __set_smccc_filter(vm, PSCI_0_2_FN64_CPU_ON, 1, -1);
|
||||
TEST_ASSERT(r < 0 && errno == EINVAL,
|
||||
"Attempt to use reserved filter action should return EINVAL");
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
||||
|
||||
|
||||
/* Test that overlapping configurations of the SMCCC filter are rejected */
|
||||
static void test_filter_overlap(void)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm = setup_vm(&vcpu);
|
||||
int r;
|
||||
|
||||
set_smccc_filter(vm, PSCI_0_2_FN64_CPU_ON, 1, KVM_SMCCC_FILTER_DENY);
|
||||
|
||||
r = __set_smccc_filter(vm, PSCI_0_2_FN64_CPU_ON, 1, KVM_SMCCC_FILTER_DENY);
|
||||
TEST_ASSERT(r < 0 && errno == EEXIST,
|
||||
"Attempt to filter already configured range should return EEXIST");
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
||||
|
||||
static void expect_call_denied(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct ucall uc;
|
||||
|
||||
if (get_ucall(vcpu, &uc) != UCALL_SYNC)
|
||||
TEST_FAIL("Unexpected ucall: %lu\n", uc.cmd);
|
||||
|
||||
TEST_ASSERT(uc.args[1] == SMCCC_RET_NOT_SUPPORTED,
|
||||
"Unexpected SMCCC return code: %lu", uc.args[1]);
|
||||
}
|
||||
|
||||
/* Denied SMCCC calls have a return code of SMCCC_RET_NOT_SUPPORTED */
|
||||
static void test_filter_denied(void)
|
||||
{
|
||||
enum smccc_conduit conduit;
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm;
|
||||
|
||||
for_each_conduit(conduit) {
|
||||
vm = setup_vm(&vcpu);
|
||||
|
||||
set_smccc_filter(vm, PSCI_0_2_FN_PSCI_VERSION, 1, KVM_SMCCC_FILTER_DENY);
|
||||
vcpu_args_set(vcpu, 2, PSCI_0_2_FN_PSCI_VERSION, conduit);
|
||||
|
||||
vcpu_run(vcpu);
|
||||
expect_call_denied(vcpu);
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
||||
}
|
||||
|
||||
static void expect_call_fwd_to_user(struct kvm_vcpu *vcpu, uint32_t func_id,
|
||||
enum smccc_conduit conduit)
|
||||
{
|
||||
struct kvm_run *run = vcpu->run;
|
||||
|
||||
TEST_ASSERT(run->exit_reason == KVM_EXIT_HYPERCALL,
|
||||
"Unexpected exit reason: %u", run->exit_reason);
|
||||
TEST_ASSERT(run->hypercall.nr == func_id,
|
||||
"Unexpected SMCCC function: %llu", run->hypercall.nr);
|
||||
|
||||
if (conduit == SMC_INSN)
|
||||
TEST_ASSERT(run->hypercall.flags & KVM_HYPERCALL_EXIT_SMC,
|
||||
"KVM_HYPERCALL_EXIT_SMC is not set");
|
||||
else
|
||||
TEST_ASSERT(!(run->hypercall.flags & KVM_HYPERCALL_EXIT_SMC),
|
||||
"KVM_HYPERCALL_EXIT_SMC is set");
|
||||
}
|
||||
|
||||
/* SMCCC calls forwarded to userspace cause KVM_EXIT_HYPERCALL exits */
|
||||
static void test_filter_fwd_to_user(void)
|
||||
{
|
||||
enum smccc_conduit conduit;
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct kvm_vm *vm;
|
||||
|
||||
for_each_conduit(conduit) {
|
||||
vm = setup_vm(&vcpu);
|
||||
|
||||
set_smccc_filter(vm, PSCI_0_2_FN_PSCI_VERSION, 1, KVM_SMCCC_FILTER_FWD_TO_USER);
|
||||
vcpu_args_set(vcpu, 2, PSCI_0_2_FN_PSCI_VERSION, conduit);
|
||||
|
||||
vcpu_run(vcpu);
|
||||
expect_call_fwd_to_user(vcpu, PSCI_0_2_FN_PSCI_VERSION, conduit);
|
||||
|
||||
kvm_vm_free(vm);
|
||||
}
|
||||
}
|
||||
|
||||
static bool kvm_supports_smccc_filter(void)
|
||||
{
|
||||
struct kvm_vm *vm = vm_create_barebones();
|
||||
int r;
|
||||
|
||||
r = __kvm_has_device_attr(vm->fd, KVM_ARM_VM_SMCCC_CTRL, KVM_ARM_VM_SMCCC_FILTER);
|
||||
|
||||
kvm_vm_free(vm);
|
||||
return !r;
|
||||
}
|
||||
|
||||
int main(void)
|
||||
{
|
||||
TEST_REQUIRE(kvm_supports_smccc_filter());
|
||||
|
||||
test_pad_must_be_zero();
|
||||
test_invalid_nr_functions();
|
||||
test_overflow_nr_functions();
|
||||
test_reserved_action();
|
||||
test_filter_reserved_range();
|
||||
test_filter_overlap();
|
||||
test_filter_denied();
|
||||
test_filter_fwd_to_user();
|
||||
}
|
@ -2,3 +2,4 @@ CONFIG_KVM=y
|
||||
CONFIG_KVM_INTEL=y
|
||||
CONFIG_KVM_AMD=y
|
||||
CONFIG_USERFAULTFD=y
|
||||
CONFIG_IDLE_PAGE_TRACKING=y
|
||||
|
@ -194,7 +194,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
|
||||
ts_diff.tv_sec, ts_diff.tv_nsec);
|
||||
pr_info("Overall demand paging rate: %f pgs/sec\n",
|
||||
memstress_args.vcpu_args[0].pages * nr_vcpus /
|
||||
((double)ts_diff.tv_sec + (double)ts_diff.tv_nsec / 100000000.0));
|
||||
((double)ts_diff.tv_sec + (double)ts_diff.tv_nsec / NSEC_PER_SEC));
|
||||
|
||||
memstress_destroy_vm(vm);
|
||||
|
||||
|
@ -214,6 +214,19 @@ void smccc_hvc(uint32_t function_id, uint64_t arg0, uint64_t arg1,
|
||||
uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5,
|
||||
uint64_t arg6, struct arm_smccc_res *res);
|
||||
|
||||
/**
|
||||
* smccc_smc - Invoke a SMCCC function using the smc conduit
|
||||
* @function_id: the SMCCC function to be called
|
||||
* @arg0-arg6: SMCCC function arguments, corresponding to registers x1-x7
|
||||
* @res: pointer to write the return values from registers x0-x3
|
||||
*
|
||||
*/
|
||||
void smccc_smc(uint32_t function_id, uint64_t arg0, uint64_t arg1,
|
||||
uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5,
|
||||
uint64_t arg6, struct arm_smccc_res *res);
|
||||
|
||||
|
||||
|
||||
uint32_t guest_get_vcpuid(void);
|
||||
|
||||
#endif /* SELFTEST_KVM_PROCESSOR_H */
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user