49d5759268
- Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place. - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company). - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests. - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM. - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested. - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems. - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor. - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host. - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer This also drags in arm64's 'for-next/sme2' branch, because both it and the PSCI relay changes touch the EL2 initialization code. RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Two patches sorting out confusion between virtual and physical addresses, which currently are the same on s390. - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization. - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw== =goe1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
224 lines
6.5 KiB
C
224 lines
6.5 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef __KVM_X86_PMU_H
|
|
#define __KVM_X86_PMU_H
|
|
|
|
#include <linux/nospec.h>
|
|
|
|
#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
|
|
#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
|
|
#define pmc_to_pmu(pmc) (&(pmc)->vcpu->arch.pmu)
|
|
|
|
#define MSR_IA32_MISC_ENABLE_PMU_RO_MASK (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL | \
|
|
MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
|
|
|
|
/* retrieve the 4 bits for EN and PMI out of IA32_FIXED_CTR_CTRL */
|
|
#define fixed_ctrl_field(ctrl_reg, idx) (((ctrl_reg) >> ((idx)*4)) & 0xf)
|
|
|
|
#define VMWARE_BACKDOOR_PMC_HOST_TSC 0x10000
|
|
#define VMWARE_BACKDOOR_PMC_REAL_TIME 0x10001
|
|
#define VMWARE_BACKDOOR_PMC_APPARENT_TIME 0x10002
|
|
|
|
struct kvm_pmu_ops {
|
|
bool (*hw_event_available)(struct kvm_pmc *pmc);
|
|
bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
|
|
struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
|
|
struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
|
|
unsigned int idx, u64 *mask);
|
|
struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, u32 msr);
|
|
bool (*is_valid_rdpmc_ecx)(struct kvm_vcpu *vcpu, unsigned int idx);
|
|
bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
|
|
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
|
|
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
|
|
void (*refresh)(struct kvm_vcpu *vcpu);
|
|
void (*init)(struct kvm_vcpu *vcpu);
|
|
void (*reset)(struct kvm_vcpu *vcpu);
|
|
void (*deliver_pmi)(struct kvm_vcpu *vcpu);
|
|
void (*cleanup)(struct kvm_vcpu *vcpu);
|
|
|
|
const u64 EVENTSEL_EVENT;
|
|
const int MAX_NR_GP_COUNTERS;
|
|
};
|
|
|
|
void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
|
|
|
|
static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
|
|
{
|
|
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
|
|
|
|
return pmu->counter_bitmask[pmc->type];
|
|
}
|
|
|
|
static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
|
|
{
|
|
u64 counter, enabled, running;
|
|
|
|
counter = pmc->counter;
|
|
if (pmc->perf_event && !pmc->is_paused)
|
|
counter += perf_event_read_value(pmc->perf_event,
|
|
&enabled, &running);
|
|
/* FIXME: Scaling needed? */
|
|
return counter & pmc_bitmask(pmc);
|
|
}
|
|
|
|
static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
|
|
{
|
|
if (pmc->perf_event) {
|
|
perf_event_release_kernel(pmc->perf_event);
|
|
pmc->perf_event = NULL;
|
|
pmc->current_config = 0;
|
|
pmc_to_pmu(pmc)->event_count--;
|
|
}
|
|
}
|
|
|
|
static inline void pmc_stop_counter(struct kvm_pmc *pmc)
|
|
{
|
|
if (pmc->perf_event) {
|
|
pmc->counter = pmc_read_counter(pmc);
|
|
pmc_release_perf_event(pmc);
|
|
}
|
|
}
|
|
|
|
static inline bool pmc_is_gp(struct kvm_pmc *pmc)
|
|
{
|
|
return pmc->type == KVM_PMC_GP;
|
|
}
|
|
|
|
static inline bool pmc_is_fixed(struct kvm_pmc *pmc)
|
|
{
|
|
return pmc->type == KVM_PMC_FIXED;
|
|
}
|
|
|
|
static inline bool kvm_valid_perf_global_ctrl(struct kvm_pmu *pmu,
|
|
u64 data)
|
|
{
|
|
return !(pmu->global_ctrl_mask & data);
|
|
}
|
|
|
|
/* returns general purpose PMC with the specified MSR. Note that it can be
|
|
* used for both PERFCTRn and EVNTSELn; that is why it accepts base as a
|
|
* parameter to tell them apart.
|
|
*/
|
|
static inline struct kvm_pmc *get_gp_pmc(struct kvm_pmu *pmu, u32 msr,
|
|
u32 base)
|
|
{
|
|
if (msr >= base && msr < base + pmu->nr_arch_gp_counters) {
|
|
u32 index = array_index_nospec(msr - base,
|
|
pmu->nr_arch_gp_counters);
|
|
|
|
return &pmu->gp_counters[index];
|
|
}
|
|
|
|
return NULL;
|
|
}
|
|
|
|
/* returns fixed PMC with the specified MSR */
|
|
static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
|
|
{
|
|
int base = MSR_CORE_PERF_FIXED_CTR0;
|
|
|
|
if (msr >= base && msr < base + pmu->nr_arch_fixed_counters) {
|
|
u32 index = array_index_nospec(msr - base,
|
|
pmu->nr_arch_fixed_counters);
|
|
|
|
return &pmu->fixed_counters[index];
|
|
}
|
|
|
|
return NULL;
|
|
}
|
|
|
|
static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
|
|
{
|
|
u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
|
|
|
|
if (!sample_period)
|
|
sample_period = pmc_bitmask(pmc) + 1;
|
|
return sample_period;
|
|
}
|
|
|
|
static inline void pmc_update_sample_period(struct kvm_pmc *pmc)
|
|
{
|
|
if (!pmc->perf_event || pmc->is_paused ||
|
|
!is_sampling_event(pmc->perf_event))
|
|
return;
|
|
|
|
perf_event_period(pmc->perf_event,
|
|
get_sample_period(pmc, pmc->counter));
|
|
}
|
|
|
|
static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
|
|
{
|
|
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
|
|
|
|
if (pmc_is_fixed(pmc))
|
|
return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
|
|
pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
|
|
|
|
return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
|
|
}
|
|
|
|
extern struct x86_pmu_capability kvm_pmu_cap;
|
|
|
|
static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
|
|
{
|
|
bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
|
|
|
|
/*
|
|
* Hybrid PMUs don't play nice with virtualization without careful
|
|
* configuration by userspace, and KVM's APIs for reporting supported
|
|
* vPMU features do not account for hybrid PMUs. Disable vPMU support
|
|
* for hybrid PMUs until KVM gains a way to let userspace opt-in.
|
|
*/
|
|
if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
|
|
enable_pmu = false;
|
|
|
|
if (enable_pmu) {
|
|
perf_get_x86_pmu_capability(&kvm_pmu_cap);
|
|
|
|
/*
|
|
* For Intel, only support guest architectural pmu
|
|
* on a host with architectural pmu.
|
|
*/
|
|
if ((is_intel && !kvm_pmu_cap.version) ||
|
|
!kvm_pmu_cap.num_counters_gp)
|
|
enable_pmu = false;
|
|
}
|
|
|
|
if (!enable_pmu) {
|
|
memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
|
|
return;
|
|
}
|
|
|
|
kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
|
|
kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
|
|
pmu_ops->MAX_NR_GP_COUNTERS);
|
|
kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
|
|
KVM_PMC_MAX_FIXED);
|
|
}
|
|
|
|
static inline void kvm_pmu_request_counter_reprogam(struct kvm_pmc *pmc)
|
|
{
|
|
set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi);
|
|
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
|
|
}
|
|
|
|
void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
|
|
void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
|
|
int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
|
|
bool kvm_pmu_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx);
|
|
bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
|
|
int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
|
|
int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
|
|
void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
|
|
void kvm_pmu_reset(struct kvm_vcpu *vcpu);
|
|
void kvm_pmu_init(struct kvm_vcpu *vcpu);
|
|
void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
|
|
void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
|
|
int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);
|
|
void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id);
|
|
|
|
bool is_vmware_backdoor_pmc(u32 pmc_idx);
|
|
|
|
extern struct kvm_pmu_ops intel_pmu_ops;
|
|
extern struct kvm_pmu_ops amd_pmu_ops;
|
|
#endif /* __KVM_X86_PMU_H */
|