linux

iv/linux

Author	SHA1	Message	Date
Vitaly Kuznetsov	159e037d2e	KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() 'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally not needed for APIC_DM_REMRD, they're still referenced by __apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize the structure to avoid consuming random stack memory. Fixes: a183b638b61c ("KVM: x86: make apic_accept_irq tracepoint more generic") Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220708125147.593975-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-08 15:59:28 -07:00
Sean Christopherson	f83894b24c	KVM: x86: Fix handling of APIC LVT updates when userspace changes MCG_CAP Add a helper to update KVM's in-kernel local APIC in response to MCG_CAP being changed by userspace to fix multiple bugs. First and foremost, KVM needs to check that there's an in-kernel APIC prior to dereferencing vcpu->arch.apic. Beyond that, any "new" LVT entries need to be masked, and the APIC version register needs to be updated as it reports out the number of LVT entries. Fixes: 4b903561ec49 ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.") Reported-by: syzbot+8cdad6430c24f396f158@syzkaller.appspotmail.com Cc: Siddh Raman Pant <code@siddh.me> Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-07-08 15:58:16 -07:00
Sean Christopherson	4a627b0b16	Merge branch 'kvm-5.20-msr-eperm' Merge a bug fix and cleanups for {g,s}et_msr_mce() using a base that predates commit 281b52780b57 ("KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs."), which was written with the intention that it be applied _after_ the bug fix and cleanups. The bug fix in particular needs to be sent to stable trees; give them a stable hash to use.	2022-07-08 15:02:41 -07:00
Sean Christopherson	54ad60ba9d	KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs Add helpers to identify CTL (control) and STATUS MCi MSR types instead of open coding the checks using the offset. Using the offset is perfectly safe, but unintuitive, as understanding what the code does requires knowing that the offset calcuation will not affect the lower three bits. Opportunistically comment the STATUS logic to save readers a trip to Intel's SDM or AMD's APM to understand the "data != 0" check. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-4-seanjc@google.com	2022-07-08 14:57:20 -07:00
Sean Christopherson	f5223a332f	KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce() Use an explicit case statement to grab the full range of MCx bank MSRs in {g,s}et_msr_mce(), and manually check only the "end" (the number of banks configured by userspace may be less than the max). The "default" trick works, but is a bit odd now, and will be quite odd if/when support for accessing MCx_CTL2 MSRs is added, which has near identical logic. Hoist "offset" to function scope so as to avoid curly braces for the case statement, and because MCx_CTL2 support will need the same variables. Opportunstically clean up the comment about allowing bit 10 to be cleared from bank 4. No functional change intended. Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-3-seanjc@google.com	2022-07-08 14:57:12 -07:00
Sean Christopherson	2368048bf5	KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or MCi_STATUS MSR. The behavior of "all zeros' or "all ones" for CTL MSRs is architectural, as is the "only zeros" behavior for STATUS MSRs. I.e. the intent is to inject a #GP, not exit to userspace due to an unhandled emulation case. Returning '-1' gets interpreted as -EPERM up the stack and effecitvely kills the guest. Fixes: 890ca9aefa78 ("KVM: Add MCE support") Fixes: 9ffd986c6e4e ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com	2022-07-08 14:52:59 -07:00
Peter Zijlstra	742ab6df97	x86/kvm/vmx: Make noinstr clean The recent mmio_stale_data fixes broke the noinstr constraints: vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section make it all happy again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>	2022-06-27 10:33:58 +02:00
Paolo Bonzini	db209369d4	KVM: SEV-ES: reuse advance_sev_es_emulated_ins for OUT too complete_emulator_pio_in() only has to be called by complete_sev_es_emulated_ins() now; therefore, all that the function does now is adjust sev_pio_count and sev_pio_data. Which is the same for both IN and OUT. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 13:05:35 -04:00
Paolo Bonzini	f35cee4adb	KVM: x86: de-underscorify __emulator_pio_in Now all callers except emulator_pio_in_emulated are using __emulator_pio_in/complete_emulator_pio_in explicitly. Move the "either copy the result or attempt PIO" logic in emulator_pio_in_emulated, and rename __emulator_pio_in to just emulator_pio_in. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:54:33 -04:00
Paolo Bonzini	dc7a4bfde5	KVM: x86: wean fast IN from emulator_pio_in Use __emulator_pio_in() directly for fast PIO instead of bouncing through emulator_pio_in() now that __emulator_pio_in() fills "val" when handling in-kernel PIO. vcpu->arch.pio.count is guaranteed to be '0', so this a pure nop. emulator_pio_in_emulated is now the last caller of emulator_pio_in. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:54:20 -04:00
Paolo Bonzini	0c05e10bce	KVM: x86: wean in-kernel PIO from vcpu->arch.pio* Make emulator_pio_in_out operate directly on the provided buffer as long as PIO is handled inside KVM. For input operations, this means that, in the case of in-kernel PIO, __emulator_pio_in() does not have to be always followed by complete_emulator_pio_in(). This affects emulator_pio_in() and kvm_sev_es_ins(); for the latter, that is why the call moves from advance_sev_es_emulated_ins() to complete_sev_es_emulated_ins(). For output, it means that vcpu->pio.count is never set unnecessarily and there is no need to clear it; but also vcpu->pio.size must not be used in kvm_sev_es_outs(), because it will not be updated for in-kernel OUT. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:54:04 -04:00
Paolo Bonzini	30d583fd4e	KVM: x86: move all vcpu->arch.pio* setup in emulator_pio_in_out() For now, this is basically an excuse to add back the void* argument to the function, while removing some knowledge of vcpu->arch.pio* from its callers. The WARN that vcpu->arch.pio.count is zero is also extended to OUT operations. The vcpu->arch.pio* fields still need to be filled even when the PIO is handled in-kernel as __emulator_pio_in() is always followed by complete_emulator_pio_in(). But after fixing that, it will be possible to to only populate the vcpu->arch.pio* fields on userspace exits. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:53:50 -04:00
Paolo Bonzini	35ab3b77a0	KVM: x86: drop PIO from unregistered devices KVM protects the device list with SRCU, and therefore different calls to kvm_io_bus_read()/kvm_io_bus_write() can very well see different incarnations of kvm->buses. If userspace unregisters a device while vCPUs are running there is no well-defined result. This patch applies a safe fallback by returning early from emulator_pio_in_out(). This corresponds to returning zeroes from IN, and dropping the writes on the floor for OUT. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:53:37 -04:00
Paolo Bonzini	0f87ac234d	KVM: x86: inline kernel_pio into its sole caller The caller of kernel_pio already has arguments for most of what kernel_pio fishes out of vcpu->arch.pio. This is the first step towards ensuring that vcpu->arch.pio.* is only used when exiting to userspace. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:53:23 -04:00
Paolo Bonzini	7a6177d6f3	KVM: x86: complete fast IN directly with complete_emulator_pio_in() Use complete_emulator_pio_in() directly when completing fast PIO, there's no need to bounce through emulator_pio_in(): the comment about ECX changing doesn't apply to fast PIO, which isn't used for string I/O. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:53:07 -04:00
Suravee Suthikulpanit	39b6b8c35c	KVM: SVM: Add AVIC doorbell tracepoint Add a tracepoint to track number of doorbells being sent to signal a running vCPU to process IRQ after being injected. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-17-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:52:45 -04:00
Suravee Suthikulpanit	f8d8ac2159	KVM: x86: Warning APICv inconsistency only when vcpu APIC mode is valid When launching a VM with x2APIC and specify more than 255 vCPUs, the guest kernel can disable x2APIC (e.g. specify nox2apic kernel option). The VM fallbacks to xAPIC mode, and disable the vCPU ID 255 and greater. In this case, APICV is deactivated for the disabled vCPUs. However, the current APICv consistency warning does not account for this case, which results in a warning. Therefore, modify warning logic to report only when vCPU APIC mode is valid. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-15-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:51:15 -04:00
Suravee Suthikulpanit	8fc9c7a307	KVM: x86: Deactivate APICv on vCPU with APIC disabled APICv should be deactivated on vCPU that has APIC disabled. Therefore, call kvm_vcpu_update_apicv() when changing APIC mode, and add additional check for APIC disable mode when determine APICV activation, Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-9-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 12:45:51 -04:00
Jue Wang	aebc3ca190	KVM: x86: Enable CMCI capability by default and handle injected UCNA errors This patch enables MCG_CMCI_P by default in kvm_mce_cap_supported. It reuses ioctl KVM_X86_SET_MCE to implement injection of UnCorrectable No Action required (UCNA) errors, signaled via Corrected Machine Check Interrupt (CMCI). Neither of the CMCI and UCNA emulations depends on hardware. Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-8-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 04:52:03 -04:00
Jue Wang	281b52780b	KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs. This patch adds the emulation of IA32_MCi_CTL2 registers to KVM. A separate mci_ctl2_banks array is used to keep the existing mce_banks register layout intact. In Machine Check Architecture, in addition to MCG_CMCI_P, bit 30 of the per-bank register IA32_MCi_CTL2 controls whether Corrected Machine Check error reporting is enabled. Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-7-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 04:52:03 -04:00
Jue Wang	087acc4e18	KVM: x86: Use kcalloc to allocate the mce_banks array. This patch updates the allocation of mce_banks with the array allocation API (kcalloc) as a precedent for the later mci_ctl2_banks to implement per-bank control of Corrected Machine Check Interrupt (CMCI). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-6-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 04:52:02 -04:00
Jue Wang	4b903561ec	KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic. This patch calculates the number of lvt entries as part of KVM_X86_MCE_SETUP conditioned on the presence of MCG_CMCI_P bit in MCG_CAP and stores result in kvm_lapic. It translats from APIC_LVTx register to index in lapic_lvt_entry enum. It extends the APIC_LVTx macro as well as other lapic write/reset handling etc to support Corrected Machine Check Interrupt. Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-5-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 04:52:02 -04:00
Ben Gardon	084cc29f8b	KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis In some cases, the NX hugepage mitigation for iTLB multihit is not needed for all guests on a host. Allow disabling the mitigation on a per-VM basis to avoid the performance hit of NX hugepages on trusted workloads. In order to disable NX hugepages on a VM, ensure that the userspace actor has permission to reboot the system. Since disabling NX hugepages would allow a guest to crash the system, it is similar to reboot permissions. Ideally, KVM would require userspace to prove it has access to KVM's nx_huge_pages module param, e.g. so that userspace can opt out without needing full reboot permissions. But getting access to the module param file info is difficult because it is buried in layers of sysfs and module glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases. Suggested-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-9-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 04:51:49 -04:00
Ben Gardon	1c4dc57328	KVM: x86: Fix errant brace in KVM capability handling The braces around the KVM_CAP_XSAVE2 block also surround the KVM_CAP_PMU_CAPABILITY block, likely the result of a merge issue. Simply move the curly brace back to where it belongs. Fixes: ba7bb663f5547 ("KVM: x86: Provide per VM capability for disabling PMU virtualization") Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-8-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-24 04:51:48 -04:00
Sean Christopherson	bfbcc81bb8	KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior Add a quirk for KVM's behavior of emulating intercepted MONITOR/MWAIT instructions a NOPs regardless of whether or not they are supported in guest CPUID. KVM's current behavior was likely motiviated by a certain fruity operating system that expects MONITOR/MWAIT to be supported unconditionally and blindly executes MONITOR/MWAIT without first checking CPUID. And because KVM does NOT advertise MONITOR/MWAIT to userspace, that's effectively the default setup for any VMM that regurgitates KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2. Note, this quirk interacts with KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT. The behavior is actually desirable, as userspace VMMs that want to unconditionally hide MONITOR/MWAIT from the guest can leave the MISC_ENABLE quirk enabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220608224516.3788274-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-20 11:50:42 -04:00
Sean Christopherson	ff81a90f45	KVM: x86: Ignore benign host writes to "unsupported" F15H_PERF_CTL MSRs Ignore host userspace writes of '0' to F15H_PERF_CTL MSRs KVM reports in the MSR-to-save list, but the MSRs are ultimately unsupported. All MSRs in said list must be writable by userspace, e.g. if userspace sends the list back at KVM without filtering out the MSRs it doesn't need. Note, reads of said MSRs already have the desired behavior. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-20 11:50:33 -04:00
Sean Christopherson	157fc497b5	KVM: x86: Ignore benign host accesses to "unsupported" PEBS and BTS MSRs Ignore host userspace reads and writes of '0' to PEBS and BTS MSRs that KVM reports in the MSR-to-save list, but the MSRs are ultimately unsupported. All MSRs in said list must be writable by userspace, e.g. if userspace sends the list back at KVM without filtering out the MSRs it doesn't need. Fixes: 8183a538cd95 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS") Fixes: 902caeb6841a ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS") Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-20 11:50:26 -04:00
Sean Christopherson	545feb96c0	Revert "KVM: x86: always allow host-initiated writes to PMU MSRs" Revert the hack to allow host-initiated accesses to all "PMU" MSRs, as intel_is_valid_msr() returns true for _all_ MSRs, regardless of whether or not it has a snowball's chance in hell of actually being a PMU MSR. That mostly gets papered over by the actual get/set helpers only handling MSRs that they knows about, except there's the minor detail that kvm_pmu_{g,s}et_msr() eat reads and writes when the PMU is disabled. I.e. KVM will happy allow reads and writes to _any_ MSR if the PMU is disabled, either via module param or capability. This reverts commit d1c88a4020567ba4da52f778bcd9619d87e4ea75. Fixes: d1c88a402056 ("KVM: x86: always allow host-initiated writes to PMU MSRs") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220611005755.753273-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-20 11:49:46 -04:00
Sean Christopherson	9fc222967a	KVM: x86: Give host userspace full control of MSR_IA32_MISC_ENABLES Give userspace full control of the read-only bits in MISC_ENABLES, i.e. do not modify bits on PMU refresh and do not preserve existing bits when userspace writes MISC_ENABLES. With a few exceptions where KVM doesn't expose the necessary controls to userspace _and_ there is a clear cut association with CPUID, e.g. reserved CR4 bits, KVM does not own the vCPU and should not manipulate the vCPU model on behalf of "dummy user space". The argument that KVM is doing userspace a favor because "the order of setting vPMU capabilities and MSR_IA32_MISC_ENABLE is not strictly guaranteed" is specious, as attempting to configure MSRs on behalf of userspace inevitably leads to edge cases precisely because KVM does not prescribe a specific order of initialization. Example #1: intel_pmu_refresh() consumes and modifies the vCPU's MSR_IA32_PERF_CAPABILITIES, and so assumes userspace initializes config MSRs before setting the guest CPUID model. If userspace sets CPUID first, then KVM will mark PEBS as available when arch.perf_capabilities is initialized with a non-zero PEBS format, thus creating a bad vCPU model if userspace later disables PEBS by writing PERF_CAPABILITIES. Example #2: intel_pmu_refresh() does not clear PERF_CAP_PEBS_MASK in MSR_IA32_PERF_CAPABILITIES if there is no vPMU, making KVM inconsistent in its desire to be consistent. Example #3: intel_pmu_refresh() does not clear MSR_IA32_MISC_ENABLE_EMON if KVM_SET_CPUID2 is called multiple times, first with a vPMU, then without a vPMU. While slightly contrived, it's plausible a VMM could reflect KVM's default vCPU and then operate on KVM's copy of CPUID to later clear the vPMU settings, e.g. see KVM's selftests. Example #4: Enumerating an Intel vCPU on an AMD host will not call into intel_pmu_refresh() at any point, and so the BTS and PEBS "unavailable" bits will be left clear, without any way for userspace to set them. Keep the "R" behavior of the bit 7, "EMON available", for the guest. Unlike the BTS and PEBS bits, which are fully "RO", the EMON bit can be written with a different value, but that new value is ignored. Cc: Like Xu <likexu@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Message-Id: <20220611005755.753273-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-20 11:49:03 -04:00
Sean Christopherson	ce0a58f475	KVM: x86: Move "apicv_active" into "struct kvm_lapic" Move the per-vCPU apicv_active flag into KVM's local APIC instance. APICv is fully dependent on an in-kernel local APIC, but that's not at all clear when reading the current code due to the flag being stored in the generic kvm_vcpu_arch struct. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-20 06:21:24 -04:00
Sean Christopherson	ae801e1303	KVM: x86: Check for in-kernel xAPIC when querying APICv for directed yield Use kvm_vcpu_apicv_active() to check if APICv is active when seeing if a vCPU is a candidate for directed yield due to a pending ACPIv interrupt. This will allow moving apicv_active into kvm_lapic without introducing a potential NULL pointer deref (kvm_vcpu_apicv_active() effectively adds a pre-check on the vCPU having an in-kernel APIC). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-20 06:21:23 -04:00
Linus Torvalds	24625f7d91	ARM64: * Properly reset the SVE/SME flags on vcpu load * Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) * Fix access to the idreg range for protected guests * Ignore 'kvm-arm.mode=protected' when using VHE * Return an error from kvm_arch_init_vm() on allocation failure * A bunch of small cleanups (comments, annotations, indentation) RISC-V: * Typo fix in arch/riscv/kvm/vmid.c * Remove broken reference pattern from MAINTAINERS entry x86-64: * Fix error in page tables with MKTME enabled * Dirty page tracking performance test extended to running a nested guest * Disable APICv/AVIC in cases that it cannot implement correctly -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKjTIAUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNhPQgAiIVtp8aepujUM/NhkNyK3SIdLzlS oZCZiS6bvaecKXi/QvhBU0EBxAEyrovk3lmVuYNd41xI+PDjyaA4SDIl5DnToGUw bVPNFSYqjpF939vUUKjc0RCdZR4o5g3Od3tvWoHTHviS1a8aAe5o9pcpHpD0D6Mp Gc/o58nKAOPl3htcFKmjymqo3Y6yvkJU9NB7DCbL8T5mp5pJ959Mw1/LlmBaAzJC OofrynUm4NjMyAj/mAB1FhHKFyQfjBXLhiVlS0SLiiEA/tn9/OXyVFMKG+n5VkAZ Q337GMFe2RikEIuMEr3Rc4qbZK3PpxHhaj+6MPRuM0ho/P4yzl2Nyb/OhA== =h81Q -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "While last week's pull request contained miscellaneous fixes for x86, this one covers other architectures, selftests changes, and a bigger series for APIC virtualization bugs that were discovered during 5.20 development. The idea is to base 5.20 development for KVM on top of this tag. ARM64: - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation) RISC-V: - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry x86-64: - Fix error in page tables with MKTME enabled - Dirty page tracking performance test extended to running a nested guest - Disable APICv/AVIC in cases that it cannot implement correctly" [ This merge also fixes a misplaced end parenthesis bug introduced in commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") pointed out by Sean Christopherson ] Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/ * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits) KVM: selftests: Restrict test region to 48-bit physical addresses when using nested KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 KVM: selftests: Clean up LIBKVM files in Makefile KVM: selftests: Link selftests directly with lib object files KVM: selftests: Drop unnecessary rule for STATIC_LIBS KVM: selftests: Add a helper to check EPT/VPID capabilities KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h KVM: selftests: Refactor nested_map() to specify target level KVM: selftests: Drop stale function parameter comment for nested_map() KVM: selftests: Add option to create 2M and 1G EPT mappings KVM: selftests: Replace x86_page_size with PG_LEVEL_XX KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un\|}blocking KVM: x86: disable preemption while updating apicv inhibition KVM: x86: SVM: fix avic_kick_target_vcpus_fast KVM: x86: SVM: remove avic's broken code that updated APIC ID KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base KVM: x86: document AVIC/APICv inhibit reasons KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs ...	2022-06-14 07:57:18 -07:00
Linus Torvalds	8e8afafb0b	Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKXMkMTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoWGPD/idalLIhhV5F2+hZIKm0WSnsBxAOh9K 7y8xBxpQQ5FUfW3vm7Pg3ro6VJp7w2CzKoD4lGXzGHriusn3qst3vkza9Ay8xu8g RDwKe6hI+p+Il9BV9op3f8FiRLP9bcPMMReW/mRyYsOnJe59hVNwRAL8OG40PY4k hZgg4Psfvfx8bwiye5efjMSe4fXV7BUCkr601+8kVJoiaoszkux9mqP+cnnB5P3H zW1d1jx7d6eV1Y063h7WgiNqQRYv0bROZP5BJkufIoOHUXDpd65IRF3bDnCIvSEz KkMYJNXb3qh7EQeHS53NL+gz2EBQt+Tq1VH256qn6i3mcHs85HvC68gVrAkfVHJE QLJE3MoXWOqw+mhwzCRrEXN9O1lT/PqDWw8I4M/5KtGG/KnJs+bygmfKBbKjIVg4 2yQWfMmOgQsw3GWCRjgEli7aYbDJQjany0K/qZTq54I41gu+TV8YMccaWcXgDKrm cXFGUfOg4gBm4IRjJ/RJn+mUv6u+/3sLVqsaFTs9aiib1dpBSSUuMGBh548Ft7g2 5VbFVSDaLjB2BdlcG7enlsmtzw0ltNssmqg7jTK/L7XNVnvxwUoXw+zP7RmCLEYt UV4FHXraMKNt2ZketlomC8ui2hg73ylUp4pPdMXCp7PIXp9sVamRTbpz12h689VJ /s55bWxHkR6S =LBxT -----END PGP SIGNATURE----- Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MMIO stale data fixes from Thomas Gleixner: "Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too" * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mmio: Print SMT warning KVM: x86/speculation: Disable Fill buffer clear within guests x86/speculation/mmio: Reuse SRBDS mitigation for SBDS x86/speculation/srbds: Update SRBDS mitigation selection x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data x86/speculation/mmio: Enable CPU Fill buffer clearing on idle x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data x86/speculation: Add a common function for MD_CLEAR mitigation update x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug Documentation: Add documentation for Processor MMIO Stale Data	2022-06-14 07:43:15 -07:00
Sean Christopherson	1cca2f8c50	KVM: x86: Bug the VM if the emulator accesses a non-existent GPR Bug the VM, i.e. kill it, if the emulator accesses a non-existent GPR, i.e. generates an out-of-bounds GPR index. Continuing on all but gaurantees some form of data corruption in the guest, e.g. even if KVM were to redirect to a dummy register, KVM would be incorrectly read zeros and drop writes. Note, bugging the VM doesn't completely prevent data corruption, e.g. the current round of emulation will complete before the vCPU bails out to userspace. But, the very act of killing the guest can also cause data corruption, e.g. due to lack of file writeback before termination, so taking on additional complexity to cleanly bail out of the emulator isn't justified, the goal is purely to stem the bleeding and alert userspace that something has gone horribly wrong, i.e. to avoid _silent_ data corruption. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220526210817.3428868-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-10 10:01:33 -04:00
Paolo Bonzini	e15f5e6fa6	Merge branch 'kvm-5.20-early' s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to show tests x86: * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Rewrite gfn-pfn cache refresh * Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit	2022-06-09 11:38:12 -04:00
Maxim Levitsky	66c768d30e	KVM: x86: disable preemption while updating apicv inhibition Currently nothing prevents preemption in kvm_vcpu_update_apicv. On SVM, If the preemption happens after we update the vcpu->arch.apicv_active, the preemption itself will 'update' the inhibition since the AVIC will be first disabled on vCPU unload and then enabled, when the current task is loaded again. Then we will try to update it again, which will lead to a warning in __avic_vcpu_load, that the AVIC is already enabled. Fix this by disabling preemption in this code. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-09 10:52:19 -04:00
Paolo Bonzini	66da65005a	KVM/riscv fixes for 5.19, take #1 - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmKhgGkACgkQrUjsVaLH LAfCCw/+LEz6af/lm4PXr5CGkJ91xq2xMpU9o39jMBm0RltzsG7zt/90SaUdK/Oz MpX7CLFgb1Cm2ZZ/+l5cBlLc7NUaMMxHH9dpScyrYC8xAb75QYimpe/jfjuMyXjO IaYJB2WCs2gfTYXA58c4sB2WR5rLahLnQGJrwW2CfMSvpv/nAyEZyWYtgXw8tSxH oM06Z/cLWU53uWuX0hbKAVQMdAIrQK5H+z46bhbpFC6gk/XSvaBBEngoOiiE6lC6 uM8i8ZIeUgqSeWWreczd6H25eYwyLuVxXHWSIgbdvEcvBUn0VDO+Ox4UA2ab3g3d uHubqdRY5GnrkbRK0ue6tXfON8NxGlKwlcc6kp9Vqxb3Jxjr2qwToTubHYAUVXUi XzrvSxoZRRikwstb1+PNXECCNYUHkNdj4FBA4WoF0Y3Br1IfSwZLUX+EKkY/DHv+ L4MhFFNqsQPzVly2wNiyxuWwRQyxupHekeMQlp13P9vZnGcptxxEyuQlM1Hf40ST iiOC8L+TCQzc5dN156/KjQIUFPud4huJO+0xHQtang628yVzQazzcxD+ialPkcqt JnpMmNbvvNzFYLoB3dQ/36flmYRA6SbK4Tt4bdhls+UcnLnfHDZow7OLmX5yj8+A QiKx6IOS6KI10LXhVZguAmZuKjXajyLVaCWpBl0tV6XpV9Y5t98= =w6dT -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv fixes for 5.19, take #1 - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry	2022-06-09 09:45:00 -04:00
Like Xu	b9181c8ef3	KVM: x86/pmu: Avoid exposing Intel BTS feature The BTS feature (including the ability to set the BTS and BTINT bits in the DEBUGCTL MSR) is currently unsupported on KVM. But we may try using the BTS facility on a PEBS enabled guest like this: perf record -e branches:u -c 1 -d ls and then we would encounter the following call trace: [] unchecked MSR access error: WRMSR to 0x1d9 (tried to write 0x00000000000003c0) at rIP: 0xffffffff810745e4 (native_write_msr+0x4/0x20) [] Call Trace: [] intel_pmu_enable_bts+0x5d/0x70 [] bts_event_add+0x54/0x70 [] event_sched_in+0xee/0x290 As it lacks any CPUID indicator or perf_capabilities valid bit fields to prompt for this information, the platform would hint the Intel BTS feature unavailable to guest by setting the BTS_UNAVAIL bit in the IA32_MISC_ENABLE. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220601031925.59693-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 13:06:16 -04:00
Linus Torvalds	34f4335c16	* Fix syzkaller NULL pointer dereference * Fix TDP MMU performance issue with disabling dirty logging * Fix 5.14 regression with SVM TSC scaling * Fix indefinite stall on applying live patches * Fix unstable selftest * Fix memory leak from wrong copy-and-paste * Fix missed PV TLB flush when racing with emulation -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKglysUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOJDAgArpPcAnJbeT2VQTQcp94e4tp9k1Sf gmUewajco4zFVB/sldE0fIporETkaX+FYYPiaNDdNgJ2lUw/HUJBN7KoFEYTZ37N Xx/qXiIXQYFw1bmxTnacLzIQtD3luMCzOs/6/Q7CAFZIBpUtUEjkMlQOBuxoKeG0 B0iLCTJSw0taWcN170aN8G6T+5+bdR3AJW1k2wkgfESfYF9NfJoTUHQj9WTMzM2R aBRuXvUI/rWKvQY3DfoRmgg9Ig/SirSC+abbKIs4H08vZIEUlPk3WOZSKpsN/Wzh 3XDnVRxgnaRLx6NI/ouI2UYJCmjPKbNcueGCf5IfUcHvngHjAEG/xxe4Qw== =zQ9u -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: - syzkaller NULL pointer dereference - TDP MMU performance issue with disabling dirty logging - 5.14 regression with SVM TSC scaling - indefinite stall on applying live patches - unstable selftest - memory leak from wrong copy-and-paste - missed PV TLB flush when racing with emulation * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: do not report a vCPU as preempted outside instruction boundaries KVM: x86: do not set st->preempted when going back to user space KVM: SVM: fix tsc scaling cache logic KVM: selftests: Make hyperv_clock selftest more stable KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm() KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots() entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set KVM: Don't null dereference ops->destroy	2022-06-08 09:16:31 -07:00
Tao Xu	2f4073e08f	KVM: VMX: Enable Notify VM exit There are cases that malicious virtual machines can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. VMM can enable notify VM exit that a VM exit generated if no event window occurs in VM non-root mode for a specified amount of time (notify window). Feature enabling: - The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust the expected notify window. - Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space can query and enable this feature in per-VM scope. The argument is a 64bit value: bits 63:32 are used for notify window, and bits 31:0 are for flags. Current supported flags: - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify window provided. - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen. - It's safe to even set notify window to zero since an internal hardware threshold is added to vmcs.notify_window. VM exit handling: - Introduce a vcpu state notify_window_exits to records the count of notify VM exits and expose it through the debugfs. - Notify VM exit can happen incident to delivery of a vector event. Allow it in KVM. - Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID bit is set. Nested handling - Nested notify VM exits are not supported yet. Keep the same notify window control in vmcs02 as vmcs01, so that L1 can't escape the restriction of notify VM exits through launching L2 VM. Notify VM exit is defined in latest Intel Architecture Instruction Set Extensions Programming Reference, chapter 9.2. Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 05:56:24 -04:00
Sean Christopherson	938c8745bc	KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings Add kvm_caps to hold a variety of capabilites and defaults that aren't handled by kvm_cpu_caps because they aren't CPUID bits in order to reduce the amount of boilerplate code required to add a new feature. The vast majority (all?) of the caps interact with vendor code and are written only during initialization, i.e. should be tagged __read_mostly, declared extern in x86.h, and exported. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220524135624.22988-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 05:21:16 -04:00
Chenyi Qiang	ed2351174e	KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault For the triple fault sythesized by KVM, e.g. the RSM path or nested_vmx_abort(), if KVM exits to userspace before the request is serviced, userspace could migrate the VM and lose the triple fault. Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and restore the triple fault event. This extension is guarded by a new KVM capability KVM_CAP_TRIPLE_FAULT_EVENT. Note that in the set_vcpu_events path, userspace is able to set/clear the triple fault request through triple_fault.pending field. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 05:20:53 -04:00
Paolo Bonzini	d1c88a4020	KVM: x86: always allow host-initiated writes to PMU MSRs Whenever an MSR is part of KVM_GET_MSR_INDEX_LIST, it has to be always retrievable and settable with KVM_GET_MSR and KVM_SET_MSR. Accept the PMU MSRs unconditionally in intel_is_valid_msr, if the access was host-initiated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 04:48:40 -04:00
Like Xu	968635abd5	KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability The information obtained from the interface perf_get_x86_pmu_capability() doesn't change, so an exported "struct x86_pmu_capability" is introduced for all guests in the KVM, and it's initialized before hardware_setup(). Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-16-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 04:48:16 -04:00
Like Xu	d10551738f	KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled The bit 12 represents "Processor Event Based Sampling Unavailable (RO)" : 1 = PEBS is not supported. 0 = PEBS is supported. A write to this PEBS_UNAVL available bit will bring #GP(0) when guest PEBS is enabled. Some PEBS drivers in guest may care about this bit. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20220411101946.20262-13-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 04:48:08 -04:00
Like Xu	902caeb684	KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the adaptive PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record enable bits (IA32_PERFEVTSELx.Adaptive_Record and IA32_FIXED_CTR_CTRL. FCx_Adaptive_Record) are also supported. Adaptive PEBS provides software the capability to configure the PEBS records to capture only the data of interest, keeping the record size compact. An overflow of PMCx results in generation of an adaptive PEBS record with state information based on the selections specified in MSR_PEBS_DATA_CFG.By default, the record only contain the Basic group. When guest adaptive PEBS is enabled, the IA32_PEBS_ENABLE MSR will be added to the perf_guest_switch_msr() and switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. According to Intel SDM, software is recommended to PEBS Baseline when the following is true. IA32_PERF_CAPABILITIES.PEBS_BASELINE[14] && IA32_PERF_CAPABILITIES.PEBS_FMT[11:8] ≥ 4. Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-12-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 04:48:06 -04:00
Like Xu	8183a538cd	KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS When CPUID.01H:EDX.DS[21] is set, the IA32_DS_AREA MSR exists and points to the linear address of the first byte of the DS buffer management area, which is used to manage the PEBS records. When guest PEBS is enabled, the MSR_IA32_DS_AREA MSR will be added to the perf_guest_switch_msr() and switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. The WRMSR to IA32_DS_AREA MSR brings a #GP(0) if the source register contains a non-canonical address. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-11-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 04:48:03 -04:00
Like Xu	c59a1f106f	KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed and general-purpose counters have corresponding bits in IA32_PEBS_ENABLE that enable generation of PEBS records. The general-purpose counter bits start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at bit IA32_PEBS_ENABLE[32]. When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be added to the perf_guest_switch_msr() and atomically switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. Based on whether the platform supports x86_pmu.pebs_ept, it has also refactored the way to add more msrs to arr[] in intel_guest_get_msrs() for extensibility. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-8-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 04:47:55 -04:00
Like Xu	bef6ecca46	KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to detect whether the processor supports performance monitoring facility. It depends on the PMU is enabled for the guest, and a software write operation to this available bit will be ignored. The proposal to ignore the toggle in KVM is the way to go and that behavior matches bare metal. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-5-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 04:47:47 -04:00
Chao Gao	d588bb9be1	KVM: VMX: enable IPI virtualization With IPI virtualization enabled, the processor emulates writes to APIC registers that would send IPIs. The processor sets the bit corresponding to the vector in target vCPU's PIR and may send a notification (IPI) specified by NDST and NV fields in target vCPU's Posted-Interrupt Descriptor (PID). It is similar to what IOMMU engine does when dealing with posted interrupt from devices. A PID-pointer table is used by the processor to locate the PID of a vCPU with the vCPU's APIC ID. The table size depends on maximum APIC ID assigned for current VM session from userspace. Allocating memory for PID-pointer table is deferred to vCPU creation, because irqchip mode and VM-scope maximum APIC ID is settled at that point. KVM can skip PID-pointer table allocation if !irqchip_in_kernel(). Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its notification vector to wakeup vector. This can ensure that when an IPI for blocked vCPUs arrives, VMM can get control and wake up blocked vCPUs. And if a VCPU is preempted, its posted interrupt notification is suppressed. Note that IPI virtualization can only virualize physical-addressing, flat mode, unicast IPIs. Sending other IPIs would still cause a trap-like APIC-write VM-exit and need to be handled by VMM. Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Zeng Guang <guang.zeng@intel.com> Message-Id: <20220419154510.11938-1-guang.zeng@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-08 04:47:37 -04:00

... 2 3 4 5 6 ...

2686 Commits