linux

iv/linux

Author	SHA1	Message	Date
Hou Wenlong	ca85f00225	KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor() Per Intel's SDM on the "Instruction Set Reference", when loading segment descriptor, not-present segment check should be after all type and privilege checks. But the emulator checks it first, then #NP is triggered instead of #GP if privilege fails and segment is not present. Put not-present segment check after type and privilege checks in __load_segment_descriptor(). Fixes: 38ba30ba51a00 (KVM: x86 emulator: Emulate task switch in emulator.c) Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Message-Id: <52573c01d369f506cadcf7233812427cf7db81a7.1644292363.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:49 -05:00
Sean Christopherson	85c68eb429	KVM: selftests: Add test to verify KVM handling of ICR The main thing that the selftest verifies is that KVM copies x2APIC's ICR[63:32] to/from ICR2 when userspace accesses the vAPIC page via KVM_{G,S}ET_LAPIC. KVM previously split x2APIC ICR to ICR+ICR2 at the time of write (from the guest), and so KVM must preserve that behavior for backwards compatibility between different versions of KVM. It will also test other invariants, e.g. that KVM clears the BUSY flag on ICR writes, that the reserved bits in ICR2 are dropped on writes from the guest, etc... Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:48 -05:00
Sean Christopherson	b9964ee36b	KVM: x86: Make kvm_lapic_set_reg() a "private" xAPIC helper Hide the lapic's "raw" write helper inside lapic.c to force non-APIC code to go through proper helpers when modification the vAPIC state. Keep the read helper visible to outsiders for now, refactoring KVM to hide it too is possible, it will just take more work to do so. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:48 -05:00
Sean Christopherson	a57a31684d	KVM: x86: Treat x2APIC's ICR as a 64-bit register, not two 32-bit regs Emulate the x2APIC ICR as a single 64-bit register, as opposed to forking it across ICR and ICR2 as two 32-bit registers. This mirrors hardware behavior for Intel's upcoming IPI virtualization support, which does not split the access. Previous versions of Intel's SDM and AMD's APM don't explicitly state exactly how ICR is reflected in the vAPIC page for x2APIC, KVM just happened to speculate incorrectly. Handling the upcoming behavior is necessary in order to maintain backwards compatibility with KVM_{G,S}ET_LAPIC, e.g. failure to shuffle the 64-bit ICR to ICR+ICR2 and vice versa would break live migration if IPI virtualization support isn't symmetrical across the source and dest. Cc: Zeng Guang <guang.zeng@intel.com> Cc: Chao Gao <chao.gao@intel.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:48 -05:00
Sean Christopherson	5429478d03	KVM: x86: Add helpers to handle 64-bit APIC MSR read/writes Add helpers to handle 64-bit APIC read/writes via MSRs to deduplicate the x2APIC and Hyper-V code needed to service reads/writes to ICR. Future support for IPI virtualization will add yet another path where KVM must handle 64-bit APIC MSR reads/write (to ICR). Opportunistically fix the comment in the write path; ICR2 holds the destination (if there's no shorthand), not the vector. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:47 -05:00
Sean Christopherson	7018005235	KVM: x86: Make kvm_lapic_reg_{read,write}() static Make the low level read/write lapic helpers static, any accesses to the local APIC from vendor code or non-APIC code should be routed through proper helpers. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:47 -05:00
Sean Christopherson	bd17f417c0	KVM: x86: WARN if KVM emulates an IPI without clearing the BUSY flag WARN if KVM emulates an IPI without clearing the BUSY flag, failure to do so could hang the guest if it waits for the IPI be sent. Opportunistically use APIC_ICR_BUSY macro instead of open coding the magic number, and add a comment to clarify why kvm_recalculate_apic_map() is unconditionally invoked (it's really, really confusing for IPIs due to the existence of fast paths that don't trigger a potential recalc). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:47 -05:00
Sean Christopherson	b51818afdc	KVM: SVM: Don't rewrite guest ICR on AVIC IPI virtualization failure Don't bother rewriting the ICR value into the vAPIC page on an AVIC IPI virtualization failure, the access is a trap, i.e. the value has already been written to the vAPIC page. The one caveat is if hardware left the BUSY flag set (which appears to happen somewhat arbitrarily), in which case go through the "nodecode" APIC-write path in order to clear the BUSY flag. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:46 -05:00
Sean Christopherson	ed60920efe	KVM: SVM: Use common kvm_apic_write_nodecode() for AVIC write traps Use the common kvm_apic_write_nodecode() to handle AVIC/APIC-write traps instead of open coding the same exact code. This will allow making the low level lapic helpers inaccessible outside of lapic.c code. Opportunistically clean up the params to eliminate a bunch of svm=>vcpu reflection. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:46 -05:00
Sean Christopherson	b031f10435	KVM: x86: Use "raw" APIC register read for handling APIC-write VM-Exit Use the "raw" helper to read the vAPIC register after an APIC-write trap VM-Exit. Hardware is responsible for vetting the write, and the caller is responsible for sanitizing the offset. This is a functional change, as it means KVM will consume whatever happens to be in the vAPIC page if the write was dropped by hardware. But, unless userspace deliberately wrote garbage into the vAPIC page via KVM_SET_LAPIC, the value should be zero since it's not writable by the guest. This aligns common x86 with SVM's AVIC logic, i.e. paves the way for using the nodecode path to handle APIC-write traps when AVIC is enabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:45 -05:00
Sean Christopherson	b5ede3df79	KVM: VMX: Handle APIC-write offset wrangling in VMX code Move the vAPIC offset adjustments done in the APIC-write trap path from common x86 to VMX in anticipation of using the nodecode path for SVM's AVIC. The adjustment reflects hardware behavior, i.e. it's technically a property of VMX, no common x86. SVM's AVIC behavior is identical, so it's a bit of a moot point, the goal is purely to make it easier to understand why the adjustment is ok. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:45 -05:00
Paolo Bonzini	d22a81b304	KVM: x86: Do not change ICR on write to APIC_SELF_IPI Emulating writes to SELF_IPI with a write to ICR has an unwanted side effect: the value of ICR in vAPIC page gets changed. The lists SELF_IPI as write-only, with no associated MMIO offset, so any write should have no visible side effect in the vAPIC page. Reported-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:45 -05:00
Zhenzhong Duan	f66af9f222	KVM: x86: Fix emulation in writing cr8 In emulation of writing to cr8, one of the lowest four bits in TPR[3:0] is kept. According to Intel SDM 10.8.6.1(baremetal scenario): "APIC.TPR[bits 7:4] = CR8[bits 3:0], APIC.TPR[bits 3:0] = 0"; and SDM 28.3(use TPR shadow): "MOV to CR8. The instruction stores bits 3:0 of its source operand into bits 7:4 of VTPR; the remainder of VTPR (bits 3:0 and bits 31:8) are cleared."; and AMD's APM 16.6.4: "Task Priority Sub-class (TPS)-Bits 3 : 0. The TPS field indicates the current sub-priority to be used when arbitrating lowest-priority messages. This field is written with zero when TPR is written using the architectural CR8 register."; so in KVM emulated scenario, clear TPR[3:0] to make a consistent behavior as in other scenarios. This doesn't impact evaluation and delivery of pending virtual interrupts because processor does not use the processor-priority sub-class to determine which interrupts to delivery and which to inhibit. Sub-class is used by hardware to arbitrate lowest priority interrupts, but KVM just does a round-robin style delivery. Fixes: b93463aa59d6 ("KVM: Accelerated apic support") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220210094506.20181-1-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:44 -05:00
Paolo Bonzini	b5f61c035d	KVM: x86: flush TLB separately from MMU reset For both CR0 and CR4, disassociate the TLB flush logic from the MMU role logic. Instead of relying on kvm_mmu_reset_context() being a superset of various TLB flushes (which is not necessarily going to be the case in the future), always call it if the role changes but also set the various TLB flush requests according to what is in the manual. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:50:42 -05:00
Paolo Bonzini	6d58f275e6	KVM: x86/mmu: clear MMIO cache when unloading the MMU For cleanliness, do not leave a stale GVA in the cache after all the roots are cleared. In practice, kvm_mmu_load will go through kvm_mmu_sync_roots if paging is on, and will not use vcpu_match_mmio_gva at all if paging is off. However, leaving data in the cache might cause bugs in the future. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:19 -05:00
Paolo Bonzini	d2e5f33341	KVM: x86/mmu: Always use current mmu's role when loading new PGD Since the guest PGD is now loaded after the MMU has been set up completely, the desired role for a cache hit is simply the current mmu_role. There is no need to compute it again, so __kvm_mmu_new_pgd can be folded in kvm_mmu_new_pgd. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:18 -05:00
Paolo Bonzini	3cffc89d9d	KVM: x86/mmu: load new PGD after the shadow MMU is initialized Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and shadow_root_level anymore, pull the PGD load after the initialization of the shadow MMUs. Besides being more intuitive, this enables future simplifications and optimizations because it's not necessary anymore to compute the role outside kvm_init_mmu. In particular, kvm_mmu_reset_context was not attempting to use a cached PGD to avoid having to figure out the new role. With this change, it could follow what nested_{vmx,svm}_load_cr3 are doing, and avoid unloading all the cached roots. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:18 -05:00
Paolo Bonzini	5499ea73e7	KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit Right now, PGD caching avoids placing a PAE root in the cache by using the old value of mmu->root_level and mmu->shadow_root_level; it does not look for a cached PGD if the old root is a PAE one, and then frees it using kvm_mmu_free_roots. Change the logic instead to free the uncacheable root early. This way, __kvm_new_mmu_pgd is able to look up the cache when going from 32-bit to 64-bit (if there is a hit, the invalid root becomes the least recently used). An example of this is nested virtualization with shadow paging, when a 64-bit L1 runs a 32-bit L2. As a side effect (which is actually the reason why this patch was written), PGD caching does not use the old value of mmu->root_level and mmu->shadow_root_level anymore. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:18 -05:00
Paolo Bonzini	0c1c92f15f	KVM: x86/mmu: do not pass vcpu to root freeing functions These functions only operate on a given MMU, of which there is more than one in a vCPU (we care about two, because the third does not have any roots and is only used to walk guest page tables). They do need a struct kvm in order to lock the mmu_lock, but they do not needed anything else in the struct kvm_vcpu. So, pass the vcpu->kvm directly to them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:17 -05:00
Paolo Bonzini	594bef7931	KVM: x86/mmu: do not consult levels when freeing roots Right now, PGD caching requires a complicated dance of first computing the MMU role and passing it to __kvm_mmu_new_pgd(), and then separately calling kvm_init_mmu(). Part of this is due to kvm_mmu_free_roots using mmu->root_level and mmu->shadow_root_level to distinguish whether the page table uses a single root or 4 PAE roots. Because kvm_init_mmu() can overwrite mmu->root_level, kvm_mmu_free_roots() must be called before kvm_init_mmu(). However, even after kvm_init_mmu() there is a way to detect whether the page table may hold PAE roots, as root.hpa isn't backed by a shadow when it points at PAE roots. Using this method results in simpler code, and is one less obstacle in moving all calls to __kvm_mmu_new_pgd() after the MMU has been initialized. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:17 -05:00
Paolo Bonzini	b9e5603c2a	KVM: x86: use struct kvm_mmu_root_info for mmu->root The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_info. Use the struct to have more consistency between mmu->root and mmu->prev_roots. The patch is entirely search and replace except for cached_root_available, which does not need a temporary struct kvm_mmu_root_info anymore. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:16 -05:00
Paolo Bonzini	9191b8f074	KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs WARN and bail if KVM attempts to free a root that isn't backed by a shadow page. KVM allocates a bare page for "special" roots, e.g. when using PAE paging or shadowing 2/3/4-level page tables with 4/5-level, and so root_hpa will be valid but won't be backed by a shadow page. It's all too easy to blindly call mmu_free_root_page() on root_hpa, be nice and WARN instead of crashing KVM and possibly the kernel. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:16 -05:00
Paolo Bonzini	57cb3bb0dc	KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 Enabling async page faults is nonsensical if paging is disabled, but it is allowed because CR0.PG=0 does not clear the async page fault MSR. Just ignore them and only use the artificial halt state, similar to what happens in guest mode if async #PF vmexits are disabled. Given the increasingly complex logic, and the nicer code if the new "if" is placed last, opportunistically change the "\|\|" into a chain of "if (...) return false" statements. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:16 -05:00
Paolo Bonzini	d617429936	KVM: x86: Reinitialize context if host userspace toggles EFER.LME While the guest runs, EFER.LME cannot change unless CR0.PG is clear, and therefore EFER.NX is the only bit that can affect the MMU role. However, set_efer accepts a host-initiated change to EFER.LME even with CR0.PG=1. In that case, the MMU has to be reset. Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes") Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:15 -05:00
David Dunn	20e416720e	KVM: selftests: Verify disabling PMU virtualization via KVM_CAP_CONFIG_PMU On a VM with PMU disabled via KVM_CAP_PMU_CONFIG, the PMU should not be usable by the guest. Signed-off-by: David Dunn <daviddunn@google.com> Message-Id: <20220223225743.2703915-4-daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:15 -05:00
David Dunn	f49b8138e6	KVM: selftests: Carve out helper to create "default" VM without vCPUs Carve out portion of vm_create_default so that selftests can modify a "default" VM prior to creating vcpus. Signed-off-by: David Dunn <daviddunn@google.com> Message-Id: <20220223225743.2703915-3-daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:15 -05:00
David Dunn	ba7bb663f5	KVM: x86: Provide per VM capability for disabling PMU virtualization Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of settings/features to allow userspace to configure PMU virtualization on a per-VM basis. For now, support a single flag, KVM_PMU_CAP_DISABLE, to allow disabling PMU virtualization for a VM even when KVM is configured with enable_pmu=true a module level. To keep KVM simple, disallow changing VM's PMU configuration after vCPUs have been created. Signed-off-by: David Dunn <daviddunn@google.com> Message-Id: <20220223225743.2703915-2-daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:14 -05:00
Sean Christopherson	925088781e	KVM: x86: Fix pointer mistmatch warning when patching RET0 static calls Cast kvm_x86_ops.func to 'void ' when updating KVM static calls that are conditionally patched to __static_call_return0(). clang complains about using mismatching pointers in the ternary operator, which breaks the build when compiling with CONFIG_KVM_WERROR=y. >> arch/x86/include/asm/kvm-x86-ops.h:82:1: warning: pointer type mismatch ('bool ()(struct kvm_vcpu )' and 'void ') [-Wpointer-type-mismatch] Fixes: 5be2226f417d ("KVM: x86: allow defining return-0 static calls") Reported-by: Like Xu <like.xu.linux@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Dunn <daviddunn@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Message-Id: <20220223162355.3174907-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:14 -05:00
Vipin Sharma	e45cce30ea	KVM: Move VM's worker kthreads back to the original cgroup before exiting. VM worker kthreads can linger in the VM process's cgroup for sometime after KVM terminates the VM process. KVM terminates the worker kthreads by calling kthread_stop() which waits on the 'exited' completion, triggered by exit_mm(), via mm_release(), in do_exit() during the kthread's exit. However, these kthreads are removed from the cgroup using the cgroup_exit() which happens after the exit_mm(). Therefore, A VM process can terminate in between the exit_mm() and cgroup_exit() calls, leaving only worker kthreads in the cgroup. Moving worker kthreads back to the original cgroup (kthreadd_task's cgroup) makes sure that the cgroup is empty as soon as the main VM process is terminated. Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220222054848.563321-1-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:14 -05:00
Peng Hao	0b8934d3a9	KVM: VMX: Remove scratch 'cpu' variable that shadows an identical scratch var From: Peng Hao <flyingpeng@tencent.com> Remove a redundant 'cpu' declaration from inside an if-statement that that shadows an identical declaration at function scope. Both variables are used as scratch variables in for_each_*_cpu() loops, thus there's no harm in sharing a variable. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222103954.70062-1-flyingpeng@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:13 -05:00
Peng Hao	105e0c441a	kvm: vmx: Fix typos comment in __loaded_vmcs_clear() Fix a comment documenting the memory barrier related to clearing a loaded_vmcs; loaded_vmcs tracks the host CPU the VMCS is loaded on via the field 'cpu', it doesn't have a 'vcpu' field. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222104029.70129-1-flyingpeng@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:13 -05:00
Peng Hao	fbc2dfe53a	KVM: nVMX: Make setup/unsetup under the same conditions Make sure nested_vmx_hardware_setup/unsetup() are called in pairs under the same conditions. Calling nested_vmx_hardware_unsetup() when nested is false "works" right now because it only calls free_page() on zero- initialized pointers, but it's possible that more code will be added to nested_vmx_hardware_unsetup() in the future. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222104054.70286-1-flyingpeng@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:13 -05:00
Paolo Bonzini	c0f1eaeb9e	Merge branch 'kvm-hv-xmm-hypercall-fixes' into HEAD The fixes for 5.17 conflict with cleanups made in the same area earlier in the 5.18 development cycle.	2022-02-25 08:20:08 -05:00
Vitaly Kuznetsov	47d3e5cdfe	KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall It has been proven on practice that at least Windows Server 2019 tries using HVCALL_SEND_IPI_EX in 'XMM fast' mode when it has more than 64 vCPUs and it needs to send an IPI to a vCPU > 63. Similarly to other XMM Fast hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}{,_EX}), this information is missing in TLFS as of 6.0b. Currently, KVM returns an error (HV_STATUS_INVALID_HYPERCALL_INPUT) and Windows crashes. Note, HVCALL_SEND_IPI is a 'standard' fast hypercall (not 'XMM fast') as all its parameters fit into RDX:R8 and this is handled by KVM correctly. Cc: stable@vger.kernel.org # 5.14.x: 3244867af8c0: KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req Cc: stable@vger.kernel.org # 5.14.x Fixes: d8f5537a8816 ("KVM: hyper-v: Advertise support for fast XMM hypercalls") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 05:52:22 -05:00
Vitaly Kuznetsov	7321f47ead	KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast TLB flush hypercalls When TLB flush hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX are issued in 'XMM fast' mode, the maximum number of allowed sparse_banks is not 'HV_HYPERCALL_MAX_XMM_REGISTERS - 1' (5) but twice as many (10) as each XMM register is 128 bit long and can hold two 64 bit long banks. Cc: stable@vger.kernel.org # 5.14.x Fixes: 5974565bc26d ("KVM: x86: kvm_hv_flush_tlb use inputs from XMM registers") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 05:52:12 -05:00
Vitaly Kuznetsov	82c1ead0d6	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_flush_tlb() 'struct kvm_hv_hcall' has all the required information already, there's no need to pass 'ex' additionally. No functional change intended. Cc: stable@vger.kernel.org # 5.14.x Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 05:52:04 -05:00
Vitaly Kuznetsov	50e523dd79	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_send_ipi() 'struct kvm_hv_hcall' has all the required information already, there's no need to pass 'ex' additionally. No functional change intended. Cc: stable@vger.kernel.org # 5.14.x Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 05:51:53 -05:00
Paolo Bonzini	4dfc4ec2b7	Merge branch 'kvm-ppc-cap-210' into kvm-next-5.18	2022-02-24 08:49:56 -05:00
Paolo Bonzini	0828824158	KVM: s390: Changes for 5.18 part1 - add Claudio as Maintainer - first step to do proper storage key checking - testcase for missing memop check -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmIUnCkACgkQEXu8gLWm HHxagA//ZY02kmPB8Q6cd7xju+9wxAtdvBAorSEFO8nyW0tPwI8tPGX2AbrGywwh ZDSXRRDGK5PgYWzWAlOO59YbZ9b+HqDJV3yDfDVwGX8JauRVo70DFYxLudD4IKIZ 9bbxm9jjc1S0+nxpBaQZzuTGGKVxl88/GAH4s5Ce5liYPjYn1XWOuHh3Udl1fx5g 4DiKGcpTSYKwcU9aGuv8l680T8CEpzhluLHuz/IG1p5do+NCmyxXAt9IUWnR7LhS 2ram6IB2kHo3ygbiB354Qd6A/uy8I/Ow3HKoyN9QfppOYEowyLM0q2BT0tMwba7j bkd2fYjegUw4e5eWCozZ5T6OWNt4YPZO7Y8R/AW3WeE//AeTJWKeqX3tckU5gukh Th0SQyujLzTlJUFCxEsKbH2ljmX5qiIl/67C6nSdycY5du1cCMezattV0FC/iX3q 0nnDTk5jcMd2G1gHEmrBvBXlhN29e0VNiRgZ5KG6F53DRkJzNdP18Q5DMHi5icJ+ rkYFfQ/ROc+TSsL2xaczmvHMesN78ts+V/4tIrObG7CCe6taSCPwrj4u1tIPvQ++ jfxSw0v5rqXUwnCUQ+F9vt2H4nryijCooNt9rsWY63JPblmLs/8rfF72wJd0edY4 6+HUCAyYksoW7MKl3DyOJnG/AgHy5Kq114oloPH8NykpcL3RiHE= =qGQI -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Changes for 5.18 part1 - add Claudio as Maintainer - first step to do proper storage key checking - testcase for missing memop check	2022-02-22 13:07:28 -05:00
Nicholas Piggin	93b71801a8	KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3 Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL resource mode to 3 with the H_SET_MODE hypercall. This capability differs between processor types and KVM types (PR, HV, Nested HV), and affects guest-visible behaviour. QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and use the KVM CAP if available to determine KVM support[2]. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-22 09:06:54 -05:00
Janis Schoetterl-Glausch	3d9042f8b9	KVM: s390: Add missing vm MEM_OP size check Check that size is not zero, preventing the following warning: WARNING: CPU: 0 PID: 9692 at mm/vmalloc.c:3059 __vmalloc_node_range+0x528/0x648 Modules linked in: CPU: 0 PID: 9692 Comm: memop Not tainted 5.17.0-rc3-e4+ #80 Hardware name: IBM 8561 T01 701 (LPAR) Krnl PSW : 0704c00180000000 0000000082dc584c (__vmalloc_node_range+0x52c/0x648) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000083 ffffffffffffffff 0000000000000000 0000000000000001 0000038000000000 000003ff80000000 0000000000000cc0 000000008ebb8000 0000000087a8a700 000000004040aeb1 000003ffd9f7dec8 000000008ebb8000 000000009d9b8000 000000000102a1b4 00000380035afb68 00000380035afaa8 Krnl Code: 0000000082dc583e: d028a7f4ff80 trtr 2036(41,%r10),3968(%r15) 0000000082dc5844: af000000 mc 0,0 #0000000082dc5848: af000000 mc 0,0 >0000000082dc584c: a7d90000 lghi %r13,0 0000000082dc5850: b904002d lgr %r2,%r13 0000000082dc5854: eb6ff1080004 lmg %r6,%r15,264(%r15) 0000000082dc585a: 07fe bcr 15,%r14 0000000082dc585c: 47000700 bc 0,1792 Call Trace: [<0000000082dc584c>] __vmalloc_node_range+0x52c/0x648 [<0000000082dc5b62>] vmalloc+0x5a/0x68 [<000003ff8067f4ca>] kvm_arch_vm_ioctl+0x2da/0x2a30 [kvm] [<000003ff806705bc>] kvm_vm_ioctl+0x4ec/0x978 [kvm] [<0000000082e562fe>] __s390x_sys_ioctl+0xbe/0x100 [<000000008360a9bc>] __do_syscall+0x1d4/0x200 [<0000000083618bd2>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<0000000082dc5348>] __vmalloc_node_range+0x28/0x648 Other than the warning, there is no ill effect from the missing check, the condition is detected by subsequent code and causes a return with ENOMEM. Fixes: ef11c9463ae0 (KVM: s390: Add vm IOCTL for key checked guest absolute memory access) Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220221163237.4122868-1-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2022-02-22 09:16:18 +01:00
Janis Schoetterl-Glausch	cbf9b8109d	KVM: s390: Clarify key argument for MEM_OP in api docs Clarify that the key argument represents the access key, not the whole storage key. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220221143657.3712481-1-scgl@linux.ibm.com Fixes: 5e35d0eb472b ("KVM: s390: Update api documentation for memop ioctl") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2022-02-22 09:16:18 +01:00
Sean Christopherson	1bbc60d0c7	KVM: x86/mmu: Remove MMU auditing Remove mmu_audit.c and all its collateral, the auditing code has suffered severe bitrot, ironically partly due to shadow paging being more stable and thus not benefiting as much from auditing, but mostly due to TDP supplanting shadow paging for non-nested guests and shadowing of nested TDP not heavily stressing the logic that is being audited. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 13:46:23 -05:00
Paolo Bonzini	5be2226f41	KVM: x86: allow defining return-0 static calls A few vendor callbacks are only used by VMX, but they return an integer or bool value. Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is NULL in struct kvm_x86_ops, it will be changed to __static_call_return0 when updating static calls. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:44:22 -05:00
Paolo Bonzini	abb6d479e2	KVM: x86: make several APIC virtualization callbacks optional All their invocations are conditional on vcpu->arch.apicv_active, meaning that they need not be implemented by vendor code: even though at the moment both vendors implement APIC virtualization, all of them can be optional. In fact SVM does not need many of them, and their implementation can be deleted now. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:41:23 -05:00
Paolo Bonzini	dd2319c618	KVM: x86: warn on incorrectly NULL members of kvm_x86_ops Use the newly corrected KVM_X86_OP annotations to warn about possible NULL pointer dereferences as soon as the vendor module is loaded. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:31:43 -05:00
Paolo Bonzini	e4fc23bad8	KVM: x86: remove KVM_X86_OP_NULL and mark optional kvm_x86_ops The original use of KVM_X86_OP_NULL, which was to mark calls that do not follow a specific naming convention, is not in use anymore. Instead, let's mark calls that are optional because they are always invoked within conditionals or with static_call_cond. Those that are _not_, i.e. those that are defined with KVM_X86_OP, must be defined by both vendor modules or some kind of NULL pointer dereference is bound to happen at runtime. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:31:27 -05:00
Paolo Bonzini	2a89061451	KVM: x86: use static_call_cond for optional callbacks SVM implements neither update_emulated_instruction nor set_apic_access_page_addr. Remove an "if" by calling them with static_call_cond(). Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:31:17 -05:00
Paolo Bonzini	8a2897853c	KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPIC The two ioctls used to implement userspace-accelerated TPR, KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available even if hardware-accelerated TPR can be used. So there is no reason not to report KVM_CAP_VAPIC. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:30:56 -05:00
Paolo Bonzini	1e8ff29fbb	selftests: KVM: allow sev_migrate_tests on machines without SEV-ES I managed to get hold of a machine that has SEV but not SEV-ES, and sev_migrate_tests fails because sev_vm_create(true) returns ENOTTY. Fix this, and while at it also return KSFT_SKIP on machines that do not have SEV at all, instead of returning 0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 05:16:43 -05:00

1 2 3 4 5 ...

1073154 Commits