linux

iv/linux

Author	SHA1	Message	Date
David Dunn	ba7bb663f5	KVM: x86: Provide per VM capability for disabling PMU virtualization Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of settings/features to allow userspace to configure PMU virtualization on a per-VM basis. For now, support a single flag, KVM_PMU_CAP_DISABLE, to allow disabling PMU virtualization for a VM even when KVM is configured with enable_pmu=true a module level. To keep KVM simple, disallow changing VM's PMU configuration after vCPUs have been created. Signed-off-by: David Dunn <daviddunn@google.com> Message-Id: <20220223225743.2703915-2-daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:14 -05:00
Sean Christopherson	925088781e	KVM: x86: Fix pointer mistmatch warning when patching RET0 static calls Cast kvm_x86_ops.func to 'void ' when updating KVM static calls that are conditionally patched to __static_call_return0(). clang complains about using mismatching pointers in the ternary operator, which breaks the build when compiling with CONFIG_KVM_WERROR=y. >> arch/x86/include/asm/kvm-x86-ops.h:82:1: warning: pointer type mismatch ('bool ()(struct kvm_vcpu )' and 'void ') [-Wpointer-type-mismatch] Fixes: 5be2226f417d ("KVM: x86: allow defining return-0 static calls") Reported-by: Like Xu <like.xu.linux@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Dunn <daviddunn@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Message-Id: <20220223162355.3174907-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:14 -05:00
Peng Hao	0b8934d3a9	KVM: VMX: Remove scratch 'cpu' variable that shadows an identical scratch var From: Peng Hao <flyingpeng@tencent.com> Remove a redundant 'cpu' declaration from inside an if-statement that that shadows an identical declaration at function scope. Both variables are used as scratch variables in for_each_*_cpu() loops, thus there's no harm in sharing a variable. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222103954.70062-1-flyingpeng@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:13 -05:00
Peng Hao	105e0c441a	kvm: vmx: Fix typos comment in __loaded_vmcs_clear() Fix a comment documenting the memory barrier related to clearing a loaded_vmcs; loaded_vmcs tracks the host CPU the VMCS is loaded on via the field 'cpu', it doesn't have a 'vcpu' field. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222104029.70129-1-flyingpeng@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:13 -05:00
Peng Hao	fbc2dfe53a	KVM: nVMX: Make setup/unsetup under the same conditions Make sure nested_vmx_hardware_setup/unsetup() are called in pairs under the same conditions. Calling nested_vmx_hardware_unsetup() when nested is false "works" right now because it only calls free_page() on zero- initialized pointers, but it's possible that more code will be added to nested_vmx_hardware_unsetup() in the future. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <20220222104054.70286-1-flyingpeng@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:13 -05:00
Paolo Bonzini	c0f1eaeb9e	Merge branch 'kvm-hv-xmm-hypercall-fixes' into HEAD The fixes for 5.17 conflict with cleanups made in the same area earlier in the 5.18 development cycle.	2022-02-25 08:20:08 -05:00
Vitaly Kuznetsov	47d3e5cdfe	KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall It has been proven on practice that at least Windows Server 2019 tries using HVCALL_SEND_IPI_EX in 'XMM fast' mode when it has more than 64 vCPUs and it needs to send an IPI to a vCPU > 63. Similarly to other XMM Fast hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}{,_EX}), this information is missing in TLFS as of 6.0b. Currently, KVM returns an error (HV_STATUS_INVALID_HYPERCALL_INPUT) and Windows crashes. Note, HVCALL_SEND_IPI is a 'standard' fast hypercall (not 'XMM fast') as all its parameters fit into RDX:R8 and this is handled by KVM correctly. Cc: stable@vger.kernel.org # 5.14.x: 3244867af8c0: KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req Cc: stable@vger.kernel.org # 5.14.x Fixes: d8f5537a8816 ("KVM: hyper-v: Advertise support for fast XMM hypercalls") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 05:52:22 -05:00
Vitaly Kuznetsov	7321f47ead	KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast TLB flush hypercalls When TLB flush hypercalls (HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX are issued in 'XMM fast' mode, the maximum number of allowed sparse_banks is not 'HV_HYPERCALL_MAX_XMM_REGISTERS - 1' (5) but twice as many (10) as each XMM register is 128 bit long and can hold two 64 bit long banks. Cc: stable@vger.kernel.org # 5.14.x Fixes: 5974565bc26d ("KVM: x86: kvm_hv_flush_tlb use inputs from XMM registers") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 05:52:12 -05:00
Vitaly Kuznetsov	82c1ead0d6	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_flush_tlb() 'struct kvm_hv_hcall' has all the required information already, there's no need to pass 'ex' additionally. No functional change intended. Cc: stable@vger.kernel.org # 5.14.x Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 05:52:04 -05:00
Vitaly Kuznetsov	50e523dd79	KVM: x86: hyper-v: Drop redundant 'ex' parameter from kvm_hv_send_ipi() 'struct kvm_hv_hcall' has all the required information already, there's no need to pass 'ex' additionally. No functional change intended. Cc: stable@vger.kernel.org # 5.14.x Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220222154642.684285-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 05:51:53 -05:00
Sean Christopherson	1bbc60d0c7	KVM: x86/mmu: Remove MMU auditing Remove mmu_audit.c and all its collateral, the auditing code has suffered severe bitrot, ironically partly due to shadow paging being more stable and thus not benefiting as much from auditing, but mostly due to TDP supplanting shadow paging for non-nested guests and shadowing of nested TDP not heavily stressing the logic that is being audited. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 13:46:23 -05:00
Paolo Bonzini	5be2226f41	KVM: x86: allow defining return-0 static calls A few vendor callbacks are only used by VMX, but they return an integer or bool value. Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is NULL in struct kvm_x86_ops, it will be changed to __static_call_return0 when updating static calls. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:44:22 -05:00
Paolo Bonzini	abb6d479e2	KVM: x86: make several APIC virtualization callbacks optional All their invocations are conditional on vcpu->arch.apicv_active, meaning that they need not be implemented by vendor code: even though at the moment both vendors implement APIC virtualization, all of them can be optional. In fact SVM does not need many of them, and their implementation can be deleted now. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:41:23 -05:00
Paolo Bonzini	dd2319c618	KVM: x86: warn on incorrectly NULL members of kvm_x86_ops Use the newly corrected KVM_X86_OP annotations to warn about possible NULL pointer dereferences as soon as the vendor module is loaded. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:31:43 -05:00
Paolo Bonzini	e4fc23bad8	KVM: x86: remove KVM_X86_OP_NULL and mark optional kvm_x86_ops The original use of KVM_X86_OP_NULL, which was to mark calls that do not follow a specific naming convention, is not in use anymore. Instead, let's mark calls that are optional because they are always invoked within conditionals or with static_call_cond. Those that are _not_, i.e. those that are defined with KVM_X86_OP, must be defined by both vendor modules or some kind of NULL pointer dereference is bound to happen at runtime. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:31:27 -05:00
Paolo Bonzini	2a89061451	KVM: x86: use static_call_cond for optional callbacks SVM implements neither update_emulated_instruction nor set_apic_access_page_addr. Remove an "if" by calling them with static_call_cond(). Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:31:17 -05:00
Paolo Bonzini	8a2897853c	KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPIC The two ioctls used to implement userspace-accelerated TPR, KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available even if hardware-accelerated TPR can be used. So there is no reason not to report KVM_CAP_VAPIC. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:30:56 -05:00
Peter Gonda	b2125513df	KVM: SEV: Allow SEV intra-host migration of VM with mirrors For SEV-ES VMs with mirrors to be intra-host migrated they need to be able to migrate with the mirror. This is due to that fact that all VMSAs need to be added into the VM with LAUNCH_UPDATE_VMSA before lAUNCH_FINISH. Allowing migration with mirrors allows users of SEV-ES to keep the mirror VMs VMSAs during migration. Adds a list of mirror VMs for the original VM iterate through during its migration. During the iteration the owner pointers can be updated from the source to the destination. This fixes the ASID leaking issue which caused the blocking of migration of VMs with mirrors. Signed-off-by: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20220211193634.3183388-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 04:43:56 -05:00
Sean Christopherson	db6e7adf8d	KVM: SVM: Rename AVIC helpers to use "avic" prefix instead of "svm" Use "avic" instead of "svm" for SVM's all of APICv hooks and make a few additional funciton name tweaks so that the AVIC functions conform to their associated kvm_x86_ops hooks. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-14 07:49:34 -05:00
Paolo Bonzini	4e71cad31c	Merge remote-tracking branch 'kvm/master' into HEAD Merge bugfix patches from Linux 5.17-rc.	2022-02-14 07:49:10 -05:00
Jim Mattson	710c476514	KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW AMD's event select is 3 nybbles, with the high nybble in bits 35:32 of a PerfEvtSeln MSR. Don't mask off the high nybble when configuring a RAW perf event. Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220203014813.2130559-2-jmattson@google.com> Reviewed-by: David Dunn <daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-14 07:44:51 -05:00
Jim Mattson	b8bfee85f1	KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event AMD's event select is 3 nybbles, with the high nybble in bits 35:32 of a PerfEvtSeln MSR. Don't drop the high nybble when setting up the config field of a perf_event_attr structure for a call to perf_event_create_kernel_counter(). Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220203014813.2130559-1-jmattson@google.com> Reviewed-by: David Dunn <daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-14 07:43:46 -05:00
Maxim Levitsky	66fa226c13	KVM: SVM: fix race between interrupt delivery and AVIC inhibition If svm_deliver_avic_intr is called just after the target vcpu's AVIC got inhibited, it might read a stale value of vcpu->arch.apicv_active which can lead to the target vCPU not noticing the interrupt. To fix this use load-acquire/store-release so that, if the target vCPU is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the AVIC. If AVIC has been disabled in the meanwhile, proceed with the KVM_REQ_EVENT-based delivery. Incomplete IPI vmexit has the same races as svm_deliver_avic_intr, and in fact it can be handled in exactly the same way; the only difference lies in who has set IRR, whether svm_deliver_interrupt or the processor. Therefore, svm_complete_interrupt_delivery can be used to fix incomplete IPI vmexits as well. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-11 12:53:02 -05:00
Paolo Bonzini	30811174f0	KVM: SVM: set IRR in svm_deliver_interrupt SVM has to set IRR for both the AVIC and the software-LAPIC case, so pull it up to the common function that handles both configurations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-11 12:53:02 -05:00
Maxim Levitsky	0a5f784273	KVM: SVM: extract avic_ring_doorbell The check on the current CPU adds an extra level of indentation to svm_deliver_avic_intr and conflates documentation on what happens if the vCPU exits (of interest to svm_deliver_avic_intr) and migrates (only of interest to avic_ring_doorbell, which calls get/put_cpu()). Extract the wrmsr to a separate function and rewrite the comment in svm_deliver_avic_intr(). Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-11 12:53:02 -05:00
Oliver Upton	48ebd0cf23	KVM: VMX: Use local pointer to vcpu_vmx in vmx_vcpu_after_set_cpuid() There is a local that contains a pointer to vcpu_vmx already. Just use that instead to get at the structure directly instead of doing pointer arithmetic. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220204204705.3538240-8-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:48 -05:00
Vitaly Kuznetsov	66c03a926f	KVM: nSVM: Implement Enlightened MSR-Bitmap feature Similar to nVMX commit 502d2bf5f2fd ("KVM: nVMX: Implement Enlightened MSR Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM). Notable differences from nVMX implementation: - As the feature uses SW reserved fields in VMCB control, KVM needs to make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()). - 'msrpm_base_pa' needs to be always be overwritten in nested_svm_vmrun_msrpm(), even when the update is skipped. As an optimization, nested_vmcb02_prepare_control() copies it from VMCB01 so when MSR-Bitmap feature for L2 is disabled nothing needs to be done. - 'struct vmcb_ctrl_area_cached' needs to be extended with clean fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to copy it so nested_svm_vmrun_msrpm() can use it later. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:45 -05:00
Vitaly Kuznetsov	9e083ec7bb	KVM: nSVM: Split off common definitions for Hyper-V on KVM and KVM on Hyper-V In preparation to implementing Enlightened MSR-Bitmap feature for Hyper-V on KVM, split off the required definitions into common 'svm/hyperv.h' header. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:45 -05:00
Vitaly Kuznetsov	ce3859172c	KVM: x86: Make kvm_hv_hypercall_enabled() static inline In preparation for using kvm_hv_hypercall_enabled() from SVM code, make it static inline to avoid the need to export it. The function is a simple check with only two call sites currently. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:44 -05:00
Vitaly Kuznetsov	73c25546d4	KVM: nSVM: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt Similar to nVMX commit ed2a4800ae9d ("KVM: nVMX: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt"), introduce a flag to keep track of whether MSR bitmap for L2 needs to be rebuilt due to changes in MSR bitmap for L1 or switching to a different L2. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:44 -05:00
David Matlack	e0b728b1f1	KVM: x86/mmu: Add tracepoint for splitting huge pages Add a tracepoint that records whenever KVM eagerly splits a huge page and the error status of the split to indicate if it succeeded or failed and why. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-18-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:43 -05:00
David Matlack	cb00a70bd4	KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not write-protected when dirty logging is enabled on the memslot. Instead they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for the first time and only for the specific sub-region being cleared. Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to write-protecting to avoid causing write-protection faults on vCPU threads. This also allows userspace to smear the cost of huge page splitting across multiple ioctls, rather than splitting the entire memslot as is the case when initially-all-set is not used. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-17-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:43 -05:00
David Matlack	a3fe5dbda0	KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled When dirty logging is enabled without initially-all-set, try to split all huge pages in the memslot down to 4KB pages so that vCPUs do not have to take expensive write-protection faults to split huge pages. Eager page splitting is best-effort only. This commit only adds the support for the TDP MMU, and even there splitting may fail due to out of memory conditions. Failures to split a huge page is fine from a correctness standpoint because KVM will always follow up splitting by write-protecting any remaining huge pages. Eager page splitting moves the cost of splitting huge pages off of the vCPU threads and onto the thread enabling dirty logging on the memslot. This is useful because: 1. Splitting on the vCPU thread interrupts vCPUs execution and is disruptive to customers whereas splitting on VM ioctl threads can run in parallel with vCPU execution. 2. Splitting all huge pages at once is more efficient because it does not require performing VM-exit handling or walking the page table for every 4KiB page in the memslot, and greatly reduces the amount of contention on the mmu_lock. For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to all of their memory after dirty logging is enabled decreased by 95% from 2.94s to 0.14s. Eager Page Splitting is over 100x more efficient than the current implementation of splitting on fault under the read lock. For example, taking the same workload as above, Eager Page Splitting reduced the CPU required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s) * 96 vCPU threads) to only 1.55 CPU-seconds. Eager page splitting does increase the amount of time it takes to enable dirty logging since it has split all huge pages. For example, the time it took to enable dirty logging in the 96GiB region of the aforementioned test increased from 0.001s to 1.55s. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-16-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:42 -05:00
David Matlack	a82070b6e7	KVM: x86/mmu: Separate TDP MMU shadow page allocation and initialization Separate the allocation of shadow pages from their initialization. This is in preparation for splitting huge pages outside of the vCPU fault context, which requires a different allocation mechanism. No functional changed intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-15-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:41 -05:00
David Matlack	a3aca4de0d	KVM: x86/mmu: Derive page role for TDP MMU shadow pages from parent Derive the page role from the parent shadow page, since the only thing that changes is the level. This is in preparation for splitting huge pages during VM-ioctls which do not have access to the vCPU MMU context. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-14-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:41 -05:00
David Matlack	a81399a573	KVM: x86/mmu: Remove redundant role overrides for TDP MMU shadow pages The vCPU's mmu_role already has the correct values for direct, has_4_byte_gpte, access, and ad_disabled. Remove the code that was redundantly overwriting these fields with the same values. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-13-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:41 -05:00
David Matlack	77aa60753a	KVM: x86/mmu: Refactor TDP MMU iterators to take kvm_mmu_page root Instead of passing a pointer to the root page table and the root level separately, pass in a pointer to the root kvm_mmu_page struct. This reduces the number of arguments by 1, cutting down on line lengths. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-12-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:40 -05:00
David Matlack	315d86da89	KVM: x86/mmu: Move restore_acc_track_spte() to spte.h restore_acc_track_spte() is pure SPTE bit manipulation, making it a good fit for spte.h. And now that the WARN_ON_ONCE() calls have been removed, there isn't any good reason to not inline it. This move also prepares for a follow-up commit that will need to call restore_acc_track_spte() from spte.c No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:40 -05:00
David Matlack	77c23c77f9	KVM: x86/mmu: Drop new_spte local variable from restore_acc_track_spte() The new_spte local variable is unnecessary. Deleting it can save a line of code and simplify the remaining lines a bit. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:39 -05:00
David Matlack	59940e76d1	KVM: x86/mmu: Remove unnecessary warnings from restore_acc_track_spte() The warnings in restore_acc_track_spte() can be removed because the only caller checks is_access_track_spte(), and is_access_track_spte() checks !spte_ad_enabled(). In other words, the warning can never be triggered. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:39 -05:00
David Matlack	7b7e1ab6fd	KVM: x86/mmu: Consolidate logic to atomically install a new TDP MMU page table Consolidate the logic to atomically replace an SPTE with an SPTE that points to a new page table into a single helper function. This will be used in a follow-up commit to split huge pages, which involves replacing each huge page SPTE with an SPTE that points to a page table. Opportunistically drop the call to trace_kvm_mmu_get_page() in kvm_tdp_mmu_map() since it is redundant with the identical tracepoint in tdp_mmu_alloc_sp(). No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:39 -05:00
David Matlack	0f53dfa34e	KVM: x86/mmu: Rename handle_removed_tdp_mmu_page() to handle_removed_pt() First remove tdp_mmu_ from the name since it is redundant given that it is a static function in tdp_mmu.c. There is a pattern of using tdp_mmu_ as a prefix in the names of static TDP MMU functions, but all of the other handle_*() variants do not include such a prefix. So drop it entirely. Then change "page" to "pt" to convey that this is operating on a page table rather than an struct page. Purposely use "pt" instead of "sp" since this function takes the raw RCU-protected page table pointer as an argument rather than a pointer to the struct kvm_mmu_page. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:38 -05:00
David Matlack	c298a30c28	KVM: x86/mmu: Rename TDP MMU functions that handle shadow pages Rename 3 functions in tdp_mmu.c that handle shadow pages: alloc_tdp_mmu_page() -> tdp_mmu_alloc_sp() tdp_mmu_link_page() -> tdp_mmu_link_sp() tdp_mmu_unlink_page() -> tdp_mmu_unlink_sp() These changed make tdp_mmu a consistent prefix before the verb in the function name, and make it more clear that these functions deal with kvm_mmu_page structs rather than struct pages. One could argue that "shadow page" is the wrong term for a page table in the TDP MMU since it never actually shadows a guest page table. However, "shadow page" (or "sp" for short) has evolved to become the standard term in KVM when referring to a kvm_mmu_page struct, and its associated page table and other metadata, regardless of whether the page table shadows a guest page table. So this commit just makes the TDP MMU more consistent with the rest of KVM. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:38 -05:00
David Matlack	3e72c791fd	KVM: x86/mmu: Change tdp_mmu_{set,zap}_spte_atomic() to return 0/-EBUSY tdp_mmu_set_spte_atomic() and tdp_mmu_zap_spte_atomic() return a bool with true indicating the SPTE modification was successful and false indicating failure. Change these functions to return an int instead since that is the common practice. Opportunistically fix up the kernel-doc style for the Return section above tdp_mmu_set_spte_atomic(). No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:37 -05:00
David Matlack	3255530ab1	KVM: x86/mmu: Automatically update iter->old_spte if cmpxchg fails Consolidate a bunch of code that was manually re-reading the spte if the cmpxchg failed. There is no extra cost of doing this because we already have the spte value as a result of the cmpxchg (and in fact this eliminates re-reading the spte), and none of the call sites depend on iter->old_spte retaining the stale spte value. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:37 -05:00
David Matlack	1346bbb6b4	KVM: x86/mmu: Rename __rmap_write_protect() to rmap_write_protect() The function formerly known as rmap_write_protect() has been renamed to kvm_vcpu_write_protect_gfn(), so we can get rid of the double underscores in front of __rmap_write_protect(). No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:37 -05:00
David Matlack	cf48f9e286	KVM: x86/mmu: Rename rmap_write_protect() to kvm_vcpu_write_protect_gfn() rmap_write_protect() is a poor name because it also write-protects SPTEs in the TDP MMU, not just SPTEs in the rmap. It is also confusing that rmap_write_protect() is not a simple wrapper around __rmap_write_protect(), since that is the common pattern for functions with double-underscore names. Rename rmap_write_protect() to kvm_vcpu_write_protect_gfn() to convey that KVM is write-protecting a specific gfn in the context of a vCPU. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:36 -05:00
Sean Christopherson	413af6601f	KVM: x86: Add checks for reserved-to-zero Hyper-V hypercall fields Add checks for the three fields in Hyper-V's hypercall params that must be zero. Per the TLFS, HV_STATUS_INVALID_HYPERCALL_INPUT is returned if "A reserved bit in the specified hypercall input value is non-zero." Note, some versions of the TLFS have an off-by-one bug for the last reserved field, and define it as being bits 64:60. See https://github.com/MicrosoftDocs/Virtualization-Documentation/pull/1682. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:36 -05:00
Sean Christopherson	40421f38f6	KVM: x86: Reject fixeds-size Hyper-V hypercalls with non-zero "var_cnt" Reject Hyper-V hypercalls if the guest specifies a non-zero variable size header (var_cnt in KVM) for a hypercall that has a fixed header size. Per the TLFS: It is illegal to specify a non-zero variable header size for a hypercall that is not explicitly documented as accepting variable sized input headers. In such a case the hypercall will result in a return code of HV_STATUS_INVALID_HYPERCALL_INPUT. Note, at least some of the various DEBUG commands likely aren't allowed to use variable size headers, but the TLFS documentation doesn't clearly state what is/isn't allowed. Omit them for now to avoid unnecessary breakage. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:35 -05:00
Sean Christopherson	9c52f6b3d8	KVM: x86: Shove vp_bitmap handling down into sparse_set_to_vcpu_mask() Move the vp_bitmap "allocation" that's needed to handle mismatched vp_index values down into sparse_set_to_vcpu_mask() and drop __always_inline from said helper. The need for an intermediate vp_bitmap is a detail that's specific to the sparse translation with mismatched VP<=>vCPU indexes and does not need to be exposed to the caller. Regarding the __always_inline, prior to commit f21dd494506a ("KVM: x86: hyperv: optimize sparse VP set processing") the helper, then named hv_vcpu_in_sparse_set(), was a tiny bit of code that effectively boiled down to a handful of bit ops. The __always_inline was understandable, if not justifiable. Since the aforementioned change, sparse_set_to_vcpu_mask() is a chunky 350-450+ bytes of code without KASAN=y, and balloons to 1100+ with KASAN=y. In other words, it has no business being forcefully inlined. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:35 -05:00

1 2 3 4 5 ...

40574 Commits