linux

iv/linux

Author	SHA1	Message	Date
Sean Christopherson	23ef21f58c	KVM: selftests: Fold x86's descriptor tables helpers into vcpu_init_sregs() Now that the per-VM, on-demand allocation logic in kvm_setup_gdt() and vcpu_init_descriptor_tables() is gone, fold them into vcpu_init_sregs(). Note, both kvm_setup_gdt() and vcpu_init_descriptor_tables() configured the GDT, which is why it looks like kvm_setup_gdt() disappears. Opportunistically delete the pointless zeroing of the IDT limit (it was being unconditionally overwritten by vcpu_init_descriptor_tables()). Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:18 -07:00
Sean Christopherson	1051e29cb9	KVM: selftests: Drop superfluous switch() on vm->mode in vcpu_init_sregs() Replace the switch statement on vm->mode in x86's vcpu_init_sregs()'s with a simple assert that the VM has a 48-bit virtual address space. A switch statement is both overkill and misleading, as the existing code incorrectly implies that VMs with LA57 would need different to configuration for the LDT, TSS, and flat segments. In all likelihood, the only difference that would be needed for selftests is CR4.LA57 itself. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:17 -07:00
Sean Christopherson	2a511ca994	KVM: selftests: Allocate x86's GDT during VM creation Allocate the GDT during creation of non-barebones VMs instead of waiting until the first vCPU is created, as the whole point of non-barebones VMs is to be able to run vCPUs, i.e. the GDT is going to get allocated no matter what. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:17 -07:00
Sean Christopherson	44c93b2772	KVM: selftests: Map x86's exception_handlers at VM creation, not vCPU setup Map x86's exception handlers at VM creation, not vCPU setup, as the mapping is per-VM, i.e. doesn't need to be (re)done for every vCPU. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:16 -07:00
Sean Christopherson	c1b9793b45	KVM: selftests: Init IDT and exception handlers for all VMs/vCPUs on x86 Initialize the IDT and exception handlers for all non-barebones VMs and vCPUs on x86. Forcing tests to manually configure the IDT just to save 8KiB of memory is a terrible tradeoff, and also leads to weird tests (multiple tests have deliberately relied on shutdown to indicate success), and hard-to-debug failures, e.g. instead of a precise unexpected exception failure, tests see only shutdown. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:15 -07:00
Sean Christopherson	d8c63805e4	KVM: selftests: Rename x86's vcpu_setup() to vcpu_init_sregs() Rename vcpu_setup() to be more descriptive and precise, there is a whole lot of "setup" that is done for a vCPU that isn't in said helper. No functional change intended. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:14 -07:00
Sean Christopherson	b62c32c532	KVM: selftests: Move x86's descriptor table helpers "up" in processor.c Move x86's various descriptor table helpers in processor.c up above kvm_arch_vm_post_create() and vcpu_setup() so that the helpers can be made static and invoked from the aforementioned functions. No functional change intended. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:13 -07:00
Sean Christopherson	61c3cffd4c	KVM: selftests: Explicitly clobber the IDT in the "delete memslot" testcase Explicitly clobber the guest IDT in the "delete memslot" test, which expects the deleted memslot to result in either a KVM emulation error, or a triple fault shutdown. A future change to the core selftests library will configuring the guest IDT and exception handlers by default, i.e. will install a guest #PF handler and put the guest into an infinite #NPF loop (the guest hits a !PRESENT SPTE when trying to vector a #PF, and KVM reinjects the #PF without fixing the #NPF, because there is no memslot). Note, it's not clear whether or not KVM's behavior is reasonable in this case, e.g. arguably KVM should try (and fail) to emulate in response to the #NPF. But barring a goofy/broken userspace, this scenario will likely never happen in practice. Punt the KVM investigation to the future. Link: https://lore.kernel.org/r/20240314232637.2538648-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:12 -07:00
Sean Christopherson	dec79eab2b	KVM: selftests: Rework platform_info_test to actually verify #GP Rework platform_info_test to actually handle and verify the expected #GP on RDMSR when the associated KVM capability is disabled. Currently, the test _deliberately_ doesn't handle the #GP, and instead lets it escalated to a triple fault shutdown. In addition to verifying that KVM generates the correct fault, handling the #GP will be necessary (without even more shenanigans) when a future change to the core KVM selftests library configures the IDT and exception handlers by default (the test subtly relies on the IDT limit being '0'). Link: https://lore.kernel.org/r/20240314232637.2538648-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:11 -07:00
Sean Christopherson	53635ec253	KVM: selftests: Move platform_info_test's main assert into guest code As a first step toward gracefully handling the expected #GP on RDMSR in platform_info_test, move the test's assert on the non-faulting RDMSR result into the guest itself. This will allow using a unified flow for the host userspace side of things. Link: https://lore.kernel.org/r/20240314232637.2538648-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:11 -07:00
Ackerley Tng	0d95817e07	KVM: selftests: Fix off-by-one initialization of GDT limit Fix an off-by-one bug in the initialization of the GDT limit, which as defined in the SDM is inclusive, not exclusive. Note, vcpu_init_descriptor_tables() gets the limit correct, it's only vcpu_setup() that is broken, i.e. only tests that _don't_ invoke vcpu_init_descriptor_tables() can have problems. And the fact that KVM effectively initializes the GDT twice will be cleaned up in the near future. Signed-off-by: Ackerley Tng <ackerleytng@google.com> [sean: rewrite changelog] Link: https://lore.kernel.org/r/20240314232637.2538648-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:10 -07:00
Sean Christopherson	3a085fbf82	KVM: selftests: Move GDT, IDT, and TSS fields to x86's kvm_vm_arch Now that kvm_vm_arch exists, move the GDT, IDT, and TSS fields to x86's implementation, as the structures are firmly x86-only. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:55:07 -07:00
Sean Christopherson	f54884f938	KVM: sefltests: Add kvm_util_types.h to hold common types, e.g. vm_vaddr_t Move the base types unique to KVM selftests out of kvm_util.h and into a new header, kvm_util_types.h. This will allow kvm_util_arch.h, i.e. core arch headers, to reference common types, e.g. vm_vaddr_t and vm_paddr_t. No functional change intended. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:54:16 -07:00
Sean Christopherson	2b7deea3ec	Revert "kvm: selftests: move base kvm_util.h declarations to kvm_util_base.h" Effectively revert the movement of code from kvm_util.h => kvm_util_base.h, as the TL;DR of the justification for the move was to avoid #idefs and/or circular dependencies between what ended up being ucall_common.h and what was (and now again, is), kvm_util.h. But avoiding #ifdef and circular includes is trivial: don't do that. The cost of removing kvm_util_base.h is a few extra includes of ucall_common.h, but that cost is practically nothing. On the other hand, having a "base" version of a header that is really just the header itself is confusing, and makes it weird/hard to choose names for headers that actually are "base" headers, e.g. to hold core KVM selftests typedefs. For all intents and purposes, this reverts commit 7d9a662ed9f0403e7b94940dceb81552b8edb931. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20240314232637.2538648-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:54:13 -07:00
Sean Christopherson	87aa264cd8	KVM: selftests: Randomly force emulation on x86 writes from guest code Override vcpu_arch_put_guest() to randomly force emulation on supported accesses. Force emulation of LOCK CMPXCHG as well as a regular MOV to stress KVM's emulation of atomic accesses, which has a unique path in KVM's emulator. Arbitrarily give all the decisions 50/50 odds; absent much, much more sophisticated infrastructure for generating random numbers, it's highly unlikely that doing more than a coin flip with affect selftests' ability to find KVM bugs. This is effectively a regression test for commit 910c57dfa4d1 ("KVM: x86: Mark target gfn of emulated atomic instruction as dirty"). Link: https://lore.kernel.org/r/20240314185459.2439072-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:50:43 -07:00
Sean Christopherson	2f2bc6af6a	KVM: selftests: Add vcpu_arch_put_guest() to do writes from guest code Introduce a macro, vcpu_arch_put_guest(), for "putting" values to memory from guest code in "interesting" situations, e.g. when writing memory that is being dirty logged. Structure the macro so that arch code can provide a custom implementation, e.g. x86 will use the macro to force emulation of the access. Use the helper in dirty_log_test, which is of particular interest (see above), and in xen_shinfo_test, which isn't all that interesting, but provides a second usage of the macro with a different size operand (uint8_t versus uint64_t), i.e. to help verify that the macro works for more than just 64-bit values. Use "put" as the verb to align with the kernel's {get,put}_user() terminology. Link: https://lore.kernel.org/r/20240314185459.2439072-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:50:43 -07:00
Sean Christopherson	e1ff11525d	KVM: selftests: Add global snapshot of kvm_is_forced_emulation_enabled() Add a global snapshot of kvm_is_forced_emulation_enabled() and sync it to all VMs by default so that core library code can force emulation, e.g. to allow for easier testing of the intersections between emulation and other features in KVM. Link: https://lore.kernel.org/r/20240314185459.2439072-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:50:43 -07:00
Sean Christopherson	73369acd9f	KVM: selftests: Provide an API for getting a random bool from an RNG Move memstress' random bool logic into common code to avoid reinventing the wheel for basic yes/no decisions. Provide an outer wrapper to handle the basic/common case of just wanting a 50/50 chance of something happening. Link: https://lore.kernel.org/r/20240314185459.2439072-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:50:42 -07:00
Sean Christopherson	cb6c691478	KVM: selftests: Provide a global pseudo-RNG instance for all tests Add a global guest_random_state instance, i.e. a pseudo-RNG, so that an RNG is available for all tests. This will allow randomizing behavior in core library code, e.g. x86 will utilize the pRNG to conditionally force emulation of writes from within common guest code. To allow for deterministic runs, and to be compatible with existing tests, allow tests to override the seed used to initialize the pRNG. Note, the seed must be overwritten before a VM is created in order for the seed to take effect, though it's perfectly fine for a test to initialize multiple VMs with different seeds. And as evidenced by memstress_guest_code(), it's also a-ok to instantiate more RNGs using the global seed (or a modified version of it). The goal of the global RNG is purely to ensure that _a_ source of random numbers is available, it doesn't have to be the _only_ RNG. Link: https://lore.kernel.org/r/20240314185459.2439072-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:50:41 -07:00
Sean Christopherson	730cfa45b5	KVM: selftests: Define _GNU_SOURCE for all selftests code Define _GNU_SOURCE is the base CFLAGS instead of relying on selftests to manually #define _GNU_SOURCE, which is repetitive and error prone. E.g. kselftest_harness.h requires _GNU_SOURCE for asprintf(), but if a selftest includes kvm_test_harness.h after stdio.h, the include guards result in the effective version of stdio.h consumed by kvm_test_harness.h not defining asprintf(): In file included from x86_64/fix_hypercall_test.c:12: In file included from include/kvm_test_harness.h:11: ../kselftest_harness.h:1169:2: error: call to undeclared function 'asprintf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 1169 \| asprintf(&test_name, "%s%s%s.%s", f->name, \| ^ When including the rseq selftest's "library" code, #undef _GNU_SOURCE so that rseq.c controls whether or not it wants to build with _GNU_SOURCE. Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240423190308.2883084-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2024-04-29 12:49:10 -07:00
Paolo Bonzini	a96cb3bf39	Merge x86 bugfixes from Linux 6.9-rc3 Pull fix for SEV-SNP late disable bugs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-19 09:02:22 -04:00
Paolo Bonzini	1ab157ce57	KVM: SEV: use u64_to_user_ptr throughout Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-12 04:42:25 -04:00
Sean Christopherson	2325a21ac1	KVM: VMX: Modify NMI and INTR handlers to take intr_info as function argument TDX uses different ABI to get information about VM exit. Pass intr_info to the NMI and INTR handlers instead of pulling it from vcpu_vmx in preparation for sharing the bulk of the handlers with TDX. When the guest TD exits to VMM, RAX holds status and exit reason, RCX holds exit qualification etc rather than the VMCS fields because VMM doesn't have access to the VMCS. The eventual code will be VMX: - get exit reason, intr_info, exit_qualification, and etc from VMCS - call NMI/INTR handlers (common code) TDX: - get exit reason, intr_info, exit_qualification, and etc from guest registers - call NMI/INTR handlers (common code) Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <0396a9ae70d293c9d0b060349dae385a8a4fbcec.1705965635.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-12 04:42:24 -04:00
Paolo Bonzini	5f18c642ff	KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions to operate on VM. TDX doesn't allow VMM to operate VMCS directly. Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to indirectly operate those data structures. This means we must have a TDX version of kvm_x86_ops. The existing global struct kvm_x86_ops already defines an interface which can be adapted to TDX, but kvm_x86_ops is a system-wide, not per-VM structure. To allow VMX to coexist with TDs, the kvm_x86_ops callbacks will have wrappers "if (tdx) tdx_op() else vmx_op()" to pick VMX or TDX at run time. To split the runtime switch, the VMX implementation, and the TDX implementation, add main.c, and move out the vmx_x86_ops hooks in preparation for adding TDX. Use 'vt' for the naming scheme as a nod to VT-x and as a concatenation of VmxTdx. The eventually converted code will look like this: vmx.c: vmx_op() { ... } VMX initialization tdx.c: tdx_op() { ... } TDX initialization x86_ops.h: vmx_op(); tdx_op(); main.c: static vt_op() { if (tdx) tdx_op() else vmx_op() } static struct kvm_x86_ops vt_x86_ops = { .op = vt_op, initialization functions call both VMX and TDX initialization Opportunistically, fix the name inconsistency from vmx_create_vcpu() and vmx_free_vcpu() to vmx_vcpu_create() and vmx_vcpu_free(). Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> Message-Id: <e603c317587f933a9d1bee8728c84e4935849c16.1705965634.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-12 04:42:24 -04:00
Sean Christopherson	e913ef159f	KVM: x86: Split core of hypercall emulation to helper function By necessity, TDX will use a different register ABI for hypercalls. Break out the core functionality so that it may be reused for TDX. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-Id: <5134caa55ac3dec33fb2addb5545b52b3b52db02.1705965635.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-12 04:42:23 -04:00
Paolo Bonzini	f9cecb3c50	Merge branch 'kvm-sev-init2' into HEAD The idea that no parameter would ever be necessary when enabling SEV or SEV-ES for a VM was decidedly optimistic. The first source of variability that was encountered is the desired set of VMSA features, as that affects the measurement of the VM's initial state and cannot be changed arbitrarily by the hypervisor. This series adds all the APIs that are needed to customize the features, with room for future enhancements: - a new /dev/kvm device attribute to retrieve the set of supported features (right now, only debug swap) - a new sub-operation for KVM_MEM_ENCRYPT_OP that can take a struct, replacing the existing KVM_SEV_INIT and KVM_SEV_ES_INIT It then puts the new op to work by including the VMSA features as a field of the The existing KVM_SEV_INIT and KVM_SEV_ES_INIT use the full set of supported VMSA features for backwards compatibility; but I am considering also making them use zero as the feature mask, and will gladly adjust the patches if so requested. In order to avoid creating two new KVM_MEM_ENCRYPT_OPs, I decided that I could as well make SEV and SEV-ES use VM types. This allows SEV-SNP to reuse the KVM_SEV_INIT2 ioctl. And while at it, KVM_SEV_INIT2 also includes two bugfixes. First of all, SEV-ES VM, when created with the new VM type instead of KVM_SEV_ES_INIT, reject KVM_GET_REGS/KVM_SET_REGS and friends on the vCPU file descriptor once the VMSA has been encrypted... which is how the API should have always behaved. Second, they also synchronize the FPU and AVX state. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-12 04:41:34 -04:00
Paolo Bonzini	531f520024	Merge branch 'mm-delete-change-gpte' into HEAD The .change_pte() MMU notifier callback was intended as an optimization and for this reason it was initially called without a surrounding mmu_notifier_invalidate_range_{start,end}() pair. It was only ever implemented by KVM (which was also the original user of MMU notifiers) and the rules on when to call set_pte_at_notify() rather than set_pte_at() have always been pretty obscure. It may seem a miracle that it has never caused any hard to trigger bugs, but there's a good reason for that: KVM's implementation has been nonfunctional for a good part of its existence. Already in 2012, commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end", 2012-10-09) changed the .change_pte() callback to occur within an invalidate_range_start/end() pair; and because KVM unmaps the sPTEs during .invalidate_range_start(), .change_pte() has no hope of finding a sPTE to change. Therefore, all the code for .change_pte() can be removed from both KVM and mm/, and set_pte_at_notify() can be replaced with just set_pte_at(). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-12 04:41:14 -04:00
Paolo Bonzini	f7842747d1	mm: replace set_pte_at_notify() with just set_pte_at() With the demise of the .change_pte() MMU notifier callback, there is no notification happening in set_pte_at_notify(). It is a synonym of set_pte_at() and can be replaced with it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20240405115815.3226315-5-pbonzini@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-12 04:40:27 -04:00
Paolo Bonzini	997308f9ae	mmu_notifier: remove the .change_pte() callback The scope of set_pte_at_notify() has reduced more and more through the years. Initially, it was meant for when the change to the PTE was not bracketed by mmu_notifier_invalidate_range_{start,end}(). However, that has not been so for over ten years. During all this period the only implementation of .change_pte() was KVM and it had no actual functionality, because it was called after mmu_notifier_invalidate_range_start() zapped the secondary PTE. Now that this (nonfunctional) user of the .change_pte() callback is gone, the whole callback can be removed. For now, leave in place set_pte_at_notify() even though it is just a synonym for set_pte_at(). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-ID: <20240405115815.3226315-4-pbonzini@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:18:36 -04:00
Paolo Bonzini	5257de954c	KVM: remove unused argument of kvm_handle_hva_range() The only user was kvm_mmu_notifier_change_pte(), which is now gone. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20240405115815.3226315-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:18:35 -04:00
Paolo Bonzini	f3b65bbaed	KVM: delete .change_pte MMU notifier callback The .change_pte() MMU notifier callback was intended as an optimization. The original point of it was that KSM could tell KVM to flip its secondary PTE to a new location without having to first zap it. At the time there was also an .invalidate_page() callback; both of them were not bracketed by calls to mmu_notifier_invalidate_range_{start,end}(), and .invalidate_page() also doubled as a fallback implementation of .change_pte(). Later on, however, both callbacks were changed to occur within an invalidate_range_start/end() block. In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end", 2012-10-09) did so to remove the fallback from .invalidate_page() to .change_pte() and allow sleepable .invalidate_page() hooks. This however made KVM's usage of the .change_pte() callback completely moot, because KVM unmaps the sPTEs during .invalidate_range_start() and therefore .change_pte() has no hope of finding a sPTE to change. Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as well as all the architecture specific implementations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20240405115815.3226315-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:18:27 -04:00
Paolo Bonzini	8c53183dba	selftests: kvm: add test for transferring FPU state into VMSA Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-18-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:28 -04:00
Paolo Bonzini	4c180a57b0	selftests: kvm: split "launch" phase of SEV VM creation Allow the caller to set the initial state of the VM. Doing this before sev_vm_launch() matters for SEV-ES, since that is the place where the VMSA is updated and after which the guest state becomes sealed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-17-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:27 -04:00
Paolo Bonzini	d18c864816	selftests: kvm: switch to using KVM_X86_*_VM This removes the concept of "subtypes", instead letting the tests use proper VM types that were recently added. While the sev_init_vm() and sev_es_init_vm() are still able to operate with the legacy KVM_SEV_INIT and KVM_SEV_ES_INIT ioctls, this is limited to VMs that are created manually with vm_create_barebones(). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-16-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:27 -04:00
Paolo Bonzini	dfc083a181	selftests: kvm: add tests for KVM_SEV_INIT2 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-15-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:26 -04:00
Paolo Bonzini	4dd5ecacb9	KVM: SEV: allow SEV-ES DebugSwap again The DebugSwap feature of SEV-ES provides a way for confidential guests to use data breakpoints. Its status is record in VMSA, and therefore attestation signatures depend on whether it is enabled or not. In order to avoid invalidating the signatures depending on the host machine, it was disabled by default (see commit 5abf6dceb066, "SEV: disable SEV-ES DebugSwap by default", 2024-03-09). However, we now have a new API to create SEV VMs that allows enabling DebugSwap based on what the user tells KVM to do, and we also changed the legacy KVM_SEV_ES_INIT API to never enable DebugSwap. It is therefore possible to re-enable the feature without breaking compatibility with kernels that pre-date the introduction of DebugSwap, so go ahead. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-14-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:26 -04:00
Paolo Bonzini	4f5defae70	KVM: SEV: introduce KVM_SEV_INIT2 operation The idea that no parameter would ever be necessary when enabling SEV or SEV-ES for a VM was decidedly optimistic. In fact, in some sense it's already a parameter whether SEV or SEV-ES is desired. Another possible source of variability is the desired set of VMSA features, as that affects the measurement of the VM's initial state and cannot be changed arbitrarily by the hypervisor. Create a new sub-operation for KVM_MEMORY_ENCRYPT_OP that can take a struct, and put the new op to work by including the VMSA features as a field of the struct. The existing KVM_SEV_INIT and KVM_SEV_ES_INIT use the full set of supported VMSA features for backwards compatibility. The struct also includes the usual bells and whistles for future extensibility: a flags field that must be zero for now, and some padding at the end. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-13-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:25 -04:00
Paolo Bonzini	eb4441864e	KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time SEV-ES allows passing custom contents for x87, SSE and AVX state into the VMSA. Allow userspace to do that with the usual KVM_SET_XSAVE API and only mark FPU contents as confidential after it has been copied and encrypted into the VMSA. Since the XSAVE state for AVX is the first, it does not need the compacted-state handling of get_xsave_addr(). However, there are other parts of XSAVE state in the VMSA that currently are not handled, and the validation logic of get_xsave_addr() is pointless to duplicate in KVM, so move get_xsave_addr() to public FPU API; it is really just a facility to operate on XSAVE state and does not expose any internal details of arch/x86/kernel/fpu. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-12-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:25 -04:00
Paolo Bonzini	26c44aa9e0	KVM: SEV: define VM types for SEV and SEV-ES Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-11-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:25 -04:00
Paolo Bonzini	4ebb105e6c	KVM: SEV: introduce to_kvm_sev_info Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-10-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:24 -04:00
Paolo Bonzini	2a955c4db1	KVM: x86: Add supported_vm_types to kvm_caps This simplifies the implementation of KVM_CHECK_EXTENSION(KVM_CAP_VM_TYPES), and also allows the vendor module to specify which VM types are supported. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-9-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:24 -04:00
Paolo Bonzini	517987e3fb	KVM: x86: add fields to struct kvm_arch for CoCo features Some VM types have characteristics in common; in fact, the only use of VM types right now is kvm_arch_has_private_mem and it assumes that _all_ nonzero VM types have private memory. We will soon introduce a VM type for SEV and SEV-ES VMs, and at that point we will have two special characteristics of confidential VMs that depend on the VM type: not just if memory is private, but also whether guest state is protected. For the latter we have kvm->arch.guest_state_protected, which is only set on a fully initialized VM. For VM types with protected guest state, we can actually fix a problem in the SEV-ES implementation, where ioctls to set registers do not cause an error even if the VM has been initialized and the guest state encrypted. Make sure that when using VM types that will become an error. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20240209183743.22030-7-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-ID: <20240404121327.3107131-8-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:23 -04:00
Paolo Bonzini	605bbdc12b	KVM: SEV: store VMSA features in kvm_sev_info Right now, the set of features that are stored in the VMSA upon initialization is fixed and depends on the module parameters for kvm-amd.ko. However, the hypervisor cannot really change it at will because the feature word has to match between the hypervisor and whatever computes a measurement of the VMSA for attestation purposes. Add a field to kvm_sev_info that holds the set of features to be stored in the VMSA; and query it instead of referring to the module parameters. Because KVM_SEV_INIT and KVM_SEV_ES_INIT accept no parameters, this does not yet introduce any functional change, but it paves the way for an API that allows customization of the features per-VM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20240209183743.22030-6-pbonzini@redhat.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-7-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:23 -04:00
Paolo Bonzini	ac5c48027b	KVM: SEV: publish supported VMSA features Compute the set of features to be stored in the VMSA when KVM is initialized; move it from there into kvm_sev_info when SEV is initialized, and then into the initial VMSA. The new variable can then be used to return the set of supported features to userspace, via the KVM_GET_DEVICE_ATTR ioctl. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-ID: <20240404121327.3107131-6-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:22 -04:00
Paolo Bonzini	546d714b08	KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR Allow vendor modules to provide their own attributes on /dev/kvm. To avoid proliferation of vendor ops, implement KVM_HAS_DEVICE_ATTR and KVM_GET_DEVICE_ATTR in terms of the same function. You're not supposed to use KVM_GET_DEVICE_ATTR to do complicated computations, especially on /dev/kvm. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-ID: <20240404121327.3107131-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:22 -04:00
Paolo Bonzini	8d2aec3b2d	KVM: x86: use u64_to_user_ptr() There is no danger to the kernel if 32-bit userspace provides a 64-bit value that has the high bits set, but for whatever reason happens to resolve to an address that has something mapped there. KVM uses the checked version of get_user() and put_user(), so any faults are caught properly. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:22 -04:00
Paolo Bonzini	0d7bf5e5b0	KVM: SVM: Compile sev.c if and only if CONFIG_KVM_AMD_SEV=y Stop compiling sev.c when CONFIG_KVM_AMD_SEV=n, as the number of #ifdefs in sev.c is getting ridiculous, and having #ifdefs inside of SEV helpers is quite confusing. To minimize #ifdefs in code flows, #ifdef away only the kvm_x86_ops hooks and the #VMGEXIT handler. Stubs are also restricted to functions that check sev_enabled and to the destruction functions sev_free_cpu() and sev_vm_destroy(), where the style of their callers is to leave checks to the callers. Most call sites instead rely on dead code elimination to take care of functions that are guarded with sev_guest() or sev_es_guest(). Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:21 -04:00
Sean Christopherson	1ff3c89032	KVM: SVM: Invert handling of SEV and SEV_ES feature flags Leave SEV and SEV_ES '0' in kvm_cpu_caps by default, and instead set them in sev_set_cpu_caps() if SEV and SEV-ES support are fully enabled. Aside from the fact that sev_set_cpu_caps() is wildly misleading when it clears capabilities, this will allow compiling out sev.c without falsely advertising SEV/SEV-ES support in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20240404121327.3107131-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-11 13:08:21 -04:00
Borislav Petkov (AMD)	b377c66ae3	x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk srso_alias_untrain_ret() is special code, even if it is a dummy which is called in the !SRSO case, so annotate it like its real counterpart, to address the following objtool splat: vmlinux.o: warning: objtool: .export_symbol+0x2b290: data relocation to !ENDBR: srso_alias_untrain_ret+0x0 Fixes: 4535e1a4174c ("x86/bugs: Fix the SRSO mitigation on Zen3/4") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240405144637.17908-1-bp@kernel.org	2024-04-06 13:01:50 +02:00
Ingo Molnar	5f2ca44ed2	Merge branch 'linus' into x86/urgent, to pick up dependent commit We want to fix: 0e110732473e ("x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO") So merge in Linus's latest into x86/urgent to have it available. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2024-04-06 13:00:32 +02:00

1 2 3 4 5 ...

1265364 Commits