linux

iv/linux

Author	SHA1	Message	Date
Vitaly Kuznetsov	334c59d58d	x86/kvm: Disable all PV features on crash commit 3d6b84132d2a57b5a74100f6923a8feb679ac2ce upstream. Crash shutdown handler only disables kvmclock and steal time, other PV features remain active so we risk corrupting memory or getting some side-effects in kdump kernel. Move crash handler to kvm.c and unify with CPU offline. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 13:39:29 +02:00
Vitaly Kuznetsov	3b0becf8b1	x86/kvm: Disable kvmclock on all CPUs on shutdown commit c02027b5742b5aa804ef08a4a9db433295533046 upstream. Currenly, we disable kvmclock from machine_shutdown() hook and this only happens for boot CPU. We need to disable it for all CPUs to guard against memory corruption e.g. on restore from hibernate. Note, writing '0' to kvmclock MSR doesn't clear memory location, it just prevents hypervisor from updating the location so for the short while after write and while CPU is still alive, the clock remains usable and correct so we don't need to switch to some other clocksource. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 13:39:29 +02:00
Vitaly Kuznetsov	38b858da1c	x86/kvm: Teardown PV features on boot CPU as well commit 8b79feffeca28c5459458fe78676b081e87c93a4 upstream. Various PV features (Async PF, PV EOI, steal time) work through memory shared with hypervisor and when we restore from hibernation we must properly teardown all these features to make sure hypervisor doesn't write to stale locations after we jump to the previously hibernated kernel (which can try to place anything there). For secondary CPUs the job is already done by kvm_cpu_down_prepare(), register syscore ops to do the same for boot CPU. Krzysztof: This fixes memory corruption visible after second resume from hibernation: BUG: Bad page state in process dbus-daemon pfn:18b01 page:ffffea000062c040 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 compound_mapcount: -30591 flags: 0xfffffc0078141(locked\|error\|workingset\|writeback\|head\|mappedtodisk\|reclaim) raw: 000fffffc0078141 dead0000000002d0 dead000000000100 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set bad because of flags: 0x78141(locked\|error\|workingset\|writeback\|head\|mappedtodisk\|reclaim) Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> [krzysztof: Extend the commit message, adjust for v5.10 context] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 13:39:29 +02:00
Sean Christopherson	b3ee3f50ab	KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode commit 0884335a2e653b8a045083aa1d57ce74269ac81d upstream. Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and CRs: In 64-bit mode, the operand size is fixed at 64 bits without the need for a REX prefix. In non-64-bit mode, the operand size is fixed at 32 bits and the upper 32 bits of the destination are forced to 0. Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler") Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 13:39:28 +02:00
Thomas Gleixner	42f75a4381	x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing commit 7d65f9e80646c595e8c853640a9d0768a33e204c upstream. PIC interrupts do not support affinity setting and they can end up on any online CPU. Therefore, it's required to mark the associated vectors as system-wide reserved. Otherwise, the corresponding irq descriptors are copied to the secondary CPUs but the vectors are not marked as assigned or reserved. This works correctly for the IO/APIC case. When the IO/APIC is disabled via config, kernel command line or lack of enumeration then all legacy interrupts are routed through the PIC, but nothing marks them as system-wide reserved vectors. As a consequence, a subsequent allocation on a secondary CPU can result in allocating one of these vectors, which triggers the BUG() in apic_update_vector() because the interrupt descriptor slot is not empty. Imran tried to work around that by marking those interrupts as allocated when a CPU comes online. But that's wrong in case that the IO/APIC is available and one of the legacy interrupts, e.g. IRQ0, has been switched to PIC mode because then marking them as allocated will fail as they are already marked as system vectors. Stay consistent and update the legacy vectors after attempting IO/APIC initialization and mark them as system vectors in case that no IO/APIC is available. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Imran Khan <imran.f.khan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 13:39:27 +02:00
Pu Wen	445477e927	x86/sev: Check SME/SEV support in CPUID first commit 009767dbf42ac0dbe3cf48c1ee224f6b778aa85a upstream. The first two bits of the CPUID leaf 0x8000001F EAX indicate whether SEV or SME is supported, respectively. It's better to check whether SEV or SME is actually supported before accessing the MSR_AMD64_SEV to check whether SEV or SME is enabled. This is both a bare-metal issue and a guest/VM issue. Since the first generation Hygon Dhyana CPU doesn't support the MSR_AMD64_SEV, reading that MSR results in a #GP - either directly from hardware in the bare-metal case or via the hypervisor (because the RDMSR is actually intercepted) in the guest/VM case, resulting in a failed boot. And since this is very early in the boot phase, rdmsrl_safe()/native_read_msr_safe() can't be used. So check the CPUID bits first, before accessing the MSR. [ tlendacky: Expand and improve commit message. ] [ bp: Massage commit message. ] Fixes: eab696d8e8b9 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests") Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: <stable@vger.kernel.org> # v5.10+ Link: https://lkml.kernel.org/r/20210602070207.2480-1-puwen@hygon.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 13:39:27 +02:00
Thomas Gleixner	942c5864de	x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid() commit 9bfecd05833918526cc7357d55e393393440c5fa upstream. While digesting the XSAVE-related horrors which got introduced with the supervisor/user split, the recent addition of ENQCMD-related functionality got on the radar and turned out to be similarly broken. update_pasid(), which is only required when X86_FEATURE_ENQCMD is available, is invoked from two places: 1) From switch_to() for the incoming task 2) Via a SMP function call from the IOMMU/SMV code #1 is half-ways correct as it hacks around the brokenness of get_xsave_addr() by enforcing the state to be 'present', but all the conditionals in that code are completely pointless for that. Also the invocation is just useless overhead because at that point it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task and all of this can be handled at return to user space. #2 is broken beyond repair. The comment in the code claims that it is safe to invoke this in an IPI, but that's just wishful thinking. FPU state of a running task is protected by fregs_lock() which is nothing else than a local_bh_disable(). As BH-disabled regions run usually with interrupts enabled the IPI can hit a code section which modifies FPU state and there is absolutely no guarantee that any of the assumptions which are made for the IPI case is true. Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is invoked with a NULL pointer argument, so it can hit a completely unrelated task and unconditionally force an update for nothing. Worse, it can hit a kernel thread which operates on a user space address space and set a random PASID for it. The offending commit does not cleanly revert, but it's sufficient to force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid() code to make this dysfunctional all over the place. Anything more complex would require more surgery and none of the related functions outside of the x86 core code are blatantly wrong, so removing those would be overkill. As nothing enables the PASID bit in the IA32_XSS MSR yet, which is required to make this actually work, this cannot result in a regression except for related out of tree train-wrecks, but they are broken already today. Fixes: 20f0afd1fb3d ("x86/mmu: Allocate/free a PASID") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-10 13:39:27 +02:00
Wanpeng Li	bb2e3adf23	KVM: X86: Fix vCPU preempted state from guest's point of view commit 1eff0ada88b48e4ac1e3fe26483b3684fedecd27 upstream. Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID) avoids to access pv tlb shootdown host side logic when this pv feature is not exposed to guest, however, kvm_steal_time.preempted not only leveraged by pv tlb shootdown logic but also mitigate the lock holder preemption issue. From guest's point of view, vCPU is always preempted since we lose the reset of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not exposed. This patch fixes it by clearing kvm_steal_time.preempted before vmentry. Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID) Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1621339235-11131-3-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-03 09:00:32 +02:00
Wanpeng Li	514883ebac	KVM: x86: Defer vtime accounting 'til after IRQ handling commit 160457140187c5fb127b844e5a85f87f00a01b14 upstream. Defer the call to account guest time until after servicing any IRQ(s) that happened in the guest or immediately after VM-Exit. Tick-based accounting of vCPU time relies on PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked throughout the main sequence of vcpu_enter_guest(), including the call into vendor code to actually enter and exit the guest. This fixes a bug where reported guest time remains '0', even when running an infinite loop in the guest: https://bugzilla.kernel.org/show_bug.cgi?id=209831 Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-28 13:17:43 +02:00
Joerg Roedel	d28aa3c157	x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path commit fef81c86262879d4b1176ef51a834c15b805ebb9 upstream. Check whether the hypervisor reported the correct C-bit when running as an SEV guest. Using a wrong C-bit position could be used to leak sensitive data from the guest to the hypervisor. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-8-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:57 +02:00
Jan Beulich	e2c26ddd4e	x86/Xen: swap NX determination and GDT setup on BSP commit ae897fda4f507e4b239f0bdfd578b3688ca96fb4 upstream. xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. [jgross] Note that this is a revert of 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established"), which is possible now that we no longer support running as PV guest in 32-bit mode. Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/12a866b0-9e89-59f7-ebeb-a2a6cec0987a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2021-05-26 12:06:57 +02:00
Joerg Roedel	367c90f2bc	x86/sev-es: Forward page-faults which happen during emulation commit c25bbdb564060adaad5c3a8a10765c13487ba6a3 upstream. When emulating guest instructions for MMIO or IOIO accesses, the #VC handler might get a page-fault and will not be able to complete. Forward the page-fault in this case to the correct handler instead of killing the machine. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-3-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:53 +02:00
Joerg Roedel	5af89eeb74	x86/sev-es: Use __put_user()/__get_user() for data accesses commit 4954f5b8ef0baf70fe978d1a99a5f70e4dd5c877 upstream. The put_user() and get_user() functions do checks on the address which is passed to them. They check whether the address is actually a user-space address and whether its fine to access it. They also call might_fault() to indicate that they could fault and possibly sleep. All of these checks are neither wanted nor needed in the #VC exception handler, which can be invoked from almost any context and also for MMIO instructions from kernel space on kernel memory. All the #VC handler wants to know is whether a fault happened when the access was tried. This is provided by __put_user()/__get_user(), which just do the access no matter what. Also add comments explaining why __get_user() and __put_user() are the best choice here and why it is safe to use them in this context. Also explain why copy_to/from_user can't be used. In addition, also revert commit 7024f60d6552 ("x86/sev-es: Handle string port IO to kernel memory properly") because using __get_user()/__put_user() fixes the same problem while the above commit introduced several problems: 1) It uses access_ok() which is only allowed in task context. 2) It uses memcpy() which has no fault handling at all and is thus unsafe to use here. [ bp: Fix up commit ID of the reverted commit above. ] Fixes: f980f9c31a92 ("x86/sev-es: Compile early handler code into kernel image") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-4-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:53 +02:00
Joerg Roedel	be4cba71b2	x86/sev-es: Don't return NULL from sev_es_get_ghcb() commit b250f2f7792d15bcde98e0456781e2835556d5fa upstream. sev_es_get_ghcb() is called from several places but only one of them checks the return value. The reaction to returning NULL is always the same: calling panic() and kill the machine. Instead of adding checks to all call sites, move the panic() into the function itself so that it will no longer return NULL. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-2-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:53 +02:00
Tom Lendacky	e7174da8c4	x86/sev-es: Invalidate the GHCB after completing VMGEXIT commit a50c5bebc99c525e7fbc059988c6a5ab8680cb76 upstream. Since the VMGEXIT instruction can be issued from userspace, invalidate the GHCB after performing VMGEXIT processing in the kernel. Invalidation is only required after userspace is available, so call vc_ghcb_invalidate() from sev_es_put_ghcb(). Update vc_ghcb_invalidate() to additionally clear the GHCB exit code so that it is always presented as 0 when VMGEXIT has been issued by anything else besides the kernel. Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/5a8130462e4f0057ee1184509cd056eedd78742b.1621273353.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:52 +02:00
Tom Lendacky	193e02196f	x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch commit fea63d54f7a3e74f8ab489a8b82413a29849a594 upstream. Move the location of sev_es_put_ghcb() in preparation for an update to it in a follow-on patch. This will better highlight the changes being made to the function. No functional change. Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/8c07662ec17d3d82e5c53841a1d9e766d3bdbab6.1621273353.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:52 +02:00
Like Xu	075becedce	perf/x86: Avoid touching LBR_TOS MSR for Arch LBR [ Upstream commit 3317c26a4b413b41364f2c4b83c778c6aba1576d ] The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). In a guest that should support Architecture LBR, check_msr() will be a non-related check for the architecture MSR 0x0 (IA32_P5_MC_ADDR) that is also not supported by KVM. The failure will cause x86_pmu.lbr_nr = 0, thereby preventing the initialization of the guest Arch LBR. Fix it by avoiding this extraneous check in intel_pmu_init() for Arch LBR. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Signed-off-by: Like Xu <like.xu@linux.intel.com> [peterz: simpler still] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210430052247.3079672-1-like.xu@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-26 12:06:50 +02:00
Arnd Bergmann	ee387de3ca	x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes commit 396a66aa1172ef2b78c21651f59b40b87b2e5e1e upstream. gcc-11 warns about mismatched prototypes here: arch/x86/lib/msr-smp.c:255:51: error: argument 2 of type ‘u32 ’ {aka ‘unsigned int ’} declared as a pointer [-Werror=array-parameter=] 255 \| int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 *regs) \| ~~~~~^~~~ arch/x86/include/asm/msr.h:347:50: note: previously declared as an array ‘u32[8]’ {aka ‘unsigned int[8]’} GCC is right here - fix up the types. [ mingo: Twiddled the changelog. ] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210322164541.912261-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-22 11:40:51 +02:00
Sean Christopherson	31f29749ee	KVM: VMX: Disable preemption when probing user return MSRs commit 5104d7ffcf24749939bea7fdb5378d186473f890 upstream. Disable preemption when probing a user return MSR via RDSMR/WRMSR. If the MSR holds a different value per logical CPU, the WRMSR could corrupt the host's value if KVM is preempted between the RDMSR and WRMSR, and then rescheduled on a different CPU. Opportunistically land the helper in common x86, SVM will use the helper in a future commit. Fixes: 4be534102624 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation") Cc: stable@vger.kernel.org Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-6-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Sean Christopherson	79abde761e	KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported commit 8aec21c04caa2000f91cf8822ae0811e4b0c3971 upstream. Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is unsupported. Despite being enumerated in a separate CPUID flag, RDPID is bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root if ENABLE_RDTSCP is not enabled. Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-2-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Vitaly Kuznetsov	c8bf64e3fb	KVM: nVMX: Always make an attempt to map eVMCS after migration commit f5c7e8425f18fdb9bdb7d13340651d7876890329 upstream. When enlightened VMCS is in use and nested state is migrated with vmx_get_nested_state()/vmx_set_nested_state() KVM can't map evmcs page right away: evmcs gpa is not 'struct kvm_vmx_nested_state_hdr' and we can't read it from VP assist page because userspace may decide to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring nested state (and QEMU, for example, does exactly that). To make sure eVMCS is mapped /vmx_set_nested_state() raises KVM_REQ_GET_NESTED_STATE_PAGES request. Commit f2c7ef3ba955 ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") added KVM_REQ_GET_NESTED_STATE_PAGES clearing to nested_vmx_vmexit() to make sure MSR permission bitmap is not switched when an immediate exit from L2 to L1 happens right after migration (caused by a pending event, for example). Unfortunately, in the exact same situation we still need to have eVMCS mapped so nested_sync_vmcs12_to_shadow() reflects changes in VMCS12 to eVMCS. As a band-aid, restore nested_get_evmcs_page() when clearing KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). The 'fix' is far from being ideal as we can't easily propagate possible failures and even if we could, this is most likely already too late to do so. The whole 'KVM_REQ_GET_NESTED_STATE_PAGES' idea for mapping eVMCS after migration seems to be fragile as we diverge too much from the 'native' path when vmptr loading happens on vmx_set_nested_state(). Fixes: f2c7ef3ba955 ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210503150854.1144255-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Sean Christopherson	2f86dd3d2b	KVM: x86: Move RDPID emulation intercept to its own enum commit 2183de4161b90bd3851ccd3910c87b2c9adfc6ed upstream. Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP. Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-5-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Sean Christopherson	abbf8c99a9	KVM: x86: Emulate RDPID only if RDTSCP is supported commit 85d0011264da24be08ae907d7f29983a597ca9b1 upstream. Do not advertise emulation support for RDPID if RDTSCP is unsupported. RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as both vmx_get_msr() and svm_get_msr() will return an error if the MSR is unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will inject a #UD. Note, RDPID emulation also relies on RDTSCP being enabled in the guest, but this is a KVM bug and will eventually be fixed. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-3-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Thomas Gleixner	b9c663dc9a	KVM: x86: Prevent deadlock against tk_core.seq [ Upstream commit 3f804f6d201ca93adf4c3df04d1bfd152c1129d6 ] syzbot reported a possible deadlock in pvclock_gtod_notify(): CPU 0 CPU 1 write_seqcount_begin(&tk_core.seq); pvclock_gtod_notify() spin_lock(&pool->lock); queue_work(..., &pvclock_gtod_work) ktime_get() spin_lock(&pool->lock); do { seq = read_seqcount_begin(tk_core.seq) ... } while (read_seqcount_retry(&tk_core.seq, seq); While this is unlikely to happen, it's possible. Delegate queue_work() to irq_work() which postpones it until the tk_core.seq write held region is left and interrupts are reenabled. Fixes: 16e8d74d2da9 ("KVM: x86: notifier for clocksource changes") Reported-by: syzbot+6beae4000559d41d80f8@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Message-Id: <87h7jgm1zy.ffs@nanos.tec.linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:13:12 +02:00
Thomas Gleixner	8aa7227a5d	KVM: x86: Cancel pvclock_gtod_work on module removal [ Upstream commit 594b27e677b35f9734b1969d175ebc6146741109 ] Nothing prevents the following: pvclock_gtod_notify() queue_work(system_long_wq, &pvclock_gtod_work); ... remove_module(kvm); ... work_queue_run() pvclock_gtod_work() <- UAF Ditto for any other operation on that workqueue list head which touches pvclock_gtod_work after module removal. Cancel the work in kvm_arch_exit() to prevent that. Fixes: 16e8d74d2da9 ("KVM: x86: notifier for clocksource changes") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Message-Id: <87czu4onry.ffs@nanos.tec.linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:13:12 +02:00
Wanpeng Li	2e0ce36d0b	KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer [ Upstream commit d981dd15498b188636ec5a7d8ad485e650f63d8d ] Commit ee66e453db13d (KVM: lapic: Busy wait for timer to expire when using hv_timer) tries to set ktime->expired_tscdeadline by checking ktime->hv_timer_in_use since lapic timer oneshot/periodic modes which are emulated by vmx preemption timer also get advanced, they leverage the same vmx preemption timer logic with tsc-deadline mode. However, ktime->hv_timer_in_use is cleared before apic_timer_expired() handling, let's delay this clearing in preemption-disabled region. Fixes: ee66e453db13d ("KVM: lapic: Busy wait for timer to expire when using hv_timer") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1619608082-4187-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:13:12 +02:00
Lai Jiangshan	bfccc4eade	KVM/VMX: Invoke NMI non-IST entry instead of IST entry commit a217a6593cec8b315d4c2f344bae33660b39b703 upstream. In VMX, the host NMI handler needs to be invoked after NMI VM-Exit. Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn"), this was done by INTn ("int $2"). But INTn microcode is relatively expensive, so the commit reworked NMI VM-Exit handling to invoke the kernel handler by function call. But this missed a detail. The NMI entry point for direct invocation is fetched from the IDT table and called on the kernel stack. But on 64-bit the NMI entry installed in the IDT expects to be invoked on the IST stack. It relies on the "NMI executing" variable on the IST stack to work correctly, which is at a fixed position in the IST stack. When the entry point is unexpectedly called on the kernel stack, the RSP-addressed "NMI executing" variable is obviously also on the kernel stack and is "uninitialized" and can cause the NMI entry code to run in the wrong way. Provide a non-ist entry point for VMX which shares the C-function with the regular NMI entry and invoke the new asm entry point instead. On 32-bit this just maps to the regular NMI entry point as 32-bit has no ISTs and is not affected. [ tglx: Made it independent for backporting, massaged changelog ] Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Lai Jiangshan <laijs@linux.alibaba.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:12:51 +02:00
Sean Christopherson	21f317826e	KVM: x86/mmu: Remove the defunct update_pte() paging hook commit c5e2184d1544f9e56140791eff1a351bea2e63b9 upstream. Remove the update_pte() shadow paging logic, which was obsoleted by commit 4731d4c7a077 ("KVM: MMU: out of sync shadow core"), but never removed. As pointed out by Yu, KVM never write protects leaf page tables for the purposes of shadow paging, and instead marks their associated shadow page as unsync so that the guest can write PTEs at will. The update_pte() path, which predates the unsync logic, optimizes COW scenarios by refreshing leaf SPTEs when they are written, as opposed to zapping the SPTE, restarting the guest, and installing the new SPTE on the subsequent fault. Since KVM no longer write-protects leaf page tables, update_pte() is unreachable and can be dropped. Reported-by: Yu Zhang <yu.c.zhang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210115004051.4099250-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:12:51 +02:00
Sean Christopherson	8fcdfa71ba	KVM: VMX: Intercept FS/GS_BASE MSR accesses for 32-bit KVM [ Upstream commit dbdd096a5a74b94f6b786a47baef2085859b0dce ] Disable pass-through of the FS and GS base MSRs for 32-bit KVM. Intel's SDM unequivocally states that the MSRs exist if and only if the CPU supports x86-64. FS_BASE and GS_BASE are mostly a non-issue; a clever guest could opportunistically use the MSRs without issue. KERNEL_GS_BASE is a bigger problem, as a clever guest would subtly be broken if it were migrated, as KVM disallows software access to the MSRs, and unlike the direct variants, KERNEL_GS_BASE needs to be explicitly migrated as it's not captured in the VMCS. Fixes: 25c5f225beda ("KVM: VMX: Enable MSR Bitmap feature") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422023831.3473491-1-seanjc@google.com> [NOT for stable kernels. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:43 +02:00
David Edmondson	e9bd1af4c0	KVM: x86: dump_vmcs should not assume GUEST_IA32_EFER is valid [ Upstream commit d9e46d344e62a0d56fd86a8289db5bed8a57c92e ] If the VM entry/exit controls for loading/saving MSR_EFER are either not available (an older processor or explicitly disabled) or not used (host and guest values are the same), reading GUEST_IA32_EFER from the VMCS returns an inaccurate value. Because of this, in dump_vmcs() don't use GUEST_IA32_EFER to decide whether to print the PDPTRs - always do so if the fields exist. Fixes: 4eb64dce8d0a ("KVM: x86: dump VMCS on invalid entry") Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-2-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:39 +02:00
Sean Christopherson	5cce890e5d	KVM: x86/mmu: Retry page faults that hit an invalid memslot [ Upstream commit e0c378684b6545ad2d4403bb701d0ac4932b4e95 ] Retry page faults (re-enter the guest) that hit an invalid memslot instead of treating the memslot as not existing, i.e. handling the page fault as an MMIO access. When deleting a memslot, SPTEs aren't zapped and the TLBs aren't flushed until after the memslot has been marked invalid. Handling the invalid slot as MMIO means there's a small window where a page fault could replace a valid SPTE with an MMIO SPTE. The legacy MMU handles such a scenario cleanly, but the TDP MMU assumes such behavior is impossible (see the BUG() in __handle_changed_spte()). There's really no good reason why the legacy MMU should allow such a scenario, and closing this hole allows for additional cleanups. Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs") Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:29 +02:00
Nathan Chancellor	db4645fbae	perf/amd/uncore: Fix sysfs type mismatch [ Upstream commit 5deac80d4571dffb51f452f0027979d72259a1b9 ] dev_attr_show() calls the __uncore_*_show() functions via an indirect call but their type does not currently match the type of the show() member in 'struct device_attribute', resulting in a Control Flow Integrity violation. $ cat /sys/devices/amd_l3/format/umask config:8-15 $ dmesg \| grep "CFI failure" [ 1258.174653] CFI failure (target: __uncore_umask_show...): Update the type in the DEFINE_UNCORE_FORMAT_ATTR macro to match 'struct device_attribute' so that there is no more CFI violation. Fixes: 06f2c24584f3 ("perf/amd/uncore: Prepare to scale for more attributes that vary per family") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210415001112.3024673-2-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:28 +02:00
Nathan Chancellor	c8a54b4d66	x86/events/amd/iommu: Fix sysfs type mismatch [ Upstream commit de5bc7b425d4c27ae5faa00ea7eb6b9780b9a355 ] dev_attr_show() calls _iommu_event_show() via an indirect call but _iommu_event_show()'s type does not currently match the type of the show() member in 'struct device_attribute', resulting in a Control Flow Integrity violation. $ cat /sys/devices/amd_iommu_1/events/mem_dte_hit csource=0x0a $ dmesg \| grep "CFI failure" [ 3526.735140] CFI failure (target: _iommu_event_show...): Change _iommu_event_show() and 'struct amd_iommu_event_desc' to 'struct device_attribute' so that there is no more CFI violation. Fixes: 7be6296fdd75 ("perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210415001112.3024673-1-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:28 +02:00
Masami Hiramatsu	296da2049f	x86/kprobes: Fix to check non boostable prefixes correctly [ Upstream commit 6dd3b8c9f58816a1354be39559f630cd1bd12159 ] There are 2 bugs in the can_boost() function because of using x86 insn decoder. Since the insn->opcode never has a prefix byte, it can not find CS override prefix in it. And the insn->attr is the attribute of the opcode, thus inat_is_address_size_prefix( insn->attr) always returns false. Fix those by checking each prefix bytes with for_each_insn_prefix loop and getting the correct attribute for each prefix byte. Also, this removes unlikely, because this is a slow path. Fixes: a8d11cd0714f ("kprobes/x86: Consolidate insn decoder users for copying code") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/161666691162.1120877.2808435205294352583.stgit@devnote2 Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:24 +02:00
Chris von Recklinghausen	1789737ca9	PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check [ Upstream commit f5d1499ae2096d7ea301023c4cc54e427300eb0a ] Hibernation fails on a system in fips mode because md5 is used for the e820 integrity check and is not available. Use crc32 instead. The check is intended to detect whether the E820 memory map provided by the firmware after cold boot unexpectedly differs from the one that was in use when the hibernation image was created. In this case, the hibernation image cannot be restored, as it may cover memory regions that are no longer available to the OS. A non-cryptographic checksum such as CRC-32 is sufficient to detect such inadvertent deviations. Fixes: 62a03defeabd ("PM / hibernate: Verify the consistent of e820 memory map by md5 digest") Reviewed-by: Eric Biggers <ebiggers@google.com> Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:21 +02:00
Ingo Molnar	ee9bc379e4	x86/platform/uv: Fix !KEXEC build failure [ Upstream commit c2209ea55612efac75de0a58ef5f7394fae7fa0f ] When KEXEC is disabled, the UV build fails: arch/x86/platform/uv/uv_nmi.c:875:14: error: ‘uv_nmi_kexec_failed’ undeclared (first use in this function) Since uv_nmi_kexec_failed is only defined in the KEXEC_CORE #ifdef branch, this code cannot ever have been build tested: if (main) pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n"); atomic_set(&uv_nmi_kexec_failed, 1); Nor is this use possible in uv_handle_nmi(): atomic_set(&uv_nmi_kexec_failed, 0); These bugs were introduced in this commit: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails") Which added the uv_nmi_kexec_failed assignments to !KEXEC code, while making the definition KEXEC-only - apparently without testing the !KEXEC case. Instead of complicating the #ifdef maze, simplify the code by requiring X86_UV to depend on KEXEC_CORE. This pattern is present in other architectures as well. ( We'll remove the untested, 7 years old !KEXEC complications from the file in a separate commit. ) Fixes: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails") Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Mike Travis <travis@sgi.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:20 +02:00
Arnd Bergmann	bbd61fa05c	crypto: poly1305 - fix poly1305_core_setkey() declaration [ Upstream commit 8d195e7a8ada68928f2aedb2c18302a4518fe68e ] gcc-11 points out a mismatch between the declaration and the definition of poly1305_core_setkey(): lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=] 13 \| void poly1305_core_setkey(struct poly1305_core_key key, const u8 raw_key[16]) \| ~~~~~~~~~^~~~~~~~~~~ In file included from lib/crypto/poly1305-donna32.c:11: include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 ’ {aka ‘const unsigned char ’} 21 \| void poly1305_core_setkey(struct poly1305_core_key key, const u8 *raw_key); This is harmless in principle, as the calling conventions are the same, but the more specific prototype allows better type checking in the caller. Change the declaration to match the actual function definition. The poly1305_simd_init() is a bit suspicious here, as it previously had a 32-byte argument type, but looks like it needs to take the 16-byte POLY1305_BLOCK_SIZE array instead. Fixes: 1c08a104360f ("crypto: poly1305 - add new 32 and 64-bit generic versions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:13 +02:00
Otavio Pontes	bac2031321	x86/microcode: Check for offline CPUs before requesting new microcode [ Upstream commit 7189b3c11903667808029ec9766a6e96de5012a5 ] Currently, the late microcode loading mechanism checks whether any CPUs are offlined, and, in such a case, aborts the load attempt. However, this must be done before the kernel caches new microcode from the filesystem. Otherwise, when offlined CPUs are onlined later, those cores are going to be updated through the CPU hotplug notifier callback with the new microcode, while CPUs previously onine will continue to run with the older microcode. For example: Turn off one core (2 threads): echo 0 > /sys/devices/system/cpu/cpu3/online echo 0 > /sys/devices/system/cpu/cpu1/online Install the ucode fails because a primary SMT thread is offline: cp intel-ucode/06-8e-09 /lib/firmware/intel-ucode/ echo 1 > /sys/devices/system/cpu/microcode/reload bash: echo: write error: Invalid argument Turn the core back on echo 1 > /sys/devices/system/cpu/cpu3/online echo 1 > /sys/devices/system/cpu/cpu1/online cat /proc/cpuinfo \|grep microcode microcode : 0x30 microcode : 0xde microcode : 0x30 microcode : 0xde The rationale for why the update is aborted when at least one primary thread is offline is because even if that thread is soft-offlined and idle, it will still have to participate in broadcasted MCE's synchronization dance or enter SMM, and in both examples it will execute instructions so it better have the same microcode revision as the other cores. [ bp: Heavily edit and extend commit message with the reasoning behind all this. ] Fixes: 30ec26da9967 ("x86/microcode: Do not upload microcode if CPUs are offline") Signed-off-by: Otavio Pontes <otavio.pontes@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/20210319165515.9240-2-otavio.pontes@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:11 +02:00
Mike Travis	7c5e96e89c	x86/platform/uv: Set section block size for hubless architectures [ Upstream commit 6840a150b9daf35e4d21ab9780d0a03b4ed74a5b ] Commit bbbd2b51a2aa ("x86/platform/UV: Use new set memory block size function") added a call to set the block size value that is needed by the kernel to set the boundaries in the section list. This was done for UV Hubbed systems but missed in the UV Hubless setup. Fix that mistake by adding that same set call for hubless systems, which support the same NVRAMs and Intel BIOS, thus the same problem occurs. [ bp: Massage commit message. ] Fixes: bbbd2b51a2aa ("x86/platform/UV: Use new set memory block size function") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20210305162853.299892-1-mike.travis@hpe.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:07 +02:00
Sean Christopherson	a947f95b6b	KVM: nVMX: Truncate base/index GPR value on address calc in !64-bit commit 82277eeed65eed6c6ee5b8f97bd978763eab148f upstream. Drop bits 63:32 of the base and/or index GPRs when calculating the effective address of a VMX instruction memory operand. Outside of 64-bit mode, memory encodings are strictly limited to E*X and below. Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	6b7028de66	KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bit commit ee050a577523dfd5fac95e6cc182ebe0293ead59 upstream. Drop bits 63:32 of the VMCS field encoding when checking for a nested VM-Exit on VMREAD/VMWRITE in !64-bit mode. VMREAD and VMWRITE always use 32-bit operands outside of 64-bit mode. The actual emulation of VMREAD/VMWRITE does the right thing, this bug is purely limited to incorrectly causing a nested VM-Exit if a GPR happens to have bits 63:32 set outside of 64-bit mode. Fixes: a7cde481b6e8 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	fa9b4ee318	KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch commit c805f5d5585ab5e0cdac6b1ccf7086eb120fb7db upstream. Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC instruction itself is executed in the previous EPTP context, any side effects, e.g. updating RIP, should occur in the old context. Practically speaking, this bug is benign as VMX doesn't touch the MMU when skipping an emulated instruction, nor does queuing a single-step #DB. No other post-switch side effects exist. Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	6748f80aea	KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported commit 6f2b296aa6432d8274e258cc3220047ca04f5de0 upstream. Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in the guest's CPUID model. Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210423223404.3860547-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	6ccdbedd16	KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are created commit 8727906fde6ea665b52e68ddc58833772537f40a upstream. Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV enabled, and svm_create_vcpu() needs to allocate the VMSA. At best, creating vCPUs before SEV/SEV-ES init will lead to unexpected errors and/or behavior, and at worst it will crash the host, e.g. sev_launch_update_vmsa() will dereference a null svm->vmsa pointer. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	ead4fb53fd	KVM: SVM: Don't strip the C-bit from CR2 on #PF interception commit 6d1b867d045699d6ce0dfa0ef35d1b87dd36db56 upstream. Don't strip the C-bit from the faulting address on an intercepted #PF, the address is a virtual address, not a physical address. Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	12d6843025	KVM: nSVM: Set the shadow root level to the TDP level for nested NPT commit a3322d5cd87fef5ec0037fd1b14068a533f9a60f upstream. Override the shadow root level in the MMU context when configuring NPT for shadowing nested NPT. The level is always tied to the TDP level of the host, not whatever level the guest happens to be using. Fixes: 096586fda522 ("KVM: nSVM: Correctly set the shadow NPT root level in its MMU role") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	f59c2220f6	KVM: x86: Remove emulator's broken checks on CR0/CR3/CR4 loads commit d0fe7b6404408835ed60232cb3bf28324b2f95db upstream. Remove the emulator's checks for illegal CR0, CR3, and CR4 values, as the checks are redundant, outdated, and in the case of SEV's C-bit, broken. The emulator manually calculates MAXPHYADDR from CPUID and neglects to mask off the C-bit. For all other checks, kvm_set_cr*() are a superset of the emulator checks, e.g. see CR4.LA57. Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3") Cc: Babu Moger <babu.moger@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-2-seanjc@google.com> Cc: stable@vger.kernel.org [Unify check_cr_read and check_cr_write. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:03 +02:00
Sean Christopherson	c8b49e01a2	KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit commit 04d45551a1eefbea42655da52f56e846c0af721a upstream. Allocate the so called pae_root page on-demand, along with the lm_root page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a 32-bit L1. KVM currently only allocates the page when NPT is disabled, or when L0 is 32-bit (using PAE paging). Note, there is an existing memory leak involving the MMU roots, as KVM fails to free the PAE roots on failure. This will be addressed in a future commit. Fixes: ee6268ba3a68 ("KVM: x86: Skip pae_root shadow allocation if tdp enabled") Fixes: b6b80c78af83 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT") Cc: stable@vger.kernel.org Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:03 +02:00
Alison Schofield	a4c421b12c	x86, sched: Treat Intel SNC topology as default, COD as exception commit 2c88d45edbb89029c1190bb3b136d2602f057c98 upstream. Commit 1340ccfa9a9a ("x86,sched: Allow topologies where NUMA nodes share an LLC") added a vendor and model specific check to never call topology_sane() for Intel Skylake Server systems where NUMA nodes share an LLC. Intel Ice Lake and Sapphire Rapids CPUs also enumerate an LLC that is shared by multiple NUMA nodes. The LLC on these CPUs is shared for off-package data access but private to the NUMA node for on-package access. Rather than managing a list of allowable SNC topologies, make this SNC topology the default, and treat Intel's Cluster-On-Die (COD) topology as the exception. In SNC mode, Sky Lake, Ice Lake, and Sapphire Rapids servers do not emit this warning: sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210310190233.31752-1-alison.schofield@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:49:59 +02:00
Sean Christopherson	451a3e7570	KVM: x86: Defer the MMU unload to the normal path on an global INVPCID commit f66c53b3b94f658590e1012bf6d922f8b7e01bda upstream. Defer unloading the MMU after a INVPCID until the instruction emulation has completed, i.e. until after RIP has been updated. On VMX, this is a benign bug as VMX doesn't touch the MMU when skipping an emulated instruction. However, on SVM, if nrip is disabled, the emulator is used to skip an instruction, which would lead to fireworks if the emulator were invoked without a valid MMU. Fixes: eb4b248e152d ("kvm: vmx: Support INVPCID in shadow paging mode") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:49:57 +02:00

1 2 3 4 5 ...

37149 Commits