linux

iv/linux

Author	SHA1	Message	Date
Wanpeng Li	bb2e3adf23	KVM: X86: Fix vCPU preempted state from guest's point of view commit 1eff0ada88b48e4ac1e3fe26483b3684fedecd27 upstream. Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID) avoids to access pv tlb shootdown host side logic when this pv feature is not exposed to guest, however, kvm_steal_time.preempted not only leveraged by pv tlb shootdown logic but also mitigate the lock holder preemption issue. From guest's point of view, vCPU is always preempted since we lose the reset of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not exposed. This patch fixes it by clearing kvm_steal_time.preempted before vmentry. Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID) Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1621339235-11131-3-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-03 09:00:32 +02:00
Wanpeng Li	514883ebac	KVM: x86: Defer vtime accounting 'til after IRQ handling commit 160457140187c5fb127b844e5a85f87f00a01b14 upstream. Defer the call to account guest time until after servicing any IRQ(s) that happened in the guest or immediately after VM-Exit. Tick-based accounting of vCPU time relies on PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked throughout the main sequence of vcpu_enter_guest(), including the call into vendor code to actually enter and exit the guest. This fixes a bug where reported guest time remains '0', even when running an infinite loop in the guest: https://bugzilla.kernel.org/show_bug.cgi?id=209831 Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-28 13:17:43 +02:00
Joerg Roedel	d28aa3c157	x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path commit fef81c86262879d4b1176ef51a834c15b805ebb9 upstream. Check whether the hypervisor reported the correct C-bit when running as an SEV guest. Using a wrong C-bit position could be used to leak sensitive data from the guest to the hypervisor. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210312123824.306-8-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:57 +02:00
Jan Beulich	e2c26ddd4e	x86/Xen: swap NX determination and GDT setup on BSP commit ae897fda4f507e4b239f0bdfd578b3688ca96fb4 upstream. xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. [jgross] Note that this is a revert of 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established"), which is possible now that we no longer support running as PV guest in 32-bit mode. Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/12a866b0-9e89-59f7-ebeb-a2a6cec0987a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2021-05-26 12:06:57 +02:00
Joerg Roedel	367c90f2bc	x86/sev-es: Forward page-faults which happen during emulation commit c25bbdb564060adaad5c3a8a10765c13487ba6a3 upstream. When emulating guest instructions for MMIO or IOIO accesses, the #VC handler might get a page-fault and will not be able to complete. Forward the page-fault in this case to the correct handler instead of killing the machine. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-3-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:53 +02:00
Joerg Roedel	5af89eeb74	x86/sev-es: Use __put_user()/__get_user() for data accesses commit 4954f5b8ef0baf70fe978d1a99a5f70e4dd5c877 upstream. The put_user() and get_user() functions do checks on the address which is passed to them. They check whether the address is actually a user-space address and whether its fine to access it. They also call might_fault() to indicate that they could fault and possibly sleep. All of these checks are neither wanted nor needed in the #VC exception handler, which can be invoked from almost any context and also for MMIO instructions from kernel space on kernel memory. All the #VC handler wants to know is whether a fault happened when the access was tried. This is provided by __put_user()/__get_user(), which just do the access no matter what. Also add comments explaining why __get_user() and __put_user() are the best choice here and why it is safe to use them in this context. Also explain why copy_to/from_user can't be used. In addition, also revert commit 7024f60d6552 ("x86/sev-es: Handle string port IO to kernel memory properly") because using __get_user()/__put_user() fixes the same problem while the above commit introduced several problems: 1) It uses access_ok() which is only allowed in task context. 2) It uses memcpy() which has no fault handling at all and is thus unsafe to use here. [ bp: Fix up commit ID of the reverted commit above. ] Fixes: f980f9c31a92 ("x86/sev-es: Compile early handler code into kernel image") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-4-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:53 +02:00
Joerg Roedel	be4cba71b2	x86/sev-es: Don't return NULL from sev_es_get_ghcb() commit b250f2f7792d15bcde98e0456781e2835556d5fa upstream. sev_es_get_ghcb() is called from several places but only one of them checks the return value. The reaction to returning NULL is always the same: calling panic() and kill the machine. Instead of adding checks to all call sites, move the panic() into the function itself so that it will no longer return NULL. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-2-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:53 +02:00
Tom Lendacky	e7174da8c4	x86/sev-es: Invalidate the GHCB after completing VMGEXIT commit a50c5bebc99c525e7fbc059988c6a5ab8680cb76 upstream. Since the VMGEXIT instruction can be issued from userspace, invalidate the GHCB after performing VMGEXIT processing in the kernel. Invalidation is only required after userspace is available, so call vc_ghcb_invalidate() from sev_es_put_ghcb(). Update vc_ghcb_invalidate() to additionally clear the GHCB exit code so that it is always presented as 0 when VMGEXIT has been issued by anything else besides the kernel. Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/5a8130462e4f0057ee1184509cd056eedd78742b.1621273353.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:52 +02:00
Tom Lendacky	193e02196f	x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch commit fea63d54f7a3e74f8ab489a8b82413a29849a594 upstream. Move the location of sev_es_put_ghcb() in preparation for an update to it in a follow-on patch. This will better highlight the changes being made to the function. No functional change. Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/8c07662ec17d3d82e5c53841a1d9e766d3bdbab6.1621273353.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-26 12:06:52 +02:00
Like Xu	075becedce	perf/x86: Avoid touching LBR_TOS MSR for Arch LBR [ Upstream commit 3317c26a4b413b41364f2c4b83c778c6aba1576d ] The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). In a guest that should support Architecture LBR, check_msr() will be a non-related check for the architecture MSR 0x0 (IA32_P5_MC_ADDR) that is also not supported by KVM. The failure will cause x86_pmu.lbr_nr = 0, thereby preventing the initialization of the guest Arch LBR. Fix it by avoiding this extraneous check in intel_pmu_init() for Arch LBR. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Signed-off-by: Like Xu <like.xu@linux.intel.com> [peterz: simpler still] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210430052247.3079672-1-like.xu@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-26 12:06:50 +02:00
Arnd Bergmann	ee387de3ca	x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes commit 396a66aa1172ef2b78c21651f59b40b87b2e5e1e upstream. gcc-11 warns about mismatched prototypes here: arch/x86/lib/msr-smp.c:255:51: error: argument 2 of type ‘u32 ’ {aka ‘unsigned int ’} declared as a pointer [-Werror=array-parameter=] 255 \| int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 *regs) \| ~~~~~^~~~ arch/x86/include/asm/msr.h:347:50: note: previously declared as an array ‘u32[8]’ {aka ‘unsigned int[8]’} GCC is right here - fix up the types. [ mingo: Twiddled the changelog. ] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210322164541.912261-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-22 11:40:51 +02:00
Sean Christopherson	31f29749ee	KVM: VMX: Disable preemption when probing user return MSRs commit 5104d7ffcf24749939bea7fdb5378d186473f890 upstream. Disable preemption when probing a user return MSR via RDSMR/WRMSR. If the MSR holds a different value per logical CPU, the WRMSR could corrupt the host's value if KVM is preempted between the RDMSR and WRMSR, and then rescheduled on a different CPU. Opportunistically land the helper in common x86, SVM will use the helper in a future commit. Fixes: 4be534102624 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation") Cc: stable@vger.kernel.org Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-6-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Sean Christopherson	79abde761e	KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported commit 8aec21c04caa2000f91cf8822ae0811e4b0c3971 upstream. Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is unsupported. Despite being enumerated in a separate CPUID flag, RDPID is bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root if ENABLE_RDTSCP is not enabled. Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-2-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Vitaly Kuznetsov	c8bf64e3fb	KVM: nVMX: Always make an attempt to map eVMCS after migration commit f5c7e8425f18fdb9bdb7d13340651d7876890329 upstream. When enlightened VMCS is in use and nested state is migrated with vmx_get_nested_state()/vmx_set_nested_state() KVM can't map evmcs page right away: evmcs gpa is not 'struct kvm_vmx_nested_state_hdr' and we can't read it from VP assist page because userspace may decide to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring nested state (and QEMU, for example, does exactly that). To make sure eVMCS is mapped /vmx_set_nested_state() raises KVM_REQ_GET_NESTED_STATE_PAGES request. Commit f2c7ef3ba955 ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") added KVM_REQ_GET_NESTED_STATE_PAGES clearing to nested_vmx_vmexit() to make sure MSR permission bitmap is not switched when an immediate exit from L2 to L1 happens right after migration (caused by a pending event, for example). Unfortunately, in the exact same situation we still need to have eVMCS mapped so nested_sync_vmcs12_to_shadow() reflects changes in VMCS12 to eVMCS. As a band-aid, restore nested_get_evmcs_page() when clearing KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). The 'fix' is far from being ideal as we can't easily propagate possible failures and even if we could, this is most likely already too late to do so. The whole 'KVM_REQ_GET_NESTED_STATE_PAGES' idea for mapping eVMCS after migration seems to be fragile as we diverge too much from the 'native' path when vmptr loading happens on vmx_set_nested_state(). Fixes: f2c7ef3ba955 ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210503150854.1144255-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Sean Christopherson	2f86dd3d2b	KVM: x86: Move RDPID emulation intercept to its own enum commit 2183de4161b90bd3851ccd3910c87b2c9adfc6ed upstream. Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP. Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-5-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Sean Christopherson	abbf8c99a9	KVM: x86: Emulate RDPID only if RDTSCP is supported commit 85d0011264da24be08ae907d7f29983a597ca9b1 upstream. Do not advertise emulation support for RDPID if RDTSCP is unsupported. RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as both vmx_get_msr() and svm_get_msr() will return an error if the MSR is unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will inject a #UD. Note, RDPID emulation also relies on RDTSCP being enabled in the guest, but this is a KVM bug and will eventually be fixed. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-3-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:13:16 +02:00
Thomas Gleixner	b9c663dc9a	KVM: x86: Prevent deadlock against tk_core.seq [ Upstream commit 3f804f6d201ca93adf4c3df04d1bfd152c1129d6 ] syzbot reported a possible deadlock in pvclock_gtod_notify(): CPU 0 CPU 1 write_seqcount_begin(&tk_core.seq); pvclock_gtod_notify() spin_lock(&pool->lock); queue_work(..., &pvclock_gtod_work) ktime_get() spin_lock(&pool->lock); do { seq = read_seqcount_begin(tk_core.seq) ... } while (read_seqcount_retry(&tk_core.seq, seq); While this is unlikely to happen, it's possible. Delegate queue_work() to irq_work() which postpones it until the tk_core.seq write held region is left and interrupts are reenabled. Fixes: 16e8d74d2da9 ("KVM: x86: notifier for clocksource changes") Reported-by: syzbot+6beae4000559d41d80f8@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Message-Id: <87h7jgm1zy.ffs@nanos.tec.linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:13:12 +02:00
Thomas Gleixner	8aa7227a5d	KVM: x86: Cancel pvclock_gtod_work on module removal [ Upstream commit 594b27e677b35f9734b1969d175ebc6146741109 ] Nothing prevents the following: pvclock_gtod_notify() queue_work(system_long_wq, &pvclock_gtod_work); ... remove_module(kvm); ... work_queue_run() pvclock_gtod_work() <- UAF Ditto for any other operation on that workqueue list head which touches pvclock_gtod_work after module removal. Cancel the work in kvm_arch_exit() to prevent that. Fixes: 16e8d74d2da9 ("KVM: x86: notifier for clocksource changes") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Message-Id: <87czu4onry.ffs@nanos.tec.linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:13:12 +02:00
Wanpeng Li	2e0ce36d0b	KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer [ Upstream commit d981dd15498b188636ec5a7d8ad485e650f63d8d ] Commit ee66e453db13d (KVM: lapic: Busy wait for timer to expire when using hv_timer) tries to set ktime->expired_tscdeadline by checking ktime->hv_timer_in_use since lapic timer oneshot/periodic modes which are emulated by vmx preemption timer also get advanced, they leverage the same vmx preemption timer logic with tsc-deadline mode. However, ktime->hv_timer_in_use is cleared before apic_timer_expired() handling, let's delay this clearing in preemption-disabled region. Fixes: ee66e453db13d ("KVM: lapic: Busy wait for timer to expire when using hv_timer") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1619608082-4187-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-19 10:13:12 +02:00
Lai Jiangshan	bfccc4eade	KVM/VMX: Invoke NMI non-IST entry instead of IST entry commit a217a6593cec8b315d4c2f344bae33660b39b703 upstream. In VMX, the host NMI handler needs to be invoked after NMI VM-Exit. Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn"), this was done by INTn ("int $2"). But INTn microcode is relatively expensive, so the commit reworked NMI VM-Exit handling to invoke the kernel handler by function call. But this missed a detail. The NMI entry point for direct invocation is fetched from the IDT table and called on the kernel stack. But on 64-bit the NMI entry installed in the IDT expects to be invoked on the IST stack. It relies on the "NMI executing" variable on the IST stack to work correctly, which is at a fixed position in the IST stack. When the entry point is unexpectedly called on the kernel stack, the RSP-addressed "NMI executing" variable is obviously also on the kernel stack and is "uninitialized" and can cause the NMI entry code to run in the wrong way. Provide a non-ist entry point for VMX which shares the C-function with the regular NMI entry and invoke the new asm entry point instead. On 32-bit this just maps to the regular NMI entry point as 32-bit has no ISTs and is not affected. [ tglx: Made it independent for backporting, massaged changelog ] Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Lai Jiangshan <laijs@linux.alibaba.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:12:51 +02:00
Sean Christopherson	21f317826e	KVM: x86/mmu: Remove the defunct update_pte() paging hook commit c5e2184d1544f9e56140791eff1a351bea2e63b9 upstream. Remove the update_pte() shadow paging logic, which was obsoleted by commit 4731d4c7a077 ("KVM: MMU: out of sync shadow core"), but never removed. As pointed out by Yu, KVM never write protects leaf page tables for the purposes of shadow paging, and instead marks their associated shadow page as unsync so that the guest can write PTEs at will. The update_pte() path, which predates the unsync logic, optimizes COW scenarios by refreshing leaf SPTEs when they are written, as opposed to zapping the SPTE, restarting the guest, and installing the new SPTE on the subsequent fault. Since KVM no longer write-protects leaf page tables, update_pte() is unreachable and can be dropped. Reported-by: Yu Zhang <yu.c.zhang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210115004051.4099250-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-19 10:12:51 +02:00
Sean Christopherson	8fcdfa71ba	KVM: VMX: Intercept FS/GS_BASE MSR accesses for 32-bit KVM [ Upstream commit dbdd096a5a74b94f6b786a47baef2085859b0dce ] Disable pass-through of the FS and GS base MSRs for 32-bit KVM. Intel's SDM unequivocally states that the MSRs exist if and only if the CPU supports x86-64. FS_BASE and GS_BASE are mostly a non-issue; a clever guest could opportunistically use the MSRs without issue. KERNEL_GS_BASE is a bigger problem, as a clever guest would subtly be broken if it were migrated, as KVM disallows software access to the MSRs, and unlike the direct variants, KERNEL_GS_BASE needs to be explicitly migrated as it's not captured in the VMCS. Fixes: 25c5f225beda ("KVM: VMX: Enable MSR Bitmap feature") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422023831.3473491-1-seanjc@google.com> [NOT for stable kernels. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:43 +02:00
David Edmondson	e9bd1af4c0	KVM: x86: dump_vmcs should not assume GUEST_IA32_EFER is valid [ Upstream commit d9e46d344e62a0d56fd86a8289db5bed8a57c92e ] If the VM entry/exit controls for loading/saving MSR_EFER are either not available (an older processor or explicitly disabled) or not used (host and guest values are the same), reading GUEST_IA32_EFER from the VMCS returns an inaccurate value. Because of this, in dump_vmcs() don't use GUEST_IA32_EFER to decide whether to print the PDPTRs - always do so if the fields exist. Fixes: 4eb64dce8d0a ("KVM: x86: dump VMCS on invalid entry") Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-2-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:39 +02:00
Sean Christopherson	5cce890e5d	KVM: x86/mmu: Retry page faults that hit an invalid memslot [ Upstream commit e0c378684b6545ad2d4403bb701d0ac4932b4e95 ] Retry page faults (re-enter the guest) that hit an invalid memslot instead of treating the memslot as not existing, i.e. handling the page fault as an MMIO access. When deleting a memslot, SPTEs aren't zapped and the TLBs aren't flushed until after the memslot has been marked invalid. Handling the invalid slot as MMIO means there's a small window where a page fault could replace a valid SPTE with an MMIO SPTE. The legacy MMU handles such a scenario cleanly, but the TDP MMU assumes such behavior is impossible (see the BUG() in __handle_changed_spte()). There's really no good reason why the legacy MMU should allow such a scenario, and closing this hole allows for additional cleanups. Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs") Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:29 +02:00
Nathan Chancellor	db4645fbae	perf/amd/uncore: Fix sysfs type mismatch [ Upstream commit 5deac80d4571dffb51f452f0027979d72259a1b9 ] dev_attr_show() calls the __uncore_*_show() functions via an indirect call but their type does not currently match the type of the show() member in 'struct device_attribute', resulting in a Control Flow Integrity violation. $ cat /sys/devices/amd_l3/format/umask config:8-15 $ dmesg \| grep "CFI failure" [ 1258.174653] CFI failure (target: __uncore_umask_show...): Update the type in the DEFINE_UNCORE_FORMAT_ATTR macro to match 'struct device_attribute' so that there is no more CFI violation. Fixes: 06f2c24584f3 ("perf/amd/uncore: Prepare to scale for more attributes that vary per family") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210415001112.3024673-2-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:28 +02:00
Nathan Chancellor	c8a54b4d66	x86/events/amd/iommu: Fix sysfs type mismatch [ Upstream commit de5bc7b425d4c27ae5faa00ea7eb6b9780b9a355 ] dev_attr_show() calls _iommu_event_show() via an indirect call but _iommu_event_show()'s type does not currently match the type of the show() member in 'struct device_attribute', resulting in a Control Flow Integrity violation. $ cat /sys/devices/amd_iommu_1/events/mem_dte_hit csource=0x0a $ dmesg \| grep "CFI failure" [ 3526.735140] CFI failure (target: _iommu_event_show...): Change _iommu_event_show() and 'struct amd_iommu_event_desc' to 'struct device_attribute' so that there is no more CFI violation. Fixes: 7be6296fdd75 ("perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210415001112.3024673-1-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:28 +02:00
Masami Hiramatsu	296da2049f	x86/kprobes: Fix to check non boostable prefixes correctly [ Upstream commit 6dd3b8c9f58816a1354be39559f630cd1bd12159 ] There are 2 bugs in the can_boost() function because of using x86 insn decoder. Since the insn->opcode never has a prefix byte, it can not find CS override prefix in it. And the insn->attr is the attribute of the opcode, thus inat_is_address_size_prefix( insn->attr) always returns false. Fix those by checking each prefix bytes with for_each_insn_prefix loop and getting the correct attribute for each prefix byte. Also, this removes unlikely, because this is a slow path. Fixes: a8d11cd0714f ("kprobes/x86: Consolidate insn decoder users for copying code") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/161666691162.1120877.2808435205294352583.stgit@devnote2 Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:24 +02:00
Chris von Recklinghausen	1789737ca9	PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check [ Upstream commit f5d1499ae2096d7ea301023c4cc54e427300eb0a ] Hibernation fails on a system in fips mode because md5 is used for the e820 integrity check and is not available. Use crc32 instead. The check is intended to detect whether the E820 memory map provided by the firmware after cold boot unexpectedly differs from the one that was in use when the hibernation image was created. In this case, the hibernation image cannot be restored, as it may cover memory regions that are no longer available to the OS. A non-cryptographic checksum such as CRC-32 is sufficient to detect such inadvertent deviations. Fixes: 62a03defeabd ("PM / hibernate: Verify the consistent of e820 memory map by md5 digest") Reviewed-by: Eric Biggers <ebiggers@google.com> Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:21 +02:00
Ingo Molnar	ee9bc379e4	x86/platform/uv: Fix !KEXEC build failure [ Upstream commit c2209ea55612efac75de0a58ef5f7394fae7fa0f ] When KEXEC is disabled, the UV build fails: arch/x86/platform/uv/uv_nmi.c:875:14: error: ‘uv_nmi_kexec_failed’ undeclared (first use in this function) Since uv_nmi_kexec_failed is only defined in the KEXEC_CORE #ifdef branch, this code cannot ever have been build tested: if (main) pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n"); atomic_set(&uv_nmi_kexec_failed, 1); Nor is this use possible in uv_handle_nmi(): atomic_set(&uv_nmi_kexec_failed, 0); These bugs were introduced in this commit: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails") Which added the uv_nmi_kexec_failed assignments to !KEXEC code, while making the definition KEXEC-only - apparently without testing the !KEXEC case. Instead of complicating the #ifdef maze, simplify the code by requiring X86_UV to depend on KEXEC_CORE. This pattern is present in other architectures as well. ( We'll remove the untested, 7 years old !KEXEC complications from the file in a separate commit. ) Fixes: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails") Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Mike Travis <travis@sgi.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:20 +02:00
Arnd Bergmann	bbd61fa05c	crypto: poly1305 - fix poly1305_core_setkey() declaration [ Upstream commit 8d195e7a8ada68928f2aedb2c18302a4518fe68e ] gcc-11 points out a mismatch between the declaration and the definition of poly1305_core_setkey(): lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=] 13 \| void poly1305_core_setkey(struct poly1305_core_key key, const u8 raw_key[16]) \| ~~~~~~~~~^~~~~~~~~~~ In file included from lib/crypto/poly1305-donna32.c:11: include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 ’ {aka ‘const unsigned char ’} 21 \| void poly1305_core_setkey(struct poly1305_core_key key, const u8 *raw_key); This is harmless in principle, as the calling conventions are the same, but the more specific prototype allows better type checking in the caller. Change the declaration to match the actual function definition. The poly1305_simd_init() is a bit suspicious here, as it previously had a 32-byte argument type, but looks like it needs to take the 16-byte POLY1305_BLOCK_SIZE array instead. Fixes: 1c08a104360f ("crypto: poly1305 - add new 32 and 64-bit generic versions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:13 +02:00
Otavio Pontes	bac2031321	x86/microcode: Check for offline CPUs before requesting new microcode [ Upstream commit 7189b3c11903667808029ec9766a6e96de5012a5 ] Currently, the late microcode loading mechanism checks whether any CPUs are offlined, and, in such a case, aborts the load attempt. However, this must be done before the kernel caches new microcode from the filesystem. Otherwise, when offlined CPUs are onlined later, those cores are going to be updated through the CPU hotplug notifier callback with the new microcode, while CPUs previously onine will continue to run with the older microcode. For example: Turn off one core (2 threads): echo 0 > /sys/devices/system/cpu/cpu3/online echo 0 > /sys/devices/system/cpu/cpu1/online Install the ucode fails because a primary SMT thread is offline: cp intel-ucode/06-8e-09 /lib/firmware/intel-ucode/ echo 1 > /sys/devices/system/cpu/microcode/reload bash: echo: write error: Invalid argument Turn the core back on echo 1 > /sys/devices/system/cpu/cpu3/online echo 1 > /sys/devices/system/cpu/cpu1/online cat /proc/cpuinfo \|grep microcode microcode : 0x30 microcode : 0xde microcode : 0x30 microcode : 0xde The rationale for why the update is aborted when at least one primary thread is offline is because even if that thread is soft-offlined and idle, it will still have to participate in broadcasted MCE's synchronization dance or enter SMM, and in both examples it will execute instructions so it better have the same microcode revision as the other cores. [ bp: Heavily edit and extend commit message with the reasoning behind all this. ] Fixes: 30ec26da9967 ("x86/microcode: Do not upload microcode if CPUs are offline") Signed-off-by: Otavio Pontes <otavio.pontes@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/20210319165515.9240-2-otavio.pontes@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:11 +02:00
Mike Travis	7c5e96e89c	x86/platform/uv: Set section block size for hubless architectures [ Upstream commit 6840a150b9daf35e4d21ab9780d0a03b4ed74a5b ] Commit bbbd2b51a2aa ("x86/platform/UV: Use new set memory block size function") added a call to set the block size value that is needed by the kernel to set the boundaries in the section list. This was done for UV Hubbed systems but missed in the UV Hubless setup. Fix that mistake by adding that same set call for hubless systems, which support the same NVRAMs and Intel BIOS, thus the same problem occurs. [ bp: Massage commit message. ] Fixes: bbbd2b51a2aa ("x86/platform/UV: Use new set memory block size function") Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20210305162853.299892-1-mike.travis@hpe.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-14 09:50:07 +02:00
Sean Christopherson	a947f95b6b	KVM: nVMX: Truncate base/index GPR value on address calc in !64-bit commit 82277eeed65eed6c6ee5b8f97bd978763eab148f upstream. Drop bits 63:32 of the base and/or index GPRs when calculating the effective address of a VMX instruction memory operand. Outside of 64-bit mode, memory encodings are strictly limited to E*X and below. Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	6b7028de66	KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bit commit ee050a577523dfd5fac95e6cc182ebe0293ead59 upstream. Drop bits 63:32 of the VMCS field encoding when checking for a nested VM-Exit on VMREAD/VMWRITE in !64-bit mode. VMREAD and VMWRITE always use 32-bit operands outside of 64-bit mode. The actual emulation of VMREAD/VMWRITE does the right thing, this bug is purely limited to incorrectly causing a nested VM-Exit if a GPR happens to have bits 63:32 set outside of 64-bit mode. Fixes: a7cde481b6e8 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	fa9b4ee318	KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch commit c805f5d5585ab5e0cdac6b1ccf7086eb120fb7db upstream. Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC instruction itself is executed in the previous EPTP context, any side effects, e.g. updating RIP, should occur in the old context. Practically speaking, this bug is benign as VMX doesn't touch the MMU when skipping an emulated instruction, nor does queuing a single-step #DB. No other post-switch side effects exist. Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	6748f80aea	KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported commit 6f2b296aa6432d8274e258cc3220047ca04f5de0 upstream. Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in the guest's CPUID model. Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210423223404.3860547-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	6ccdbedd16	KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are created commit 8727906fde6ea665b52e68ddc58833772537f40a upstream. Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV enabled, and svm_create_vcpu() needs to allocate the VMSA. At best, creating vCPUs before SEV/SEV-ES init will lead to unexpected errors and/or behavior, and at worst it will crash the host, e.g. sev_launch_update_vmsa() will dereference a null svm->vmsa pointer. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	ead4fb53fd	KVM: SVM: Don't strip the C-bit from CR2 on #PF interception commit 6d1b867d045699d6ce0dfa0ef35d1b87dd36db56 upstream. Don't strip the C-bit from the faulting address on an intercepted #PF, the address is a virtual address, not a physical address. Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	12d6843025	KVM: nSVM: Set the shadow root level to the TDP level for nested NPT commit a3322d5cd87fef5ec0037fd1b14068a533f9a60f upstream. Override the shadow root level in the MMU context when configuring NPT for shadowing nested NPT. The level is always tied to the TDP level of the host, not whatever level the guest happens to be using. Fixes: 096586fda522 ("KVM: nSVM: Correctly set the shadow NPT root level in its MMU role") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:04 +02:00
Sean Christopherson	f59c2220f6	KVM: x86: Remove emulator's broken checks on CR0/CR3/CR4 loads commit d0fe7b6404408835ed60232cb3bf28324b2f95db upstream. Remove the emulator's checks for illegal CR0, CR3, and CR4 values, as the checks are redundant, outdated, and in the case of SEV's C-bit, broken. The emulator manually calculates MAXPHYADDR from CPUID and neglects to mask off the C-bit. For all other checks, kvm_set_cr*() are a superset of the emulator checks, e.g. see CR4.LA57. Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3") Cc: Babu Moger <babu.moger@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-2-seanjc@google.com> Cc: stable@vger.kernel.org [Unify check_cr_read and check_cr_write. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:03 +02:00
Sean Christopherson	c8b49e01a2	KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit commit 04d45551a1eefbea42655da52f56e846c0af721a upstream. Allocate the so called pae_root page on-demand, along with the lm_root page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a 32-bit L1. KVM currently only allocates the page when NPT is disabled, or when L0 is 32-bit (using PAE paging). Note, there is an existing memory leak involving the MMU roots, as KVM fails to free the PAE roots on failure. This will be addressed in a future commit. Fixes: ee6268ba3a68 ("KVM: x86: Skip pae_root shadow allocation if tdp enabled") Fixes: b6b80c78af83 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT") Cc: stable@vger.kernel.org Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:50:03 +02:00
Alison Schofield	a4c421b12c	x86, sched: Treat Intel SNC topology as default, COD as exception commit 2c88d45edbb89029c1190bb3b136d2602f057c98 upstream. Commit 1340ccfa9a9a ("x86,sched: Allow topologies where NUMA nodes share an LLC") added a vendor and model specific check to never call topology_sane() for Intel Skylake Server systems where NUMA nodes share an LLC. Intel Ice Lake and Sapphire Rapids CPUs also enumerate an LLC that is shared by multiple NUMA nodes. The LLC on these CPUs is shared for off-package data access but private to the NUMA node for on-package access. Rather than managing a list of allowable SNC topologies, make this SNC topology the default, and treat Intel's Cluster-On-Die (COD) topology as the exception. In SNC mode, Sky Lake, Ice Lake, and Sapphire Rapids servers do not emit this warning: sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210310190233.31752-1-alison.schofield@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:49:59 +02:00
Sean Christopherson	451a3e7570	KVM: x86: Defer the MMU unload to the normal path on an global INVPCID commit f66c53b3b94f658590e1012bf6d922f8b7e01bda upstream. Defer unloading the MMU after a INVPCID until the instruction emulation has completed, i.e. until after RIP has been updated. On VMX, this is a benign bug as VMX doesn't touch the MMU when skipping an emulated instruction. However, on SVM, if nrip is disabled, the emulator is used to skip an instruction, which would lead to fireworks if the emulator were invoked without a valid MMU. Fixes: eb4b248e152d ("kvm: vmx: Support INVPCID in shadow paging mode") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 09:49:57 +02:00
Sean Christopherson	3aec683ee7	x86/cpu: Initialize MSR_TSC_AUX if RDTSCP or RDPID is supported commit b6b4fbd90b155a0025223df2c137af8a701d53b3 upstream. Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is supported. This fixes a bug where vdso_read_cpunode() will read garbage via RDPID if RDPID is supported but RDTSCP is not. While no known CPU supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model for a virtual machine. Note, technically MSR_TSC_AUX could be initialized if and only if RDPID is supported since RDTSCP is currently not used to retrieve the CPU node. But, the cost of the superfluous WRMSR is negigible, whereas leaving MSR_TSC_AUX uninitialized is just asking for future breakage if someone decides to utilize RDTSCP. Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-11 14:47:37 +02:00
Nathan Chancellor	8829b6ccf4	x86/boot: Add $(CLANG_FLAGS) to compressed KBUILD_CFLAGS [ Upstream commit d5cbd80e302dfea59726c44c56ab7957f822409f ] When cross compiling x86 on an ARM machine with clang, there are several errors along the lines of: arch/x86/include/asm/string_64.h:27:10: error: invalid output constraint '=&c' in asm This happens because the compressed boot Makefile reassigns KBUILD_CFLAGS and drops the clang flags that set the target architecture ('--target=') and the path to the GNU cross tools ('--prefix='), meaning that the host architecture is targeted. These flags are available as $(CLANG_FLAGS) from the main Makefile so add them to the compressed boot folder's KBUILD_CFLAGS so that cross compiling works as expected. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20210326000435.4785-3-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-11 14:47:18 +02:00
John Millikin	fdc9c3cff9	x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS) [ Upstream commit 8abe7fc26ad8f28bfdf78adbed56acd1fa93f82d ] When cross-compiling with Clang, the `$(CLANG_FLAGS)' variable contains additional flags needed to build C and assembly sources for the target platform. Normally this variable is automatically included in `$(KBUILD_CFLAGS)' via the top-level Makefile. The x86 real-mode makefile builds `$(REALMODE_CFLAGS)' from a plain assignment and therefore drops the Clang flags. This causes Clang to not recognize x86-specific assembler directives: arch/x86/realmode/rm/header.S:36:1: error: unknown directive .type real_mode_header STT_OBJECT ; .size real_mode_header, .-real_mode_header ^ Explicit propagation of `$(CLANG_FLAGS)' to `$(REALMODE_CFLAGS)', which is inherited by real-mode make rules, fixes cross-compilation with Clang for x86 targets. Relevant flags: * `--target' sets the target architecture when cross-compiling. This flag must be set for both compilation and assembly (`KBUILD_AFLAGS') to support architecture-specific assembler directives. * `-no-integrated-as' tells clang to assemble with GNU Assembler instead of its built-in LLVM assembler. This flag is set by default unless `LLVM_IAS=1' is set, because the LLVM assembler can't yet parse certain GNU extensions. Signed-off-by: John Millikin <john@john-millikin.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://lkml.kernel.org/r/20210326000435.4785-2-nathan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-11 14:47:18 +02:00
Joerg Roedel	6a6273a65f	x86/sev: Do not require Hypervisor CPUID bit for SEV guests [ Upstream commit eab696d8e8b9c9d600be6fad8dd8dfdfaca6ca7c ] A malicious hypervisor could disable the CPUID intercept for an SEV or SEV-ES guest and trick it into the no-SEV boot path, where it could potentially reveal secrets. This is not an issue for SEV-SNP guests, as the CPUID intercept can't be disabled for those. Remove the Hypervisor CPUID bit check from the SEV detection code to protect against this kind of attack and add a Hypervisor bit equals zero check to the SME detection path to prevent non-encrypted guests from trying to enable SME. This handles the following cases: 1) SEV(-ES) guest where CPUID intercept is disabled. The guest will still see leaf 0x8000001f and the SEV bit. It can retrieve the C-bit and boot normally. 2) Non-encrypted guests with intercepted CPUID will check the SEV_STATUS MSR and find it 0 and will try to enable SME. This will fail when the guest finds MSR_K8_SYSCFG to be zero, as it is emulated by KVM. But we can't rely on that, as there might be other hypervisors which return this MSR with bit 23 set. The Hypervisor bit check will prevent that the guest tries to enable SME in this case. 3) Non-encrypted guests on SEV capable hosts with CPUID intercept disabled (by a malicious hypervisor) will try to boot into the SME path. This will fail, but it is also not considered a problem because non-encrypted guests have no protection against the hypervisor anyway. [ bp: s/non-SEV/non-encrypted/g ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20210312123824.306-3-joro@8bytes.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-05-11 14:47:17 +02:00
Maciej W. Rozycki	0c48349a6d	x86/build: Disable HIGHMEM64G selection for M486SX commit 0ef3439cd80ba7770723edb0470d15815914bb62 upstream. Fix a regression caused by making the 486SX separately selectable in Kconfig, for which the HIGHMEM64G setting has not been updated and therefore has become exposed as a user-selectable option for the M486SX configuration setting unlike with original M486 and all the other settings that choose non-PAE-enabled processors: High Memory Support > 1. off (NOHIGHMEM) 2. 4GB (HIGHMEM4G) 3. 64GB (HIGHMEM64G) choice[1-3?]: With the fix in place the setting is now correctly removed: High Memory Support > 1. off (NOHIGHMEM) 2. 4GB (HIGHMEM4G) choice[1-2?]: [ bp: Massage commit message. ] Fixes: 87d6021b8143 ("x86/math-emu: Limit MATH_EMULATION to 486SX compatibles") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.5+ Link: https://lkml.kernel.org/r/alpine.DEB.2.21.2104141221340.44318@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-11 14:47:15 +02:00
Mike Galbraith	31720f9e87	x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access commit 5849cdf8c120e3979c57d34be55b92d90a77a47e upstream. Commit in Fixes: added support for kexec-ing a kernel on panic using a new system call. As part of it, it does prepare a memory map for the new kernel. However, while doing so, it wrongly accesses memory it has not allocated: it accesses the first element of the cmem->ranges[] array in memmap_exclude_ranges() but it has not allocated the memory for it in crash_setup_memmap_entries(). As KASAN reports: BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0 Write of size 8 at addr ffffc90000426008 by task kexec/1187 (gdb) list crash_setup_memmap_entries+0x17e 0xffffffff8107cafe is in crash_setup_memmap_entries (arch/x86/kernel/crash.c:322). 317 unsigned long long mend) 318 { 319 unsigned long start, end; 320 321 cmem->ranges[0].start = mstart; 322 cmem->ranges[0].end = mend; 323 cmem->nr_ranges = 1; 324 325 / Exclude elf header region */ 326 start = image->arch.elf_load_addr; (gdb) Make sure the ranges array becomes a single element allocated. [ bp: Write a proper commit message. ] Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call") Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Young <dyoung@redhat.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/725fa3dc1da2737f0f6188a1a9701bead257ea9d.camel@gmx.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-04-28 13:40:02 +02:00
Jim Mattson	ab112cc573	perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[] [ Upstream commit 4b2f1e59229b9da319d358828cdfa4ddbc140769 ] The only stepping of Broadwell Xeon parts is stepping 1. Fix the relevant isolation_ucodes[] entry, which previously enumerated stepping 2. Although the original commit was characterized as an optimization, it is also a workaround for a correctness issue. If a PMI arrives between kvm's call to perf_guest_get_msrs() and the subsequent VM-entry, a stale value for the IA32_PEBS_ENABLE MSR may be restored at the next VM-exit. This is because, unbeknownst to kvm, PMI throttling may clear bits in the IA32_PEBS_ENABLE MSR. CPUs with "PEBS isolation" don't suffer from this issue, because perf_guest_get_msrs() doesn't report the IA32_PEBS_ENABLE value. Fixes: 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering") Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Peter Shier <pshier@google.com> Acked-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20210422001834.1748319-1-jmattson@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-04-28 13:40:00 +02:00

1 2 3 4 5 ...

37142 Commits