linux

iv/linux

Author	SHA1	Message	Date
Borislav Petkov	64840a4a2d	task_stack, x86/cea: Force-inline stack helpers [ Upstream commit e87f4152e542610d0b4c6c8548964a68a59d2040 ] Force-inline two stack helpers to fix the following objtool warnings: vmlinux.o: warning: objtool: in_task_stack()+0xc: call to task_stack_page() leaves .noinstr.text section vmlinux.o: warning: objtool: in_entry_stack()+0x10: call to cpu_entry_stack() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220324183607.31717-2-bp@alien8.de Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-20 12:39:43 +02:00
Borislav Petkov	0b009e5fd1	x86/mm: Force-inline __phys_addr_nodebug() [ Upstream commit ace1a98519270c586c0d4179419292df67441cd1 ] Fix: vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0x8b: call to __phys_addr_nodebug() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220324183607.31717-4-bp@alien8.de Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-20 12:39:43 +02:00
Nick Desaulniers	f9571a9699	lockdep: Fix -Wunused-parameter for _THIS_IP_ [ Upstream commit 8b023accc8df70e72f7704d29fead7ca914d6837 ] While looking into a bug related to the compiler's handling of addresses of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep. Drive by cleanup. -Wunused-parameter: kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip' kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip' kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip' Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20220314221909.2027027-1-ndesaulniers@google.com Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-20 12:39:42 +02:00
Jim Mattson	03b1870fbc	KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES [ Upstream commit 0204750bd4c6ccc2fb7417618477f10373b33f56 ] KVM should not claim to virtualize unknown IA32_ARCH_CAPABILITIES bits. When kvm_get_arch_capabilities() was originally written, there were only a few bits defined in this MSR, and KVM could virtualize all of them. However, over the years, several bits have been defined that KVM cannot just blindly pass through to the guest without additional work (such as virtualizing an MSR promised by the IA32_ARCH_CAPABILITES feature bit). Define a mask of supported IA32_ARCH_CAPABILITIES bits, and mask off any other bits that are set in the hardware MSR. Cc: Paolo Bonzini <pbonzini@redhat.com> Fixes: 5b76a3cff011 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20220830174947.2182144-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-08 12:28:05 +02:00
Jim Mattson	fec48eba47	KVM: VMX: Heed the 'msr' argument in msr_write_intercepted() [ Upstream commit 020dac4187968535f089f83f376a72beb3451311 ] Regardless of the 'msr' argument passed to the VMX version of msr_write_intercepted(), the function always checks to see if a specific MSR (IA32_SPEC_CTRL) is intercepted for write. This behavior seems unintentional and unexpected. Modify the function so that it checks to see if the provided 'msr' index is intercepted for write. Fixes: 67f4b9969c30 ("KVM: nVMX: Handle dynamic MSR intercept toggling") Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220810213050.2655000-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-08 12:28:04 +02:00
Stephane Eranian	b5f5fee03d	perf/x86/intel/ds: Fix precise store latency handling commit d4bdb0bebc5ba3299d74f123c782d99cd4e25c49 upstream. With the existing code in store_latency_data(), the memory operation (mem_op) returned to the user is always OP_LOAD where in fact, it should be OP_STORE. This comes from the fact that the function is simply grabbing the information from a data source map which covers only load accesses. Intel 12th gen CPU offers precise store sampling that captures both the data source and latency. Therefore it can use the data source mapping table but must override the memory operation to reflect stores instead of loads. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220818054613.1548130-1-eranian@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:51 +02:00
Stephane Eranian	83bd6d1212	perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU commit 11745ecfe8fea4b4a4c322967a7605d2ecbd5080 upstream. Existing code was generating bogus counts for the SNB IMC bandwidth counters: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000327813 1,024.03 MiB uncore_imc/data_reads/ 1.000327813 20.73 MiB uncore_imc/data_writes/ 2.000580153 261,120.00 MiB uncore_imc/data_reads/ 2.000580153 23.28 MiB uncore_imc/data_writes/ The problem was introduced by commit: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Where the read_counter callback was replace to point to the generic uncore_mmio_read_counter() function. The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in MMIO. But uncore_mmio_read_counter() is using a readq() call to read from MMIO therefore reading 64-bit from MMIO. Although this is okay for the uncore_perf_event_update() function because it is shifting the value based on the actual counter width to compute a delta, it is not okay for the uncore_pmu_event_start() which is simply reading the counter and therefore priming the event->prev_count with a bogus value which is responsible for causing bogus deltas in the perf stat command above. The fix is to reintroduce the custom callback for read_counter for the SNB IMC PMU and use readl() instead of readq(). With the change the output of perf stat is back to normal: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000120987 296.94 MiB uncore_imc/data_reads/ 1.000120987 138.42 MiB uncore_imc/data_writes/ 2.000403144 175.91 MiB uncore_imc/data_reads/ 2.000403144 68.50 MiB uncore_imc/data_writes/ Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:51 +02:00
Peter Zijlstra	992d2fc2fe	x86/nospec: Fix i386 RSB stuffing commit 332924973725e8cdcc783c175f68cf7e162cb9e5 upstream. Turns out that i386 doesn't unconditionally have LFENCE, as such the loop in __FILL_RETURN_BUFFER isn't actually speculation safe on such chips. Fixes: ba6e31af2be9 ("x86/speculation: Add LFENCE to RSB fill sequence") Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Yv9tj9vbQ9nNlXoY@worktop.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:50 +02:00
Peter Zijlstra	500195a109	x86/nospec: Unwreck the RSB stuffing commit 4e3aa9238277597c6c7624f302d81a7b568b6f2d upstream. Commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") made a right mess of the RSB stuffing, rewrite the whole thing to not suck. Thanks to Andrew for the enlightening comment about Post-Barrier RSB things so we can make this code less magical. Cc: stable@vger.kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:47 +02:00
Pawan Gupta	75fa6c733b	x86/bugs: Add "unknown" reporting for MMIO Stale Data commit 7df548840c496b0141fb2404b889c346380c2b22 upstream. Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown. Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities. Mitigation is not deployed when the status is unknown. [ bp: Massage, fixup. ] Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:47 +02:00
Chen Zhongjin	a7484eb9f3	x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry commit fc2e426b1161761561624ebd43ce8c8d2fa058da upstream. When meeting ftrace trampolines in ORC unwinding, unwinder uses address of ftrace_{regs_}call address to find the ORC entry, which gets next frame at sp+176. If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be sp+8 instead of 176. It makes unwinder skip correct frame and throw warnings such as "wrong direction" or "can't access registers", etc, depending on the content of the incorrect frame address. By adding the base address ftrace_{regs_}caller with the offset ip - ops->trampoline, we can get the correct address to find the ORC entry. Also change "caller" to "tramp_addr" to make variable name conform to its content. [ mingo: Clarified the changelog a bit. ] Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:47 +02:00
Kan Liang	1cdfef6cd2	perf/x86/lbr: Enable the branch type for the Arch LBR by default commit 32ba156df1b1c8804a4e5be5339616945eafea22 upstream. On the platform with Arch LBR, the HW raw branch type encoding may leak to the perf tool when the SAVE_TYPE option is not set. In the intel_pmu_store_lbr(), the HW raw branch type is stored in lbr_entries[].type. If the SAVE_TYPE option is set, the lbr_entries[].type will be converted into the generic PERF_BR_* type in the intel_pmu_lbr_filter() and exposed to the user tools. But if the SAVE_TYPE option is NOT set by the user, the current perf kernel doesn't clear the field. The HW raw branch type leaks. There are two solutions to fix the issue for the Arch LBR. One is to clear the field if the SAVE_TYPE option is NOT set. The other solution is to unconditionally convert the branch type and expose the generic type to the user tools. The latter is implemented here, because - The branch type is valuable information. I don't see a case where you would not benefit from the branch type. (Stephane Eranian) - Not having the branch type DOES NOT save any space in the branch record (Stephane Eranian) - The Arch LBR HW can retrieve the common branch types from the LBR_INFO. It doesn't require the high overhead SW disassemble. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:46 +02:00
Lai Jiangshan	a50d9fde46	x86/entry: Move CLD to the start of the idtentry macro commit c64cc2802a784ecfd25d39945e57e7a147854a5b upstream. Move it after CLAC. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220503032107.680190-5-jiangshanlai@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-31 17:16:34 +02:00
Nadav Amit	14674e47ff	x86/kprobes: Fix JNG/JNLE emulation commit 8924779df820c53875abaeb10c648e9cb75b46d4 upstream. When kprobes emulates JNG/JNLE instructions on x86 it uses the wrong condition. For JNG (opcode: 0F 8E), according to Intel SDM, the jump is performed if (ZF == 1 or SF != OF). However the kernel emulation currently uses 'and' instead of 'or'. As a result, setting a kprobe on JNG/JNLE might cause the kernel to behave incorrectly whenever the kprobe is hit. Fix by changing the 'and' to 'or'. Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step") Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220813225943.143767-1-namit@vmware.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-25 11:39:57 +02:00
Aaron Lu	d26beb9109	x86/mm: Use proper mask when setting PUD mapping commit 88e0a74902f894fbbc55ad3ad2cb23b4bfba555c upstream. Commit c164fbb40c43f("x86/mm: thread pgprot_t through init_memory_mapping()") mistakenly used __pgprot() which doesn't respect __default_kernel_pte_mask when setting PUD mapping. Fix it by only setting the one bit we actually need (PSE) and leaving the other bits (that have been properly masked) alone. Fixes: c164fbb40c43 ("x86/mm: thread pgprot_t through init_memory_mapping()") Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-25 11:39:53 +02:00
Peter Zijlstra	3eb602ad6a	x86/ftrace: Use alternative RET encoding commit 1f001e9da6bbf482311e45e48f53c2bd2179e59c upstream. Use the return thunk in ftrace trampolines, if needed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> [cascardo: use memcpy(text_gen_insn) as there is no __text_gen_insn] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-21 15:17:48 +02:00
Peter Zijlstra	543138c555	x86/ibt,ftrace: Make function-graph play nice commit e52fc2cf3f662828cc0d51c4b73bed73ad275fce upstream. Return trampoline must not use indirect branch to return; while this preserves the RSB, it is fundamentally incompatible with IBT. Instead use a retpoline like ROP gadget that defeats IBT while not unbalancing the RSB. And since ftrace_stub is no longer a plain RET, don't use it to copy from. Since RET is a trivial instruction, poke it directly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.347296408@infradead.org [cascardo: remove ENDBR] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-21 15:17:48 +02:00
Thadeu Lima de Souza Cascardo	f663276348	Revert "x86/ftrace: Use alternative RET encoding" This reverts commit e54fcb0812faebd147de72bd37ad87cc4951c68c. This temporarily reverts the backport of upstream commit 1f001e9da6bbf482311e45e48f53c2bd2179e59c. It was not correct to copy the ftrace stub as it would contain a relative jump to the return thunk which would not apply to the context where it was being copied to, leading to ftrace support to be broken. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-21 15:17:48 +02:00
Sean Christopherson	34bef00a32	KVM: nVMX: Attempt to load PERF_GLOBAL_CTRL on nVMX xfer iff it exists [ Upstream commit 4496a6f9b45e8cd83343ad86a3984d614e22cf54 ] Attempt to load PERF_GLOBAL_CTRL during nested VM-Enter/VM-Exit if and only if the MSR exists (according to the guest vCPU model). KVM has very misguided handling of VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL and attempts to force the nVMX MSR settings to match the vPMU model, i.e. to hide/expose the control based on whether or not the MSR exists from the guest's perspective. KVM's modifications fail to handle the scenario where the vPMU is hidden from the guest _after_ being exposed to the guest, e.g. by userspace doing multiple KVM_SET_CPUID2 calls, which is allowed if done before any KVM_RUN. nested_vmx_pmu_refresh() is called if and only if there's a recognized vPMU, i.e. KVM will leave the bits in the allow state and then ultimately reject the MSR load and WARN. KVM should not force the VMX MSRs in the first place. KVM taking control of the MSRs was a misguided attempt at mimicking what commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled", 2018-10-01) did for MPX. However, the MPX commit was a workaround for another KVM bug and not something that should be imitated (and it should never been done in the first place). In other words, KVM's ABI _should_ be that userspace has full control over the MSRs, at which point triggering the WARN that loading the MSR must not fail is trivial. The intent of the WARN is still valid; KVM has consistency checks to ensure that vmcs12->{guest,host}_ia32_perf_global_ctrl is valid. The problem is that '0' must be considered a valid value at all times, and so the simple/obvious solution is to just not actually load the MSR when it does not exist. It is userspace's responsibility to provide a sane vCPU model, i.e. KVM is well within its ABI and Intel's VMX architecture to skip the loads if the MSR does not exist. Fixes: 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:25 +02:00
Sean Christopherson	e365c817be	KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL [ Upstream commit b663f0b5f3d665c261256d1f76e98f077c6e56af ] Add a helper to check of the guest PMU has PERF_GLOBAL_CTRL, which is unintuitive _and_ diverges from Intel's architecturally defined behavior. Even worse, KVM currently implements the check using two different (but equivalent) checks, _and_ there has been at least one attempt to add a _third_ flavor. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:25 +02:00
Like Xu	1eedac05b2	KVM: x86/pmu: Ignore pmu->global_ctrl check if vPMU doesn't support global_ctrl [ Upstream commit 98defd2e17803263f49548fea930cfc974d505aa ] MSR_CORE_PERF_GLOBAL_CTRL is introduced as part of Architecture PMU V2, as indicated by Intel SDM 19.2.2 and the intel_is_valid_msr() function. So in the absence of global_ctrl support, all PMCs are enabled as AMD does. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220509102204.62389-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:25 +02:00
Sean Christopherson	9f1a17222a	KVM: VMX: Mark all PERF_GLOBAL_(OVF)_CTRL bits reserved if there's no vPMU [ Upstream commit 93255bf92939d948bc86d81c6bb70bb0fecc5db1 ] Mark all MSR_CORE_PERF_GLOBAL_CTRL and MSR_CORE_PERF_GLOBAL_OVF_CTRL bits as reserved if there is no guest vPMU. The nVMX VM-Entry consistency checks do not check for a valid vPMU prior to consuming the masks via kvm_valid_perf_global_ctrl(), i.e. may incorrectly allow a non-zero mask to be loaded via VM-Enter or VM-Exit (well, attempted to be loaded, the actual MSR load will be rejected by intel_is_valid_msr()). Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:25 +02:00
Like Xu	81f723a006	KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter [ Upstream commit 2c985527dd8d283e786ad7a67e532ef7f6f00fac ] The mask value of fixed counter control register should be dynamic adjusted with the number of fixed counters. This patch introduces a variable that includes the reserved bits of fixed counter control registers. This is a generic code refactoring. Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-6-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:25 +02:00
Sean Christopherson	f2145a1bf7	KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) [ Upstream commit 2368048bf5c2ec4b604ac3431564071e89a0bc71 ] Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or MCi_STATUS MSR. The behavior of "all zeros' or "all ones" for CTL MSRs is architectural, as is the "only zeros" behavior for STATUS MSRs. I.e. the intent is to inject a #GP, not exit to userspace due to an unhandled emulation case. Returning '-1' gets interpreted as -EPERM up the stack and effecitvely kills the guest. Fixes: 890ca9aefa78 ("KVM: Add MCE support") Fixes: 9ffd986c6e4e ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:22 +02:00
Lev Kujawski	1f71d1f7f4	KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors [ Upstream commit 0471a7bd1bca2a47a5f378f2222c5cf39ce94152 ] Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of MC1_CTL to ignore single-bit ECC data errors. Single-bit ECC data errors are always correctable and thus are safe to ignore because they are informational in nature rather than signaling a loss of data integrity. Prior to this patch, these guests would crash upon writing MC1_CTL, with resultant error messages like the following: error: kvm run failed Operation not permitted EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20 EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0108 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0000 00000000 ffffffff 00c00000 GS =0000 00000000 ffffffff 00c00000 LDT=0118 c1026390 00000047 00008200 DPL=0 LDT TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy GDT= ffff5020 000002cf IDT= ffff52f0 000007ff CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 EFER=0000000000000000 Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04 <0f> 30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d 74 26 00 0f 31 c3 Signed-off-by: Lev Kujawski <lkujaw@member.fsf.org> Message-Id: <20220521081511.187388-1-lkujaw@member.fsf.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:22 +02:00
Jason A. Donenfeld	3dd33a09f5	crypto: blake2s - remove shash module [ Upstream commit 2d16803c562ecc644803d42ba98a8e0aef9c014e ] BLAKE2s has no currently known use as an shash. Just remove all of this unnecessary plumbing. Removing this shash was something we talked about back when we were making BLAKE2s a built-in, but I simply never got around to doing it. So this completes that project. Importantly, this fixs a bug in which the lib code depends on crypto_simd_disabled_for_test, causing linker errors. Also add more alignment tests to the selftests and compare SIMD and non-SIMD compression functions, to make up for what we lose from testmgr.c. Reported-by: gaochao <gaochao49@huawei.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:19 +02:00
Alexander Lobakin	c273671ae8	x86/olpc: fix 'logical not is only applied to the left hand side' commit 3a2ba42cbd0b669ce3837ba400905f93dd06c79f upstream. The bitops compile-time optimization series revealed one more problem in olpc-xo1-sci.c:send_ebook_state(), resulted in GCC warnings: arch/x86/platform/olpc/olpc-xo1-sci.c: In function 'send_ebook_state': arch/x86/platform/olpc/olpc-xo1-sci.c:83:63: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses] 83 \| if (!!test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == state) \| ^~ arch/x86/platform/olpc/olpc-xo1-sci.c:83:13: note: add parentheses around left hand side expression to silence this warning Despite this code working as intended, this redundant double negation of boolean value, together with comparing to `char` with no explicit conversion to bool, makes compilers think the author made some unintentional logical mistakes here. Make it the other way around and negate the char instead to silence the warnings. Fixes: d2aa37411b8e ("x86/olpc/xo1/sci: Produce wakeup events for buttons and switches") Cc: stable@vger.kernel.org # 3.5+ Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: kernel test robot <lkp@intel.com> Reviewed-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:24:18 +02:00
Masami Hiramatsu (Google)	1cbf3882cb	x86/kprobes: Update kcb status flag after singlestepping commit dec8784c9088b131a1523f582c2194cfc8107dc0 upstream. Fix kprobes to update kcb (kprobes control block) status flag to KPROBE_HIT_SSDONE even if the kp->post_handler is not set. This bug may cause a kernel panic if another INT3 user runs right after kprobes because kprobe_int3_handler() misunderstands the INT3 is kprobe's single stepping INT3. Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step") Reported-by: Daniel Müller <deso@posteo.net> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Daniel Müller <deso@posteo.net> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20220727210136.jjgc3lpqeq42yr3m@muellerd-fedora-PC2BDTX9 Link: https://lore.kernel.org/r/165942025658.342061.12452378391879093249.stgit@devnote2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:24:18 +02:00
Steven Rostedt (Google)	7c91c8da43	ftrace/x86: Add back ftrace_expected assignment commit ac6c1b2ca77e722a1e5d651f12f437f2f237e658 upstream. When a ftrace_bug happens (where ftrace fails to modify a location) it is helpful to have what was at that location as well as what was expected to be there. But with the conversion to text_poke() the variable that assigns the expected for debugging was dropped. Unfortunately, I noticed this when I needed it. Add it back. Link: https://lkml.kernel.org/r/20220726101851.069d2e70@gandalf.local.home Cc: "x86@kernel.org" <x86@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:24:18 +02:00
Kim Phillips	0b00cb428f	x86/bugs: Enable STIBP for IBPB mitigated RETBleed commit e6cfcdda8cbe81eaf821c897369a65fec987b404 upstream. AMD's "Technical Guidance for Mitigating Branch Type Confusion, Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On Privileged Mode Entry / SMT Safety" says: Similar to the Jmp2Ret mitigation, if the code on the sibling thread cannot be trusted, software should set STIBP to 1 or disable SMT to ensure SMT safety when using this mitigation. So, like already being done for retbleed=unret, and now also for retbleed=ibpb, force STIBP on machines that have it, and report its SMT vulnerability status accordingly. [ bp: Remove the "we" and remove "[AMD]" applicability parameter which doesn't work here. ] Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:24:18 +02:00
Andrea Righi	608d4c5f9f	x86/entry: Build thunk_$(BITS) only if CONFIG_PREEMPTION=y [ Upstream commit de979c83574abf6e78f3fa65b716515c91b2613d ] With CONFIG_PREEMPTION disabled, arch/x86/entry/thunk_$(BITS).o becomes an empty object file. With some old versions of binutils (i.e., 2.35.90.20210113-1ubuntu1) the GNU assembler doesn't generate a symbol table for empty object files and objtool fails with the following error when a valid symbol table cannot be found: arch/x86/entry/thunk_64.o: warning: objtool: missing symbol table To prevent this from happening, build thunk_$(BITS).o only if CONFIG_PREEMPTION is enabled. BugLink: https://bugs.launchpad.net/bugs/1911359 Fixes: 320100a5ffe5 ("x86/entry: Remove the TRACE_IRQS cruft") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/Ys/Ke7EWjcX+ZlXO@arighi-desktop Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:15 +02:00
Siddh Raman Pant	3bb94ff1e7	x86/numa: Use cpumask_available instead of hardcoded NULL check [ Upstream commit 625395c4a0f4775e0fe00f616888d2e6c1ba49db ] GCC-12 started triggering a new warning: arch/x86/mm/numa.c: In function ‘cpumask_of_node’: arch/x86/mm/numa.c:916:39: warning: the comparison will always evaluate as ‘false’ for the address of ‘node_to_cpumask_map’ will never be NULL [-Waddress] 916 \| if (node_to_cpumask_map[node] == NULL) { \| ^~ node_to_cpumask_map is of type cpumask_var_t[]. When CONFIG_CPUMASK_OFFSTACK is set, cpumask_var_t is typedef'd to a pointer for dynamic allocation, else to an array of one element. The "wicked game" can be checked on line 700 of include/linux/cpumask.h. The original code in debug_cpumask_set_cpu() and cpumask_of_node() were probably written by the original authors with CONFIG_CPUMASK_OFFSTACK=y (i.e. dynamic allocation) in mind, checking if the cpumask was available via a direct NULL check. When CONFIG_CPUMASK_OFFSTACK is not set, GCC gives the above warning while compiling the kernel. Fix that by using cpumask_available(), which does the NULL check when CONFIG_CPUMASK_OFFSTACK is set, otherwise returns true. Use it wherever such checks are made. Conditional definitions of cpumask_available() can be found along with the definition of cpumask_var_t. Check the cpumask.h reference mentioned above. Fixes: c032ef60d1aa ("cpumask: convert node_to_cpumask_map[] to cpumask_var_t") Fixes: de2d9445f162 ("x86: Unify node_to_cpumask_map handling between 32 and 64bit") Signed-off-by: Siddh Raman Pant <code@siddh.me> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220731160913.632092-1-code@siddh.me Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:15 +02:00
Chenyi Qiang	31dad89b16	x86/bus_lock: Don't assume the init value of DEBUGCTLMSR.BUS_LOCK_DETECT to be zero [ Upstream commit ffa6482e461ff550325356ae705b79e256702ea9 ] It's possible that this kernel has been kexec'd from a kernel that enabled bus lock detection, or (hypothetically) BIOS/firmware has set DEBUGCTLMSR_BUS_LOCK_DETECT. Disable bus lock detection explicitly if not wanted. Fixes: ebb1064e7c2e ("x86/traps: Handle #DB for bus lock") Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20220802033206.21333-1-chenyi.qiang@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:14 +02:00
Sean Christopherson	e389e927e8	KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP [ Upstream commit a910b5ab6b250a88fff1866bf708642d83317466 ] Make UMIP an "allowed-1" bit CR4_FIXED1 MSR when KVM is emulating UMIP. KVM emulates UMIP for both L1 and L2, and so should enumerate that L2 is allowed to have CR4.UMIP=1. Not setting the bit doesn't immediately break nVMX, as KVM does set/clear the bit in CR4_FIXED1 in response to a guest CPUID update, i.e. KVM will correctly (dis)allow nested VM-Entry based on whether or not UMIP is exposed to L1. That said, KVM should enumerate the bit as being allowed from time zero, e.g. userspace will see the wrong value if the MSR is read before CPUID is written. Fixes: 0367f205a3b7 ("KVM: vmx: add support for emulating UMIP") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:58 +02:00
Sean Christopherson	7840dce796	KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported [ Upstream commit 3741aec4c38fa4123ab08ae552f05366d4fd05d8 ] If NRIPS is supported in hardware but disabled in KVM, set next_rip to the next RIP when advancing RIP as part of emulating INT3 injection. There is no flag to tell the CPU that KVM isn't using next_rip, and so leaving next_rip is left as is will result in the CPU pushing garbage onto the stack when vectoring the injected event. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <cd328309a3b88604daa2359ad56f36cb565ce2d4.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:40 +02:00
Sean Christopherson	2adc7032ec	KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails" [ Upstream commit cd9e6da8048c5b40315ee2d929b6230ce1252c3c ] Unwind the RIP advancement done by svm_queue_exception() when injecting an INT3 ultimately "fails" due to the CPU encountering a VM-Exit while vectoring the injected event, even if the exception reported by the CPU isn't the same event that was injected. If vectoring INT3 encounters an exception, e.g. #NP, and vectoring the #NP encounters an intercepted exception, e.g. #PF when KVM is using shadow paging, then the #NP will be reported as the event that was in-progress. Note, this is still imperfect, as it will get a false positive if the INT3 is cleanly injected, no VM-Exit occurs before the IRET from the INT3 handler in the guest, the instruction following the INT3 generates an exception (directly or indirectly), _and_ vectoring that exception encounters an exception that is intercepted by KVM. The false positives could theoretically be solved by further analyzing the vectoring event, e.g. by comparing the error code against the expected error code were an exception to occur when vectoring the original injected exception, but SVM without NRIPS is a complete disaster, trying to make it 100% correct is a waste of time. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Fixes: 66b7138f9136 ("KVM: SVM: Emulate nRIP feature when reinjecting INT3") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <450133cf0a026cb9825a2ff55d02cb136a1cb111.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:40 +02:00
Peter Zijlstra	7bc43ab2b9	x86/extable: Fix ex_handler_msr() print condition [ Upstream commit a1a5482a2c6e38a3ebed32e571625c56a8cc41a6 ] On Fri, Jun 17, 2022 at 02:08:52PM +0300, Stephane Eranian wrote: > Some changes to the way invalid MSR accesses are reported by the > kernel is causing some problems with messages printed on the > console. > > We have seen several cases of ex_handler_msr() printing invalid MSR > accesses once but the callstack multiple times causing confusion on > the console. > The problem here is that another earlier commit (5.13): > > a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality") > > Modifies all the pr_*_once() calls to always return true claiming > that no caller is ever checking the return value of the functions. > > This is why we are seeing the callstack printed without the > associated printk() msg. Extract the ONCE_IF(cond) part into __ONCE_LTE_IF() and use that to implement DO_ONCE_LITE_IF() and fix the extable code. Fixes: a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Stephane Eranian <eranian@google.com> Link: https://lkml.kernel.org/r/YqyVFsbviKjVGGZ9@worktop.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:14 +02:00
Johan Hovold	377a4c5cb7	x86/pmem: Fix platform-device leak in error path [ Upstream commit 229e73d46994f15314f58b2d39bf952111d89193 ] Make sure to free the platform device in the unlikely event that registration fails. Fixes: 7a67832c7e44 ("libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220620140723.9810-1-johan@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:07 +02:00
Mark Rutland	d1e812beae	arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic [ Upstream commit 4510bffb4d0246cdcc1f14c7367c026b807a862d ] On most architectures, IRQ flag tracing is disabled in NMI context, and architectures need to define and select TRACE_IRQFLAGS_NMI_SUPPORT in order to enable this. Commit: 859d069ee1ddd878 ("lockdep: Prepare for NMI IRQ state tracking") Permitted IRQ flag tracing in NMI context, allowing lockdep to work in NMI context where an architecture had suitable entry logic. At the time, most architectures did not have such suitable entry logic, and this broke lockdep on such architectures. Thus, this was partially disabled in commit: ed00495333ccc80f ("locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs") ... with architectures needing to select TRACE_IRQFLAGS_NMI_SUPPORT to enable IRQ flag tracing in NMI context. Currently TRACE_IRQFLAGS_NMI_SUPPORT is defined under arch/x86/Kconfig.debug. Move it to arch/Kconfig so architectures can select it without having to provide their own definition. Since the regular TRACE_IRQFLAGS_SUPPORT is selected by arch/x86/Kconfig, the select of TRACE_IRQFLAGS_NMI_SUPPORT is moved there too. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220511131733.4074499-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Wyes Karny	932b5e6524	x86: Handle idle=nomwait cmdline properly for x86_idle [ Upstream commit 8bcedb4ce04750e1ccc9a6b6433387f6a9166a56 ] When kernel is booted with idle=nomwait do not use MWAIT as the default idle state. If the user boots the kernel with idle=nomwait, it is a clear direction to not use mwait as the default idle state. However, the current code does not take this into consideration while selecting the default idle state on x86. Fix it by checking for the idle=nomwait boot option in prefer_mwait_c1_over_halt(). Also update the documentation around idle=nomwait appropriately. [ dhansen: tweak commit message ] Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lkml.kernel.org/r/fdc2dc2d0a1bc21c2f53d989ea2d2ee3ccbc0dbe.1654538381.git-series.wyes.karny@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:23:00 +02:00
Paolo Bonzini	abedd69baf	KVM: x86: revalidate steal time cache if MSR value changes commit 901d3765fa804ce42812f1d5b1f3de2dfbb26723 upstream. Commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status", 2021-11-11) open coded the previous call to kvm_map_gfn, but in doing so it dropped the comparison between the cached guest physical address and the one in the MSR. This cause an incorrect cache hit if the guest modifies the steal time address while the memslots remain the same. This can happen with kexec, in which case the steal time data is written at the address used by the old kernel instead of the old one. While at it, rename the variable from gfn to gpa since it is a plain physical address and not a right-shifted one. Reported-by: Dave Young <ruyang@redhat.com> Reported-by: Xiaoying Yan <yiyan@redhat.com> Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:49 +02:00
Paolo Bonzini	77e26cdf5c	KVM: x86: do not report preemption if the steal time cache is stale commit c3c28d24d910a746b02f496d190e0e8c6560224b upstream. Commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status", 2021-11-11) open coded the previous call to kvm_map_gfn, but in doing so it dropped the comparison between the cached guest physical address and the one in the MSR. This cause an incorrect cache hit if the guest modifies the steal time address while the memslots remain the same. This can happen with kexec, in which case the preempted bit is written at the address used by the old kernel instead of the old one. Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:49 +02:00
Sean Christopherson	69704ca43e	KVM: x86: Tag kvm_mmu_x86_module_init() with __init commit 982bae43f11c37b51d2f1961bb25ef7cac3746fa upstream. Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists is to initialize variables when kvm.ko is loaded, i.e. it must never be called after module initialization. Fixes: 1d0e84806047 ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded") Cc: stable@vger.kernel.org Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220803224957.1285926-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:49 +02:00
Vitaly Kuznetsov	439fcac3d0	KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1 commit 156b9d76e8822f2956c15029acf2d4b171502f3a upstream. Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to hang upon boot or shortly after when a non-default TSC frequency was set for L1. The issue is observed on a host where TSC scaling is supported. The problem appears to be that Windows doesn't use TSC scaling for its guests, even when the feature is advertised, and KVM filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from L1's VMCS. This leads to L2 running with the default frequency (matching host's) while L1 is running with an altered one. Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when it was set for L1. TSC_MULTIPLIER is already correctly computed and written by prepare_vmcs02(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: d041b5ea93352b ("KVM: nVMX: Enable nested TSC scaling") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220712135009.952805-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:49 +02:00
Sean Christopherson	14aebe952f	KVM: x86: Set error code to segment selector on LLDT/LTR non-canonical #GP commit 2626206963ace9e8bf92b6eea5ff78dd674c555c upstream. When injecting a #GP on LLDT/LTR due to a non-canonical LDT/TSS base, set the error code to the selector. Intel SDM's says nothing about the #GP, but AMD's APM explicitly states that both LLDT and LTR set the error code to the selector, not zero. Note, a non-canonical memory operand on LLDT/LTR does generate a #GP(0), but the KVM code in question is specific to the base from the descriptor. Fixes: e37a75a13cda ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:48 +02:00
Sean Christopherson	ccbf3f955c	KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks commit ec6e4d863258d4bfb36d48d5e3ef68140234d688 upstream. Wait to mark the TSS as busy during LTR emulation until after all fault checks for the LTR have passed. Specifically, don't mark the TSS busy if the new TSS base is non-canonical. Opportunistically drop the one-off !seg_desc.PRESENT check for TR as the only reason for the early check was to avoid marking a !PRESENT TSS as busy, i.e. the common !PRESENT is now done before setting the busy bit. Fixes: e37a75a13cda ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR") Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:48 +02:00
Sean Christopherson	2a117667f3	KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4 commit c7d855c2aff2d511fd60ee2e356134c4fb394799 upstream. Inject a #UD if L1 attempts VMXON with a CR0 or CR4 that is disallowed per the associated nested VMX MSRs' fixed0/1 settings. KVM cannot rely on hardware to perform the checks, even for the few checks that have higher priority than VM-Exit, as (a) KVM may have forced CR0/CR4 bits in hardware while running the guest, (b) there may incompatible CR0/CR4 bits that have lower priority than VM-Exit, e.g. CR0.NE, and (c) userspace may have further restricted the allowed CR0/CR4 values by manipulating the guest's nested VMX MSRs. Note, despite a very strong desire to throw shade at Jim, commit 70f3aac964ae ("kvm: nVMX: Remove superfluous VMX instruction fault checks") is not to blame for the buggy behavior (though the comment...). That commit only removed the CR0.PE, EFLAGS.VM, and COMPATIBILITY mode checks (though it did erroneously drop the CPL check, but that has already been remedied). KVM may force CR0.PE=1, but will do so only when also forcing EFLAGS.VM=1 to emulate Real Mode, i.e. hardware will still #UD. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216033 Fixes: ec378aeef9df ("KVM: nVMX: Implement VMXON and VMXOFF") Reported-by: Eric Li <ercli@ucdavis.edu> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:48 +02:00
Sean Christopherson	3868687afa	KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks commit ca58f3aa53d165afe4ab74c755bc2f6d168617ac upstream. Check that the guest (L2) and host (L1) CR4 values that would be loaded by nested VM-Enter and VM-Exit respectively are valid with respect to KVM's (L0 host) allowed CR4 bits. Failure to check KVM reserved bits would allow L1 to load an illegal CR4 (or trigger hardware VM-Fail or failed VM-Entry) by massaging guest CPUID to allow features that are not supported by KVM. Amusingly, KVM itself is an accomplice in its doom, as KVM adjusts L1's MSR_IA32_VMX_CR4_FIXED1 to allow L1 to enable bits for L2 based on L1's CPUID model. Note, although nested_{guest,host}_cr4_valid() are _currently_ used if and only if the vCPU is post-VMXON (nested.vmxon == true), that may not be true in the future, e.g. emulating VMXON has a bug where it doesn't check the allowed/required CR0/CR4 bits. Cc: stable@vger.kernel.org Fixes: 3899152ccbf4 ("KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:48 +02:00
Sean Christopherson	76e6038cfa	KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value commit f8ae08f9789ad59d318ea75b570caa454aceda81 upstream. Restrict the nVMX MSRs based on KVM's config, not based on the guest's current config. Using the guest's config to audit the new config prevents userspace from restoring the original config (KVM's config) if at any point in the past the guest's config was restricted in any way. Fixes: 62cc6b9dc61e ("KVM: nVMX: support restore of VMX capability MSRs") Cc: stable@vger.kernel.org Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:48 +02:00
Sean Christopherson	9953f86a67	KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits commit c33f6f2228fe8517e38941a508e9f905f99ecba9 upstream. Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits checks, into a separate helper, __kvm_is_valid_cr4(), and export only the inner helper to vendor code in order to prevent nested VMX from calling back into vmx_is_valid_cr4() via kvm_is_valid_cr4(). On SVM, this is a nop as SVM doesn't place any additional restrictions on CR4. On VMX, this is also currently a nop, but only because nested VMX is missing checks on reserved CR4 bits for nested VM-Enter. That bug will be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is, but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's restrictions on CR0/CR4. The cleanest and most intuitive way to fix the VMXON bug is to use nested_host_cr{0,4}_valid(). If the CR4 variant routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's restrictions if and only if the vCPU is post-VMXON. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:22:48 +02:00

1 2 3 4 5 ...

39876 Commits