linux

iv/linux

Author	SHA1	Message	Date
Feng Tang	97640d8e2c	x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4 commit 2c66ca3949dc701da7f4c9407f2140ae425683a5 upstream. 0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and bisected it to commit b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves the CR4_OSXSAVE enabling into a later place: arch_cpu_finalize_init identify_boot_cpu identify_cpu generic_identify get_cpu_cap --> setup cpu capability ... fpu__init_cpu fpu__init_cpu_xstate cr4_set_bits(X86_CR4_OSXSAVE); As the FPU is not yet initialized the CPU capability setup fails to set X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64' depend on this feature and therefore fail to load, causing the regression. Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE enabling. [ tglx: Moved it into the actual BSP FPU initialization code and added a comment ] Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-30 16:27:26 +02:00
Arnd Bergmann	ed8dcd9543	x86: Move gds_ucode_mitigated() declaration to header commit eb3515dc99c7c85f4170b50838136b2a193f8012 upstream. The declaration got placed in the .c file of the caller, but that causes a warning for the definition: arch/x86/kernel/cpu/bugs.c:682:6: error: no previous prototype for 'gds_ucode_mitigated' [-Werror=missing-prototypes] Move it to a header where both sides can observe it instead. Fixes: 81ac7e5d74174 ("KVM: Add GDS_NO support to KVM") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: stable@kernel.org Link: https://lore.kernel.org/all/20230809130530.1913368-2-arnd%40kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:19:23 +02:00
Kirill A. Shutemov	6b342b1f3b	x86/mm: Fix VDSO and VVAR placement on 5-level paging machines commit 1b8b1aa90c9c0e825b181b98b8d9e249dc395470 upstream. Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR VMAs are placed above the 47-bit border: 8000001a9000-8000001ad000 r--p 00000000 00:00 0 [vvar] 8000001ad000-8000001af000 r-xp 00000000 00:00 0 [vdso] This might confuse users who are not aware of 5-level paging and expect all userspace addresses to be under the 47-bit border. So far problem has only been triggered with ASLR disabled, although it may also occur with ASLR enabled if the layout is randomized in a just right way. The problem happens due to custom placement for the VMAs in the VDSO code: vdso_addr() tries to place them above the stack and checks the result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW instead. Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace") Reported-by: Yingcong Wu <yingcong.wu@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:19:23 +02:00
Cristian Ciocaltea	91a5e755e1	x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405 commit 6dbef74aeb090d6bee7d64ef3fa82ae6fa53f271 upstream. Commit 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") provided a fix for the Zen2 VZEROUPPER data corruption bug affecting a range of CPU models, but the AMD Custom APU 0405 found on SteamDeck was not listed, although it is clearly affected by the vulnerability. Add this CPU variant to the Zenbleed erratum list, in order to unconditionally enable the fallback fix until a proper microcode update is available. Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230811203705.1699914-1-cristian.ciocaltea@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:19:23 +02:00
Thomas Gleixner	655716938d	x86/pkeys: Revert a5eff7259790 ("x86/pkeys: Add PKRU value to init_fpstate") commit b3607269ff57fd3c9690cb25962c5e4b91a0fd3b upstream. This cannot work and it's unclear how that ever made a difference. init_fpstate.xsave.header.xfeatures is always 0 so get_xsave_addr() will always return a NULL pointer, which will prevent storing the default PKRU value in init_fpstate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121451.451391598@linutronix.de Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-16 18:19:23 +02:00
Greg Kroah-Hartman	9399ea1ce4	x86: fix backwards merge of GDS/SRSO bit Stable-tree-only change. Due to the way the GDS and SRSO patches flowed into the stable tree, it was a 50% chance that the merge of the which value GDS and SRSO should be. Of course, I lost that bet, and chose the opposite of what Linus chose in commit 64094e7e3118 ("Merge tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") Fix this up by switching the values to match what is now in Linus's tree as that is the correct value to mirror. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:37 +02:00
Kim Phillips	43ed6f79b3	x86/cpu, kvm: Add support for CPUID_80000021_EAX commit 8415a74852d7c24795007ee9862d25feb519007c upstream. Add support for CPUID leaf 80000021, EAX. The majority of the features will be used in the kernel and thus a separate leaf is appropriate. Include KVM's reverse_cpuid entry because features are used by VM guests, too. [ bp: Massage commit message. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-2-kim.phillips@amd.com [bwh: Backported to 6.1: adjust context] Signed-off-by: Ben Hutchings <benh@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:37 +02:00
Borislav Petkov (AMD)	1f0618bb24	x86/bugs: Increase the x86 bugs vector size to two u32s Upstream commit: 0e52740ffd10c6c316837c6c128f460f1aaba1ea There was never a doubt in my mind that they would not fit into a single u32 eventually. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:37 +02:00
Sean Christopherson	694b40dcfb	x86/cpufeatures: Assign dedicated feature word for CPUID_0x8000001F[EAX] commit fb35d30fe5b06cc24444f0405da8fbe0be5330d1 upstream. Collect the scattered SME/SEV related feature flags into a dedicated word. There are now five recognized features in CPUID.0x8000001F.EAX, with at least one more on the horizon (SEV-SNP). Using a dedicated word allows KVM to use its automagic CPUID adjustment logic when reporting the set of supported features to userspace. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Link: https://lkml.kernel.org/r/20210122204047.2860075-2-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:36 +02:00
Tom Lendacky	4fa849d4af	x86/cpu: Add VM page flush MSR availablility as a CPUID feature commit 69372cf01290b9587d2cee8fbe161d75d55c3adc upstream. On systems that do not have hardware enforced cache coherency between encrypted and unencrypted mappings of the same physical page, the hypervisor can use the VM page flush MSR (0xc001011e) to flush the cache contents of an SEV guest page. When a small number of pages are being flushed, this can be used in place of issuing a WBINVD across all CPUs. CPUID 0x8000001f_eax[2] is used to determine if the VM page flush MSR is available. Add a CPUID feature to indicate it is supported and define the MSR. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <f1966379e31f9b208db5257509c4a089a87d33d0.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:36 +02:00
Tom Lendacky	998eec0666	x86/cpufeatures: Add SEV-ES CPU feature commit 360e7c5c4ca4fd8e627781ed42f95d58bc3bb732 upstream. Add CPU feature detection for Secure Encrypted Virtualization with Encrypted State. This feature enhances SEV by also encrypting the guest register state, making it in-accessible to the hypervisor. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-6-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:36 +02:00
Peter Zijlstra	3d1b8cfdd0	x86/mm: Use mm_alloc() in poking_init() commit 3f4c8211d982099be693be9aa7d6fc4607dff290 upstream. Instead of duplicating init_mm, allocate a fresh mm. The advantage is that mm_alloc() has much simpler dependencies. Additionally it makes more conceptual sense, init_mm has no (and must not have) user state to duplicate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:36 +02:00
Juergen Gross	ddcf05fe88	x86/mm: fix poking_init() for Xen PV guests commit 26ce6ec364f18d2915923bc05784084e54a5c4cc upstream. Commit 3f4c8211d982 ("x86/mm: Use mm_alloc() in poking_init()") broke the kernel for running as Xen PV guest. It seems as if the new address space is never activated before being used, resulting in Xen rejecting to accept the new CR3 value (the PGD isn't pinned). Fix that by adding the now missing call of paravirt_arch_dup_mmap() to poking_init(). That call was previously done by dup_mm()->dup_mmap() and it is a NOP for all cases but for Xen PV, where it is just doing the pinning of the PGD. Fixes: 3f4c8211d982 ("x86/mm: Use mm_alloc() in poking_init()") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230109150922.10578-1-jgross@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:36 +02:00
Juergen Gross	3f8968f1f0	x86/xen: Fix secondary processors' FPU initialization commit fe3e0a13e597c1c8617814bf9b42ab732db5c26e upstream. Moving the call of fpu__init_cpu() from cpu_init() to start_secondary() broke Xen PV guests, as those don't call start_secondary() for APs. Call fpu__init_cpu() in Xen's cpu_bringup(), which is the Xen PV replacement of start_secondary(). Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230703130032.22916-1-jgross@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:35 +02:00
Daniel Sneddon	e56c1e0f91	KVM: Add GDS_NO support to KVM commit 81ac7e5d741742d650b4ed6186c4826c1a0631a7 upstream Gather Data Sampling (GDS) is a transient execution attack using gather instructions from the AVX2 and AVX512 extensions. This attack allows malicious code to infer data that was previously stored in vector registers. Systems that are not vulnerable to GDS will set the GDS_NO bit of the IA32_ARCH_CAPABILITIES MSR. This is useful for VM guests that may think they are on vulnerable systems that are, in fact, not affected. Guests that are running on affected hosts where the mitigation is enabled are protected as if they were running on an unaffected system. On all hosts that are not affected or that are mitigated, set the GDS_NO bit. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:35 +02:00
Daniel Sneddon	ed56430ab2	x86/speculation: Add Kconfig option for GDS commit 53cf5797f114ba2bd86d23a862302119848eff19 upstream Gather Data Sampling (GDS) is mitigated in microcode. However, on systems that haven't received the updated microcode, disabling AVX can act as a mitigation. Add a Kconfig option that uses the microcode mitigation if available and disables AVX otherwise. Setting this option has no effect on systems not affected by GDS. This is the equivalent of setting gather_data_sampling=force. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:35 +02:00
Daniel Sneddon	e35c657943	x86/speculation: Add force option to GDS mitigation commit 553a5c03e90a6087e88f8ff878335ef0621536fb upstream The Gather Data Sampling (GDS) vulnerability allows malicious software to infer stale data previously stored in vector registers. This may include sensitive data such as cryptographic keys. GDS is mitigated in microcode, and systems with up-to-date microcode are protected by default. However, any affected system that is running with older microcode will still be vulnerable to GDS attacks. Since the gather instructions used by the attacker are part of the AVX2 and AVX512 extensions, disabling these extensions prevents gather instructions from being executed, thereby mitigating the system from GDS. Disabling AVX2 is sufficient, but we don't have the granularity to do this. The XCR0[2] disables AVX, with no option to just disable AVX2. Add a kernel parameter gather_data_sampling=force that will enable the microcode mitigation if available, otherwise it will disable AVX on affected systems. This option will be ignored if cmdline mitigations=off. This is a big hammer. It is known to break buggy userspace that uses incomplete, buggy AVX enumeration. Unfortunately, such userspace does exist in the wild: https://www.mail-archive.com/bug-coreutils@gnu.org/msg33046.html [ dhansen: add some more ominous warnings about disabling AVX ] Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:35 +02:00
Daniel Sneddon	f68f9f2df6	x86/speculation: Add Gather Data Sampling mitigation commit 8974eb588283b7d44a7c91fa09fcbaf380339f3a upstream Gather Data Sampling (GDS) is a hardware vulnerability which allows unprivileged speculative access to data which was previously stored in vector registers. Intel processors that support AVX2 and AVX512 have gather instructions that fetch non-contiguous data elements from memory. On vulnerable hardware, when a gather instruction is transiently executed and encounters a fault, stale data from architectural or internal vector registers may get transiently stored to the destination vector register allowing an attacker to infer the stale data using typical side channel techniques like cache timing attacks. This mitigation is different from many earlier ones for two reasons. First, it is enabled by default and a bit must be set to DISABLE it. This is the opposite of normal mitigation polarity. This means GDS can be mitigated simply by updating microcode and leaving the new control bit alone. Second, GDS has a "lock" bit. This lock bit is there because the mitigation affects the hardware security features KeyLocker and SGX. It needs to be enabled and STAY enabled for these features to be mitigated against GDS. The mitigation is enabled in the microcode by default. Disable it by setting gather_data_sampling=off or by disabling all mitigations with mitigations=off. The mitigation status can be checked by reading: /sys/devices/system/cpu/vulnerabilities/gather_data_sampling Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:35 +02:00
Thomas Gleixner	6e60443668	x86/fpu: Move FPU initialization into arch_cpu_finalize_init() commit b81fac906a8f9e682e513ddd95697ec7a20878d4 upstream Initializing the FPU during the early boot process is a pointless exercise. Early boot is convoluted and fragile enough. Nothing requires that the FPU is set up early. It has to be initialized before fork_init() because the task_struct size depends on the FPU register buffer size. Move the initialization to arch_cpu_finalize_init() which is the perfect place to do so. No functional change. This allows to remove quite some of the custom early command line parsing, but that's subject to the next installment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230613224545.902376621@linutronix.de Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:35 +02:00
Thomas Gleixner	2ee37a46aa	x86/fpu: Mark init functions __init commit 1703db2b90c91b2eb2d699519fc505fe431dde0e upstream No point in keeping them around. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230613224545.841685728@linutronix.de Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:35 +02:00
Thomas Gleixner	77fe815057	x86/fpu: Remove cpuinfo argument from init functions commit 1f34bb2a24643e0087652d81078e4f616562738d upstream Nothing in the call chain requires it Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230613224545.783704297@linutronix.de Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:35 +02:00
Thomas Gleixner	95356fff6f	init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init() commit 439e17576eb47f26b78c5bbc72e344d4206d2327 upstream Invoke the X86ism mem_encrypt_init() from X86 arch_cpu_finalize_init() and remove the weak fallback from the core code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230613224545.670360645@linutronix.de Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:34 +02:00
Thomas Gleixner	1743bc756b	x86/cpu: Switch to arch_cpu_finalize_init() commit 7c7077a72674402654f3291354720cd73cdf649e upstream check_bugs() is a dumping ground for finalizing the CPU bringup. Only parts of it has to do with actual CPU bugs. Split it apart into arch_cpu_finalize_init() and cpu_select_mitigations(). Fixup the bogus 32bit comments while at it. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230613224545.019583869@linutronix.de Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-08 19:56:33 +02:00
Shawn Wang	4841049183	x86/resctrl: Only show tasks' pid in current pid namespace [ Upstream commit 2997d94b5dd0e8b10076f5e0b6f18410c73e28bd ] When writing a task id to the "tasks" file in an rdtgroup, rdtgroup_tasks_write() treats the pid as a number in the current pid namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows the list of global pids from the init namespace, which is confusing and incorrect. To be more robust, let the "tasks" file only show pids in the current pid namespace. Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/all/20230116071246.97717-1-shawnwang@linux.alibaba.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-27 08:37:04 +02:00
James Morse	7206eca1ac	x86/resctrl: Use is_closid_match() in more places [ Upstream commit e6b2fac36fcc0b73cbef063d700a9841850e37a0 ] rdtgroup_tasks_assigned() and show_rdt_tasks() loop over threads testing for a CTRL/MON group match by closid/rmid with the provided rdtgrp. Further down the file are helpers to do this, move these further up and make use of them here. These helpers additionally check for alloc/mon capable. This is harmless as rdtgroup_mkdir() tests these capable flags before allowing the config directories to be created. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20200708163929.2783-7-james.morse@arm.com Stable-dep-of: 2997d94b5dd0 ("x86/resctrl: Only show tasks' pid in current pid namespace") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-07-27 08:37:04 +02:00
Thomas Gleixner	f042d80a63	x86/smp: Use dedicated cache-line for mwait_play_dead() commit f9c9987bf52f4e42e940ae217333ebb5a4c3b506 upstream. Monitoring idletask::thread_info::flags in mwait_play_dead() has been an obvious choice as all what is needed is a cache line which is not written by other CPUs. But there is a use case where a "dead" CPU needs to be brought out of MWAIT: kexec(). This is required as kexec() can overwrite text, pagetables, stacks and the monitored cacheline of the original kernel. The latter causes MWAIT to resume execution which obviously causes havoc on the kexec kernel which results usually in triple faults. Use a dedicated per CPU storage to prepare for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-27 08:37:03 +02:00
Borislav Petkov (AMD)	00363ef307	x86/cpu/amd: Add a Zenbleed fix Upstream commit: 522b1d69219d8f083173819fde04f994aa051a98 Add a fix for the Zen2 VZEROUPPER data corruption bug where under certain circumstances executing VZEROUPPER can cause register corruption or leak data. The optimal fix is through microcode but in the case the proper microcode revision has not been applied, enable a fallback fix using a chicken bit. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-24 19:10:53 +02:00
Borislav Petkov (AMD)	92b292bed6	x86/cpu/amd: Move the errata checking functionality up Upstream commit: 8b6f687743dacce83dbb0c7cfacf88bab00f808a Avoid new and remove old forward declarations. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-24 19:10:53 +02:00
Borislav Petkov (AMD)	4d4112e284	x86/microcode/AMD: Load late on both threads too commit a32b0f0db3f396f1c9be2fe621e77c09ec3d8e7d upstream. Do the same as early loading - load on both threads. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230605141332.25948-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-24 19:10:53 +02:00
Dheeraj Kumar Srivastava	f89bcf03e9	x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys [ Upstream commit 85d38d5810e285d5aec7fb5283107d1da70c12a9 ] When booting with "intremap=off" and "x2apic_phys" on the kernel command line, the physical x2APIC driver ends up being used even when x2APIC mode is disabled ("intremap=off" disables x2APIC mode). This happens because the first compound condition check in x2apic_phys_probe() is false due to x2apic_mode == 0 and so the following one returns true after default_acpi_madt_oem_check() having already selected the physical x2APIC driver. This results in the following panic: kernel BUG at arch/x86/kernel/apic/io_apic.c:2409! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc2-ver4.1rc2 #2 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:setup_IO_APIC+0x9c/0xaf0 Call Trace: <TASK> ? native_read_msr apic_intr_mode_init x86_late_time_init start_kernel x86_64_start_reservations x86_64_start_kernel secondary_startup_64_no_verify </TASK> which is: setup_IO_APIC: apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n"); for_each_ioapic(ioapic) BUG_ON(mp_irqdomain_create(ioapic)); Return 0 to denote that x2APIC has not been enabled when probing the physical x2APIC driver. [ bp: Massage commit message heavily. ] Fixes: 9ebd680bd029 ("x86, apic: Use probe routines to simplify apic selection") Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230616212236.1389-1-dheerajkumar.srivastava@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-06-28 10:18:42 +02:00
Lee Jones	94199d4727	x86/mm: Avoid using set_pgd() outside of real PGD pages commit d082d48737c75d2b3cc1f972b8c8674c25131534 upstream. KPTI keeps around two PGDs: one for userspace and another for the kernel. Among other things, set_pgd() contains infrastructure to ensure that updates to the kernel PGD are reflected in the user PGD as well. One side-effect of this is that set_pgd() expects to be passed whole pages. Unfortunately, init_trampoline_kaslr() passes in a single entry: 'trampoline_pgd_entry'. When KPTI is on, set_pgd() will update 'trampoline_pgd_entry' (an 8-Byte globally stored [.bss] variable) and will then proceed to replicate that value into the non-existent neighboring user page (located +4k away), leading to the corruption of other global [.bss] stored variables. Fix it by directly assigning 'trampoline_pgd_entry' and avoiding set_pgd(). [ dhansen: tweak subject and changelog ] Fixes: 0925dda5962e ("x86/mm/KASLR: Use only one PUD entry for real mode trampoline") Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/all/20230614163859.924309-1-lee@kernel.org/g Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-06-28 10:18:38 +02:00
Ricardo Ribalda	b9b61fd1f7	x86/purgatory: remove PGO flags commit 97b6b9cbba40a21c1d9a344d5c1991f8cfbf136e upstream. If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system. Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-2-b05c520b7296@chromium.org Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Cc: <stable@vger.kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@google.com> Cc: Simon Horman <horms@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ricardo Ribalda Delgado <ribalda@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-06-28 10:18:35 +02:00
Kees Cook	0638dcc7e7	treewide: Remove uninitialized_var() usage commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream. Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' \| cut -d: -f1 \| sort -u \| \ xargs perl -pi -e \ 's/\buninitialized_var$([^$]+)\)/\1/g; s:\s/\ (GCC be quiet\|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-06-09 10:29:01 +02:00
Kees Cook	a010f8e646	x86/boot: Wrap literal addresses in absolute_pointer() commit aeb84412037b89e06f45e382f044da6f200e12f8 upstream. GCC 11 (incorrectly[1]) assumes that literal values cast to (void ) should be treated like a NULL pointer with an offset, and raises diagnostics when doing bounds checking under -Warray-bounds. GCC 12 got "smarter" about finding these: In function 'rdfs8', inlined from 'vga_recalc_vertical' at /srv/code/arch/x86/boot/video-mode.c:124:29, inlined from 'set_mode' at /srv/code/arch/x86/boot/video-mode.c:163:3: /srv/code/arch/x86/boot/boot.h:114:9: warning: array subscript 0 is outside array bounds of 'u8[0]' {aka 'unsigned char[]'} [-Warray-bounds] 114 \| asm volatile("movb %%fs:%1,%0" : "=q" (v) : "m" ((u8 *)addr)); \| ^~~ This has been solved in other places[2] already by using the recently added absolute_pointer() macro. Do the same here. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578 [2] https://lore.kernel.org/all/20210912160149.2227137-1-linux@roeck-us.net/ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20220227195918.705219-1-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-06-09 10:29:01 +02:00
Vernon Lovejoy	0099a29bc5	x86/show_trace_log_lvl: Ensure stack pointer is aligned, again commit 2e4be0d011f21593c6b316806779ba1eba2cd7e0 upstream. The commit e335bb51cc15 ("x86/unwind: Ensure stack pointer is aligned") tried to align the stack pointer in show_trace_log_lvl(), otherwise the "stack < stack_info.end" check can't guarantee that the last read does not go past the end of the stack. However, we have the same problem with the initial value of the stack pointer, it can also be unaligned. So without this patch this trivial kernel module #include <linux/module.h> static int init(void) { asm volatile("sub $0x4,%rsp"); dump_stack(); asm volatile("add $0x4,%rsp"); return -EAGAIN; } module_init(init); MODULE_LICENSE("GPL"); crashes the kernel. Fixes: e335bb51cc15 ("x86/unwind: Ensure stack pointer is aligned") Signed-off-by: Vernon Lovejoy <vlovejoy@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20230512104232.GA10227@redhat.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-30 12:44:10 +01:00
Zhang Rui	4e5a7181a6	x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms commit edc0a2b5957652f4685ef3516f519f84807087db upstream. Traditionally, all CPUs in a system have identical numbers of SMT siblings. That changes with hybrid processors where some logical CPUs have a sibling and others have none. Today, the CPU boot code sets the global variable smp_num_siblings when every CPU thread is brought up. The last thread to boot will overwrite it with the number of siblings of that thread. That last thread to boot will "win". If the thread is a Pcore, smp_num_siblings == 2. If it is an Ecore, smp_num_siblings == 1. smp_num_siblings describes if the system supports SMT. It should specify the maximum number of SMT threads among all cores. Ensure that smp_num_siblings represents the system-wide maximum number of siblings by always increasing its value. Never allow it to decrease. On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are not updated in any cpu sibling map because the system is treated as an UP system when probing Ecore CPUs. Below shows part of the CPU topology information before and after the fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore). ... -/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff -/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11 +/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff +/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21 ... -/sys/devices/system/cpu/cpu12/topology/package_cpus:001000 -/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12 +/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff +/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21 Notice that the "before" 'package_cpus_list' has only one CPU. This means that userspace tools like lscpu will see a little laptop like an 11-socket system: -Core(s) per socket: 1 -Socket(s): 11 +Core(s) per socket: 16 +Socket(s): 1 This is also expected to make the scheduler do rather wonky things too. [ dhansen: remove CPUID detail from changelog, add end user effects ] CC: stable@kernel.org Fixes: bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology") Fixes: 95f3d39ccf7a ("x86/cpu/topology: Provide detect_extended_topology_early()") Suggested-by: Len Brown <len.brown@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-30 12:44:09 +01:00
Dave Hansen	04aee084a3	x86/mm: Avoid incomplete Global INVLPG flushes commit ce0b15d11ad837fbacc5356941712218e38a0a83 upstream. The INVLPG instruction is used to invalidate TLB entries for a specified virtual address. When PCIDs are enabled, INVLPG is supposed to invalidate TLB entries for the specified address for both the current PCID and Global entries. (Note: Only kernel mappings set Global=1.) Unfortunately, some INVLPG implementations can leave Global translations unflushed when PCIDs are enabled. As a workaround, never enable PCIDs on affected processors. I expect there to eventually be microcode mitigations to replace this software workaround. However, the exact version numbers where that will happen are not known today. Once the version numbers are set in stone, the processor list can be tweaked to only disable PCIDs on affected processors with affected microcode. Note: if anyone wants a quick fix that doesn't require patching, just stick 'nopcid' on your kernel command-line. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-30 12:44:09 +01:00
Paolo Bonzini	1eb3e32de7	KVM: x86: do not report a vCPU as preempted outside instruction boundaries commit 6cd88243c7e03845a450795e134b488fc2afb736 upstream. If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit not instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [OP: use VCPU_STAT() for debugfs entries] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-30 12:44:07 +01:00
Saurabh Sengar	442470948c	x86/ioapic: Don't return 0 from arch_dynirq_lower_bound() [ Upstream commit 5af507bef93c09a94fb8f058213b489178f4cbe5 ] arch_dynirq_lower_bound() is invoked by the core interrupt code to retrieve the lowest possible Linux interrupt number for dynamically allocated interrupts like MSI. The x86 implementation uses this to exclude the IO/APIC GSI space. This works correctly as long as there is an IO/APIC registered, but returns 0 if not. This has been observed in VMs where the BIOS does not advertise an IO/APIC. 0 is an invalid interrupt number except for the legacy timer interrupt on x86. The return value is unchecked in the core code, so it ends up to allocate interrupt number 0 which is subsequently considered to be invalid by the caller, e.g. the MSI allocation code. The function has already a check for 0 in the case that an IO/APIC is registered, as ioapic_dynirq_base is 0 in case of device tree setups. Consolidate this and zero check for both ioapic_dynirq_base and gsi_top, which is used in the case that no IO/APIC is registered. Fixes: 3e5bedc2c258 ("x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines") Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1679988604-20308-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-17 11:35:39 +02:00
Uros Bizjak	0c61a6897c	x86/apic: Fix atomic update of offset in reserve_eilvt_offset() [ Upstream commit f96fb2df3eb31ede1b34b0521560967310267750 ] The detection of atomic update failure in reserve_eilvt_offset() is not correct. The value returned by atomic_cmpxchg() should be compared to the old value from the location to be updated. If these two are the same, then atomic update succeeded and "eilvt_offsets[offset]" location is updated to "new" in an atomic way. Otherwise, the atomic update failed and it should be retried with the value from "eilvt_offsets[offset]" - exactly what atomic_try_cmpxchg() does in a correct and more optimal way. Fixes: a68c439b1966c ("apic, x86: Check if EILVT APIC registers are available (AMD only)") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230227160917.107820-1-ubizjak@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-17 11:35:37 +02:00
Sean Christopherson	d482617fa6	KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not intercepted commit 4984563823f0034d3533854c1b50e729f5191089 upstream. Extend VMX's nested intercept logic for emulated instructions to handle "pause" interception, in quotes because KVM's emulator doesn't filter out NOPs when checking for nested intercepts. Failure to allow emulation of NOPs results in KVM injecting a #UD into L2 on any NOP that collides with the emulator's definition of PAUSE, i.e. on all single-byte NOPs. For PAUSE itself, honor L1's PAUSE-exiting control, but ignore PLE to avoid unnecessarily injecting a #UD into L2. Per the SDM, the first execution of PAUSE after VM-Entry is treated as the beginning of a new loop, i.e. will never trigger a PLE VM-Exit, and so L1 can't expect any given execution of PAUSE to deterministically exit. ... the processor considers this execution to be the first execution of PAUSE in a loop. (It also does so for the first execution of PAUSE at CPL 0 after VM entry.) All that said, the PLE side of things is currently a moot point, as KVM doesn't expose PLE to L1. Note, vmx_check_intercept() is still wildly broken when L1 wants to intercept an instruction, as KVM injects a #UD instead of synthesizing a nested VM-Exit. That issue extends far beyond NOP/PAUSE and needs far more effort to fix, i.e. is a problem for the future. Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Cc: Mathias Krause <minipli@grsecurity.net> Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230405002359.418138-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-17 11:35:33 +02:00
Pingfan Liu	ef2aab86c3	x86/purgatory: Don't generate debug info for purgatory.ro commit 52416ffcf823ee11aa19792715664ab94757f111 upstream. Purgatory.ro is a standalone binary that is not linked against the rest of the kernel. Its image is copied into an array that is linked to the kernel, and from there kexec relocates it wherever it desires. Unlike the debug info for vmlinux, which can be used for analyzing crash such info is useless in purgatory.ro. And discarding them can save about 200K space. Original: 259080 kexec-purgatory.o Stripped debug info: 29152 kexec-purgatory.o Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Acked-by: Dave Young <dyoung@redhat.com> Link: https://lore.kernel.org/r/1596433788-3784-1-git-send-email-kernelfans@gmail.com [Alyssa: fixed for LLVM_IAS=1 by adding -g to AFLAGS_REMOVE_*] Signed-off-by: Alyssa Ross <hi@alyssa.is> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-26 11:24:04 +02:00
Hans de Goede	b3052e5d46	efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L [ Upstream commit 5ed213dd64681f84a01ceaa82fb336cf7d59ddcf ] Another Lenovo convertable which reports a landscape resolution of 1920x1200 with a pitch of (1920 * 4) bytes, while the actual framebuffer has a resolution of 1200x1920 with a pitch of (1200 * 4) bytes. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-04-20 12:07:36 +02:00
Basavaraj Natikar	4311ae04b3	x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot commit f195fc1e9715ba826c3b62d58038f760f66a4fe9 upstream. The AMD [1022:15b8] USB controller loses some internal functional MSI-X context when transitioning from D0 to D3hot. BIOS normally traps D0->D3hot and D3hot->D0 transitions so it can save and restore that internal context, but some firmware in the field can't do this because it fails to clear the AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit. Clear AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit before USB controller initialization during boot. Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u Link: https://lore.kernel.org/r/20230329172859.699743-1-Basavaraj.Natikar@amd.com Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Tested-by: Thomas Glanzmann <thomas@glanzmann.de> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-20 12:07:32 +02:00
Nikita Zhandarovich	8d26a4fecc	x86/mm: Fix use of uninitialized buffer in sme_enable() commit cbebd68f59f03633469f3ecf9bea99cd6cce3854 upstream. cmdline_find_option() may fail before doing any initialization of the buffer array. This may lead to unpredictable results when the same buffer is used later in calls to strncmp() function. Fix the issue by returning early if cmdline_find_option() returns an error. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230306160656.14844-1-n.zhandarovich@fintech.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-22 13:28:09 +01:00
Paolo Bonzini	65e4c9a6d0	KVM: nVMX: add missing consistency checks for CR0 and CR4 commit 112e66017bff7f2837030f34c2bc19501e9212d5 upstream. The effective values of the guest CR0 and CR4 registers may differ from those included in the VMCS12. In particular, disabling EPT forces CR4.PAE=1 and disabling unrestricted guest mode forces CR0.PG=CR0.PE=1. Therefore, checks on these bits cannot be delegated to the processor and must be performed by KVM. Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-22 13:28:09 +01:00
H.J. Lu	a4bd6d4df3	x86, vmlinux.lds: Add RUNTIME_DISCARD_EXIT to generic DISCARDS commit 84d5f77fc2ee4e010c2c037750e32f06e55224b0 upstream. In the x86 kernel, .exit.text and .exit.data sections are discarded at runtime, not by the linker. Add RUNTIME_DISCARD_EXIT to generic DISCARDS and define it in the x86 kernel linker script to keep them. The sections are added before the DISCARD directive so document here only the situation explicitly as this change doesn't have any effect on the generated kernel. Also, other architectures like ARM64 will use it too so generalize the approach with the RUNTIME_DISCARD_EXIT define. [ bp: Massage and extend commit message. ] Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200326193021.255002-1-hjl.tools@gmail.com Signed-off-by: Tom Saeger <tom.saeger@oracle.com>	2023-03-17 08:32:53 +01:00
Andrew Cooper	e40c1e9da1	x86/CPU/AMD: Disable XSAVES on AMD family 0x17 commit b0563468eeac88ebc70559d52a0b66efc37e4e9d upstream. AMD Erratum 1386 is summarised as: XSAVES Instruction May Fail to Save XMM Registers to the Provided State Save Area This piece of accidental chronomancy causes the %xmm registers to occasionally reset back to an older value. Ignore the XSAVES feature on all AMD Zen1/2 hardware. The XSAVEC instruction (which works fine) is equivalent on affected parts. [ bp: Typos, move it into the F17h-specific function. ] Reported-by: Tavis Ormandy <taviso@gmail.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230307174643.1240184-1-andrew.cooper3@citrix.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-17 08:32:47 +01:00
Linus Torvalds	27c64d90d9	x86/resctl: fix scheduler confusion with 'current' commit 7fef099702527c3b2c5234a2ea6a24411485a13a upstream. The implementation of 'current' on x86 is very intentionally special: it is a very common thing to look up, and it uses 'this_cpu_read_stable()' to get the current thread pointer efficiently from per-cpu storage. And the keyword in there is 'stable': the current thread pointer never changes as far as a single thread is concerned. Even if when a thread is preempted, or moved to another CPU, or even across an explicit call 'schedule()' that thread will still have the same value for 'current'. It is, after all, the kernel base pointer to thread-local storage. That's why it's stable to begin with, but it's also why it's important enough that we have that special 'this_cpu_read_stable()' access for it. So this is all done very intentionally to allow the compiler to treat 'current' as a value that never visibly changes, so that the compiler can do CSE and combine multiple different 'current' accesses into one. However, there is obviously one very special situation when the currently running thread does actually change: inside the scheduler itself. So the scheduler code paths are special, and do not have a 'current' thread at all. Instead there are _two_ threads: the previous and the next thread - typically called 'prev' and 'next' (or prev_p/next_p) internally. So this is all actually quite straightforward and simple, and not all that complicated. Except for when you then have special code that is run in scheduler context, that code then has to be aware that 'current' isn't really a valid thing. Did you mean 'prev'? Did you mean 'next'? In fact, even if then look at the code, and you use 'current' after the new value has been assigned to the percpu variable, we have explicitly told the compiler that 'current' is magical and always stable. So the compiler is quite free to use an older (or newer) value of 'current', and the actual assignment to the percpu storage is not relevant even if it might look that way. Which is exactly what happened in the resctl code, that blithely used 'current' in '__resctrl_sched_in()' when it really wanted the new process state (as implied by the name: we're scheduling 'into' that new resctl state). And clang would end up just using the old thread pointer value at least in some configurations. This could have happened with gcc too, and purely depends on random compiler details. Clang just seems to have been more aggressive about moving the read of the per-cpu current_task pointer around. The fix is trivial: just make the resctl code adhere to the scheduler rules of using the prev/next thread pointer explicitly, instead of using 'current' in a situation where it just wasn't valid. That same code is then also used outside of the scheduler context (when a thread resctl state is explicitly changed), and then we will just pass in 'current' as that pointer, of course. There is no ambiguity in that case. The fix may be trivial, but noticing and figuring out what went wrong was not. The credit for that goes to Stephane Eranian. Reported-by: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/ Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/ Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Stephane Eranian <eranian@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-11 16:44:16 +01:00
Valentin Schneider	81da72aaf5	x86/resctrl: Apply READ_ONCE/WRITE_ONCE to task_struct.{rmid,closid} commit 6d3b47ddffed70006cf4ba360eef61e9ce097d8f upstream. A CPU's current task can have its {closid, rmid} fields read locally while they are being concurrently written to from another CPU. This can happen anytime __resctrl_sched_in() races with either __rdtgroup_move_task() or rdt_move_group_tasks(). Prevent load / store tearing for those accesses by giving them the READ_ONCE() / WRITE_ONCE() treatment. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/9921fda88ad81afb9885b517fbe864a2bc7c35a9.1608243147.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-11 16:44:16 +01:00

1 2 3 4 5 ...

34162 Commits