linux

iv/linux

Author	SHA1	Message	Date
Jakub Kicinski	7ff57803d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-18 12:44:56 -07:00
Pengfei Xu	0d345996e4	x86/kernel: increase kcov coverage under arch/x86/kernel folder Currently kcov instrument is disabled for object files under arch/x86/kernel folder. For object files under arch/x86/kernel, actually just disabling the kcov instrument of files:"head32.o or head64.o and sev.o" could achieve successful booting and provide kcov coverage for object files that do not disable kcov instrument. The additional kcov coverage collected from arch/x86/kernel folder helps kernel fuzzing efforts to find bugs. Link to related improvement discussion is below: https://groups.google.com/g/syzkaller/c/Dsl-RYGCqs8/m/x-tfpTyFBAAJ Related ticket is as follow: https://bugzilla.kernel.org/show_bug.cgi?id=198443 Link: https://lkml.kernel.org/r/06c0bb7b5f61e5884bf31180e8c122648c752010.1690771380.git.pengfei.xu@intel.com Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Pengfei Xu <pengfei.xu@intel.com> Cc: Aleksandr Nogikh <nogikh@google.com> Cc: <heng.su@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Kees Cook <keescook@google.com>, Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:19:01 -07:00
Douglas Anderson	8d539b84f1	nmi_backtrace: allow excluding an arbitrary CPU The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a place to allocate a CPU mask just to handle the common case. Let's extend the API to take a CPU ID to exclude instead of just a boolean. This isn't any more complex for the API to handle and allows the hardlockup detector to exclude a different CPU (the one it already did a trace for) without needing to find space for a CPU mask. Arguably, this new API also encourages safer behavior. Specifically if the caller wants to avoid tracing the current CPU (maybe because they already traced the current CPU) this makes it more obvious to the caller that they need to make sure that the current CPU ID can't change. [akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub] Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:19:00 -07:00
Andy Shevchenko	3c9d017cc2	range.h: Move resource API and constant to respective files range.h works with struct range data type. The resource_size_t is an alien here. (1) Move cap_resource() implementation into its only user, and (2) rename and move RESOURCE_SIZE_MAX to limits.h. Link: https://lkml.kernel.org/r/20230804064636.15368-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Alexander Sverdlin <alexander.sverdlin@nokia.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:19:00 -07:00
Andy Shevchenko	15beb1b746	x86/asm: replace custom COUNT_ARGS() & CONCATENATE() implementations Replace custom implementation of the macros from args.h. Link: https://lkml.kernel.org/r/20230718211147.18647-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Daniel Latypov <dlatypov@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gow <davidgow@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:18:56 -07:00
Eric DeVolder	e6265fe777	kexec: rename ARCH_HAS_KEXEC_PURGATORY The Kconfig refactor to consolidate KEXEC and CRASH options utilized option names of the form ARCH_SUPPORTS_<option>. Thus rename the ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow the same. Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:18:54 -07:00
Eric DeVolder	6af5138083	x86/kexec: refactor for kernel/Kconfig.kexec The kexec and crash kernel options are provided in the common kernel/Kconfig.kexec. Utilize the common options and provide the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the equivalent set of KEXEC and CRASH options. Link: https://lkml.kernel.org/r/20230712161545.87870-3-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:18:52 -07:00
Aneesh Kumar K.V	0b6f15824c	mm/vmemmap optimization: split hugetlb and devdax vmemmap optimization Arm disabled hugetlb vmemmap optimization [1] because hugetlb vmemmap optimization includes an update of both the permissions (writeable to read-only) and the output address (pfn) of the vmemmap ptes. That is not supported without unmapping of pte(marking it invalid) by some architectures. With DAX vmemmap optimization we don't require such pte updates and architectures can enable DAX vmemmap optimization while having hugetlb vmemmap optimization disabled. Hence split DAX optimization support into a different config. s390, loongarch and riscv don't have devdax support. So the DAX config is not enabled for them. With this change, arm64 should be able to select DAX optimization [1] commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP") Link: https://lkml.kernel.org/r/20230724190759.483013-8-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:54 -07:00
Matthew Wilcox (Oracle)	284e059204	mm: remove CONFIG_PER_VMA_LOCK ifdefs Patch series "Handle most file-backed faults under the VMA lock", v3. This patchset adds the ability to handle page faults on parts of files which are already in the page cache without taking the mmap lock. This patch (of 10): Provide lock_vma_under_rcu() when CONFIG_PER_VMA_LOCK is not defined to eliminate ifdefs in the users. Link: https://lkml.kernel.org/r/20230724185410.1124082-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230724185410.1124082-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:50 -07:00
Alistair Popple	1af5a81099	mmu_notifiers: rename invalidate_range notifier There are two main use cases for mmu notifiers. One is by KVM which uses mmu_notifier_invalidate_range_start()/end() to manage a software TLB. The other is to manage hardware TLBs which need to use the invalidate_range() callback because HW can establish new TLB entries at any time. Hence using start/end() can lead to memory corruption as these callbacks happen too soon/late during page unmap. mmu notifier users should therefore either use the start()/end() callbacks or the invalidate_range() callbacks. To make this usage clearer rename the invalidate_range() callback to arch_invalidate_secondary_tlbs() and update documention. Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Andrew Donnellan <ajd@linux.ibm.com> Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Frederic Barrat <fbarrat@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolin Chen <nicolinc@nvidia.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: SeongJae Park <sj@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Zhi Wang <zhi.wang.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:41 -07:00
Alistair Popple	6bbd42e2df	mmu_notifiers: call invalidate_range() when invalidating TLBs The invalidate_range() is going to become an architecture specific mmu notifier used to keep the TLB of secondary MMUs such as an IOMMU in sync with the CPU page tables. Currently it is called from separate code paths to the main CPU TLB invalidations. This can lead to a secondary TLB not getting invalidated when required and makes it hard to reason about when exactly the secondary TLB is invalidated. To fix this move the notifier call to the architecture specific TLB maintenance functions for architectures that have secondary MMUs requiring explicit software invalidations. This fixes a SMMU bug on ARM64. On ARM64 PTE permission upgrades require a TLB invalidation. This invalidation is done by the architecture specific ptep_set_access_flags() which calls flush_tlb_page() if required. However this doesn't call the notifier resulting in infinite faults being generated by devices using the SMMU if it has previously cached a read-only PTE in it's TLB. Moving the invalidations into the TLB invalidation functions ensures all invalidations happen at the same time as the CPU invalidation. The architecture specific flush_tlb_all() routines do not call the notifier as none of the IOMMUs require this. Link: https://lkml.kernel.org/r/0287ae32d91393a582897d6c4db6f7456b1001f2.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Tested-by: SeongJae Park <sj@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Andrew Donnellan <ajd@linux.ibm.com> Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Frederic Barrat <fbarrat@linux.ibm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolin Chen <nicolinc@nvidia.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Zhi Wang <zhi.wang.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:41 -07:00
Yicong Yang	db6c1f6f23	mm/tlbbatch: introduce arch_flush_tlb_batched_pending() Currently we'll flush the mm in flush_tlb_batched_pending() to avoid race between reclaim unmaps pages by batched TLB flush and mprotect/munmap/etc. Other architectures like arm64 may only need a synchronization barrier(dsb) here rather than a full mm flush. So add arch_flush_tlb_batched_pending() to allow an arch-specific implementation here. This intends no functional changes on x86 since still a full mm flush for x86. Link: https://lkml.kernel.org/r/20230717131004.12662-4-yangyicong@huawei.com Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Barry Song <baohua@kernel.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Darren Hart <darren@os.amperecomputing.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: lipeifeng <lipeifeng@oppo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Miao <realmz6@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Xin Hao <xhao@linux.alibaba.com> Cc: Zeng Tao <prime.zeng@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:37 -07:00
Barry Song	f73419bb89	mm/tlbbatch: rename and extend some functions This patch does some preparation works to extend batched TLB flush to arm64. Including: - Extend set_tlb_ubc_flush_pending() and arch_tlbbatch_add_mm() to accept an additional argument for address, architectures like arm64 may need this for tlbi. - Rename arch_tlbbatch_add_mm() to arch_tlbbatch_add_pending() to match its current function since we don't need to handle mm on architectures like arm64 and add_mm is not proper, add_pending will make sense to both as on x86 we're pending the TLB flush operations while on arm64 we're pending the synchronize operations. This intends no functional changes on x86. Link: https://lkml.kernel.org/r/20230717131004.12662-3-yangyicong@huawei.com Tested-by: Yicong Yang <yangyicong@hisilicon.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Punit Agrawal <punit.agrawal@bytedance.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xin Hao <xhao@linux.alibaba.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nadav Amit <namit@vmware.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Barry Song <baohua@kernel.org> Cc: Darren Hart <darren@os.amperecomputing.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: lipeifeng <lipeifeng@oppo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Miao <realmz6@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Tao <prime.zeng@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:36 -07:00
Anshuman Khandual	65c8d30e67	mm/tlbbatch: introduce arch_tlbbatch_should_defer() Patch series "arm64: support batched/deferred tlb shootdown during page reclamation/migration", v11. Though ARM64 has the hardware to do tlb shootdown, the hardware broadcasting is not free. A simplest micro benchmark shows even on snapdragon 888 with only 8 cores, the overhead for ptep_clear_flush is huge even for paging out one page mapped by only one process: 5.36% a.out [kernel.kallsyms] [k] ptep_clear_flush While pages are mapped by multiple processes or HW has more CPUs, the cost should become even higher due to the bad scalability of tlb shootdown. The same benchmark can result in 16.99% CPU consumption on ARM64 server with around 100 cores according to the test on patch 4/4. This patchset leverages the existing BATCHED_UNMAP_TLB_FLUSH by 1. only send tlbi instructions in the first stage - arch_tlbbatch_add_mm() 2. wait for the completion of tlbi by dsb while doing tlbbatch sync in arch_tlbbatch_flush() Testing on snapdragon shows the overhead of ptep_clear_flush is removed by the patchset. The micro benchmark becomes 5% faster even for one page mapped by single process on snapdragon 888. Since BATCHED_UNMAP_TLB_FLUSH is implemented only on x86, the patchset does some renaming/extension for the current implementation first (Patch 1-3), then add the support on arm64 (Patch 4). This patch (of 4): The entire scheme of deferred TLB flush in reclaim path rests on the fact that the cost to refill TLB entries is less than flushing out individual entries by sending IPI to remote CPUs. But architecture can have different ways to evaluate that. Hence apart from checking TTU_BATCH_FLUSH in the TTU flags, rest of the decision should be architecture specific. [yangyicong@hisilicon.com: rebase and fix incorrect return value type] Link: https://lkml.kernel.org/r/20230717131004.12662-1-yangyicong@huawei.com Link: https://lkml.kernel.org/r/20230717131004.12662-2-yangyicong@huawei.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [https://lore.kernel.org/linuxppc-dev/20171101101735.2318-2-khandual@linux.vnet.ibm.com/] Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Punit Agrawal <punit.agrawal@bytedance.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Darren Hart <darren@os.amperecomputing.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: lipeifeng <lipeifeng@oppo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Miao <realmz6@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Tao <prime.zeng@hisilicon.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <namit@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:36 -07:00
Baoquan He	0b1f77e74b	asm-generic/iomap.h: remove ARCH_HAS_IOREMAP_xx macros Patch series "mm: ioremap: Convert architectures to take GENERIC_IOREMAP way", v8. Motivation and implementation: ============================== Currently, many architecutres have't taken the standard GENERIC_IOREMAP way to implement ioremap_prot(), iounmap(), and ioremap_xx(), but make these functions specifically under each arch's folder. Those cause many duplicated code of ioremap() and iounmap(). In this patchset, firstly introduce generic_ioremap_prot() and generic_iounmap() to extract the generic code for GENERIC_IOREMAP. By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(), generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and iounmap() are all visible and available to arch. Arch needs to provide wrapper functions to override the generic version if there's arch specific handling in its corresponding ioremap_prot(), ioremap() or iounmap(). With these changes, duplicated ioremap/iounmap() code uder ARCH-es are removed, and the equivalent functioality is kept as before. Background info: ================ 1) The converting more architectures to take GENERIC_IOREMAP way is suggested by Christoph in below discussion: https://lore.kernel.org/all/Yp7h0Jv6vpgt6xdZ@infradead.org/T/#u 2) In the previous v1 to v3, it's basically further action after arm64 has converted to GENERIC_IOREMAP way in below patchset. It's done by adding hook ioremap_allowed() and iounmap_allowed() in ARCH to add ARCH specific handling the middle of ioremap_prot() and iounmap(). [PATCH v5 0/6] arm64: Cleanup ioremap() and support ioremap_prot() https://lore.kernel.org/all/20220607125027.44946-1-wangkefeng.wang@huawei.com/T/#u Later, during v3 reviewing, Christophe Leroy suggested to introduce generic_ioremap_prot() and generic_iounmap() to generic codes, and ARCH can provide wrapper function ioremap_prot(), ioremap() or iounmap() if needed. Christophe made a RFC patchset as below to specially demonstrate his idea. This is what v4 and now v5 is doing. [RFC PATCH 0/8] mm: ioremap: Convert architectures to take GENERIC_IOREMAP way https://lore.kernel.org/all/cover.1665568707.git.christophe.leroy@csgroup.eu/T/#u Testing: ======== In v8, I only applied this patchset onto the latest linus's tree to build and run on arm64 and s390. This patch (of 19): Let's use '#define ioremap_xx' and "#ifdef ioremap_xx" instead. To remove defined ARCH_HAS_IOREMAP_xx macros in <asm/io.h> of each ARCH, the ARCH's own ioremap_wc\|wt\|np definition need be above "#include <asm-generic/iomap.h>. Otherwise the redefinition error would be seen during compiling. So the relevant adjustments are made to avoid compiling error: loongarch: - doesn't include <asm-generic/iomap.h>, defining ARCH_HAS_IOREMAP_WC is redundant, so simply remove it. m68k: - selected GENERIC_IOMAP, <asm-generic/iomap.h> has been added in <asm-generic/io.h>, and <asm/kmap.h> is included above <asm-generic/iomap.h>, so simply remove ARCH_HAS_IOREMAP_WT defining. mips: - move "#include <asm-generic/iomap.h>" below ioremap_wc definition in <asm/io.h> powerpc: - remove "#include <asm-generic/iomap.h>" in <asm/io.h> because it's duplicated with the one in <asm-generic/io.h>, let's rely on the latter. x86: - selected GENERIC_IOMAP, remove #include <asm-generic/iomap.h> in the middle of <asm/io.h>. Let's rely on <asm-generic/io.h>. Link: https://lkml.kernel.org/r/20230706154520.11257-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Helge Deller <deller@gmx.de> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rich Felker <dalias@libc.org> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:32 -07:00
Kemeng Shi	6d144436d9	mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set Remove unused addr in __page_table_check_pud_set and page_table_check_pud_set. Link: https://lkml.kernel.org/r/20230713172636.1705415-9-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:29 -07:00
Kemeng Shi	a3b837130b	mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set Remove unused addr in __page_table_check_pmd_set and page_table_check_pmd_set. Link: https://lkml.kernel.org/r/20230713172636.1705415-8-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:29 -07:00
Kemeng Shi	1066293d42	mm/page_table_check: remove unused parameter in [__]page_table_check_pte_set Remove unused addr in __page_table_check_pte_set and page_table_check_pte_set. Link: https://lkml.kernel.org/r/20230713172636.1705415-7-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:29 -07:00
Kemeng Shi	931c38e164	mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear Remove unused addr in __page_table_check_pud_clear and page_table_check_pud_clear. Link: https://lkml.kernel.org/r/20230713172636.1705415-6-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:28 -07:00
Kemeng Shi	1831414cd7	mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear Remove unused addr in page_table_check_pmd_clear and __page_table_check_pmd_clear. Link: https://lkml.kernel.org/r/20230713172636.1705415-5-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:28 -07:00
Kemeng Shi	aa232204c4	mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear Remove unused addr in page_table_check_pte_clear and __page_table_check_pte_clear. Link: https://lkml.kernel.org/r/20230713172636.1705415-4-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-18 10:12:28 -07:00
Yazen Ghannam	4240e2ebe6	x86/MCE: Always save CS register on AMD Zen IF Poison errors The Instruction Fetch (IF) units on current AMD Zen-based systems do not guarantee a synchronous #MC is delivered for poison consumption errors. Therefore, MCG_STATUS[EIPV\|RIPV] will not be set. However, the microarchitecture does guarantee that the exception is delivered within the same context. In other words, the exact rIP is not known, but the context is known to not have changed. There is no architecturally-defined method to determine this behavior. The Code Segment (CS) register is always valid on such IF unit poison errors regardless of the value of MCG_STATUS[EIPV\|RIPV]. Add a quirk to save the CS register for poison consumption from the IF unit banks. This is needed to properly determine the context of the error. Otherwise, the severity grading function will assume the context is IN_KERNEL due to the m->cs value being 0 (the initialized value). This leads to unnecessary kernel panics on data poison errors due to the kernel believing the poison consumption occurred in kernel context. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230814200853.29258-1-yazen.ghannam@amd.com	2023-08-18 13:05:52 +02:00
Borislav Petkov (AMD)	6405b72e8d	x86/srso: Correct the mitigation status when SMT is disabled Specify how is SRSO mitigated when SMT is disabled. Also, correct the SMT check for that. Fixes: e9fbc47b818b ("x86/srso: Disable the mitigation on unaffected configurations") Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20230814200813.p5czl47zssuej7nv@treble	2023-08-18 12:43:10 +02:00
Sean Christopherson	9717efbe5b	KVM: x86: Disallow guest CPUID lookups when IRQs are disabled Now that KVM has a framework for caching guest CPUID feature flags, add a "rule" that IRQs must be enabled when doing guest CPUID lookups, and enforce the rule via a lockdep assertion. CPUID lookups are slow, and within KVM, IRQs are only ever disabled in hot paths, e.g. the core run loop, fast page fault handling, etc. I.e. querying guest CPUID with IRQs disabled, especially in the run loop, should be avoided. Link: https://lore.kernel.org/r/20230815203653.519297-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:43:32 -07:00
Sean Christopherson	ee785c870d	KVM: nSVM: Use KVM-governed feature framework to track "vNMI enabled" Track "virtual NMI exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, checking KVM's capabilities instead of the "vnmi" param means that the code isn't strictly equivalent, as vnmi_enabled could have been set if nested=false where as that the governed feature cannot. But that's a glorified nop as the feature/flag is consumed only by paths that are gated by nSVM being enabled. Link: https://lore.kernel.org/r/20230815203653.519297-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:43:31 -07:00
Sean Christopherson	b89456aee7	KVM: nSVM: Use KVM-governed feature framework to track "vGIF enabled" Track "virtual GIF exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, checking KVM's capabilities instead of the "vgif" param means that the code isn't strictly equivalent, as vgif_enabled could have been set if nested=false where as that the governed feature cannot. But that's a glorified nop as the feature/flag is consumed only by paths that are Link: https://lore.kernel.org/r/20230815203653.519297-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:43:31 -07:00
Sean Christopherson	59d67fc1f0	KVM: nSVM: Use KVM-governed feature framework to track "Pause Filter enabled" Track "Pause Filtering is exposed to L1" via governed feature flags instead of using dedicated bits/flags in vcpu_svm. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:43:30 -07:00
Sean Christopherson	e183d17ac3	KVM: nSVM: Use KVM-governed feature framework to track "LBRv enabled" Track "LBR virtualization exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, checking KVM's capabilities instead of the "lbrv" param means that the code isn't strictly equivalent, as lbrv_enabled could have been set if nested=false where as that the governed feature cannot. But that's a glorified nop as the feature/flag is consumed only by paths that are gated by nSVM being enabled. Link: https://lore.kernel.org/r/20230815203653.519297-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:43:30 -07:00
Sean Christopherson	4d2a1560ff	KVM: nSVM: Use KVM-governed feature framework to track "vVM{SAVE,LOAD} enabled" Track "virtual VMSAVE/VMLOAD exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Opportunistically add a comment explaining why KVM disallows virtual VMLOAD/VMSAVE when the vCPU model is Intel. No functional change intended. Link: https://lore.kernel.org/r/20230815203653.519297-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:43:29 -07:00
Sean Christopherson	4365a45571	KVM: nSVM: Use KVM-governed feature framework to track "TSC scaling enabled" Track "TSC scaling exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, this fixes a benign bug where KVM would mark TSC scaling as exposed to L1 even if overall nested SVM supported is disabled, i.e. KVM would let L1 write MSR_AMD64_TSC_RATIO even when KVM didn't advertise TSCRATEMSR support to userspace. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:43:28 -07:00
Sean Christopherson	7a6a6a3bf5	KVM: nSVM: Use KVM-governed feature framework to track "NRIPS enabled" Track "NRIPS exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:43:28 -07:00
Sean Christopherson	1c18efdaa3	KVM: nVMX: Use KVM-governed feature framework to track "nested VMX enabled" Track "VMX exposed to L1" via a governed feature flag instead of using a dedicated helper to provide the same functionality. The main goal is to drive convergence between VMX and SVM with respect to querying features that are controllable via module param (SVM likes to cache nested features), avoiding the guest CPUID lookups at runtime is just a bonus and unlikely to provide any meaningful performance benefits. Note, X86_FEATURE_VMX is set in kvm_cpu_caps if and only if "nested" is true, and the CPU obviously supports VMX if KVM+VMX is running. I.e. the check on "nested" is now implicitly down by the kvm_cpu_cap_has() check in kvm_governed_feature_check_and_set(). No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Reviwed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:40:55 -07:00
Sean Christopherson	fe60e8f65f	KVM: x86: Use KVM-governed feature framework to track "XSAVES enabled" Use the governed feature framework to track if XSAVES is "enabled", i.e. if XSAVES can be used by the guest. Add a comment in the SVM code to explain the very unintuitive logic of deliberately NOT checking if XSAVES is enumerated in the guest CPUID model. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:38:28 -07:00
Sean Christopherson	662f681578	KVM: VMX: Rename XSAVES control to follow KVM's preferred "ENABLE_XYZ" Rename the XSAVES secondary execution control to follow KVM's preferred style so that XSAVES related logic can use common macros that depend on KVM's preferred style. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230815203653.519297-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:38:28 -07:00
Sean Christopherson	0497d2ac9b	KVM: VMX: Check KVM CPU caps, not just VMX MSR support, for XSAVE enabling Check KVM CPU capabilities instead of raw VMX support for XSAVES when determining whether or not XSAVER can/should be exposed to the guest. Practically speaking, it's nonsensical/impossible for a CPU to support "enable XSAVES" without XSAVES being supported natively. The real motivation for checking kvm_cpu_cap_has() is to allow using the governed feature's standard check-and-set logic. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:38:28 -07:00
Sean Christopherson	1143c0b85c	KVM: VMX: Recompute "XSAVES enabled" only after CPUID update Recompute whether or not XSAVES is enabled for the guest only if the guest's CPUID model changes instead of redoing the computation every time KVM generates vmcs01's secondary execution controls. The boot_cpu_has() and cpu_has_vmx_xsaves() checks should never change after KVM is loaded, and if they do the kernel/KVM is hosed. Opportunistically add a comment explaining _why_ XSAVES is effectively exposed to the guest if and only if XSAVE is also exposed to the guest. Practically speaking, no functional change intended (KVM will do fewer computations, but should still see the same xsaves_enabled value whenever KVM looks at it). Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:38:28 -07:00
Sean Christopherson	ccf31d6e6c	KVM: x86/mmu: Use KVM-governed feature framework to track "GBPAGES enabled" Use the governed feature framework to track whether or not the guest can use 1GiB pages, and drop the one-off helper that wraps the surprisingly non-trivial logic surrounding 1GiB page usage in the guest. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:38:27 -07:00
Sean Christopherson	42764413d1	KVM: x86: Add a framework for enabling KVM-governed x86 features Introduce yet another X86_FEATURE flag framework to manage and cache KVM governed features (for lack of a better name). "Governed" in this case means that KVM has some level of involvement and/or vested interest in whether or not an X86_FEATURE can be used by the guest. The intent of the framework is twofold: to simplify caching of guest CPUID flags that KVM needs to frequently query, and to add clarity to such caching, e.g. it isn't immediately obvious that SVM's bundle of flags for "optional nested SVM features" track whether or not a flag is exposed to L1. Begrudgingly define KVM_MAX_NR_GOVERNED_FEATURES for the size of the bitmap to avoid exposing governed_features.h in arch/x86/include/asm/, but add a FIXME to call out that it can and should be cleaned up once "struct kvm_vcpu_arch" is no longer expose to the kernel at large. Cc: Zeng Guang <guang.zeng@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:38:27 -07:00
Manali Shukla	f67063414c	KVM: SVM: correct the size of spec_ctrl field in VMCB save area Correct the spec_ctrl field in the VMCB save area based on the AMD Programmer's manual. Originally, the spec_ctrl was listed as u32 with 4 bytes of reserved area. The AMD Programmer's Manual now lists the spec_ctrl as 8 bytes in VMCB save area. The Public Processor Programming reference for Genoa, shows SPEC_CTRL as 64b register, but the AMD Programmer's Manual lists SPEC_CTRL as 32b register. This discrepancy will be cleaned up in next revision of the AMD Programmer's Manual. Since remaining bits above bit 7 are reserved bits in SPEC_CTRL MSR and thus, not being used, the spec_ctrl added as u32 in the VMCB save area is currently not an issue. Fixes: 3dd2775b74c9 ("KVM: SVM: Create a separate mapping for the SEV-ES save area") Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Manali Shukla <manali.shukla@amd.com> Link: https://lore.kernel.org/r/20230717041903.85480-1-manali.shukla@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:36:24 -07:00
Li zeming	392a532462	x86: kvm: x86: Remove unnecessary initial values of variables bitmap and khz is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li zeming <zeming@nfschina.com> Link: https://lore.kernel.org/r/20230817002631.2885-1-zeming@nfschina.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:35:28 -07:00
Shiyuan Gao	7d18eef136	KVM: VMX: Rename vmx_get_max_tdp_level() to vmx_get_max_ept_level() In VMX, ept_level looks better than tdp_level and is consistent with SVM's get_npt_level(). Signed-off-by: Shiyuan Gao <gaoshiyuan@baidu.com> Link: https://lore.kernel.org/r/20230810113853.98114-1-gaoshiyuan@baidu.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:34:04 -07:00
Sean Christopherson	f3cebc75e7	KVM: SVM: Set target pCPU during IRTE update if target vCPU is running Update the target pCPU for IOMMU doorbells when updating IRTE routing if KVM is actively running the associated vCPU. KVM currently only updates the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell events will be delayed until the vCPU goes through a put+load cycle (which might very well "never" happen for the lifetime of the VM). To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE routing and vCPU load/put, get the pCPU information from the vCPU's Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and update the IRTE while holding ir_list_lock. Add comments with --verbose enabled to explain exactly what is and isn't protected by ir_list_lock. Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt") Reported-by: dengqiao.joey <dengqiao.joey@bytedance.com> Cc: stable@vger.kernel.org Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20230808233132.2499764-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:32:07 -07:00
Sean Christopherson	4c08e737f0	KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry Hoist the acquisition of ir_list_lock from avic_update_iommu_vcpu_affinity() to its two callers, avic_vcpu_load() and avic_vcpu_put(), specifically to encapsulate the write to the vCPU's entry in the AVIC Physical ID table. This will allow a future fix to pull information from the Physical ID entry when updating the IRTE, without potentially consuming stale information, i.e. without racing with the vCPU being (un)loaded. Add a comment to call out that ir_list_lock does NOT protect against multiple writers, specifically that reading the Physical ID entry in avic_vcpu_put() outside of the lock is safe. To preserve some semblance of independence from ir_list_lock, keep the READ_ONCE() in avic_vcpu_load() even though acuiring the spinlock effectively ensures the load(s) will be generated after acquiring the lock. Cc: stable@vger.kernel.org Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20230808233132.2499764-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:31:37 -07:00
Sean Christopherson	7b0151caf7	KVM: x86: Remove WARN sanity check on hypervisor timer vs. UNINITIALIZED vCPU Drop the WARN in KVM_RUN that asserts that KVM isn't using the hypervisor timer, a.k.a. the VMX preemption timer, for a vCPU that is in the UNINITIALIZIED activity state. The intent of the WARN is to sanity check that KVM won't drop a timer interrupt due to an unexpected transition to UNINITIALIZED, but unfortunately userspace can use various ioctl()s to force the unexpected state. Drop the sanity check instead of switching from the hypervisor timer to a software based timer, as the only reason to switch to a software timer when a vCPU is blocking is to ensure the timer interrupt wakes the vCPU, but said interrupt isn't a valid wake event for vCPUs in UNINITIALIZED state and the interrupt will be dropped in the end. Reported-by: Yikebaer Aizezi <yikebaer61@gmail.com> Closes: https://lore.kernel.org/all/CALcu4rbFrU4go8sBHk3FreP+qjgtZCGcYNpSiEXOLm==qFv7iQ@mail.gmail.com Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230808232057.2498287-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:30:43 -07:00
Like Xu	765da7fe0e	KVM: x86: Remove break statements that will never be executed Fix compiler warnings when compiling KVM with [-Wunreachable-code-break]. No functional change intended. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230807094243.32516-1-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:28:00 -07:00
Sean Christopherson	3e1efe2b67	KVM: Wrap kvm_{gfn,hva}_range.pte in a per-action union Wrap kvm_{gfn,hva}_range.pte in a union so that future notifier events can pass event specific information up and down the stack without needing to constantly expand and churn the APIs. Lockless aging of SPTEs will pass around a bitmap, and support for memory attributes will pass around the new attributes for the range. Add a "KVM_NO_ARG" placeholder to simplify handling events without an argument (creating a dummy union variable is midly annoying). Opportunstically drop explicit zero-initialization of the "pte" field, as omitting the field (now a union) has the same effect. Cc: Yu Zhao <yuzhao@google.com> Link: https://lore.kernel.org/all/CAOUHufagkd2Jk3_HrVoFFptRXM=hX2CV8f+M-dka-hJU4bP8kw@mail.gmail.com Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Yu Zhao <yuzhao@google.com> Link: https://lore.kernel.org/r/20230729004144.1054885-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2023-08-17 11:26:53 -07:00
Josh Poimboeuf	c6cfcbd8ca	x86/ibt: Convert IBT selftest to asm The following warning is reported when frame pointers and kernel IBT are enabled: vmlinux.o: warning: objtool: ibt_selftest+0x11: sibling call from callable instruction with modified stack frame The problem is that objtool interprets the indirect branch in ibt_selftest() as a sibling call, and GCC inserts a (partial) frame pointer prologue before it: 0000 000000000003f550 <ibt_selftest>: 0000 3f550: f3 0f 1e fa endbr64 0004 3f554: e8 00 00 00 00 call 3f559 <ibt_selftest+0x9> 3f555: R_X86_64_PLT32 __fentry__-0x4 0009 3f559: 55 push %rbp 000a 3f55a: 48 8d 05 02 00 00 00 lea 0x2(%rip),%rax # 3f563 <ibt_selftest_ip> 0011 3f561: ff e0 jmp *%rax Note the inline asm is missing ASM_CALL_CONSTRAINT, so the 'push %rbp' happens before the indirect branch and the 'mov %rsp, %rbp' happens afterwards. Simplify the generated code and make it easier to understand for both tools and humans by moving the selftest to proper asm. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/99a7e16b97bda97bf0a04aa141d6241cd8a839a2.1680912949.git.jpoimboe@kernel.org	2023-08-17 17:07:09 +02:00
Peter Zijlstra	5409730962	x86/static_call: Fix __static_call_fixup() Christian reported spurious module load crashes after some of Song's module memory layout patches. Turns out that if the very last instruction on the very last page of the module is a 'JMP __x86_return_thunk' then __static_call_fixup() will trip a fault and die. And while the module rework made this slightly more likely to happen, it's always been possible. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Reported-by: Christian Bricart <christian@bricart.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lkml.kernel.org/r/20230816104419.GA982867@hirez.programming.kicks-ass.net	2023-08-17 13:24:09 +02:00
David Matlack	619b507244	KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a range-based TLB invalidation where the range is defined by the memslot. Now that kvm_flush_remote_tlbs_range() can be called from common code we can just use that and drop a bunch of duplicate code from the arch directories. Note this adds a lockdep assertion for slots_lock being held when calling kvm_flush_remote_tlbs_memslot(), which was previously only asserted on x86. MIPS has calls to kvm_flush_remote_tlbs_memslot(), but they all hold the slots_lock, so the lockdep assertion continues to hold true. Also drop the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT ifdef gating kvm_flush_remote_tlbs_memslot(), since it is no longer necessary. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-7-rananta@google.com	2023-08-17 09:40:35 +01:00
David Matlack	d478899605	KVM: Allow range-based TLB invalidation from common code Make kvm_flush_remote_tlbs_range() visible in common code and create a default implementation that just invalidates the whole TLB. This paves the way for several future features/cleanups: - Introduction of range-based TLBI on ARM. - Eliminating kvm_arch_flush_remote_tlbs_memslot() - Moving the KVM/x86 TDP MMU to common code. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-6-rananta@google.com	2023-08-17 09:40:35 +01:00

... 2 3 4 5 6 ...

44763 Commits