1407 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
6c1b980a7e |
dma-maping updates for Linux 6.6
- allow dynamic sizing of the swiotlb buffer, to cater for secure virtualization workloads that require all I/O to be bounce buffered (Petr Tesarik) - move a declaration to a header (Arnd Bergmann) - check for memory region overlap in dma-contiguous (Binglei Wang) - remove the somewhat dangerous runtime swiotlb-xen enablement and unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross) - per-node CMA improvements (Yajun Deng) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmTuDHkLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYOqvhAApMk2/ceTgVH17sXaKE822+xKvgv377O6TlggMeGG W4zA0KD69DNz0AfaaCc5U5f7n8Ld/YY1RsvkHW4b3jgw+KRTeQr0jjitBgP5kP2M A1+qxdyJpCTwiPt9s2+JFVPeyZ0s52V6OJODKRG3s0ore55R+U09VySKtASON+q3 GMKfWqQteKC+thg7NkrQ7JUixuo84oICws+rZn4K9ifsX2O0HYW6aMW0feRfZjJH r0TgqZc4RdPTSaF22oapR9Ls39+7hp/pBvoLm5sBNA3cl5C3X4VWo9ERMU1jW9h+ VYQv39NycUspgskWJmpbU06/+ooYqQlwHSR/vdNusmFIvxo4tf6/UX72YO5F8Dar ap0wYGauiEwTjSnhVxPTXk3obWyWEsgFAeRnPdTlH2CNmv38QZU2HLb8eU1pcXxX j+WI2Ewy9z22uBVYiPOKpdW1jkSfmlmfPp/8SbAdua7I3YQ90rQN6AvU06zAi/cL NQTgO81E4jPkygqAVgS/LeYziWAQ73yM7m9ExThtTgqFtHortwhJ4Fd8XKtvtvEb viXAZ/WZtQBv/CIKAW98NhgIDP/SPOT8ym6V35WK+kkNFMS6LMSQUfl9GgbHGyFa n9icMm7BmbDtT1+AKNafG9En4DtAf9M9QNidAVOyfrsIk6S0gZoZwvIStkA7on8a cNY= =kVVr -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping Pull dma-maping updates from Christoph Hellwig: - allow dynamic sizing of the swiotlb buffer, to cater for secure virtualization workloads that require all I/O to be bounce buffered (Petr Tesarik) - move a declaration to a header (Arnd Bergmann) - check for memory region overlap in dma-contiguous (Binglei Wang) - remove the somewhat dangerous runtime swiotlb-xen enablement and unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross) - per-node CMA improvements (Yajun Deng) * tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: optimize get_max_slots() swiotlb: move slot allocation explanation comment where it belongs swiotlb: search the software IO TLB only if the device makes use of it swiotlb: allocate a new memory pool when existing pools are full swiotlb: determine potential physical address limit swiotlb: if swiotlb is full, fall back to a transient memory pool swiotlb: add a flag whether SWIOTLB is allowed to grow swiotlb: separate memory pool data from other allocator data swiotlb: add documentation and rename swiotlb_do_find_slots() swiotlb: make io_tlb_default_mem local to swiotlb.c swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated dma-contiguous: check for memory region overlap dma-contiguous: support numa CMA for specified node dma-contiguous: support per-numa CMA for all architectures dma-mapping: move arch_dma_set_mask() declaration to header swiotlb: unexport is_swiotlb_active x86: always initialize xen-swiotlb when xen-pcifront is enabling xen/pci: add flag for PCI passthrough being possible |
||
Linus Torvalds
|
b96a3e9142 |
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6 Nfio7afS1ncD+hPYT8947UnLxTgn+ww= =Afws -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ... |
||
David Hildenbrand
|
cfeed8ffe5 |
mm/swap: stop using page->private on tail pages for THP_SWAP
Patch series "mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups". This series stops using page->private on tail pages for THP_SWAP, replaces folio->private by folio->swap for swapcache folios, and starts using "new_folio" for tail pages that we are splitting to remove the usage of page->private for swapcache handling completely. This patch (of 4): Let's stop using page->private on tail pages, making it possible to just unconditionally reuse that field in the tail pages of large folios. The remaining usage of the private field for THP_SWAP is in the THP splitting code (mm/huge_memory.c), that we'll handle separately later. Update the THP_SWAP documentation and sanity checks in mm_types.h and __split_huge_page_tail(). [david@redhat.com: stop using page->private on tail pages for THP_SWAP] Link: https://lkml.kernel.org/r/6f0a82a3-6948-20d9-580b-be1dbf415701@redhat.com Link: https://lkml.kernel.org/r/20230821160849.531668-1-david@redhat.com Link: https://lkml.kernel.org/r/20230821160849.531668-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Qi Zheng
|
00de2c9f26 |
arm64: mm: use ptep_clear() instead of pte_clear() in clear_flush()
In clear_flush(), the original pte may be a present entry, so we should use ptep_clear() to let page_table_check track the pte clearing operation, otherwise it may cause false positive in subsequent set_pte_at(). Link: https://lkml.kernel.org/r/20230810093241.1181142-1-qi.zheng@linux.dev Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
4a169d61c2 |
arm64: implement the new page table range API
Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio(). Change the PG_dcache_clean flag from being per-page to per-folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Suren Baghdasaryan
|
4089eef0e6 |
mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED
handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means mmap_lock has been released. However with per-VMA locks behavior is different and the caller should still release it. To make the rules consistent for the caller, drop the per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED. Currently the only path returning VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns VM_FAULT_COMPLETED for now. [willy@infradead.org: fix riscv] Link: https://lkml.kernel.org/r/CAJuCfpE6GWEx1rPBmNpUfoD5o-gNFz9-UFywzCE2PbEGBiVz7g@mail.gmail.com Link: https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Vishal Moola (Oracle)
|
11b4fa8b2a |
arm64: convert various functions to use ptdescs
As part of the conversions to replace pgtable constructor/destructors with ptdesc equivalents, convert various page table functions to use ptdescs. Link: https://lkml.kernel.org/r/20230807230513.102486-19-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matthew Wilcox <willy@infradead.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
284e059204 |
mm: remove CONFIG_PER_VMA_LOCK ifdefs
Patch series "Handle most file-backed faults under the VMA lock", v3. This patchset adds the ability to handle page faults on parts of files which are already in the page cache without taking the mmap lock. This patch (of 10): Provide lock_vma_under_rcu() when CONFIG_PER_VMA_LOCK is not defined to eliminate ifdefs in the users. Link: https://lkml.kernel.org/r/20230724185410.1124082-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230724185410.1124082-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Baoquan He
|
8f03d74f71 |
arm64 : mm: add wrapper function ioremap_prot()
Since hook functions ioremap_allowed() and iounmap_allowed() will be obsoleted, add wrapper function ioremap_prot() to contain the the specific handling in addition to generic_ioremap_prot() invocation. Link: https://lkml.kernel.org/r/20230706154520.11257-19-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matthew Wilcox <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Zhang Jianhua
|
4e0bacd65e |
arm64: fix build warning for ARM64_MEMSTART_SHIFT
When building with W=1, the following warning occurs. arch/arm64/include/asm/kernel-pgtable.h:129:41: error: "PUD_SHIFT" is not defined, evaluates to 0 [-Werror=undef] 129 | #define ARM64_MEMSTART_SHIFT PUD_SHIFT | ^~~~~~~~~ arch/arm64/include/asm/kernel-pgtable.h:142:5: note: in expansion of macro ‘ARM64_MEMSTART_SHIFT’ 142 | #if ARM64_MEMSTART_SHIFT < SECTION_SIZE_BITS | ^~~~~~~~~~~~~~~~~~~~ The generic PUD_SHIFT was defined in include/asm-generic/pgtable-nopud.h, however the #ifndef __ASSEMBLY__ guard in this header file makes it unavailable for assembly files. While someone .S file include the <asm/kernel-pgtable.h>, the build warning would occur. Now move the macro ARM64_MEMSTART_SHIFT and ARM64_MEMSTART_ALIGN to arch/arm64/mm/init.c where it is used only, to avoid this issue. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230804075615.3334756-1-chris.zjh@huawei.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Yajun Deng
|
22e4a348f8 |
dma-contiguous: support per-numa CMA for all architectures
In the commit b7176c261cdb ("dma-contiguous: provide the ability to reserve per-numa CMA"), Barry adds DMA_PERNUMA_CMA for ARM64. But this feature is architecture independent, so support per-numa CMA for all architectures, and enable it by default if NUMA. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
||
Anshuman Khandual
|
d0999555e3 |
arm64/mm: Replace an open coding with ID_AA64MMFR1_EL1_HAFDBS_MASK
Replace '0xf' with ID_AA64MMFR1_EL1_HAFDBS_MASK while evaluating if the cpu supports implicit page table entry access flag update in HW. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20230711090458.238346-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Nikhil V
|
a8bd38dbc5 |
arm64: mm: Make hibernation aware of KFENCE
In the restore path, swsusp_arch_suspend_exit uses copy_page() to over-write memory. However, with features like KFENCE enabled, there could be situations where it may have marked some pages as not valid, due to which it could be reported as invalid accesses. Consider a situation where page 'P' was part of the hibernation image. Now, when the resume kernel tries to restore the pages, the same page 'P' is already in use in the resume kernel and is kfence protected, due to which its mapping is removed from linear map. Since restoring pages happens with the resume kernel page tables, we would end up accessing 'P' during copy and results in kernel pagefault. The proposed fix tries to solve this issue by marking PTE as valid for such kfence protected pages. Co-developed-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Link: https://lore.kernel.org/r/20230713070757.4093-1-quic_nprakash@quicinc.com Signed-off-by: Will Deacon <will@kernel.org> |
||
SeongJae Park
|
24be4d0b46 |
arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault()
Commit ae870a68b5d1 ("arm64/mm: Convert to using lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the ifdef. As a result, building kernel without the config fails with undeclared variable error as below: arch/arm64/mm/fault.c: In function 'do_page_fault': arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'? 624 | vma = lock_mm_and_find_vma(mm, addr, regs); | ^~~ | vmap Fix it by moving the declaration out of the ifdef. Fixes: ae870a68b5d1 ("arm64/mm: Convert to using lock_mm_and_find_vma()") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
9471f1f2f5 |
Merge branch 'expand-stack'
This modifies our user mode stack expansion code to always take the mmap_lock for writing before modifying the VM layout. It's actually something we always technically should have done, but because we didn't strictly need it, we were being lazy ("opportunistic" sounds so much better, doesn't it?) about things, and had this hack in place where we would extend the stack vma in-place without doing the proper locking. And it worked fine. We just needed to change vm_start (or, in the case of grow-up stacks, vm_end) and together with some special ad-hoc locking using the anon_vma lock and the mm->page_table_lock, it all was fairly straightforward. That is, it was all fine until Ruihan Li pointed out that now that the vma layout uses the maple tree code, we *really* don't just change vm_start and vm_end any more, and the locking really is broken. Oops. It's not actually all _that_ horrible to fix this once and for all, and do proper locking, but it's a bit painful. We have basically three different cases of stack expansion, and they all work just a bit differently: - the common and obvious case is the page fault handling. It's actually fairly simple and straightforward, except for the fact that we have something like 24 different versions of it, and you end up in a maze of twisty little passages, all alike. - the simplest case is the execve() code that creates a new stack. There are no real locking concerns because it's all in a private new VM that hasn't been exposed to anybody, but lockdep still can end up unhappy if you get it wrong. - and finally, we have GUP and page pinning, which shouldn't really be expanding the stack in the first place, but in addition to execve() we also use it for ptrace(). And debuggers do want to possibly access memory under the stack pointer and thus need to be able to expand the stack as a special case. None of these cases are exactly complicated, but the page fault case in particular is just repeated slightly differently many many times. And ia64 in particular has a fairly complicated situation where you can have both a regular grow-down stack _and_ a special grow-up stack for the register backing store. So to make this slightly more manageable, the bulk of this series is to first create a helper function for the most common page fault case, and convert all the straightforward architectures to it. Thus the new 'lock_mm_and_find_vma()' helper function, which ends up being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon, loongarch, nios2, sh, sparc32, and xtensa. So we not only convert more than half the architectures, we now have more shared code and avoid some of those twisty little passages. And largely due to this common helper function, the full diffstat of this series ends up deleting more lines than it adds. That still leaves eight architectures (ia64, m68k, microblaze, openrisc, parisc, s390, sparc64 and um) that end up doing 'expand_stack()' manually because they are doing something slightly different from the normal pattern. Along with the couple of special cases in execve() and GUP. So there's a couple of patches that first create 'locked' helper versions of the stack expansion functions, so that there's a obvious path forward in the conversion. The execve() case is then actually pretty simple, and is a nice cleanup from our old "grow-up stackls are special, because at execve time even they grow down". The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because it's just more straightforward to write out the stack expansion there manually, instead od having get_user_pages_remote() do it for us in some situations but not others and have to worry about locking rules for GUP. And the final step is then to just convert the remaining odd cases to a new world order where 'expand_stack()' is called with the mmap_lock held for reading, but where it might drop it and upgrade it to a write, only to return with it held for reading (in the success case) or with it completely dropped (in the failure case). In the process, we remove all the stack expansion from GUP (where dropping the lock wouldn't be ok without special rules anyway), and add it in manually to __access_remote_vm() for ptrace(). Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases. Everything else here felt pretty straightforward, but the ia64 rules for stack expansion are really quite odd and very different from everything else. Also thanks to Vegard Nossum who caught me getting one of those odd conditions entirely the wrong way around. Anyway, I think I want to actually move all the stack expansion code to a whole new file of its own, rather than have it split up between mm/mmap.c and mm/memory.c, but since this will have to be backported to the initial maple tree vma introduction anyway, I tried to keep the patches _fairly_ minimal. Also, while I don't think it's valid to expand the stack from GUP, the final patch in here is a "warn if some crazy GUP user wants to try to expand the stack" patch. That one will be reverted before the final release, but it's left to catch any odd cases during the merge window and release candidates. Reported-by: Ruihan Li <lrh2000@pku.edu.cn> * branch 'expand-stack': gup: add warning if some caller would seem to want stack expansion mm: always expand the stack with the mmap write lock held execve: expand new process stack manually ahead of time mm: make find_extend_vma() fail if write lock not held powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma() mm/fault: convert remaining simple cases to lock_mm_and_find_vma() arm/mm: Convert to using lock_mm_and_find_vma() riscv/mm: Convert to using lock_mm_and_find_vma() mips/mm: Convert to using lock_mm_and_find_vma() powerpc/mm: Convert to using lock_mm_and_find_vma() arm64/mm: Convert to using lock_mm_and_find_vma() mm: make the page fault mmap locking killable mm: introduce new 'lock_mm_and_find_vma()' page fault helper |
||
Linus Torvalds
|
6e17c6de3d |
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing. - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability. - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning. - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface. - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree. - Johannes Weiner has done some cleanup work on the compaction code. - David Hildenbrand has contributed additional selftests for get_user_pages(). - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code. - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code. - Huang Ying has done some maintenance on the swap code's usage of device refcounting. - Christoph Hellwig has some cleanups for the filemap/directio code. - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses. - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings. - John Hubbard has a series of fixes to the MM selftesting code. - ZhangPeng continues the folio conversion campaign. - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock. - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8. - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management. - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code. - Vishal Moola also has done some folio conversion work. - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh J0qXCzqaaN8+BuEyLGDVPaXur9KirwY= =B7yQ -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ... |
||
Linus Torvalds
|
2605e80d34 |
arm64 updates for 6.5:
- Support for the Armv8.9 Permission Indirection Extensions. While this feature doesn't add new functionality, it enables future support for Guarded Control Stacks (GCS) and Permission Overlays. - User-space support for the Armv8.8 memcpy/memset instructions. - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and cleanups. - Removal of superfluous ISBs on context switch (following retrospective architecture tightening). - Decode the ISS2 register during faults for additional information to help with debugging. - KPTI clean-up/simplification of the trampoline exit code. - Addressing several -Wmissing-prototype warnings. - Kselftest improvements for signal handling and ptrace. - Fix TPIDR2_EL0 restoring on sigreturn - Clean-up, robustness improvements of the module allocation code. - More sysreg conversions to the automatic register/bitfields generation. - CPU capabilities handling cleanup. - Arm documentation updates: ACPI, ptdump. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmSZyXwACgkQa9axLQDI XvEM3BAAkMzHGTDhNVNGLSO07PVmdzTiuoNFlfX7bktdIb+El76VhGXhHeEywTje wAq9JIYBf/Src2HbgZLwuly8Fn2vCrhyp++bRJW82o9SiBnx91+0mH7zLf+XHiQ4 FHKZxvaE6PaDc9o8WXr+IeucPRb5W2HgH37mktxh7ShMLsxorwS94V1oL29A2mV9 t4XQY7/tdmrDKMKMuQnIr1DurNXBhJ1OKvDnSN/Zzm96JOU/QQ32N2wEE7Y0aHOh bBzClksx2mguQqV515mySGFe5yy9NqaAfx2hTAciq+1rwbiCSjqQQmEswoUH8WLX JNLylxADWT2qXThFe8W6uyFzEshSAoI1yKxlCGuOsQpu4sFJtR8oh8dDj5669g4Y j0jR87r9rWm0iyYI5I+XDMxFVyuh2eFInvjtynRbj+mtS3f/SkO8fXG6Uya+I76C UGLlBUKnLr/zHuIGN0LE/V4dYTqsi9EtHoc2Am2xCZsS9jqkxKJG8C93Zsm4GlJC OcUtBSjW0rYJq+tLk0yhR6hbh59QbiRh05KnZsPpOKi8purlKSL9ZNPRi7TndLdm HjHUY+vQwNIpPIb6pyK4aYZuTdGEQIsQykQ8CULiIGlHi7kc4g9029ouLc5bBAeU mU8D62I2ztzPoYljYWNtO7K6g/Dq8c4lpsaMAJ+1Wp2iq2xBJjo= =rNBK -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Notable features are user-space support for the memcpy/memset instructions and the permission indirection extension. - Support for the Armv8.9 Permission Indirection Extensions. While this feature doesn't add new functionality, it enables future support for Guarded Control Stacks (GCS) and Permission Overlays - User-space support for the Armv8.8 memcpy/memset instructions - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and cleanups - Removal of superfluous ISBs on context switch (following retrospective architecture tightening) - Decode the ISS2 register during faults for additional information to help with debugging - KPTI clean-up/simplification of the trampoline exit code - Addressing several -Wmissing-prototype warnings - Kselftest improvements for signal handling and ptrace - Fix TPIDR2_EL0 restoring on sigreturn - Clean-up, robustness improvements of the module allocation code - More sysreg conversions to the automatic register/bitfields generation - CPU capabilities handling cleanup - Arm documentation updates: ACPI, ptdump" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits) kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe Documentation/arm64: Add ptdump documentation arm64: hibernate: remove WARN_ON in save_processor_state kselftest/arm64: Log signal code and address for unexpected signals docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst arm64/fpsimd: Exit streaming mode when flushing tasks docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference ... |
||
Linus Torvalds
|
ae870a68b5 |
arm64/mm: Convert to using lock_mm_and_find_vma()
This converts arm64 to use the new page fault helper. It was very straightforward, but still needed a fix for the "obvious" conversion I initially did. Thanks to Suren for the fix and testing. Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com> Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Catalin Marinas
|
abc17128c8 |
Merge branch 'for-next/feat_s1pie' into for-next/core
* for-next/feat_s1pie: : Support for the Armv8.9 Permission Indirection Extensions (stage 1 only) KVM: selftests: get-reg-list: add Permission Indirection registers KVM: selftests: get-reg-list: support ID register features arm64: Document boot requirements for PIE arm64: transfer permission indirection settings to EL2 arm64: enable Permission Indirection Extension (PIE) arm64: add encodings of PIRx_ELx registers arm64: disable EL2 traps for PIE arm64: reorganise PAGE_/PROT_ macros arm64: add PTE_WRITE to PROT_SECT_NORMAL arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS KVM: arm64: expose ID_AA64MMFR3_EL1 to guests KVM: arm64: Save/restore PIE registers KVM: arm64: Save/restore TCR2_EL1 arm64: cpufeature: add Permission Indirection Extension cpucap arm64: cpufeature: add TCR2 cpucap arm64: cpufeature: add system register ID_AA64MMFR3 arm64/sysreg: add PIR*_ELx registers arm64/sysreg: update HCRX_EL2 register arm64/sysreg: add system registers TCR2_ELx arm64/sysreg: Add ID register ID_AA64MMFR3 |
||
Catalin Marinas
|
f42039d10b |
Merge branches 'for-next/kpti', 'for-next/missing-proto-warn', 'for-next/iss2-decode', 'for-next/kselftest', 'for-next/misc', 'for-next/feat_mops', 'for-next/module-alloc', 'for-next/sysreg', 'for-next/cpucap', 'for-next/acpi', 'for-next/kdump', 'for-next/acpi-doc', 'for-next/doc' and 'for-next/tpidr2-fix', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask dt-bindings: perf: fsl-imx-ddr: Add i.MX93 compatible drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver perf/arm_cspmu: Decouple APMT dependency perf/arm_cspmu: Clean up ACPI dependency ACPI/APMT: Don't register invalid resource perf/arm_cspmu: Fix event attribute type perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used drivers/perf: hisi: Don't migrate perf to the CPU going to teardown drivers/perf: apple_m1: Force 63bit counters for M2 CPUs perf/arm-cmn: Fix DTC reset perf: qcom_l2_pmu: Make l2_cache_pmu_probe_cluster() more robust perf/arm-cci: Slightly optimize cci_pmu_sync_counters() * for-next/kpti: : Simplify KPTI trampoline exit code arm64: entry: Simplify tramp_alias macro and tramp_exit routine arm64: entry: Preserve/restore X29 even for compat tasks * for-next/missing-proto-warn: : Address -Wmissing-prototype warnings arm64: add alt_cb_patch_nops prototype arm64: move early_brk64 prototype to header arm64: signal: include asm/exception.h arm64: kaslr: add kaslr_early_init() declaration arm64: flush: include linux/libnvdimm.h arm64: module-plts: inline linux/moduleloader.h arm64: hide unused is_valid_bugaddr() arm64: efi: add efi_handle_corrupted_x18 prototype arm64: cpuidle: fix #ifdef for acpi functions arm64: kvm: add prototypes for functions called in asm arm64: spectre: provide prototypes for internal functions arm64: move cpu_suspend_set_dbg_restorer() prototype to header arm64: avoid prototype warnings for syscalls arm64: add scs_patch_vmlinux prototype arm64: xor-neon: mark xor_arm64_neon_*() static * for-next/iss2-decode: : Add decode of ISS2 to data abort reports arm64/esr: Add decode of ISS2 to data abort reporting arm64/esr: Use GENMASK() for the ISS mask * for-next/kselftest: : Various arm64 kselftest improvements kselftest/arm64: Log signal code and address for unexpected signals kselftest/arm64: Add a smoke test for ptracing hardware break/watch points * for-next/misc: : Miscellaneous patches arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe arm64: hibernate: remove WARN_ON in save_processor_state arm64/fpsimd: Exit streaming mode when flushing tasks arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes arm64: consolidate rox page protection logic arm64: set __exception_irq_entry with __irq_entry as a default arm64: syscall: unmask DAIF for tracing status arm64: lockdep: enable checks for held locks when returning to userspace arm64/cpucaps: increase string width to properly format cpucaps.h arm64/cpufeature: Use helper for ECV CNTPOFF cpufeature * for-next/feat_mops: : Support for ARMv8.8 memcpy instructions in userspace kselftest/arm64: add MOPS to hwcap test arm64: mops: allow disabling MOPS from the kernel command line arm64: mops: detect and enable FEAT_MOPS arm64: mops: handle single stepping after MOPS exception arm64: mops: handle MOPS exceptions KVM: arm64: hide MOPS from guests arm64: mops: don't disable host MOPS instructions from EL2 arm64: mops: document boot requirements for MOPS KVM: arm64: switch HCRX_EL2 between host and guest arm64: cpufeature: detect FEAT_HCX KVM: arm64: initialize HCRX_EL2 * for-next/module-alloc: : Make the arm64 module allocation code more robust (clean-up, VA range expansion) arm64: module: rework module VA range selection arm64: module: mandate MODULE_PLTS arm64: module: move module randomization to module.c arm64: kaslr: split kaslr/module initialization arm64: kasan: remove !KASAN_VMALLOC remnants arm64: module: remove old !KASAN_VMALLOC logic * for-next/sysreg: (21 commits) : More sysreg conversions to automatic generation arm64/sysreg: Convert TRBIDR_EL1 register to automatic generation arm64/sysreg: Convert TRBTRG_EL1 register to automatic generation arm64/sysreg: Convert TRBMAR_EL1 register to automatic generation arm64/sysreg: Convert TRBSR_EL1 register to automatic generation arm64/sysreg: Convert TRBBASER_EL1 register to automatic generation arm64/sysreg: Convert TRBPTR_EL1 register to automatic generation arm64/sysreg: Convert TRBLIMITR_EL1 register to automatic generation arm64/sysreg: Rename TRBIDR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBTRG_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBMAR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBSR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBBASER_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBPTR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBLIMITR_EL1 fields per auto-gen tools format arm64/sysreg: Convert OSECCR_EL1 to automatic generation arm64/sysreg: Convert OSDTRTX_EL1 to automatic generation arm64/sysreg: Convert OSDTRRX_EL1 to automatic generation arm64/sysreg: Convert OSLAR_EL1 to automatic generation arm64/sysreg: Standardise naming of bitfield constants in OSL[AS]R_EL1 arm64/sysreg: Convert MDSCR_EL1 to automatic register generation ... * for-next/cpucap: : arm64 cpucap clean-up arm64: cpufeature: fold cpus_set_cap() into update_cpu_capabilities() arm64: cpufeature: use cpucap naming arm64: alternatives: use cpucap naming arm64: standardise cpucap bitmap names * for-next/acpi: : Various arm64-related ACPI patches ACPI: bus: Consolidate all arm specific initialisation into acpi_arm_init() * for-next/kdump: : Simplify the crashkernel reservation behaviour of crashkernel=X,high on arm64 arm64: add kdump.rst into index.rst Documentation: add kdump.rst to present crashkernel reservation on arm64 arm64: kdump: simplify the reservation behaviour of crashkernel=,high * for-next/acpi-doc: : Update ACPI documentation for Arm systems Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference * for-next/doc: : arm64 documentation updates Documentation/arm64: Add ptdump documentation * for-next/tpidr2-fix: : Fix the TPIDR2_EL0 register restoring on sigreturn kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state |
||
Catalin Marinas
|
1c1a429efd |
arm64: enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64
With the DMA bouncing of unaligned kmalloc() buffers now in place, enable it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In addition, always create the swiotlb buffer even when the end of RAM is within the 32-bit physical address range (the swiotlb buffer can still be disabled on the kernel command line). Link: https://lkml.kernel.org/r/20230612153201.554742-18-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Will Deacon <will@kernel.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Saravana Kannan <saravanak@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Hugh Dickins
|
cafcb9ca5a |
arm64/hugetlb: pte_alloc_huge() pte_offset_huge()
pte_alloc_map() expects to be followed by pte_unmap(), but hugetlb omits that: to keep balance in future, use the recently added pte_alloc_huge() instead; with pte_offset_huge() a better name for pte_offset_kernel(). Link: https://lkml.kernel.org/r/5849464-7191-40c5-c55f-fba9c3802e5d@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: John David Anglin <dave.anglin@bell.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Hugh Dickins
|
52924726f4 |
arm64: allow pte_offset_map() to fail
In rare transient cases, not yet made possible, pte_offset_map() and pte_offset_map_lock() may not find a page table: handle appropriately. Link: https://lkml.kernel.org/r/35e46485-8499-4337-c51f-b8fa495a1a93@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: John David Anglin <dave.anglin@bell.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Mark Rutland
|
ab9b400809 |
arm64: mm: fix VA-range sanity check
Both create_mapping_noalloc() and update_mapping_prot() sanity-check their 'virt' parameter, but the check itself doesn't make much sense. The condition used today appears to be a historical accident. The sanity-check condition: if ((virt >= PAGE_END) && (virt < VMALLOC_START)) { [ ... warning here ... ] return; } ... can only be true for the KASAN shadow region or the module region, and there's no reason to exclude these specifically for creating and updateing mappings. When arm64 support was first upstreamed in commit: c1cc1552616d0f35 ("arm64: MMU initialisation") ... the condition was: if (virt < VMALLOC_START) { [ ... warning here ... ] return; } At the time, VMALLOC_START was the lowest kernel address, and this was checking whether 'virt' would be translated via TTBR1. Subsequently in commit: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") ... the condition was changed to: if ((virt >= VA_START) && (virt < VMALLOC_START)) { [ ... warning here ... ] return; } This appear to have been a thinko. The commit moved the linear map to the bottom of the kernel address space, with VMALLOC_START being at the halfway point. The old condition would warn for changes to the linear map below this, and at the time VA_START was the end of the linear map. Subsequently we cleaned up the naming of VA_START in commit: 77ad4ce69321abbe ("arm64: memory: rename VA_START to PAGE_END") ... keeping the erroneous condition as: if ((virt >= PAGE_END) && (virt < VMALLOC_START)) { [ ... warning here ... ] return; } Correct the condition to check against the start of the TTBR1 address space, which is currently PAGE_OFFSET. This simplifies the logic, and more clearly matches the "outside kernel range" message in the warning. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230615102628.1052103-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Jamie Iles
|
b9293d457f |
arm64/mm: remove now-superfluous ISBs from TTBR writes
At the time of authoring 7655abb95386 ("arm64: mm: Move ASID from TTBR0 to TTBR1"), the Arm ARM did not specify any ordering guarantees for direct writes to TTBR0_ELx and TTBR1_ELx and so an ISB was required after each write to ensure TLBs would only be populated from the expected (or reserved tables). In a recent update to the Arm ARM, the requirements have been relaxed to reflect the implementation of current CPUs and required implementation of future CPUs to read (RDYDPX in D8.2.3 Translation table base address register): Direct writes to TTBR0_ELx and TTBR1_ELx occur in program order relative to one another, without the need for explicit synchronization. For any one translation, all indirect reads of TTBR0_ELx and TTBR1_ELx that are made as part of the translation observe only one point in that order of direct writes. Remove the superfluous ISBs to optimize uaccess helpers and context switch. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230613141959.92697-1-quic_jiles@quicinc.com [catalin.marinas@arm.com: rename __cpu_set_reserved_ttbr0 to ..._nosync] [catalin.marinas@arm.com: move the cpu_set_reserved_ttbr0_nosync() call to cpu_do_switch_mm()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Russell King
|
601eaec513 |
arm64: consolidate rox page protection logic
Consolidate the arm64 decision making for the page protections used for executable pages, used by both the trampoline code and the kernel text mapping code. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1q9T3v-00EDmW-BH@rmk-PC.armlinux.org.uk Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Arnd Bergmann
|
bb6e04a173 |
kasan: use internal prototypes matching gcc-13 builtins
gcc-13 warns about function definitions for builtin interfaces that have a different prototype, e.g.: In file included from kasan_test.c:31: kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch] 574 | void __asan_register_globals(struct kasan_global *globals, size_t size); kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch] 577 | void __asan_alloca_poison(unsigned long addr, size_t size); kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch] 580 | void __asan_load1(unsigned long addr); kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch] 581 | void __asan_store1(unsigned long addr); kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void *, unsigned char, long int)' [-Werror=builtin-declaration-mismatch] 643 | void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size); The two problems are: - Addresses are passes as 'unsigned long' in the kernel, but gcc-13 expects a 'void *'. - sizes meant to use a signed ssize_t rather than size_t. Change all the prototypes to match these. Using 'void *' consistently for addresses gets rid of a couple of type casts, so push that down to the leaf functions where possible. This now passes all randconfig builds on arm, arm64 and x86, but I have not tested it on the other architectures that support kasan, since they tend to fail randconfig builds in other ways. This might fail if any of the 32-bit architectures expect a 'long' instead of 'int' for the size argument. The __asan_allocas_unpoison() function prototype is somewhat weird, since it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'. This looks like it is meant to be 'addr' and 'size' like the others, but the implementation clearly treats them as 'top' and 'bottom'. Link: https://lkml.kernel.org/r/20230509145735.9263-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Baoquan He
|
6c4dcaddbd |
arm64: kdump: simplify the reservation behaviour of crashkernel=,high
On arm64, reservation for 'crashkernel=xM,high' is taken by searching for suitable memory region top down. If the 'xM' of crashkernel high memory is reserved from high memory successfully, it will try to reserve crashkernel low memory later accoringly. Otherwise, it will try to search low memory area for the 'xM' suitable region. Please see the details in Documentation/admin-guide/kernel-parameters.txt. While we observed an unexpected case where a reserved region crosses the high and low meomry boundary. E.g on a system with 4G as low memory end, user added the kernel parameters like: 'crashkernel=512M,high', it could finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel. The crashkernel high region crossing low and high memory boudary will bring issues: 1) For crashkernel=x,high, if getting crashkernel high region across low and high memory boundary, then user will see two memory regions in low memory, and one memory region in high memory. The two crashkernel low memory regions are confusing as shown in above example. 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low" and y <= 128M, when crashkernel high region crosses low and high memory boundary and the part of crashkernel high reservation below boundary is bigger than y, the expected crahskernel low reservation will be skipped. But the expected crashkernel high reservation is shrank and could not satisfy user space requirement. 3) The crossing boundary behaviour of crahskernel high reservation is different than x86 arch. On x86_64, the low memory end is 4G fixedly, and the memory near 4G is reserved by system, e.g for mapping firmware, pci mapping, so the crashkernel reservation crossing boundary never happens. From distros point of view, this brings inconsistency and confusion. Users need to dig into x86 and arm64 system details to find out why. For kernel itself, the impact of issue 3) could be slight. While issue 1) and 2) cause actual impact because it brings obscure semantics and behaviour to crashkernel=,high reservation. Here, for crashkernel=xM,high, search the high memory for the suitable region only in high memory. If failed, try reserving the suitable region only in low memory. Like this, the crashkernel high region will only exist in high memory, and crashkernel low region only exists in low memory. The reservation behaviour for crashkernel=,high is clearer and simpler. Note: RPi4 has different zone ranges than normal memory. Its DMA zone is 0~1G, and DMA32 zone is 1G~4G if CONFIG_ZONE_DMA|DMA32 are enabled by default. The low memory end is 1G in order to validate all devices, high memory starts at 1G memory. However, for being consistent with normal arm64 system, its low memory end is still 1G, while reserving crashkernel high memory from 4G if crashkernel=size,high specified. This will remove confusion. With above change applied, summary of arm64 crashkernel reservation range: 1) RPi4(zone DMA:0~1G; DMA32:1G~4G): crashkernel=size 0~1G: low memory | 1G~top: high memory crashkernel=size,high 0~1G: low memory | 4G~top: high memory 2) Other normal system: crashkernel=size crashkernel=size,high 0~4G: low memory | 4G~top: high memory 3) Systems w/o zone DMA|DMA32 crashkernel=size crashkernel=size,high 0~top: low memory Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/ZGIBSEoZ7VRVvP8H@MiWiFi-R3L-srv Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Mark Rutland
|
55123afffe |
arm64: kasan: remove !KASAN_VMALLOC remnants
Historically, KASAN could be selected with or without KASAN_VMALLOC, but since commit: f6f37d9320a11e90 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes") ... we can never select KASAN without KASAN_VMALLOC on arm64, and thus arm64 code for KASAN && !KASAN_VMALLOC is redundant and can be removed. Remove the redundant code kasan_init.c Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will@kernel.org> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> Link: https://lore.kernel.org/r/20230530110328.2213762-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Joey Gouly
|
9e9bb6ede0 |
arm64: enable Permission Indirection Extension (PIE)
Now that the necessary changes have been made, set the Permission Indirection registers and enable the Permission Indirection Extension. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230606145859.697944-17-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Joey Gouly
|
f0af339fc4 |
arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS
With PIE enabled, the swapper PTEs would have a Permission Indirection Index (PIIndex) of 0. A PIIndex of 0 is not currently used by any other PTEs. To avoid using index 0 specifically for the swapper PTEs, mark them as PTE_UXN and PTE_WRITE, so that they map to a PAGE_KERNEL_EXEC equivalent. This also adds PTE_WRITE to KPTI_NG_PTE_FLAGS, which was tested by booting with kpti=on. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230606145859.697944-12-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Jisheng Zhang
|
0e2aba6948 |
arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block
When reading the arm64's PER_VMA_LOCK support code, I found a bit difference between arm64 and other arch when calling handle_mm_fault() during VMA lock-based page fault handling: the fault address is masked before passing to handle_mm_fault(). This is also different from the usage in mmap_lock-based handling. I think we need to pass the original fault address to handle_mm_fault() as we did in commit 84c5e23edecd ("arm64: mm: Pass original fault address to handle_mm_fault()"). If we go through the code path further, we can find that the "masked" fault address can cause mismatched fault address between perf sw major/minor page fault sw event and perf page fault sw event: do_page_fault perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, ..., addr) // orig addr handle_mm_fault mm_account_fault perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, ...) // masked addr Fixes: cd7f176aea5f ("arm64/mm: try VMA lock-based page fault handling first") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230524131305.2808-1-jszhang@kernel.org Signed-off-by: Will Deacon <will@kernel.org> |
||
Mark Brown
|
1f9d4ba683 |
arm64/esr: Add decode of ISS2 to data abort reporting
The architecture has added more information about faults to ISS2 within ESR. Add decode of this to our data abort fault decode to aid diagnostics. Features that are not currently enabled are included here for completeness. Since the architecture specifies the values of bits within ISS2 in terms of ISS2 rather than in terms of the register as a whole we do so for our definitions as well, this makes it easier to review bitfield definitions. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230417-arm64-iss2-dabt-decode-v3-2-c1fa503e503a@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Arnd Bergmann
|
e13d32e992 |
arm64: move early_brk64 prototype to header
The prototype used for calling early_brk64() is in the file that calls it, which is the wrong place, as it is not included for the definition: arch/arm64/kernel/traps.c:1100:12: error: no previous prototype for 'early_brk64' [-Werror=missing-prototypes] Move it to an appropriate header instead. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20230516160642.523862-15-arnd@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Arnd Bergmann
|
1a11839389 |
arm64: flush: include linux/libnvdimm.h
The two cache management functions are declared in libnvdimm.h but provided by architecture specific code. Without including the header, this causes a W=1 warning: arch/arm64/mm/flush.c:96:6: error: no previous prototype for 'arch_wb_cache_pmem' [-Werror=missing-prototypes] arch/arm64/mm/flush.c:104:6: error: no previous prototype for 'arch_invalidate_pmem' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20230516160642.523862-12-arnd@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
||
Peter Collingbourne
|
2efbafb91e |
arm64: Also reset KASAN tag if page is not PG_mte_tagged
Consider the following sequence of events: 1) A page in a PROT_READ|PROT_WRITE VMA is faulted. 2) Page migration allocates a page with the KASAN allocator, causing it to receive a non-match-all tag, and uses it to replace the page faulted in 1. 3) The program uses mprotect() to enable PROT_MTE on the page faulted in 1. As a result of step 3, we are left with a non-match-all tag for a page with tags accessible to userspace, which can lead to the same kind of tag check faults that commit e74a68468062 ("arm64: Reset KASAN tag in copy_highpage with HW tags only") intended to fix. The general invariant that we have for pages in a VMA with VM_MTE_ALLOWED is that they cannot have a non-match-all tag. As a result of step 2, the invariant is broken. This means that the fix in the referenced commit was incomplete and we also need to reset the tag for pages without PG_mte_tagged. Fixes: e5b8d9218951 ("arm64: mte: reset the page tag in page->flags") Cc: <stable@vger.kernel.org> # 5.15 Link: https://linux-review.googlesource.com/id/I7409cdd41acbcb215c2a7417c1e50d37b875beff Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230420210945.2313627-1-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Min-Hua Chen
|
d91d580878 |
arm64/mm: mark private VM_FAULT_X defines as vm_fault_t
This patch fixes several sparse warnings for fault.c: arch/arm64/mm/fault.c:493:24: sparse: warning: incorrect type in return expression (different base types) arch/arm64/mm/fault.c:493:24: sparse: expected restricted vm_fault_t arch/arm64/mm/fault.c:493:24: sparse: got int arch/arm64/mm/fault.c:501:32: sparse: warning: incorrect type in return expression (different base types) arch/arm64/mm/fault.c:501:32: sparse: expected restricted vm_fault_t arch/arm64/mm/fault.c:501:32: sparse: got int arch/arm64/mm/fault.c:503:32: sparse: warning: incorrect type in return expression (different base types) arch/arm64/mm/fault.c:503:32: sparse: expected restricted vm_fault_t arch/arm64/mm/fault.c:503:32: sparse: got int arch/arm64/mm/fault.c:511:24: sparse: warning: incorrect type in return expression (different base types) arch/arm64/mm/fault.c:511:24: sparse: expected restricted vm_fault_t arch/arm64/mm/fault.c:511:24: sparse: got int arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer arch/arm64/mm/fault.c:713:39: sparse: warning: restricted vm_fault_t degrades to integer Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Link: https://lore.kernel.org/r/20230502151909.128810-1-minhuadotchen@gmail.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Linus Torvalds
|
671e148d07 |
arm64 fixes for -rc1
- Fix regression in CPU erratum workaround when disabling the MMU - Fix detection of pointer authentication hwcaps - Avoid writeable, executable ELF sections in vmlinux -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmRTe+UQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNJXqB/9C9DrrbHUg9ZPqAIUpXkyaxem4gpIS+kyU +ard53uweuQHchuR/x2s2K9Sp/ano5jGnQXEjikNy29Opu2UYI/wmsqdJEn3km8q kohTRsiFgQ40Y85/3iJ8ug6+llxCxK6AXdZCskdWTP56Jur0WpNiQd0a/ShYQLdX wBHdInT3QpDVzd5bEWDtUEj4H//tTCy4rESQyGsLhrHgb/x8uZKgZMtPJp6+Q3Eq ofs+PQc0qHr/Ri3ahQOCMbxTbNaLIgUzkyXbZN+y2JtxgE+l8E3Gsir2+Pv7mcSx 1gCSLCmpwE7rVJpTykN+jA6OSsoSUSJHXs6565nF4n8+ugdL7aqR =Tba+ -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A few arm64 fixes that came in during the merge window for -rc1. The main thing is restoring the pointer authentication hwcaps, which disappeared during some recent refactoring - Fix regression in CPU erratum workaround when disabling the MMU - Fix detection of pointer authentication hwcaps - Avoid writeable, executable ELF sections in vmlinux" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: lds: move .got section out of .text arm64: kernel: remove SHF_WRITE|SHF_EXECINSTR from .idmap.text arm64: cpufeature: Fix pointer auth hwcaps arm64: Fix label placement in record_mmu_state() |
||
ndesaulniers@google.com
|
4df69e0df2 |
arm64: kernel: remove SHF_WRITE|SHF_EXECINSTR from .idmap.text
commit d54170812ef1 ("arm64: fix .idmap.text assertion for large kernels") modified some of the section assembler directives that declare .idmap.text to be SHF_ALLOC instead of SHF_ALLOC|SHF_WRITE|SHF_EXECINSTR. This patch fixes up the remaining stragglers that were left behind. Add Fixes tag so that this doesn't precede related change in stable. Fixes: d54170812ef1 ("arm64: fix .idmap.text assertion for large kernels") Reported-by: Greg Thelen <gthelen@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230428-awx-v2-1-b197ffa16edc@google.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Linus Torvalds
|
7fa8a8ee94 |
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page(). - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY= =J+Dj -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ... |
||
Will Deacon
|
1bb31cc7af |
Merge branch 'for-next/mm' into for-next/core
* for-next/mm: arm64: mm: always map fixmap at page granularity arm64: mm: move fixmap code to its own file arm64: add FIXADDR_TOT_{START,SIZE} Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"" arm: uaccess: Remove memcpy_page_flushcache() mm,kfence: decouple kfence from page granularity mapping judgement |
||
Baoquan He
|
504cae453f |
arm64: kdump: defer the crashkernel reservation for platforms with no DMA memory zones
In commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones"), reserve_crashkernel() is called much earlier in arm64_memblock_init() to avoid causing base apge mapping on platforms with no DMA meomry zones. With taking off protection on crashkernel memory region, no need to call reserve_crashkernel() specially in advance. The deferred invocation of reserve_crashkernel() in bootmem_init() can cover all cases. So revert the whole commit now. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230407011507.17572-4-bhe@redhat.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Baoquan He
|
04a2a7af3d |
arm64: kdump: do not map crashkernel region specifically
After taking off the protection functions on crashkernel memory region, there's no need to map crashkernel region with page granularity during linear mapping. With this change, the system can make use of block or section mapping on linear region to largely improve perforcemence during system bootup and running. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230407011507.17572-3-bhe@redhat.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Mark Rutland
|
414c109bdf |
arm64: mm: always map fixmap at page granularity
Today the fixmap code largely maps elements at PAGE_SIZE granularity, but we special-case the FDT mapping such that it can be mapped with 2M block mappings when 4K pages are in use. The original rationale for this was simplicity, but it has some unfortunate side-effects, and complicates portions of the fixmap code (i.e. is not so simple after all). The FDT can be up to 2M in size but is only required to have 8-byte alignment, and so it may straddle a 2M boundary. Thus when using 2M block mappings we may map up to 4M of memory surrounding the FDT. This is unfortunate as most of that memory will be unrelated to the FDT, and any pages which happen to share a 2M block with the FDT will by mapped with Normal Write-Back Cacheable attributes, which might not be what we want elsewhere (e.g. for carve-outs using Non-Cacheable attributes). The logic to handle mapping the FDT with 2M blocks requires some special cases in the fixmap code, and ties it to the early page table configuration by virtue of the SWAPPER_TABLE_SHIFT and SWAPPER_BLOCK_SIZE constants used to determine the granularity used to map the FDT. This patch simplifies the FDT logic and removes the unnecessary mappings of surrounding pages by always mapping the FDT at page granularity as with all other fixmap mappings. To do so we statically reserve multiple PTE tables to cover the fixmap VA range. Since the FDT can be at most 2M, for 4K pages we only need to allocate a single additional PTE table, and for 16K and 64K pages the existing single PTE table is sufficient. The PTE table allocation scales with the number of slots reserved in the fixmap, and so this also makes it easier to add more fixmap entries if we require those in future. Our VA layout means that the fixmap will always fall within a single PMD table (and consequently, within a single PUD/P4D/PGD entry), which we can verify at compile time with a static_assert(). With that assert a number of runtime warnings become impossible, and are removed. I've boot-tested this patch with both 4K and 64K pages. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230406152759.4164229-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Mark Rutland
|
b97547761b |
arm64: mm: move fixmap code to its own file
Over time, arm64's mm/mmu.c has become increasingly large and painful to navigate. Move the fixmap code to its own file where it can be understood in isolation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230406152759.4164229-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Mark Rutland
|
32f5b6995f |
arm64: add FIXADDR_TOT_{START,SIZE}
Currently arm64's FIXADDR_{START,SIZE} definitions only cover the runtime fixmap slots (and not the boot-time fixmap slots), but the code for creating the fixmap assumes that these definitions cover the entire fixmap range. This means that the ptdump boundaries are reported in a misleading way, missing the VA region of the runtime slots. In theory this could also cause the fixmap creation to go wrong if the boot-time fixmap slots end up spilling into a separate PMD entry, though luckily this is not currently the case in any configuration. While it seems like we could extend FIXADDR_{START,SIZE} to cover the entire fixmap area, core code relies upon these *only* covering the runtime slots. For example, fix_to_virt() and virt_to_fix() try to reject manipulation of the boot-time slots based upon FIXADDR_{START,SIZE}, while __fix_to_virt() and __virt_to_fix() can handle any fixmap slot. This patch follows the lead of x86 in commit: 55f49fcb879fbeeb ("x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP") ... and add new FIXADDR_TOT_{START,SIZE} definitions which cover the entire fixmap area, using these for the fixmap creation and ptdump code. As the boot-time fixmap slots are now rejected by fix_to_virt(), the early_fixmap_init() code is changed to consistently use __fix_to_virt(), as it already does in a few cases. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230406152759.4164229-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Suren Baghdasaryan
|
cd7f176aea |
arm64/mm: try VMA lock-based page fault handling first
Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Link: https://lkml.kernel.org/r/20230227173632.3292573-31-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Will Deacon
|
7bd6680b47 |
Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""
This reverts commit b7d9aae404841d9999b7476170867ae441e948d2. With the Qualcomm remoteproc driver now modified to use a carveout memory region in 57f72170a2b2 ("remoteproc: qcom_q6v5_mss: Use a carveout to authenticate modem headers"), we can reinstate c44094eee32f ("arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()") which relaxes the arm64 implementation of arch_dma_prep_coherent() to perform only a data cache clean operation, rather than a clean-and-invalidate. Signed-off-by: Will Deacon <will@kernel.org> |
||
Zhenhua Huang
|
bfa7965b33 |
mm,kfence: decouple kfence from page granularity mapping judgement
Kfence only needs its pool to be mapped as page granularity, if it is inited early. Previous judgement was a bit over protected. From [1], Mark suggested to "just map the KFENCE region a page granularity". So I decouple it from judgement and do page granularity mapping for kfence pool only. Need to be noticed that late init of kfence pool still requires page granularity mapping. Page granularity mapping in theory cost more(2M per 1GB) memory on arm64 platform. Like what I've tested on QEMU(emulated 1GB RAM) with gki_defconfig, also turning off rodata protection: Before: [root@liebao ]# cat /proc/meminfo MemTotal: 999484 kB After: [root@liebao ]# cat /proc/meminfo MemTotal: 1001480 kB To implement this, also relocate the kfence pool allocation before the linear mapping setting up, arm64_kfence_alloc_pool is to allocate phys addr, __kfence_pool is to be set after linear mapping set up. LINK: [1] https://lore.kernel.org/linux-arm-kernel/Y+IsdrvDNILA59UN@FVFF77S0Q05N/ Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/1679066974-690-1-git-send-email-quic_zhenhuah@quicinc.com Signed-off-by: Will Deacon <will@kernel.org> |
||
Linus Torvalds
|
39ce4395c3 |
arm64 fixes:
- In copy_highpage(), only reset the tag of the destination pointer if KASAN_HW_TAGS is enabled so that user-space MTE does not interfere with KASAN_SW_TAGS (which relies on top-byte-ignore). - Remove warning if SME is detected without SVE, the kernel can cope with such configuration (though none in the field currently). - In cfi_handler(), pass the ESR_EL1 value to die() for consistency with other die() callers. - Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte manipulation from the generic vmemmap_remap_pte() does not follow the required ARM break-before-make sequence (clear the pte, flush the TLBs, set the new pte). It may be re-enabled once this sequence is sorted. - Fix possible memory leak in the arm64 ACPI code if the SMCCC version and conduit checks fail. - Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores -falign-functions=N with -Os. - Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no randomisation would actually take place. -----BEGIN PGP SIGNATURE----- iQIyBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmQBHFEACgkQa9axLQDI XvHf9Q/3Zg8o/8HchnWSvzgV//9ljGrrDfAjbfZHrE2W4PCniSd0op0uXYsVK3IH Nk6ZDiRe5uIXKgHuSq5caOoL4aRk0hk1TpQ3RKCuh8E3ybhQe9gwYm8xEWXDSSWh QzcfENsKlZLpuMoSMILJ2NlMPMbMLprXNCUlgENBbRT7KUToHZKTwE6BL2AUI3tg RdMntccorybxk1hiXV1YKT8482i+x2gAnylYXFsq3eI+G54rdfiks+tft0CQV3ng 1/i1PfbnGC45sBoxXPqYXzBSUDNHpAqb5dwvtlVinGo3J6STxIvbM6Zi5Ma5hl3u QrhwyduwCTZ6wVOqzd4KAH9gmhJSzRG75OzCek2dTwU9KXVMOPEvp1ZfTwUXDx7J 5j8UkjGgrbtj6IioGqBAO/HiFfoty8EBtmlSZIj0thwxkM73ZBG6efQOJaVWh85m ioUzMC2Y5yfKLfHEcy9yKIQVizMYoz6fl+QHOEbVSoFhJKNRc4wt5CCJCvsbMHsu K8rvD/CI9jFMP9GEK7ObTaC7ICjUz/+8wbIrRrm5ObRQ65Tm2zv3OLqGnK8O5O4W gcDEraTnSPHDUtgG6dAEPFN5Wi9hT3zYC0xAcNhc3aZC5ofS5RD6YXIWJvqjWrvL 5k8G1gfa57C/hfxO6pPw7bg/nY8vvYpxUkZ9erRWD430g7y0Sg== =jfPa -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - In copy_highpage(), only reset the tag of the destination pointer if KASAN_HW_TAGS is enabled so that user-space MTE does not interfere with KASAN_SW_TAGS (which relies on top-byte-ignore). - Remove warning if SME is detected without SVE, the kernel can cope with such configuration (though none in the field currently). - In cfi_handler(), pass the ESR_EL1 value to die() for consistency with other die() callers. - Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte manipulation from the generic vmemmap_remap_pte() does not follow the required ARM break-before-make sequence (clear the pte, flush the TLBs, set the new pte). It may be re-enabled once this sequence is sorted. - Fix possible memory leak in the arm64 ACPI code if the SMCCC version and conduit checks fail. - Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores -falign-functions=N with -Os. - Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no randomisation would actually take place. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE arm64: acpi: Fix possible memory leak of ffh_ctxt arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP arm64: pass ESR_ELx to die() of cfi_handler arm64/fpsimd: Remove warning for SME without SVE arm64: Reset KASAN tag in copy_highpage with HW tags only |