linux/arch/arm64/mm/flush.c
Muchun Song 1e63ac088f arm64: mm: hugetlb: enable HUGETLB_PAGE_FREE_VMEMMAP for arm64
The feature of minimizing overhead of struct page associated with each
HugeTLB page aims to free its vmemmap pages (used as struct page) to save
memory, where is ~14GB/16GB per 1TB HugeTLB pages (2MB/1GB type).  In
short, when a HugeTLB page is allocated or freed, the vmemmap array
representing the range associated with the page will need to be remapped. 
When a page is allocated, vmemmap pages are freed after remapping.  When a
page is freed, previously discarded vmemmap pages must be allocated before
remapping.  More implementations and details can be found here [1].

The infrastructure of freeing vmemmap pages associated with each HugeTLB
page is already there, we can easily enable HUGETLB_PAGE_FREE_VMEMMAP for
arm64, the only thing to be fixed is flush_dcache_page() .

flush_dcache_page() need to be adapted to operate on the head page's flags
since the tail vmemmap pages are mapped with read-only after the feature
is enabled (clear operation is not permitted).

There was some discussions about this in the thread [2], but there was no
conclusion in the end.  And I copied the concern proposed by Anshuman to
here and explain why those concern is superfluous.  It is safe to enable
it for x86_64 as well as arm64.

1st concern:
'''
But what happens when a hot remove section's vmemmap area (which is
being teared down) is nearby another vmemmap area which is either created
or being destroyed for HugeTLB alloc/free purpose. As you mentioned
HugeTLB pages inside the hot remove section might be safe. But what about
other HugeTLB areas whose vmemmap area shares page table entries with
vmemmap entries for a section being hot removed ? Massive HugeTLB alloc
/use/free test cycle using memory just adjacent to a memory hotplug area,
which is always added and removed periodically, should be able to expose
this problem.
'''

Answer: At the time memory is removed, all HugeTLB pages either have been
migrated away or dissolved.  So there is no race between memory hot remove
and free_huge_page_vmemmap().  Therefore, HugeTLB pages inside the hot
remove section is safe.  Let's talk your question "what about other
HugeTLB areas whose vmemmap area shares page table entries with vmemmap
entries for a section being hot removed ?", the question is not
established.  The minimal granularity size of hotplug memory 128MB (on
arm64, 4k base page), any HugeTLB smaller than 128MB is within a section,
then, there is no share PTE page tables between HugeTLB in this section
and ones in other sections and a HugeTLB page could not cross two
sections.  In this case, the section cannot be freed.  Any HugeTLB bigger
than 128MB (section size) whose vmemmap pages is an integer multiple of
2MB (PMD-mapped).  As long as:

  1) HugeTLBs are naturally aligned, power-of-two sizes
  2) The HugeTLB size >= the section size
  3) The HugeTLB size >= the vmemmap leaf mapping size

Then a HugeTLB will not share any leaf page table entries with *anything
else*, but will share intermediate entries.  In this case, at the time
memory is removed, all HugeTLB pages either have been migrated away or
dissolved.  So there is also no race between memory hot remove and
free_huge_page_vmemmap().

2nd concern:
'''
differently, not sure if ptdump would require any synchronization.

Dumping an wrong value is probably okay but crashing because a page table
entry is being freed after ptdump acquired the pointer is bad. On arm64,
ptdump() is protected against hotremove via [get|put]_online_mems().
'''

Answer: The ptdump should be fine since vmemmap_remap_free() only
exchanges PTEs or splits the PMD entry (which means allocating a PTE page
table).  Both operations do not free any page tables (PTE), so ptdump
cannot run into a UAF on any page tables.  The worst case is just dumping
an wrong value.

[1] https://lore.kernel.org/all/20210510030027.56044-1-songmuchun@bytedance.com/
[2] https://lore.kernel.org/all/20210518091826.36937-1-songmuchun@bytedance.com/

[songmuchun@bytedance.com: restructure the code comment inside flush_dcache_page()]
  Link: https://lkml.kernel.org/r/20220414072646.21910-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220331065640.5777-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Tested-by: Barry Song <baohua@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:03 -07:00

117 lines
3.2 KiB
C

// SPDX-License-Identifier: GPL-2.0-only
/*
* Based on arch/arm/mm/flush.c
*
* Copyright (C) 1995-2002 Russell King
* Copyright (C) 2012 ARM Ltd.
*/
#include <linux/export.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
#include <asm/cacheflush.h>
#include <asm/cache.h>
#include <asm/tlbflush.h>
void sync_icache_aliases(unsigned long start, unsigned long end)
{
if (icache_is_aliasing()) {
dcache_clean_pou(start, end);
icache_inval_all_pou();
} else {
/*
* Don't issue kick_all_cpus_sync() after I-cache invalidation
* for user mappings.
*/
caches_clean_inval_pou(start, end);
}
}
static void flush_ptrace_access(struct vm_area_struct *vma, unsigned long start,
unsigned long end)
{
if (vma->vm_flags & VM_EXEC)
sync_icache_aliases(start, end);
}
/*
* Copy user data from/to a page which is mapped into a different processes
* address space. Really, we want to allow our "user space" model to handle
* this.
*/
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long uaddr, void *dst, const void *src,
unsigned long len)
{
memcpy(dst, src, len);
flush_ptrace_access(vma, (unsigned long)dst, (unsigned long)dst + len);
}
void __sync_icache_dcache(pte_t pte)
{
struct page *page = pte_page(pte);
/*
* HugeTLB pages are always fully mapped, so only setting head page's
* PG_dcache_clean flag is enough.
*/
if (PageHuge(page))
page = compound_head(page);
if (!test_bit(PG_dcache_clean, &page->flags)) {
sync_icache_aliases((unsigned long)page_address(page),
(unsigned long)page_address(page) +
page_size(page));
set_bit(PG_dcache_clean, &page->flags);
}
}
EXPORT_SYMBOL_GPL(__sync_icache_dcache);
/*
* This function is called when a page has been modified by the kernel. Mark
* it as dirty for later flushing when mapped in user space (if executable,
* see __sync_icache_dcache).
*/
void flush_dcache_page(struct page *page)
{
/*
* Only the head page's flags of HugeTLB can be cleared since the tail
* vmemmap pages associated with each HugeTLB page are mapped with
* read-only when CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is enabled (more
* details can refer to vmemmap_remap_pte()). Although
* __sync_icache_dcache() only set PG_dcache_clean flag on the head
* page struct, there is more than one page struct with PG_dcache_clean
* associated with the HugeTLB page since the head vmemmap page frame
* is reused (more details can refer to the comments above
* page_fixed_fake_head()).
*/
if (hugetlb_free_vmemmap_enabled() && PageHuge(page))
page = compound_head(page);
if (test_bit(PG_dcache_clean, &page->flags))
clear_bit(PG_dcache_clean, &page->flags);
}
EXPORT_SYMBOL(flush_dcache_page);
/*
* Additional functions defined in assembly.
*/
EXPORT_SYMBOL(caches_clean_inval_pou);
#ifdef CONFIG_ARCH_HAS_PMEM_API
void arch_wb_cache_pmem(void *addr, size_t size)
{
/* Ensure order against any prior non-cacheable writes */
dmb(osh);
dcache_clean_pop((unsigned long)addr, (unsigned long)addr + size);
}
EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
void arch_invalidate_pmem(void *addr, size_t size)
{
dcache_inval_poc((unsigned long)addr, (unsigned long)addr + size);
}
EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
#endif