linux/mm
Mike Kravetz 40549ba8f8 hugetlb: use new vma_lock for pmd sharing synchronization
The new hugetlb vma lock is used to address this race:

Faulting thread                                 Unsharing thread
...                                                  ...
ptep = huge_pte_offset()
      or
ptep = huge_pte_alloc()
...
                                                i_mmap_lock_write
                                                lock page table
ptep invalid   <------------------------        huge_pmd_unshare()
Could be in a previously                        unlock_page_table
sharing process or worse                        i_mmap_unlock_write
...

The vma_lock is used as follows:
- During fault processing. The lock is acquired in read mode before
  doing a page table lock and allocation (huge_pte_alloc).  The lock is
  held until code is finished with the page table entry (ptep).
- The lock must be held in write mode whenever huge_pmd_unshare is
  called.

Lock ordering issues come into play when unmapping a page from all
vmas mapping the page.  The i_mmap_rwsem must be held to search for the
vmas, and the vma lock must be held before calling unmap which will
call huge_pmd_unshare.  This is done today in:
- try_to_migrate_one and try_to_unmap_ for page migration and memory
  error handling.  In these routines we 'try' to obtain the vma lock and
  fail to unmap if unsuccessful.  Calling routines already deal with the
  failure of unmapping.
- hugetlb_vmdelete_list for truncation and hole punch.  This routine
  also tries to acquire the vma lock.  If it fails, it skips the
  unmapping.  However, we can not have file truncation or hole punch
  fail because of contention.  After hugetlb_vmdelete_list, truncation
  and hole punch call remove_inode_hugepages.  remove_inode_hugepages
  checks for mapped pages and call hugetlb_unmap_file_page to unmap them.
  hugetlb_unmap_file_page is designed to drop locks and reacquire in the
  correct order to guarantee unmap success.

Link: https://lkml.kernel.org/r/20220914221810.95771-9-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:17 -07:00
..
damon mm/damon/core: simplify the kdamond stop mechanism by removing 'done' 2022-10-03 14:03:14 -07:00
kasan kasan: better invalid/double-free report header 2022-10-03 14:03:02 -07:00
kfence mm: kfence: convert to DEFINE_SEQ_ATTRIBUTE 2022-10-03 14:03:07 -07:00
backing-dev.c mm: backing-dev: Remove the unneeded result variable 2022-09-11 20:26:02 -07:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-05-13 15:11:26 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c mm: add pageblock_aligned() macro 2022-10-03 14:03:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
debug.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
dmapool.c
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
failslab.c mm: fix missing handler for __GFP_NOWARN 2022-05-19 14:08:55 -07:00
filemap.c mm/filemap: make folio_put_wait_locked static 2022-10-03 14:03:15 -07:00
folio-compat.c mm: remove try_to_free_swap() 2022-10-03 14:02:53 -07:00
frontswap.c frontswap: don't call ->init if no ops are registered 2022-09-26 12:14:34 -07:00
gup_test.c mm: rename is_pinnable_page() to is_longterm_pinnable_page() 2022-07-17 17:14:27 -07:00
gup_test.h
gup.c mm/gup: use gup_can_follow_protnone() also in GUP-fast 2022-09-26 19:46:28 -07:00
highmem.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
hmm.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
huge_memory.c mm/huge_memory: prevent THP_ZERO_PAGE_ALLOC increased twice 2022-10-03 14:03:08 -07:00
hugetlb_cgroup.c hugetlb_cgroup: use helper for_each_hstate and hstate_index 2022-09-11 20:25:53 -07:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: simplify reset_struct_pages() 2022-09-11 20:25:58 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hugetlb.c hugetlb: use new vma_lock for pmd sharing synchronization 2022-10-03 14:03:17 -07:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
internal.h mm: use nth_page instead of mem_map_offset mem_map_next 2022-10-03 14:03:08 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig mm: multi-gen LRU: admin guide 2022-09-26 19:46:10 -07:00
Kconfig.debug Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
khugepaged.c khugepaged: call shmem_get_folio() 2022-10-03 14:02:50 -07:00
kmemleak.c mm/kmemleak: make create_object return void 2022-09-11 20:26:10 -07:00
ksm.c ksm: use a folio in replace_page() 2022-10-03 14:02:53 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c madvise: convert madvise_free_pte_range() to use a folio 2022-10-03 14:02:52 -07:00
Makefile mm: remove vmacache 2022-09-26 19:46:18 -07:00
mapping_dirty_helpers.c
memblock.c mm: add pageblock_align() macro 2022-10-03 14:03:04 -07:00
memcontrol.c mm/memcontrol: use kstrtobool for swapaccount param parsing 2022-10-03 14:03:14 -07:00
memfd.c
memory_hotplug.c mm: add pageblock_aligned() macro 2022-10-03 14:03:04 -07:00
memory-failure.c rmap: remove page_unlock_anon_vma_read() 2022-10-03 14:02:54 -07:00
memory-tiers.c mm/demotion: expose memory tier details via sysfs 2022-09-26 19:46:13 -07:00
memory.c hugetlb: use new vma_lock for pmd sharing synchronization 2022-10-03 14:03:17 -07:00
mempolicy.c mm/mempolicy: use PAGE_ALIGN instead of open-coding it 2022-10-03 14:03:15 -07:00
mempool.c mm/mempool: use might_alloc() 2022-06-16 19:48:30 -07:00
memremap.c mm/mremap_pages: save a few cycles in get_dev_pagemap() 2022-09-11 20:26:11 -07:00
memtest.c
migrate_device.c mm: remember young/dirty bit for page migrations 2022-09-26 19:46:05 -07:00
migrate.c mm: convert page_get_anon_vma() to folio_get_anon_vma() 2022-10-03 14:02:54 -07:00
mincore.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
mlock.c mm/mlock: drop dead code in count_mm_mlocked_page_nr() 2022-09-26 19:46:27 -07:00
mm_init.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c mm: refactor of vma_merge() 2022-09-26 19:46:27 -07:00
mmu_gather.c mm/mmu_gather: limit free batch count and add schedule point in tlb_batch_pages_flush 2022-04-28 23:16:12 -07:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mprotect.c mm/mprotect: use maple tree navigation instead of VMA linked list 2022-09-26 19:46:24 -07:00
mremap.c mm: add merging after mremap resize 2022-09-26 19:46:28 -07:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
oom_kill.c mm: reduce noise in show_mem for lowmem allocations 2022-09-26 19:46:29 -07:00
page_alloc.c mm/page_alloc.c: document bulkfree_pcp_prepare() return value 2022-10-03 14:03:14 -07:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c page_ext: introduce boot parameter 'early_page_ext' 2022-09-11 20:26:02 -07:00
page_idle.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
page_io.c swap: convert swap_writepage() to use a folio 2022-10-03 14:02:52 -07:00
page_isolation.c mm: add pageblock_aligned() macro 2022-10-03 14:03:04 -07:00
page_owner.c mm: reuse pageblock_start/end_pfn() macro 2022-10-03 14:03:03 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm: fix use-after free of page_ext after race with memory-offline 2022-09-11 20:25:57 -07:00
page_vma_mapped.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
page-writeback.c writeback: avoid use-after-free after removing device 2022-08-28 14:02:43 -07:00
pagewalk.c mm/pagewalk: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() 2022-07-17 17:14:47 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c filemap: Fix serialization adding transparent huge pages to page cache 2022-06-23 12:22:00 -04:00
rmap.c hugetlb: use new vma_lock for pmd sharing synchronization 2022-10-03 14:03:17 -07:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c mm: fix dereferencing possible ERR_PTR 2022-09-11 16:22:31 -07:00
shmem.c tmpfs: add support for an i_version counter 2022-10-03 14:03:06 -07:00
shrinker_debug.c mm: shrinkers: fix double kfree on shrinker name 2022-07-29 18:07:13 -07:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h
slab_common.c mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slab.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
slab.h mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slob.c mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slub.c kfence: add sysfs interface to disable kfence for selected slabs. 2022-09-11 20:25:52 -07:00
sparse-vmemmap.c mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c 2022-08-08 18:06:42 -07:00
sparse.c mm: memory_hotplug: enumerate all supported section flags 2022-07-03 18:08:49 -07:00
swap_cgroup.c
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c swap_state: convert free_swap_cache() to use a folio 2022-10-03 14:02:51 -07:00
swap.c mm: add folio_add_lru_vma() 2022-10-03 14:02:45 -07:00
swap.h mm: remove lookup_swap_cache() 2022-10-03 14:02:51 -07:00
swapfile.c memcg: convert mem_cgroup_swap_full() to take a folio 2022-10-03 14:02:53 -07:00
truncate.c mm: add split_folio() 2022-10-03 14:02:45 -07:00
usercopy.c usercopy: use unsigned long instead of uintptr_t 2022-07-01 17:03:38 -07:00
userfaultfd.c hugetlb: use new vma_lock for pmd sharing synchronization 2022-10-03 14:03:17 -07:00
util.c mm: remove the vma linked list 2022-09-26 19:46:26 -07:00
vmalloc.c mm/vmalloc: extend find_vmap_lowest_match_check with extra arguments 2022-09-11 20:26:05 -07:00
vmpressure.c
vmscan.c memcg: convert mem_cgroup_swap_full() to take a folio 2022-10-03 14:02:53 -07:00
vmstat.c mm: remove vmacache 2022-09-26 19:46:18 -07:00
workingset.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
z3fold.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: use correct types in _first_obj_offset functions 2022-10-03 14:03:07 -07:00
zswap.c mm/swap: remove the end_write_func argument to __swap_writepage 2022-09-11 20:25:50 -07:00