linux/mm
Mike Kravetz 4643d67e8c hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS
Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in
the kernel-v5.2.3 testing.  This is caused by a race between hugetlb
page migration and page fault.

If a hugetlb page can not be allocated to satisfy a page fault, the task
is sent SIGBUS.  This is normal hugetlbfs behavior.  A hugetlb fault
mutex exists to prevent two tasks from trying to instantiate the same
page.  This protects against the situation where there is only one
hugetlb page, and both tasks would try to allocate.  Without the mutex,
one would fail and SIGBUS even though the other fault would be
successful.

There is a similar race between hugetlb page migration and fault.
Migration code will allocate a page for the target of the migration.  It
will then unmap the original page from all page tables.  It does this
unmap by first clearing the pte and then writing a migration entry.  The
page table lock is held for the duration of this clear and write
operation.  However, the beginnings of the hugetlb page fault code
optimistically checks the pte without taking the page table lock.  If
clear (as it can be during the migration unmap operation), a hugetlb
page allocation is attempted to satisfy the fault.  Note that the page
which will eventually satisfy this fault was already allocated by the
migration code.  However, the allocation within the fault path could
fail which would result in the task incorrectly being sent SIGBUS.

Ideally, we could take the hugetlb fault mutex in the migration code
when modifying the page tables.  However, locks must be taken in the
order of hugetlb fault mutex, page lock, page table lock.  This would
require significant rework of the migration code.  Instead, the issue is
addressed in the hugetlb fault code.  After failing to allocate a huge
page, take the page table lock and check for huge_pte_none before
returning an error.  This is the same check that must be made further in
the code even if page allocation is successful.

Link: http://lkml.kernel.org/r/20190808000533.7701-1-mike.kravetz@oracle.com
Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Li Wang <liwang@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13 16:06:53 -07:00
..
kasan mm/kasan: change kasan_check_{read,write} to return boolean 2019-07-12 11:05:42 -07:00
backing-dev.c backing-dev: no need to check return value of debugfs_create functions 2019-06-03 15:49:07 +02:00
balloon_compaction.c balloon: fix up comments 2019-07-22 11:19:26 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-07-16 19:23:21 -07:00
cma.h
compaction.c mm: compaction: avoid 100% CPU usage during compaction when a task is killed 2019-08-03 07:02:00 -07:00
debug_page_ref.c
debug.c mm: update references to page _refcount 2019-05-14 19:52:47 -07:00
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm/filemap.c: correct the comment about VM_FAULT_RETRY 2019-07-12 11:05:43 -07:00
frame_vector.c
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
gup.c mm: introduce ARCH_HAS_PTE_DEVMAP 2019-07-16 19:23:25 -07:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} 2019-07-25 16:14:39 -03:00
huge_memory.c Revert "mm, thp: restore node-local hugepage allocations" 2019-08-13 16:06:52 -07:00
hugetlb_cgroup.c
hugetlb.c hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS 2019-08-13 16:06:53 -07:00
hwpoison-inject.c hwpoison-inject: no need to check return value of debugfs_create functions 2019-06-03 15:39:40 +02:00
init-mm.c
internal.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
Kconfig mm: introduce ARCH_HAS_PTE_DEVMAP 2019-07-16 19:23:25 -07:00
Kconfig.debug mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
khugepaged.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c mm: kmemleak: disable early logging in case of error 2019-08-13 16:06:52 -07:00
ksm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
list_lru.c mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages 2019-07-12 11:05:44 -07:00
maccess.c The main changes in this release include: 2019-07-18 11:51:00 -07:00
madvise.c mm: remove MEMORY_DEVICE_PUBLIC support 2019-07-02 14:32:43 -03:00
Makefile memremap: move from kernel/ to mm/ 2019-08-03 07:02:01 -07:00
memblock.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
memcontrol.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
memfd.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
memory_hotplug.c mm/memory_hotplug.c: remove unneeded return for void function 2019-08-03 07:02:01 -07:00
memory-failure.c HMM patches for 5.3 2019-07-14 19:42:11 -07:00
memory.c mm: thp: make transhuge_vma_suitable available for anonymous THP 2019-07-18 17:08:06 -07:00
mempolicy.c Revert "mm, thp: restore node-local hugepage allocations" 2019-08-13 16:06:52 -07:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memremap.c mm/hmm: fix ZONE_DEVICE anon page mapping reuse 2019-08-13 16:06:52 -07:00
memtest.c
migrate.c mm/migrate.c: initialize pud_entry in migrate_vma() 2019-08-03 07:02:01 -07:00
mincore.c mm/mincore.c: fix race between swapoff and mincore 2019-07-12 11:05:43 -07:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-06-13 17:34:56 -10:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmu_context.c
mmu_gather.c mm: mmu_gather: remove __tlb_reset_range() for force flush 2019-06-13 17:34:56 -10:00
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-12 11:05:46 -07:00
mmzone.c
mprotect.c mm/mprotect.c: fix compilation warning because of unused 'mm' variable 2019-05-14 09:47:51 -07:00
mremap.c mm/mmu_notifier: contextual information for event triggering invalidation 2019-05-14 09:47:49 -07:00
msync.c
nommu.c mm: fix the MAP_UNINITIALIZED flag 2019-07-16 19:23:21 -07:00
oom_kill.c mm/oom_kill.c: remove redundant OOM score normalization in select_bad_process() 2019-07-12 11:05:47 -07:00
page_alloc.c mm/sparsemem: support sub-section hotplug 2019-07-18 17:08:07 -07:00
page_counter.c
page_ext.c mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c mm, swap: use rbtree for swap_extent 2019-07-12 11:05:43 -07:00
page_isolation.c mm/page_isolation.c: change the prototype of undo_isolate_page_range() 2019-07-12 11:05:43 -07:00
page_owner.c mm/page_owner: Simplify stack trace handling 2019-04-29 12:37:50 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-03-05 21:07:13 -08:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-10-31 08:54:11 -07:00
page-writeback.c mm: remove the account_page_dirtied export 2019-07-12 11:05:42 -07:00
pagewalk.c
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
pgtable-generic.c x86/mm: Page size aware flush_tlb_mm_range() 2018-10-09 16:51:11 +02:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
quicklist.c
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm/hmm: fix bad subpage pointer in try_to_unmap_one 2019-08-13 16:06:52 -07:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" 2019-08-13 16:06:52 -07:00
shuffle.c mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab_common.c mm/slab_common.c: work around clang bug #42570 2019-07-16 19:23:21 -07:00
slab.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
slab.h mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
slob.c mm/slab: refactor common ksize KASAN logic into slab_common.c 2019-07-12 11:05:42 -07:00
slub.c mm: slub: Fix slab walking for init_on_free 2019-07-31 13:16:06 -07:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm/sparsemem: cleanup 'section number' data types 2019-07-18 17:08:07 -07:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm/swap_state.c: simplify total_swapcache_pages() with get_swap_device() 2019-07-12 11:05:43 -07:00
swap.c docs: admin-guide: move sysctl directory to it 2019-07-15 11:03:01 -03:00
swapfile.c mm, swap: use rbtree for swap_extent 2019-07-12 11:05:43 -07:00
truncate.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
usercopy.c mm/usercopy: use memory range to be accessed for wraparound check 2019-08-13 16:06:52 -07:00
userfaultfd.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 2019-06-19 17:09:53 +02:00
util.c mm: add account_locked_vm utility function 2019-07-16 19:23:25 -07:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c mm/vmalloc.c: fix percpu free VM area search criteria 2019-08-13 16:06:52 -07:00
vmpressure.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
vmscan.c mm, vmscan: do not special-case slab reclaim when watermarks are boosted 2019-08-13 16:06:53 -07:00
vmstat.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
workingset.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
z3fold.c mm/z3fold.c: fix z3fold_destroy_pool() race condition 2019-08-13 16:06:52 -07:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zsmalloc.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
zswap.c zswap: ignore debugfs_create_dir() return value 2019-06-03 15:39:39 +02:00