linux/mm
Andrea Arcangeli 59ea6d06cf coredump: fix race condition between collapse_huge_page() and core dumping
When fixing the race conditions between the coredump and the mmap_sem
holders outside the context of the process, we focused on
mmget_not_zero()/get_task_mm() callers in 04f5866e41 ("coredump: fix
race condition between mmget_not_zero()/get_task_mm() and core
dumping"), but those aren't the only cases where the mmap_sem can be
taken outside of the context of the process as Michal Hocko noticed
while backporting that commit to older -stable kernels.

If mmgrab() is called in the context of the process, but then the
mm_count reference is transferred outside the context of the process,
that can also be a problem if the mmap_sem has to be taken for writing
through that mm_count reference.

khugepaged registration calls mmgrab() in the context of the process,
but the mmap_sem for writing is taken later in the context of the
khugepaged kernel thread.

collapse_huge_page() after taking the mmap_sem for writing doesn't
modify any vma, so it's not obvious that it could cause a problem to the
coredump, but it happens to modify the pmd in a way that breaks an
invariant that pmd_trans_huge_lock() relies upon.  collapse_huge_page()
needs the mmap_sem for writing just to block concurrent page faults that
call pmd_trans_huge_lock().

Specifically the invariant that "!pmd_trans_huge()" cannot become a
"pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.

The coredump will call __get_user_pages() without mmap_sem for reading,
which eventually can invoke a lockless page fault which will need a
functional pmd_trans_huge_lock().

So collapse_huge_page() needs to use mmget_still_valid() to check it's
not running concurrently with the coredump...  as long as the coredump
can invoke page faults without holding the mmap_sem for reading.

This has "Fixes: khugepaged" to facilitate backporting, but in my view
it's more a bug in the coredump code that will eventually have to be
rewritten to stop invoking page faults without the mmap_sem for reading.
So the long term plan is still to drop all mmget_still_valid().

Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
Fixes: ba76149f47 ("thp: khugepaged")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-13 17:34:56 -10:00
..
kasan kasan: initialize tag to 0xff in __kasan_kmalloc 2019-06-01 15:51:31 -07:00
backing-dev.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
balloon_compaction.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
cleancache.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
cma.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98 2019-05-24 17:37:54 +02:00
cma.h
compaction.c mm, compaction: make sure we isolate a valid PFN 2019-06-01 15:51:32 -07:00
debug_page_ref.c
debug.c mm: update references to page _refcount 2019-05-14 19:52:47 -07:00
dmapool.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 403 2019-06-05 17:37:13 +02:00
early_ioremap.c
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c mm: no need to check return value of debugfs_create functions 2019-03-05 21:07:17 -08:00
filemap.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
frame_vector.c
frontswap.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
gup.c mm/gup: continue VM_FAULT_RETRY processing even for pre-faults 2019-06-01 15:51:31 -07:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
huge_memory.c mm/huge_memory.c: make __thp_get_unmapped_area static 2019-05-14 09:47:51 -07:00
hugetlb_cgroup.c mm: rename page_counter's count/limit into usage/max 2018-06-07 17:34:35 -07:00
hugetlb.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
hwpoison-inject.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
init-mm.c mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids 2018-07-17 09:35:30 +02:00
internal.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
interval_tree.c
Kconfig treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Kconfig.debug treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
khugepaged.c coredump: fix race condition between collapse_huge_page() and core dumping 2019-06-13 17:34:56 -10:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
ksm.c mm/mmu_notifier: use correct mmu_notifier events for each invalidation 2019-05-14 09:47:49 -07:00
list_lru.c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node 2019-06-13 17:34:56 -10:00
maccess.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
madvise.c mm/mmu_notifier: use correct mmu_notifier events for each invalidation 2019-05-14 09:47:49 -07:00
Makefile mm: shuffle initial free memory to improve memory-side-cache utilization 2019-05-14 19:52:48 -07:00
memblock.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
memcontrol.c mm: memcontrol: don't batch updates of local VM stats and events 2019-06-13 17:34:56 -10:00
memfd.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
memory_hotplug.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
memory-failure.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 263 2019-06-05 17:30:28 +02:00
memory.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mempolicy.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225 2019-05-30 11:29:56 -07:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memtest.c
migrate.c mm/mmu_notifier: use correct mmu_notifier events for each invalidation 2019-05-14 09:47:49 -07:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-14 19:52:48 -07:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-06-13 17:34:56 -10:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmu_context.c
mmu_gather.c mm: mmu_gather: remove __tlb_reset_range() for force flush 2019-06-13 17:34:56 -10:00
mmu_notifier.c mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper 2019-05-14 09:47:49 -07:00
mmzone.c
mprotect.c mm/mprotect.c: fix compilation warning because of unused 'mm' variable 2019-05-14 09:47:51 -07:00
mremap.c mm/mmu_notifier: contextual information for event triggering invalidation 2019-05-14 09:47:49 -07:00
msync.c
nommu.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
oom_kill.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
page_alloc.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c memblock: drop memblock_alloc_*_nopanic() variants 2019-03-12 10:04:02 -07:00
page_idle.c mm: remove zone_lru_lock() function, access ->lru_lock directly 2019-03-05 21:07:21 -08:00
page_io.c mm/page_io.c: fix polled swap page in 2019-01-04 13:13:48 -08:00
page_isolation.c mm/page_isolation.c: remove redundant pfn_valid_within() in __first_valid_page() 2019-05-14 09:47:46 -07:00
page_owner.c mm/page_owner: Simplify stack trace handling 2019-04-29 12:37:50 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-03-05 21:07:13 -08:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-10-31 08:54:11 -07:00
page-writeback.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
pgtable-generic.c x86/mm: Page size aware flush_tlb_mm_range() 2018-10-09 16:51:11 +02:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
quicklist.c
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm/rmap.c: use the pra.mapcount to do the check 2019-05-14 09:47:49 -07:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
shuffle.c mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab_common.c mm: add support for kmem caches in DMA32 zone 2019-03-29 10:01:37 -07:00
slab.c slab: remove /proc/slab_allocators 2019-05-16 15:51:55 -07:00
slab.h mm: add support for kmem caches in DMA32 zone 2019-03-29 10:01:37 -07:00
slob.c slob: use slab_list instead of lru 2019-05-14 09:47:44 -07:00
slub.c mm/slub.c: update the comment about slab frozen 2019-05-14 09:47:45 -07:00
sparse-vmemmap.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sparse.c mm/sparse.c: clean up obsolete code comment 2019-05-14 09:47:48 -07:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm: page cache: store only head pages in i_pages 2019-05-14 09:47:45 -07:00
swap.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
swapfile.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
truncate.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
usercopy.c mm/usercopy.c: no check page span for stack objects 2019-01-08 17:15:11 -08:00
userfaultfd.c hugetlb: use same fault hash key for shared and private mappings 2019-05-14 09:47:48 -07:00
util.c prctl_set_mm: downgrade mmap_sem to read lock 2019-06-01 15:51:31 -07:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c mm/vmalloc.c: fix typo in comment 2019-06-01 15:51:31 -07:00
vmpressure.c mm/vmpressure.c: convert to use match_string() helper 2018-06-07 17:34:36 -07:00
vmscan.c mm/vmscan.c: fix recent_rotated history 2019-06-13 17:34:56 -10:00
vmstat.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
workingset.c mm: memcontrol: make cgroup stats and events query API explicitly local 2019-05-14 19:52:53 -07:00
z3fold.c z3fold: fix sheduling while atomic 2019-06-01 15:51:31 -07:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zsmalloc.c mm/zsmalloc.c: fix fall-through annotation 2018-10-26 16:26:35 -07:00
zswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00