linux/mm
Qian Cai e4f8e513c3 mm/slub: fix a deadlock in show_slab_objects()
A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58bcba ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e25f ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.

Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.

  WARNING: possible circular locking dependency detected
  ------------------------------------------------------
  cat/5224 is trying to acquire lock:
  ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
  show_slab_objects+0x94/0x3a8

  but task is already holding lock:
  b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (kn->count#45){++++}:
         lock_acquire+0x31c/0x360
         __kernfs_remove+0x290/0x490
         kernfs_remove+0x30/0x44
         sysfs_remove_dir+0x70/0x88
         kobject_del+0x50/0xb0
         sysfs_slab_unlink+0x2c/0x38
         shutdown_cache+0xa0/0xf0
         kmemcg_cache_shutdown_fn+0x1c/0x34
         kmemcg_workfn+0x44/0x64
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #1 (slab_mutex){+.+.}:
         lock_acquire+0x31c/0x360
         __mutex_lock_common+0x16c/0xf78
         mutex_lock_nested+0x40/0x50
         memcg_create_kmem_cache+0x38/0x16c
         memcg_kmem_cache_create_func+0x3c/0x70
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #0 (mem_hotplug_lock.rw_sem){++++}:
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc

  other info that might help us debug this:

  Chain exists of:
    mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(kn->count#45);
                                 lock(slab_mutex);
                                 lock(kn->count#45);
    lock(mem_hotplug_lock.rw_sem);

   *** DEADLOCK ***

  3 locks held by cat/5224:
   #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
   #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
   #2: b8ff009693eee398 (kn->count#45){++++}, at:
  kernfs_seq_start+0x44/0xf0

  stack backtrace:
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0xd0/0x140
   print_circular_bug+0x368/0x380
   check_noncircular+0x248/0x250
   validate_chain+0xd10/0x2bcc
   __lock_acquire+0x7f4/0xb8c
   lock_acquire+0x31c/0x360
   get_online_mems+0x54/0x150
   show_slab_objects+0x94/0x3a8
   total_objects_show+0x28/0x34
   slab_attr_show+0x38/0x54
   sysfs_kf_seq_show+0x198/0x2d4
   kernfs_seq_show+0xa4/0xcc
   seq_read+0x30c/0x8a8
   kernfs_fop_read+0xa8/0x314
   __vfs_read+0x88/0x20c
   vfs_read+0xd8/0x10c
   ksys_read+0xb0/0x120
   __arm64_sys_read+0x54/0x88
   el0_svc_handler+0x170/0x240
   el0_svc+0x8/0xc

I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free.  There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html

[akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
Fixes: 01fb58bcba ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e25f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:00 -07:00
..
kasan mm: introduce compound_nr() 2019-09-24 15:54:08 -07:00
backing-dev.c bdi: Do not use freezable workqueue 2019-10-06 09:11:35 -06:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-07-16 19:23:21 -07:00
cma.h
compaction.c mm/compaction.c: remove unnecessary zone parameter in isolate_migratepages() 2019-09-24 15:54:10 -07:00
debug_page_ref.c
debug.c mm: update references to page _refcount 2019-05-14 19:52:47 -07:00
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c
fadvise.c fs: Export generic_fadvise() 2019-08-30 22:43:58 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm,thp: avoid writes to file with THP in pagecache 2019-09-24 15:54:11 -07:00
frame_vector.c mm: untag user pointers in get_vaddr_frames 2019-09-25 17:51:41 -07:00
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
gup.c mm: untag user pointers in mm/gup.c 2019-09-25 17:51:41 -07:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c pagewalk: separate function pointers from iterator data 2019-09-07 04:28:04 -03:00
huge_memory.c Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes) 2019-09-28 14:26:47 -07:00
hugetlb_cgroup.c mm: introduce compound_nr() 2019-09-24 15:54:08 -07:00
hugetlb.c hugetlbfs: don't retry when pool page allocations start to fail 2019-09-24 15:54:10 -07:00
hwpoison-inject.c hwpoison-inject: no need to check return value of debugfs_create functions 2019-06-03 15:39:40 +02:00
init-mm.c mm: use CPU_BITS_NONE to initialize init_mm.cpu_bitmask 2019-09-24 15:54:10 -07:00
internal.h mm: introduce MADV_COLD 2019-09-25 17:51:41 -07:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
Kconfig mm,thp: add read-only THP support for (non-shmem) FS 2019-09-24 15:54:11 -07:00
Kconfig.debug mm, page_owner, debug_pagealloc: save and dump freeing stack trace 2019-09-24 15:54:08 -07:00
khugepaged.c khugepaged: enable collapse pmd for pte-mapped THP 2019-09-24 15:54:11 -07:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c kmemleak: Do not corrupt the object_list during clean-up 2019-10-14 08:56:16 -07:00
ksm.c mm: move memcmp_pages() and pages_identical() 2019-09-24 15:54:11 -07:00
list_lru.c mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages 2019-07-12 11:05:44 -07:00
maccess.c The main changes in this release include: 2019-07-18 11:51:00 -07:00
madvise.c mm: factor out common parts between MADV_COLD and MADV_PAGEOUT 2019-09-25 17:51:41 -07:00
Makefile mm: silence -Woverride-init/initializer-overrides 2019-09-24 15:54:10 -07:00
memblock.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
memcontrol.c mm, memcg: proportional memory.{low,min} reclaim 2019-10-07 15:47:20 -07:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory_hotplug.c mm/memory_hotplug.c: s/is/if 2019-09-24 15:54:09 -07:00
memory-failure.c HMM patches for 5.3 2019-07-14 19:42:11 -07:00
memory.c mm: do not hash address in print_bad_pte() 2019-09-24 15:54:09 -07:00
mempolicy.c Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes) 2019-09-28 14:26:47 -07:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memremap.c mm/memremap: drop unused SECTION_SIZE and SECTION_MASK 2019-10-07 15:47:19 -07:00
memtest.c
migrate.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mincore.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mlock.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c mm: untag user pointers in mmap/munmap/mremap/brk 2019-09-25 17:51:41 -07:00
mmu_context.c
mmu_gather.c mm: remove quicklist page table caches 2019-09-24 15:54:09 -07:00
mmu_notifier.c mm, notifier: Catch sleeping/blocking for !blockable 2019-09-07 04:28:05 -03:00
mmzone.c
mprotect.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mremap.c mm: untag user pointers in mmap/munmap/mremap/brk 2019-09-25 17:51:41 -07:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c mm: introduce page_size() 2019-09-24 15:54:08 -07:00
oom_kill.c mm: introduce MADV_COLD 2019-09-25 17:51:41 -07:00
page_alloc.c mm/page_alloc.c: fix a crash in free_pages_prepare() 2019-10-07 15:47:19 -07:00
page_counter.c
page_ext.c mm, page_owner: fix off-by-one error in __set_page_owner_handle() 2019-10-14 15:04:00 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c mm, swap: use rbtree for swap_extent 2019-07-12 11:05:43 -07:00
page_isolation.c mm/page_isolation.c: change the prototype of undo_isolate_page_range() 2019-07-12 11:05:43 -07:00
page_owner.c mm, page_owner: rename flag indicating that page is allocated 2019-10-14 15:04:00 -07:00
page_poison.c mm/page_poison.c: fix a typo in a comment 2019-09-24 15:54:08 -07:00
page_vma_mapped.c mm: introduce page_size() 2019-09-24 15:54:08 -07:00
page-writeback.c writeback, memcg: Implement foreign dirty flushing 2019-08-27 09:22:38 -06:00
pagewalk.c pagewalk: use lockdep_assert_held for locking validation 2019-09-07 04:28:04 -03:00
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c percpu: Use struct_size() helper 2019-09-04 13:40:49 -07:00
pgtable-generic.c x86/mm: Page size aware flush_tlb_mm_range() 2018-10-09 16:51:11 +02:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm,thp: add read-only THP support for (non-shmem) FS 2019-09-24 15:54:11 -07:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-10-10 08:16:44 -07:00
shuffle.c mm: fix -Wmissing-prototypes warnings 2019-10-07 15:47:19 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab_common.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slab.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
slab.h mm, slab: move memcg_cache_params structure to mm/slab.h 2019-09-24 15:54:07 -07:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c mm/slub: fix a deadlock in show_slab_objects() 2019-10-14 15:04:00 -07:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm: fix -Wmissing-prototypes warnings 2019-10-07 15:47:19 -07:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
swap.c mm: introduce MADV_COLD 2019-09-25 17:51:41 -07:00
swapfile.c vfs: don't allow writes to swap files 2019-08-20 07:55:16 -07:00
truncate.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-09-17 15:20:17 -07:00
userfaultfd.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 2019-06-19 17:09:53 +02:00
util.c arm64, mm: make randomization selected by generic topdown mmap layout 2019-09-24 15:54:11 -07:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c augmented rbtree: add new RB_DECLARE_CALLBACKS_MAX macro 2019-09-25 17:51:39 -07:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-07 15:47:19 -07:00
vmscan.c mm, memcg: make scan aggression always exclude protection 2019-10-07 15:47:20 -07:00
vmstat.c mm,thp: stats for file backed THP 2019-09-24 15:54:11 -07:00
workingset.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
z3fold.c mm/z3fold.c: claim page in the beginning of free 2019-10-07 15:47:19 -07:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c mm/zsmalloc.c: fix a -Wunused-function warning 2019-09-24 15:54:12 -07:00
zswap.c zswap: do not map same object twice 2019-09-24 15:54:12 -07:00