linux/mm
Yang Shi 3c6f17e6c5 mm: vmscan: add per memcg shrinker nr_deferred
Currently the number of deferred objects are per shrinker, but some
slabs, for example, vfs inode/dentry cache are per memcg, this would
result in poor isolation among memcgs.

The deferred objects typically are generated by __GFP_NOFS allocations,
one memcg with excessive __GFP_NOFS allocations may blow up deferred
objects, then other innocent memcgs may suffer from over shrink,
excessive reclaim latency, etc.

For example, two workloads run in memcgA and memcgB respectively,
workload in B is vfs heavy workload.  Workload in A generates excessive
deferred objects, then B's vfs cache might be hit heavily (drop half of
caches) by B's limit reclaim or global reclaim.

We observed this hit in our production environment which was running vfs
heavy workload shown as the below tracing log:

  <...>-409454 [016] .... 28286961.747146: mm_shrink_slab_start: super_cache_scan+0x0/0x1a0 ffff9a83046f3458:
  nid: 1 objects to shrink 3641681686040 gfp_flags GFP_HIGHUSER_MOVABLE|__GFP_ZERO pgs_scanned 1 lru_pgs 15721
  cache items 246404277 delta 31345 total_scan 123202138
  <...>-409454 [022] .... 28287105.928018: mm_shrink_slab_end: super_cache_scan+0x0/0x1a0 ffff9a83046f3458:
  nid: 1 unused scan count 3641681686040 new scan count 3641798379189 total_scan 602
  last shrinker return val 123186855

The vfs cache and page cache ratio was 10:1 on this machine, and half of
caches were dropped.  This also resulted in significant amount of page
caches were dropped due to inodes eviction.

Make nr_deferred per memcg for memcg aware shrinkers would solve the
unfairness and bring better isolation.

The following patch will add nr_deferred to parent memcg when memcg
offline.  To preserve nr_deferred when reparenting memcgs to root, root
memcg needs shrinker_info allocated too.

When memcg is not enabled (!CONFIG_MEMCG or memcg disabled), the
shrinker's nr_deferred would be used.  And non memcg aware shrinkers use
shrinker's nr_deferred all the time.

Link: https://lkml.kernel.org/r/20210311190845.9708-10-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 11:27:23 -07:00
..
kasan kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
kfence kfence: make compatible with kmemleak 2021-03-25 09:22:55 -07:00
backing-dev.c mm/backing-dev.c: use might_alloc() 2021-02-26 09:41:01 -08:00
balloon_compaction.c
cleancache.c
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma.h mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
compaction.c mm: make alloc_contig_range handle in-use hugetlb pages 2021-05-05 11:27:22 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm: HUGE_VMAP arch support cleanup 2021-04-30 11:20:40 -07:00
debug.c mm/debug: improve memcg debugging 2021-02-24 13:38:27 -08:00
dmapool.c mm/dmapool: switch from strlcpy to strscpy 2021-04-30 11:20:39 -07:00
early_ioremap.c mm/early_ioremap.c: use __func__ instead of function name 2021-02-26 09:41:02 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c dax: account DAX entries as nrpages 2021-05-05 11:27:19 -07:00
frontswap.c
gup_test.c mm/gup_test.c: mark gup_test_init as __init function 2020-12-15 12:13:38 -08:00
gup_test.h selftests/vm: gup_test: introduce the dump_pages() sub-test 2020-12-15 12:13:38 -08:00
gup.c mm: gup: remove FOLL_SPLIT 2021-04-30 11:20:37 -07:00
highmem.c mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP 2021-03-25 09:22:55 -07:00
hmm.c
huge_memory.c mm: vmscan: consolidate shrinker_maps handling code 2021-05-05 11:27:23 -07:00
hugetlb_cgroup.c hugetlb: make free_huge_page irq safe 2021-05-05 11:27:22 -07:00
hugetlb.c userfaultfd: add UFFDIO_CONTINUE ioctl 2021-05-05 11:27:22 -07:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mm,compaction: let isolate_migratepages_{range,block} return error codes 2021-05-05 11:27:22 -07:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm: move vmap_range from mm/ioremap.c to mm/vmalloc.c 2021-04-30 11:20:40 -07:00
Kconfig mm: generalize HUGETLB_PAGE_SIZE_VARIABLE 2021-05-05 11:27:20 -07:00
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
khugepaged.c mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks 2021-05-05 11:27:23 -07:00
kmemleak.c mm/kmemleak.c: fix a typo 2021-04-30 11:20:36 -07:00
ksm.c mm: cleanup kstrto*() usage 2020-12-15 12:13:47 -08:00
list_lru.c mm: vmscan: consolidate shrinker_maps handling code 2021-05-05 11:27:23 -07:00
maccess.c
madvise.c mm/madvise: replace ptrace attach requirement for process_madvise 2021-03-13 11:27:30 -08:00
Makefile mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: guard hugepage pud's usage 2021-04-16 16:10:37 -07:00
memblock.c memblock: remove return value of memblock_free_all() 2021-02-22 13:01:23 -08:00
memcontrol.c mm: memcontrol: rename shrinker_map to shrinker_info 2021-05-05 11:27:23 -07:00
memfd.c
memory_hotplug.c arm64: mte: Map hotplugged memory as Normal Tagged 2021-03-10 10:56:46 +00:00
memory-failure.c mm/memory-failure: unnecessary amount of unmapping 2021-04-30 11:20:44 -07:00
memory.c mm: apply_to_pte_range warn and fail if a large pte is encountered 2021-04-30 11:20:39 -07:00
mempolicy.c mm/mempolicy: fix mpol_misplaced kernel-doc 2021-04-30 11:20:43 -07:00
mempool.c kasan, mm: integrate page_alloc init with HW_TAGS 2021-04-30 11:20:41 -07:00
memremap.c mm/memremap.c: fix improper SPDX comment style 2021-04-30 11:20:37 -07:00
memtest.c
migrate.c mm/page_alloc: combine __alloc_pages and __alloc_pages_nodemask 2021-04-30 11:20:42 -07:00
mincore.c inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
mlock.c mm/mlock: stop counting mlocked pages when none vma is found 2021-02-26 09:41:01 -08:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap_lock.c mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mmap.c Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio" 2021-04-30 11:20:39 -07:00
mmu_gather.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
mprotect.c mm/mprotect.c: optimize error detection in do_mprotect_pkey() 2021-02-24 13:38:30 -08:00
mremap.c Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio" 2021-04-30 11:20:39 -07:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c mm/nommu: Fix return type of filemap_map_pages() 2021-01-28 14:10:31 +00:00
oom_kill.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
page_alloc.c mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks 2021-05-05 11:27:23 -07:00
page_counter.c mm: page_counter: mitigate consequences of a page_counter underflow 2021-04-30 11:20:38 -07:00
page_ext.c mm: fix some spelling mistakes in comments 2020-12-15 22:46:19 -08:00
page_idle.c mm: page_idle_get_page() does not need lru_lock 2020-12-15 14:48:03 -08:00
page_io.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
page_isolation.c mm/page_isolation: do not isolate the max order page 2020-12-15 12:13:45 -08:00
page_owner.c mm: page_owner: detect page_owner recursion via task_struct 2021-04-30 11:20:36 -07:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: use list_entry_is_head() in page_reporting_cycle() 2021-02-24 13:38:30 -08:00
page_reporting.h
page_vma_mapped.c mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte 2020-12-15 12:13:41 -08:00
page-writeback.c mm: page-writeback: simplify memcg handling in test_clear_page_writeback() 2021-04-30 11:20:37 -07:00
pagewalk.c
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-09 13:58:38 +00:00
percpu-km.c
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-09 13:58:38 +00:00
percpu-vm.c mm/vmalloc: remove unmap_kernel_range 2021-04-30 11:20:40 -07:00
percpu.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-09 13:58:38 +00:00
pgalloc-track.h
pgtable-generic.c mm/pgtable-generic.c: optimize the VM_BUG_ON condition in pmdp_huge_clear_flush() 2021-02-24 13:38:30 -08:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-12 18:12:54 -08:00
ptdump.c mm: ptdump: fix build failure 2021-04-16 16:10:37 -07:00
readahead.c mm: Implement readahead_control pageset expansion 2021-04-23 10:14:29 +01:00
rmap.c mm/rmap: correct obsolete comment of page_get_anon_vma() 2021-02-26 09:41:01 -08:00
rodata_test.c
shmem.c shmem: allow reporting fanotify events with file handles on tmpfs 2021-04-19 16:03:48 +02:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h
slab_common.c mm/slab_common: provide "slab_merge" option for !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT) builds 2021-04-30 11:20:36 -07:00
slab.c kasan, mm: integrate slab init_on_free with HW_TAGS 2021-04-30 11:20:41 -07:00
slab.h kasan, mm: integrate slab init_on_alloc with HW_TAGS 2021-04-30 11:20:41 -07:00
slob.c mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels 2021-03-08 14:18:46 -08:00
slub.c kasan, mm: integrate slab init_on_free with HW_TAGS 2021-04-30 11:20:41 -07:00
sparse-vmemmap.c
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-04-30 11:20:39 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: remove redundant NULL check 2021-02-24 13:38:28 -08:00
swap_state.c mm: stop accounting shadow entries 2021-05-05 11:27:19 -07:00
swap.c mm: remove pagevec_lookup_entries 2021-02-26 09:40:59 -08:00
swapfile.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
truncate.c mm: stop accounting shadow entries 2021-05-05 11:27:19 -07:00
usercopy.c
userfaultfd.c userfaultfd: add UFFDIO_CONTINUE ioctl 2021-05-05 11:27:22 -07:00
util.c mm: move page_mapping_file to pagemap.h 2021-04-30 11:20:37 -07:00
vmacache.c
vmalloc.c mm/vmalloc: remove an empty line 2021-04-30 11:20:40 -07:00
vmpressure.c
vmscan.c mm: vmscan: add per memcg shrinker nr_deferred 2021-05-05 11:27:23 -07:00
vmstat.c mm/vmstat.c: erase latency in vmstat_shepherd 2021-02-26 09:41:00 -08:00
workingset.c mm: stop accounting shadow entries 2021-05-05 11:27:19 -07:00
z3fold.c z3fold: prevent reclaim/free race for headless pages 2021-03-25 09:22:55 -07:00
zbud.c mm: set the sleep_mapped to true for zbud and z3fold 2021-02-26 09:41:01 -08:00
zpool.c mm/zswap: add the flag can_sleep_mapped 2021-02-26 09:41:01 -08:00
zsmalloc.c mm/zsmalloc.c: use page_private() to access page->private 2021-02-26 09:41:01 -08:00
zswap.c mm/zswap: add the flag can_sleep_mapped 2021-02-26 09:41:01 -08:00