linux

iv/linux

History

Yang Shi 3c6f17e6c5 mm: vmscan: add per memcg shrinker nr_deferred Currently the number of deferred objects are per shrinker, but some slabs, for example, vfs inode/dentry cache are per memcg, this would result in poor isolation among memcgs. The deferred objects typically are generated by __GFP_NOFS allocations, one memcg with excessive __GFP_NOFS allocations may blow up deferred objects, then other innocent memcgs may suffer from over shrink, excessive reclaim latency, etc. For example, two workloads run in memcgA and memcgB respectively, workload in B is vfs heavy workload. Workload in A generates excessive deferred objects, then B's vfs cache might be hit heavily (drop half of caches) by B's limit reclaim or global reclaim. We observed this hit in our production environment which was running vfs heavy workload shown as the below tracing log: <...>-409454 [016] .... 28286961.747146: mm_shrink_slab_start: super_cache_scan+0x0/0x1a0 ffff9a83046f3458: nid: 1 objects to shrink 3641681686040 gfp_flags GFP_HIGHUSER_MOVABLE\|__GFP_ZERO pgs_scanned 1 lru_pgs 15721 cache items 246404277 delta 31345 total_scan 123202138 <...>-409454 [022] .... 28287105.928018: mm_shrink_slab_end: super_cache_scan+0x0/0x1a0 ffff9a83046f3458: nid: 1 unused scan count 3641681686040 new scan count 3641798379189 total_scan 602 last shrinker return val 123186855 The vfs cache and page cache ratio was 10:1 on this machine, and half of caches were dropped. This also resulted in significant amount of page caches were dropped due to inodes eviction. Make nr_deferred per memcg for memcg aware shrinkers would solve the unfairness and bring better isolation. The following patch will add nr_deferred to parent memcg when memcg offline. To preserve nr_deferred when reparenting memcgs to root, root memcg needs shrinker_info allocated too. When memcg is not enabled (!CONFIG_MEMCG or memcg disabled), the shrinker's nr_deferred would be used. And non memcg aware shrinkers use shrinker's nr_deferred all the time. Link: https://lkml.kernel.org/r/20210311190845.9708-10-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2021-05-05 11:27:23 -07:00
..
kasan	kasan: record task_work_add() call stack	2021-04-30 11:20:42 -07:00
kfence	kfence: make compatible with kmemleak	2021-03-25 09:22:55 -07:00
backing-dev.c	mm/backing-dev.c: use might_alloc()	2021-02-26 09:41:01 -08:00
balloon_compaction.c
cleancache.c
cma_debug.c	mm/cma: change cma mutex to irq safe spinlock	2021-05-05 11:27:21 -07:00
cma.c	mm/cma: change cma mutex to irq safe spinlock	2021-05-05 11:27:21 -07:00
cma.h	mm/cma: change cma mutex to irq safe spinlock	2021-05-05 11:27:21 -07:00
compaction.c	mm: make alloc_contig_range handle in-use hugetlb pages	2021-05-05 11:27:22 -07:00
debug_page_ref.c
debug_vm_pgtable.c	mm: HUGE_VMAP arch support cleanup	2021-04-30 11:20:40 -07:00
debug.c	mm/debug: improve memcg debugging	2021-02-24 13:38:27 -08:00
dmapool.c	mm/dmapool: switch from strlcpy to strscpy	2021-04-30 11:20:39 -07:00
early_ioremap.c	mm/early_ioremap.c: use __func__ instead of function name	2021-02-26 09:41:02 -08:00
fadvise.c	mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED	2020-10-13 18:38:29 -07:00
failslab.c
filemap.c	dax: account DAX entries as nrpages	2021-05-05 11:27:19 -07:00
frontswap.c
gup_test.c	mm/gup_test.c: mark gup_test_init as __init function	2020-12-15 12:13:38 -08:00
gup_test.h	selftests/vm: gup_test: introduce the dump_pages() sub-test	2020-12-15 12:13:38 -08:00
gup.c	mm: gup: remove FOLL_SPLIT	2021-04-30 11:20:37 -07:00
highmem.c	mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP	2021-03-25 09:22:55 -07:00
hmm.c
huge_memory.c	mm: vmscan: consolidate shrinker_maps handling code	2021-05-05 11:27:23 -07:00
hugetlb_cgroup.c	hugetlb: make free_huge_page irq safe	2021-05-05 11:27:22 -07:00
hugetlb.c	userfaultfd: add UFFDIO_CONTINUE ioctl	2021-05-05 11:27:22 -07:00
hwpoison-inject.c	mm,hwpoison-inject: don't pin for hwpoison_filter	2020-10-16 11:11:16 -07:00
init-mm.c	mm/gup: prevent gup_fast from racing with COW during fork	2020-12-15 12:13:39 -08:00
internal.h	mm,compaction: let isolate_migratepages_{range,block} return error codes	2021-05-05 11:27:22 -07:00
interval_tree.c	mm/interval_tree: add comments to improve code readability	2021-04-30 11:20:38 -07:00
io-mapping.c	mm: add a io_mapping_map_user helper	2021-04-30 11:20:39 -07:00
ioremap.c	mm: move vmap_range from mm/ioremap.c to mm/vmalloc.c	2021-04-30 11:20:40 -07:00
Kconfig	mm: generalize HUGETLB_PAGE_SIZE_VARIABLE	2021-05-05 11:27:20 -07:00
Kconfig.debug	mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO	2020-12-15 12:13:46 -08:00
khugepaged.c	mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks	2021-05-05 11:27:23 -07:00
kmemleak.c	mm/kmemleak.c: fix a typo	2021-04-30 11:20:36 -07:00
ksm.c	mm: cleanup kstrto*() usage	2020-12-15 12:13:47 -08:00
list_lru.c	mm: vmscan: consolidate shrinker_maps handling code	2021-05-05 11:27:23 -07:00
maccess.c
madvise.c	mm/madvise: replace ptrace attach requirement for process_madvise	2021-03-13 11:27:30 -08:00
Makefile	mm: add a io_mapping_map_user helper	2021-04-30 11:20:39 -07:00
mapping_dirty_helpers.c	mm/mapping_dirty_helpers: guard hugepage pud's usage	2021-04-16 16:10:37 -07:00
memblock.c	memblock: remove return value of memblock_free_all()	2021-02-22 13:01:23 -08:00
memcontrol.c	mm: memcontrol: rename shrinker_map to shrinker_info	2021-05-05 11:27:23 -07:00
memfd.c
memory_hotplug.c	arm64: mte: Map hotplugged memory as Normal Tagged	2021-03-10 10:56:46 +00:00
memory-failure.c	mm/memory-failure: unnecessary amount of unmapping	2021-04-30 11:20:44 -07:00
memory.c	mm: apply_to_pte_range warn and fail if a large pte is encountered	2021-04-30 11:20:39 -07:00
mempolicy.c	mm/mempolicy: fix mpol_misplaced kernel-doc	2021-04-30 11:20:43 -07:00
mempool.c	kasan, mm: integrate page_alloc init with HW_TAGS	2021-04-30 11:20:41 -07:00
memremap.c	mm/memremap.c: fix improper SPDX comment style	2021-04-30 11:20:37 -07:00
memtest.c
migrate.c	mm/page_alloc: combine __alloc_pages and __alloc_pages_nodemask	2021-04-30 11:20:42 -07:00
mincore.c	inode: make init and permission helpers idmapped mount aware	2021-01-24 14:27:16 +01:00
mlock.c	mm/mlock: stop counting mlocked pages when none vma is found	2021-02-26 09:41:01 -08:00
mm_init.c	include/linux/page-flags-layout.h: cleanups	2021-04-30 11:20:42 -07:00
mmap_lock.c	mm: mmap_lock: add tracepoints around lock acquisition	2020-12-15 12:13:41 -08:00
mmap.c	Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"	2021-04-30 11:20:39 -07:00
mmu_gather.c	mm: eliminate "expecting prototype" kernel-doc warnings	2021-04-16 16:10:36 -07:00
mmu_notifier.c	mm/mmu_notifiers: ensure range_end() is paired with range_start()	2021-03-25 09:22:55 -07:00
mmzone.c	mm/lru: replace pgdat lru_lock with lruvec lock	2020-12-15 14:48:04 -08:00
mprotect.c	mm/mprotect.c: optimize error detection in do_mprotect_pkey()	2021-02-24 13:38:30 -08:00
mremap.c	Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"	2021-04-30 11:20:39 -07:00
msync.c	mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start	2021-04-30 11:20:37 -07:00
nommu.c	mm/nommu: Fix return type of filemap_map_pages()	2021-01-28 14:10:31 +00:00
oom_kill.c	mm: eliminate "expecting prototype" kernel-doc warnings	2021-04-16 16:10:36 -07:00
page_alloc.c	mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks	2021-05-05 11:27:23 -07:00
page_counter.c	mm: page_counter: mitigate consequences of a page_counter underflow	2021-04-30 11:20:38 -07:00
page_ext.c	mm: fix some spelling mistakes in comments	2020-12-15 22:46:19 -08:00
page_idle.c	mm: page_idle_get_page() does not need lru_lock	2020-12-15 14:48:03 -08:00
page_io.c	swap: fix swapfile read/write offset	2021-03-02 17:25:46 -07:00
page_isolation.c	mm/page_isolation: do not isolate the max order page	2020-12-15 12:13:45 -08:00
page_owner.c	mm: page_owner: detect page_owner recursion via task_struct	2021-04-30 11:20:36 -07:00
page_poison.c	mm: page_poison: print page info when corruption is caught	2021-04-30 11:20:36 -07:00
page_reporting.c	mm/page_reporting: use list_entry_is_head() in page_reporting_cycle()	2021-02-24 13:38:30 -08:00
page_reporting.h
page_vma_mapped.c	mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte	2020-12-15 12:13:41 -08:00
page-writeback.c	mm: page-writeback: simplify memcg handling in test_clear_page_writeback()	2021-04-30 11:20:37 -07:00
pagewalk.c
percpu-internal.h	percpu: make pcpu_nr_empty_pop_pages per chunk type	2021-04-09 13:58:38 +00:00
percpu-km.c
percpu-stats.c	percpu: make pcpu_nr_empty_pop_pages per chunk type	2021-04-09 13:58:38 +00:00
percpu-vm.c	mm/vmalloc: remove unmap_kernel_range	2021-04-30 11:20:40 -07:00
percpu.c	percpu: make pcpu_nr_empty_pop_pages per chunk type	2021-04-09 13:58:38 +00:00
pgalloc-track.h
pgtable-generic.c	mm/pgtable-generic.c: optimize the VM_BUG_ON condition in pmdp_huge_clear_flush()	2021-02-24 13:38:30 -08:00
process_vm_access.c	mm/process_vm_access.c: include compat.h	2021-01-12 18:12:54 -08:00
ptdump.c	mm: ptdump: fix build failure	2021-04-16 16:10:37 -07:00
readahead.c	mm: Implement readahead_control pageset expansion	2021-04-23 10:14:29 +01:00
rmap.c	mm/rmap: correct obsolete comment of page_get_anon_vma()	2021-02-26 09:41:01 -08:00
rodata_test.c
shmem.c	shmem: allow reporting fanotify events with file handles on tmpfs	2021-04-19 16:03:48 +02:00
shuffle.c	mm: eliminate "expecting prototype" kernel-doc warnings	2021-04-16 16:10:36 -07:00
shuffle.h
slab_common.c	mm/slab_common: provide "slab_merge" option for !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT) builds	2021-04-30 11:20:36 -07:00
slab.c	kasan, mm: integrate slab init_on_free with HW_TAGS	2021-04-30 11:20:41 -07:00
slab.h	kasan, mm: integrate slab init_on_alloc with HW_TAGS	2021-04-30 11:20:41 -07:00
slob.c	mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels	2021-03-08 14:18:46 -08:00
slub.c	kasan, mm: integrate slab init_on_free with HW_TAGS	2021-04-30 11:20:41 -07:00
sparse-vmemmap.c
sparse.c	mm/sparse: add the missing sparse_buffer_fini() in error branch	2021-04-30 11:20:39 -07:00
swap_cgroup.c
swap_slots.c	mm/swap_slots.c: remove redundant NULL check	2021-02-24 13:38:28 -08:00
swap_state.c	mm: stop accounting shadow entries	2021-05-05 11:27:19 -07:00
swap.c	mm: remove pagevec_lookup_entries	2021-02-26 09:40:59 -08:00
swapfile.c	swap: fix swapfile read/write offset	2021-03-02 17:25:46 -07:00
truncate.c	mm: stop accounting shadow entries	2021-05-05 11:27:19 -07:00
usercopy.c
userfaultfd.c	userfaultfd: add UFFDIO_CONTINUE ioctl	2021-05-05 11:27:22 -07:00
util.c	mm: move page_mapping_file to pagemap.h	2021-04-30 11:20:37 -07:00
vmacache.c
vmalloc.c	mm/vmalloc: remove an empty line	2021-04-30 11:20:40 -07:00
vmpressure.c
vmscan.c	mm: vmscan: add per memcg shrinker nr_deferred	2021-05-05 11:27:23 -07:00
vmstat.c	mm/vmstat.c: erase latency in vmstat_shepherd	2021-02-26 09:41:00 -08:00
workingset.c	mm: stop accounting shadow entries	2021-05-05 11:27:19 -07:00
z3fold.c	z3fold: prevent reclaim/free race for headless pages	2021-03-25 09:22:55 -07:00
zbud.c	mm: set the sleep_mapped to true for zbud and z3fold	2021-02-26 09:41:01 -08:00
zpool.c	mm/zswap: add the flag can_sleep_mapped	2021-02-26 09:41:01 -08:00
zsmalloc.c	mm/zsmalloc.c: use page_private() to access page->private	2021-02-26 09:41:01 -08:00
zswap.c	mm/zswap: add the flag can_sleep_mapped	2021-02-26 09:41:01 -08:00