linux

iv/linux

History

Johannes Weiner 815744d751 mm: memcontrol: don't batch updates of local VM stats and events The kernel test robot noticed a 26% will-it-scale pagefault regression from commit `42a3003535` ("mm: memcontrol: fix recursive statistics correctness & scalabilty"). This appears to be caused by bouncing the additional cachelines from the new hierarchical statistics counters. We can fix this by getting rid of the batched local counters instead. Originally, there were only group-local counters, and they were fully maintained per cpu. A reader of a stats file high up in the cgroup tree would have to walk the entire subtree and collect each level's per-cpu counters to get the recursive view. This was prohibitively expensive, and so we switched to per-cpu batched updates of the local counters during `a983b5ebee` ("mm: memcontrol: fix excessive complexity in memory.stat reporting"), reducing the complexity from nr_subgroups * nr_cpus to nr_subgroups. With growing machines and cgroup trees, the tree walk itself became too expensive for monitoring top-level groups, and this is when the culprit patch added hierarchy counters on each cgroup level. When the per-cpu batch size would be reached, both the local and the hierarchy counters would get batch-updated from the per-cpu delta simultaneously. This makes local and hierarchical counter reads blazingly fast, but it unfortunately makes the write-side too cache line intense. Since local counter reads were never a problem - we only centralized them to accelerate the hierarchy walk - and use of the local counters are becoming rarer due to replacement with hierarchical views (ongoing rework in the page reclaim and workingset code), we can make those local counters unbatched per-cpu counters again. The scheme will then be as such: when a memcg statistic changes, the writer will: - update the local counter (per-cpu) - update the batch counter (per-cpu). If the batch is full: - spill the batch into the group's atomic_t - spill the batch into all ancestors' atomic_ts - empty out the batch counter (per-cpu) when a local memcg counter is read, the reader will: - collect the local counter from all cpus when a hiearchy memcg counter is read, the reader will: - read the atomic_t We might be able to simplify this further and make the recursive counters unbatched per-cpu counters as well (batch upward propagation, but leave per-cpu collection to the readers), but that will require a more in-depth analysis and testing of all the callsites. Deal with the immediate regression for now. Link: http://lkml.kernel.org/r/20190521151647.GB2870@cmpxchg.org Fixes: `42a3003535` ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: kernel test robot <rong.a.chen@intel.com> Tested-by: kernel test robot <rong.a.chen@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2019-06-13 17:34:56 -10:00
..
kasan	kasan: initialize tag to 0xff in __kasan_kmalloc	2019-06-01 15:51:31 -07:00
backing-dev.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
balloon_compaction.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
cleancache.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
cma_debug.c	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()	2019-05-14 09:47:45 -07:00
cma.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98	2019-05-24 17:37:54 +02:00
cma.h
compaction.c	mm, compaction: make sure we isolate a valid PFN	2019-06-01 15:51:32 -07:00
debug_page_ref.c
debug.c	mm: update references to page _refcount	2019-05-14 19:52:47 -07:00
dmapool.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 403	2019-06-05 17:37:13 +02:00
early_ioremap.c
fadvise.c	vfs: implement readahead(2) using POSIX_FADV_WILLNEED	2018-08-30 20:01:32 +02:00
failslab.c	mm: no need to check return value of debugfs_create functions	2019-03-05 21:07:17 -08:00
filemap.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
frame_vector.c
frontswap.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
gup_benchmark.c	mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM	2019-05-14 09:47:45 -07:00
gup.c	mm/gup: continue VM_FAULT_RETRY processing even for pre-faults	2019-06-01 15:51:31 -07:00
highmem.c	mm: convert totalram_pages and totalhigh_pages variables to atomic	2018-12-28 12:11:47 -08:00
hmm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00
huge_memory.c	mm/huge_memory.c: make __thp_get_unmapped_area static	2019-05-14 09:47:51 -07:00
hugetlb_cgroup.c	mm: rename page_counter's count/limit into usage/max	2018-06-07 17:34:35 -07:00
hugetlb.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
hwpoison-inject.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
init-mm.c	mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids	2018-07-17 09:35:30 +02:00
internal.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
interval_tree.c
Kconfig	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
Kconfig.debug	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
khugepaged.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
kmemleak-test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333	2019-06-05 17:37:06 +02:00
kmemleak.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333	2019-06-05 17:37:06 +02:00
ksm.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
list_lru.c	memcg: make it work on sparse non-0-node systems	2019-06-01 15:51:31 -07:00
maccess.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
madvise.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
Makefile	mm: shuffle initial free memory to improve memory-side-cache utilization	2019-05-14 19:52:48 -07:00
memblock.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
memcontrol.c	mm: memcontrol: don't batch updates of local VM stats and events	2019-06-13 17:34:56 -10:00
memfd.c	mm: page cache: store only head pages in i_pages	2019-05-14 09:47:45 -07:00
memory_hotplug.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
memory-failure.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 263	2019-06-05 17:30:28 +02:00
memory.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mempolicy.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225	2019-05-30 11:29:56 -07:00
mempool.c	docs/core-api/mm: fix return value descriptions in mm/	2019-03-05 21:07:20 -08:00
memtest.c
migrate.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
mincore.c	mm/mincore.c: make mincore() more conservative	2019-05-14 19:52:48 -07:00
mlock.c	mm: remove zone_lru_lock() function, access ->lru_lock directly	2019-03-05 21:07:21 -08:00
mm_init.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mmap.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mmu_context.c
mmu_gather.c	asm-generic/tlb: Remove tlb_table_flush()	2019-04-03 10:33:02 +02:00
mmu_notifier.c	mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper	2019-05-14 09:47:49 -07:00
mmzone.c
mprotect.c	mm/mprotect.c: fix compilation warning because of unused 'mm' variable	2019-05-14 09:47:51 -07:00
mremap.c	mm/mmu_notifier: contextual information for event triggering invalidation	2019-05-14 09:47:49 -07:00
msync.c
nommu.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
oom_kill.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
page_alloc.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
page_counter.c	memcg: introduce memory.min	2018-06-07 17:34:36 -07:00
page_ext.c	memblock: drop memblock_alloc_*_nopanic() variants	2019-03-12 10:04:02 -07:00
page_idle.c	mm: remove zone_lru_lock() function, access ->lru_lock directly	2019-03-05 21:07:21 -08:00
page_io.c	mm/page_io.c: fix polled swap page in	2019-01-04 13:13:48 -08:00
page_isolation.c	mm/page_isolation.c: remove redundant pfn_valid_within() in __first_valid_page()	2019-05-14 09:47:46 -07:00
page_owner.c	mm/page_owner: Simplify stack trace handling	2019-04-29 12:37:50 +02:00
page_poison.c	page_poison: play nicely with KASAN	2019-03-05 21:07:13 -08:00
page_vma_mapped.c	mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly	2018-10-31 08:54:11 -07:00
page-writeback.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
pagewalk.c
percpu-internal.h	percpu: convert chunk hints to be based on pcpu_block_md	2019-03-13 12:25:31 -07:00
percpu-km.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-stats.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-vm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
pgtable-generic.c	x86/mm: Page size aware flush_tlb_mm_range()	2018-10-09 16:51:11 +02:00
process_vm_access.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
quicklist.c
readahead.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
rmap.c	mm/rmap.c: use the pra.mapcount to do the check	2019-05-14 09:47:49 -07:00
rodata_test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
shmem.c	mm: page cache: store only head pages in i_pages	2019-05-14 09:47:45 -07:00
shuffle.c	mm: maintain randomization of page free lists	2019-05-14 19:52:48 -07:00
shuffle.h	mm: maintain randomization of page free lists	2019-05-14 19:52:48 -07:00
slab_common.c	mm: add support for kmem caches in DMA32 zone	2019-03-29 10:01:37 -07:00
slab.c	slab: remove /proc/slab_allocators	2019-05-16 15:51:55 -07:00
slab.h	mm: add support for kmem caches in DMA32 zone	2019-03-29 10:01:37 -07:00
slob.c	slob: use slab_list instead of lru	2019-05-14 09:47:44 -07:00
slub.c	mm/slub.c: update the comment about slab frozen	2019-05-14 09:47:45 -07:00
sparse-vmemmap.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
sparse.c	mm/sparse.c: clean up obsolete code comment	2019-05-14 09:47:48 -07:00
swap_cgroup.c
swap_slots.c	mm, swap, get_swap_pages: use entry_size instead of cluster in parameter	2018-08-22 10:52:44 -07:00
swap_state.c	mm: page cache: store only head pages in i_pages	2019-05-14 09:47:45 -07:00
swap.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
swapfile.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
truncate.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
usercopy.c	mm/usercopy.c: no check page span for stack objects	2019-01-08 17:15:11 -08:00
userfaultfd.c	hugetlb: use same fault hash key for shared and private mappings	2019-05-14 09:47:48 -07:00
util.c	prctl_set_mm: downgrade mmap_sem to read lock	2019-06-01 15:51:31 -07:00
vmacache.c	mm: get rid of vmacache_flush_all() entirely	2018-09-13 15:18:04 -10:00
vmalloc.c	mm/vmalloc.c: fix typo in comment	2019-06-01 15:51:31 -07:00
vmpressure.c	mm/vmpressure.c: convert to use match_string() helper	2018-06-07 17:34:36 -07:00
vmscan.c	mm: memcontrol: make cgroup stats and events query API explicitly local	2019-05-14 19:52:53 -07:00
vmstat.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
workingset.c	mm: memcontrol: make cgroup stats and events query API explicitly local	2019-05-14 19:52:53 -07:00
z3fold.c	z3fold: fix sheduling while atomic	2019-06-01 15:51:31 -07:00
zbud.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
zpool.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
zsmalloc.c	mm/zsmalloc.c: fix fall-through annotation	2018-10-26 16:26:35 -07:00
zswap.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00