linux

iv/linux

History

Shakeel Butt 1f828223b7 memcg: flush lruvec stats in the refault

Prior to the commit 7e1c0d6f5820 ("memcg: switch lruvec stats to rstat")
and the commit aa48e47e3906 ("memcg: infrastructure to flush memcg
stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
32) at worst and for unbounded amount of time.  The commit aa48e47e3906
moved the lruvec stats to rstat infrastructure and the commit
7e1c0d6f5820 bounded the error for all the lruvec stats to (nr_cpus *
32) at worst for at most 2 seconds.  More specifically it decoupled the
number of stats and the number of cgroups from the error rate.

However this reduction in error comes with the cost of triggering the
slowpath of stats update more frequently.  Previously in the slowpath
the kernel adds the stats up the memcg tree.  After aa48e47e3906, the
kernel triggers the asyn lruvec stats flush through queue_work().  This
causes regression reports from 0day kernel bot [1] as well as from
phoronix test suite [2].

We tried two options to fix the regression:

 1) Increase the threshold to trigger the slowpath in lruvec stats
    update codepath from 32 to 512.

 2) Remove the slowpath from lruvec stats update codepath and instead
    flush the stats in the page refault codepath. The assumption is that
    the kernel timely flush the stats, so, the update tree would be
    small in the refault codepath to not cause the preformance impact.

Following are the results of will-it-scale/page_fault[1|2|3] benchmark
on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
aa48e47e3906 and 7e1c0d6f5820 reverted (3) 5.15-rc1 with option-1
(4) 5.15-rc1 with option-2.

  test       (1)      (2)               (3)               (4)
  pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
  pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
  pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)

From the above result, it seems like the option-2 not only solves the
regression but also improves the performance for at least these
benchmarks.

Feng Tang (intel) ran the aim7 benchmark with these two options and
confirms that option-1 reduces the regression but option-2 removes the
regression.

Michael Larabel (phoronix) ran multiple benchmarks with these options
and reported the results at [3] and it shows for most benchmarks
option-2 removes the regression introduced by the commit aa48e47e3906
("memcg: infrastructure to flush memcg stats").

Based on the experiment results, this patch proposed the option-2 as the
solution to resolve the regression.

Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
Fixes: aa48e47e3906 ("memcg: infrastructure to flush memcg stats")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Tested-by: Michael Larabel <Michael@phoronix.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>,
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2021-09-23 10:09:13 -07:00

damon

mm/damon: add kunit tests

2021-09-08 11:50:25 -07:00

kasan

Merge branch 'akpm' (patches from Andrew)

2021-09-03 10:08:28 -07:00

kfence

Merge branch 'akpm' (patches from Andrew)

2021-09-08 12:55:35 -07:00

backing-dev.c

Merge branch 'akpm' (patches from Andrew)

2021-09-03 10:08:28 -07:00

balloon_compaction.c

mm: fix typos in comments

2021-05-07 00:26:35 -07:00

bootmem_info.c

mm/bootmem_info.c: mark __init on register_page_bootmem_info_section

2021-09-03 09:58:14 -07:00

cleancache.c

…

cma_debug.c

mm/cma: change cma mutex to irq safe spinlock

2021-05-05 11:27:21 -07:00

cma_sysfs.c

mm: cma: support sysfs

2021-05-05 11:27:24 -07:00

cma.c

mm: use proper type for cma_[alloc|release]

2021-05-05 11:27:24 -07:00

cma.h

mm: cma: support sysfs

2021-05-05 11:27:24 -07:00

compaction.c

Merge branch 'akpm' (patches from Andrew)

2021-09-08 12:55:35 -07:00

debug_page_ref.c

…

debug_vm_pgtable.c

mm/debug_vm_pgtable: fix corrupted page flag

2021-09-03 09:58:10 -07:00

debug.c

mm/debug: factor PagePoisoned out of __dump_page

2021-06-29 10:53:53 -07:00

dmapool.c

mm/dmapool: use DEVICE_ATTR_RO macro

2021-06-29 10:53:52 -07:00

early_ioremap.c

mm/early_ioremap.c: remove redundant early_ioremap_shutdown()

2021-09-08 11:50:24 -07:00

fadvise.c

mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED

2020-10-13 18:38:29 -07:00

failslab.c

…

filemap.c

Merge branch 'akpm' (patches from Andrew)

2021-09-03 10:08:28 -07:00

frontswap.c

mm/mempool: minor coding style tweaks

2021-05-05 11:27:27 -07:00

gup_test.c

selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages

2021-05-05 11:27:26 -07:00

gup_test.h

selftests/vm: gup_test: fix test flag

2021-05-05 11:27:26 -07:00

gup.c

Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"

2021-09-07 11:03:45 -07:00

highmem.c

mm: in_irq() cleanup

2021-09-08 11:50:24 -07:00

hmm.c

mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled

2021-09-08 18:45:52 -07:00

huge_memory.c

mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code

2021-09-03 09:58:13 -07:00

hugetlb_cgroup.c

hugetlb: make free_huge_page irq safe

2021-05-05 11:27:22 -07:00

hugetlb_vmemmap.c

mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON

2021-06-30 20:47:26 -07:00

hugetlb_vmemmap.h

mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate

2021-06-30 20:47:25 -07:00

hugetlb.c

mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY

2021-09-03 09:58:17 -07:00

hwpoison-inject.c

mm: hwpoison: don't drop slab caches for offlining non-LRU page

2021-09-03 09:58:15 -07:00

init-mm.c

mm: add setup_initial_init_mm() helper

2021-07-08 11:48:21 -07:00

internal.h

mm/numa: automatically generate node migration order

2021-09-03 09:58:16 -07:00

interval_tree.c

mm/interval_tree: add comments to improve code readability

2021-04-30 11:20:38 -07:00

io-mapping.c

mm: add a io_mapping_map_user helper

2021-04-30 11:20:39 -07:00

ioremap.c

mm: move ioremap_page_range to vmalloc.c

2021-09-08 11:50:24 -07:00

Kconfig

mm/idle_page_tracking: make PG_idle reusable

2021-09-08 11:50:24 -07:00

Kconfig.debug

mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO

2020-12-15 12:13:46 -08:00

khugepaged.c

huge tmpfs: SGP_NOALLOC to stop collapse_file() on race

2021-09-03 09:58:12 -07:00

kmemleak.c

mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp

2021-09-08 18:45:53 -07:00

ksm.c

mm/ksm: remove old GCC 4.9+ check

2021-09-13 10:18:28 -07:00

list_lru.c

mm: vmscan: consolidate shrinker_maps handling code

2021-05-05 11:27:23 -07:00

maccess.c

ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault

2021-08-20 11:39:25 +01:00

madvise.c

Merge branch 'akpm' (patches from Andrew)

2021-09-03 10:08:28 -07:00

Makefile

mm: introduce Data Access MONitor (DAMON)

2021-09-08 11:50:24 -07:00

mapping_dirty_helpers.c

mm/mapping_dirty_helpers: remove double Note in kerneldoc

2021-07-01 11:06:02 -07:00

memblock.c

memblock: introduce saner 'memblock_free_ptr()' interface

2021-09-14 13:23:22 -07:00

memcontrol.c

memcg: flush lruvec stats in the refault

2021-09-23 10:09:13 -07:00

memfd.c

Reimplement RLIMIT_MEMLOCK on top of ucounts

2021-04-30 14:14:02 -05:00

memory_hotplug.c

Merge branch 'akpm' (patches from Andrew)

2021-09-08 12:55:35 -07:00

memory-failure.c

Merge branch 'akpm' (patches from Andrew)

2021-09-03 10:08:28 -07:00

memory.c

afs: Fix mmap coherency vs 3rd-party changes

2021-09-13 09:10:39 +01:00

mempolicy.c

Merge branches 'akpm' and 'akpm-hotfixes' (patches from Andrew)

2021-09-08 18:52:05 -07:00

mempool.c

kasan: use separate (un)poison implementation for integrated init

2021-06-04 19:32:21 +01:00

memremap.c

mm/memory_hotplug: remove nid parameter from arch_remove_memory()

2021-09-08 11:50:23 -07:00

memtest.c

…

migrate.c

compat: remove some compat entry points

2021-09-08 15:32:35 -07:00

mincore.c

inode: make init and permission helpers idmapped mount aware

2021-01-24 14:27:16 +01:00

mlock.c

mm: introduce memfd_secret system call to create "secret" memory areas

2021-07-08 11:48:21 -07:00

mm_init.c

include/linux/page-flags-layout.h: cleanups

2021-04-30 11:20:42 -07:00

mmap_lock.c

mm: mmap_lock: fix disabling preemption directly

2021-07-23 17:43:28 -07:00

mmap.c

Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux

2021-09-04 11:35:47 -07:00

mmu_gather.c

mm: eliminate "expecting prototype" kernel-doc warnings

2021-04-16 16:10:36 -07:00

mmu_notifier.c

mm/mmu_notifiers: ensure range_end() is paired with range_start()

2021-03-25 09:22:55 -07:00

mmzone.c

mm/lru: replace pgdat lru_lock with lruvec lock

2020-12-15 14:48:04 -08:00

mprotect.c

mm: device exclusive memory access

2021-07-01 11:06:03 -07:00

mremap.c

mm/mremap: fix memory account on do_munmap() failure

2021-09-03 09:58:14 -07:00

msync.c

mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start

2021-04-30 11:20:37 -07:00

nommu.c

Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux

2021-09-04 11:35:47 -07:00

oom_kill.c

mm: introduce process_mrelease system call

2021-09-03 09:58:17 -07:00

page_alloc.c

mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype

2021-09-08 18:45:53 -07:00

page_counter.c

mm: page_counter: mitigate consequences of a page_counter underflow

2021-04-30 11:20:38 -07:00

page_ext.c

mm/idle_page_tracking: make PG_idle reusable

2021-09-08 11:50:24 -07:00

page_idle.c

mm/idle_page_tracking: make PG_idle reusable

2021-09-08 11:50:24 -07:00

page_io.c

swap: fix swapfile read/write offset

2021-03-02 17:25:46 -07:00

page_isolation.c

Merge branch 'akpm' (patches from Andrew)

2021-09-08 12:55:35 -07:00

page_owner.c

mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE

2021-09-08 11:50:22 -07:00

page_poison.c

mm: page_poison: print page info when corruption is caught

2021-04-30 11:20:36 -07:00

page_reporting.c

mm/page_reporting: allow driver to specify reporting order

2021-06-29 10:53:47 -07:00

page_reporting.h

mm/page_reporting: export reporting order as module parameter

2021-06-29 10:53:47 -07:00

page_vma_mapped.c

mm: device exclusive memory access

2021-07-01 11:06:03 -07:00

page-writeback.c

Merge branch 'akpm' (patches from Andrew)

2021-09-03 10:08:28 -07:00

pagewalk.c

mm: pagewalk: fix walk for hugepage tables

2021-06-29 10:53:49 -07:00

percpu-internal.h

Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu

2021-07-01 17:17:24 -07:00

percpu-km.c

percpu: flush tlb in pcpu_reclaim_populated()

2021-07-04 18:30:17 +00:00

percpu-stats.c

percpu: rework memcg accounting

2021-06-05 20:43:15 +00:00

percpu-vm.c

percpu: flush tlb in pcpu_reclaim_populated()

2021-07-04 18:30:17 +00:00

percpu.c

Merge branch 'akpm' (patches from Andrew)

2021-09-08 12:55:35 -07:00

pgalloc-track.h

mm: fix typos in comments

2021-05-07 00:26:35 -07:00

pgtable-generic.c

mm/thp: fix __split_huge_pmd_locked() on shmem migration entry

2021-06-16 09:24:42 -07:00

process_vm_access.c

mm/process_vm_access.c: remove duplicate include

2021-05-05 11:27:27 -07:00

ptdump.c

mm: ptdump: fix build failure

2021-04-16 16:10:37 -07:00

readahead.c

mm: Protect operations adding pages to page cache with invalidate_lock

2021-07-13 13:14:27 +02:00

rmap.c

Merge branch 'akpm' (patches from Andrew)

2021-09-08 12:55:35 -07:00

rodata_test.c

mm/rodata_test.c: fix missing function declaration

2020-08-21 09:52:53 -07:00

secretmem.c

mm/secretmem: use refcount_t instead of atomic_t

2021-09-08 11:50:24 -07:00

shmem.c

Merge branch 'akpm' (patches from Andrew)

2021-09-03 10:08:28 -07:00

shuffle.c

mm: eliminate "expecting prototype" kernel-doc warnings

2021-04-16 16:10:36 -07:00

shuffle.h

mm/shuffle: fix section mismatch warning

2021-05-22 15:09:07 -10:00

slab_common.c

mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context

2021-09-04 01:12:23 +02:00

slab.c

mm: fix typos in comments

2021-05-07 00:26:35 -07:00

slab.h

mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()

2021-07-30 10:14:39 -07:00

slob.c

mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels

2021-03-08 14:18:46 -08:00

slub.c

mm, slub: convert kmem_cpu_slab protection to local_lock

2021-09-04 10:22:01 +02:00

sparse-vmemmap.c

mm: sparsemem: split the huge PMD mapping of vmemmap pages

2021-06-30 20:47:26 -07:00

sparse.c

mm: introduce memmap_alloc() to unify memory map allocation

2021-09-03 09:58:15 -07:00

swap_cgroup.c

…

swap_slots.c

mm: Replace deprecated CPU-hotplug functions.

2021-08-28 01:46:17 +02:00

swap_state.c

Revert "mm: swap: check if swap backing device is congested or not"

2021-08-20 11:31:42 -07:00

swap.c

mm: delete unused get_kernel_page()

2021-09-03 09:58:11 -07:00

swapfile.c

mm, memcg: inline swap-related functions to improve disabled memcg config

2021-09-03 09:58:12 -07:00

truncate.c

Merge branch 'akpm' (patches from Andrew)

2021-09-03 10:08:28 -07:00

usercopy.c

mm/usercopy.c: delete duplicated word

2020-08-12 10:57:58 -07:00

userfaultfd.c

userfaultfd: change mmap_changing to atomic

2021-09-03 09:58:16 -07:00

util.c

mm: don't allow oversized kvmalloc() calls

2021-09-02 09:47:01 -07:00

vmacache.c

kernel: better document the use_mm/unuse_mm API contract

2020-06-10 19:14:18 -07:00

vmalloc.c

Merge branch 'akpm' (patches from Andrew)

2021-09-08 12:55:35 -07:00

vmpressure.c

mm/vmpressure: replace vmpressure_to_css() with vmpressure_to_memcg()

2021-09-03 09:58:17 -07:00

vmscan.c

mm,vmscan: fix divide by zero in get_scan_count

2021-09-08 18:45:53 -07:00

vmstat.c

mm/vmstat: protect per cpu variables with preempt disable on RT

2021-09-08 15:32:34 -07:00

workingset.c

memcg: flush lruvec stats in the refault

2021-09-23 10:09:13 -07:00

z3fold.c

mm/z3fold: add kerneldoc fields for z3fold_pool

2021-07-01 11:06:03 -07:00

zbud.c

mm/zbud: add kerneldoc fields for zbud_pool

2021-07-01 11:06:03 -07:00

zpool.c

mm: fix typos in comments

2021-05-07 00:26:35 -07:00

zsmalloc.c

mm/zsmalloc.c: improve readability for async_free_zspage()

2021-07-01 11:06:02 -07:00

zswap.c

mm/zswap.c: fix two bugs in zswap_writeback_entry()

2021-06-30 20:47:31 -07:00