Go to file
Shakeel Butt 1f828223b7 memcg: flush lruvec stats in the refault
Prior to the commit 7e1c0d6f58 ("memcg: switch lruvec stats to rstat")
and the commit aa48e47e39 ("memcg: infrastructure to flush memcg
stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
32) at worst and for unbounded amount of time.  The commit aa48e47e39
moved the lruvec stats to rstat infrastructure and the commit
7e1c0d6f58 bounded the error for all the lruvec stats to (nr_cpus *
32) at worst for at most 2 seconds.  More specifically it decoupled the
number of stats and the number of cgroups from the error rate.

However this reduction in error comes with the cost of triggering the
slowpath of stats update more frequently.  Previously in the slowpath
the kernel adds the stats up the memcg tree.  After aa48e47e39, the
kernel triggers the asyn lruvec stats flush through queue_work().  This
causes regression reports from 0day kernel bot [1] as well as from
phoronix test suite [2].

We tried two options to fix the regression:

 1) Increase the threshold to trigger the slowpath in lruvec stats
    update codepath from 32 to 512.

 2) Remove the slowpath from lruvec stats update codepath and instead
    flush the stats in the page refault codepath. The assumption is that
    the kernel timely flush the stats, so, the update tree would be
    small in the refault codepath to not cause the preformance impact.

Following are the results of will-it-scale/page_fault[1|2|3] benchmark
on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
aa48e47e39 and 7e1c0d6f58 reverted (3) 5.15-rc1 with option-1
(4) 5.15-rc1 with option-2.

  test       (1)      (2)               (3)               (4)
  pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
  pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
  pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)

From the above result, it seems like the option-2 not only solves the
regression but also improves the performance for at least these
benchmarks.

Feng Tang (intel) ran the aim7 benchmark with these two options and
confirms that option-1 reduces the regression but option-2 removes the
regression.

Michael Larabel (phoronix) ran multiple benchmarks with these options
and reported the results at [3] and it shows for most benchmarks
option-2 removes the regression introduced by the commit aa48e47e39
("memcg: infrastructure to flush memcg stats").

Based on the experiment results, this patch proposed the option-2 as the
solution to resolve the regression.

Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
Fixes: aa48e47e39 ("memcg: infrastructure to flush memcg stats")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Tested-by: Michael Larabel <Michael@phoronix.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>,
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-23 10:09:13 -07:00
arch s390 eBPF JIT miscompilation issues fixes. 2021-09-21 09:36:11 -07:00
block blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd 2021-09-15 12:03:18 -06:00
certs certs: Add support for using elliptic curve keys for signing modules 2021-08-23 19:55:42 +03:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2021-08-30 12:57:10 -07:00
Documentation Devicetree fixes for v5.15, take 2: 2021-09-18 12:40:55 -07:00
drivers spi: Fix modalias issues 2021-09-22 11:58:24 -07:00
fs Critical bug fixes: 2021-09-22 09:21:02 -07:00
include AFS fixes 2021-09-20 15:49:02 -07:00
init init: Revert accidental changes to print irqs_disabled() 2021-09-22 13:02:30 -07:00
ipc ipc: remove memcg accounting for sops objects in do_semtimedop() 2021-09-14 10:22:11 -07:00
kernel A single fix for the perf core where a value read with READ_ONCE() was 2021-09-19 13:22:40 -07:00
lib pci_iounmap'2: Electric Boogaloo: try to make sense of it all 2021-09-19 17:13:35 -07:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm memcg: flush lruvec stats in the refault 2021-09-23 10:09:13 -07:00
net Networking fixes for 5.15-rc2, including fixes from bpf. 2021-09-16 13:05:42 -07:00
samples kgdb patches for 5.15 2021-09-07 12:08:04 -07:00
scripts Kbuild fixes for v5.15 2021-09-19 12:55:12 -07:00
security Kbuild updates for v5.15 2021-09-03 15:33:47 -07:00
sound sound fixes for 5.15-rc1 2021-09-09 16:05:10 -07:00
tools powerpc fixes for 5.15 #2 2021-09-19 13:00:23 -07:00
usr .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
virt KVM: Drop unused kvm_dirty_gfn_invalid() 2021-09-06 08:23:46 -04:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: update email address of Matthias Fuchs and Thomas Körper 2021-08-19 09:39:44 +02:00
COPYING
CREDITS MAINTAINERS: move Murali Karicheri to credits 2021-04-29 15:47:30 -07:00
Kbuild
Kconfig
MAINTAINERS MAINTAINERS: Update Xen-[PCI,SWIOTLB,Block] maintainership 2021-09-22 12:49:48 -07:00
Makefile Linux 5.15-rc2 2021-09-19 17:28:22 -07:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.