2b4f3b4987
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking while
forking a child process.
Patch 1/2 in the series implements proper VMA locking during fork. I
tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Patch 2/2 disables per-VMA locks until the fix is tested and verified.
This patch (of 2):
When forking a child process, parent write-protects an anonymous page and
COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to some
stale writable TLB entries targeting the wrong (old) page. Similar issue
happened in the past with userfaultfd (see flush_tlb_page() call inside
do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue. This
fix can potentially regress some fork-heavy workloads. Kernel build time
did not show noticeable regression on a 56-core machine while a stress
test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7%
regression. If such fork time regression is unacceptable, disabling
CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations
are possible if this regression proves to be problematic.
Link: https://lkml.kernel.org/r/20230706011400.2949242-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-2-surenb@google.com
Fixes:
|
||
---|---|---|
arch | ||
block | ||
certs | ||
crypto | ||
Documentation | ||
drivers | ||
fs | ||
include | ||
init | ||
io_uring | ||
ipc | ||
kernel | ||
lib | ||
LICENSES | ||
mm | ||
net | ||
rust | ||
samples | ||
scripts | ||
security | ||
sound | ||
tools | ||
usr | ||
virt | ||
.clang-format | ||
.cocciconfig | ||
.get_maintainer.ignore | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
.rustfmt.toml | ||
COPYING | ||
CREDITS | ||
Kbuild | ||
Kconfig | ||
MAINTAINERS | ||
Makefile | ||
README |
Linux kernel ============ There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/ There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.