Go to file
Darrick J. Wong cb357bf3d1 xfs: implement per-inode writeback completion queues
When scheduling writeback of dirty file data in the page cache, XFS uses
IO completion workqueue items to ensure that filesystem metadata only
updates after the write completes successfully.  This is essential for
converting unwritten extents to real extents at the right time and
performing COW remappings.

Unfortunately, XFS queues each IO completion work item to an unbounded
workqueue, which means that the kernel can spawn dozens of threads to
try to handle the items quickly.  These threads need to take the ILOCK
to update file metadata, which results in heavy ILOCK contention if a
large number of the work items target a single file, which is
inefficient.

Worse yet, the writeback completion threads get stuck waiting for the
ILOCK while holding transaction reservations, which can use up all
available log reservation space.  When that happens, metadata updates to
other parts of the filesystem grind to a halt, even if the filesystem
could otherwise have handled it.

Even worse, if one of the things grinding to a halt happens to be a
thread in the middle of a defer-ops finish holding the same ILOCK and
trying to obtain more log reservation having exhausted the permanent
reservation, we now have an ABBA deadlock - writeback completion has a
transaction reserved and wants the ILOCK, and someone else has the ILOCK
and wants a transaction reservation.

Therefore, we create a per-inode writeback io completion queue + work
item.  When writeback finishes, it can add the ioend to the per-inode
queue and let the single worker item process that queue.  This
dramatically cuts down on the number of kworkers and ILOCK contention in
the system, and seems to have eliminated an occasional deadlock I was
seeing while running generic/476.

Testing with a program that simulates a heavy random-write workload to a
single file demonstrates that the number of kworkers drops from
approximately 120 threads per file to 1, without dramatically changing
write bandwidth or pagecache access latency.

Note that we leave the xfs-conv workqueue's max_active alone because we
still want to be able to run ioend processing for as many inodes as the
system can handle.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-04-16 10:01:57 -07:00
arch powerpc fixes for 5.1 #5 2019-04-13 09:03:09 -07:00
block block: do not leak memory in bio_copy_user_iov() 2019-04-10 16:14:40 -06:00
certs kexec, KEYS: Make use of platform keyring for signature verify 2019-02-04 17:34:07 -05:00
crypto lib/lzo: separate lzo-rle from lzo 2019-03-07 18:32:03 -08:00
Documentation ARM: SoC fixes 2019-04-07 13:46:17 -10:00
drivers for-linus-20190412 2019-04-13 16:23:16 -07:00
fs xfs: implement per-inode writeback completion queues 2019-04-16 10:01:57 -07:00
include Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
init init/main: add checks for the return value of memblock_alloc*() 2019-03-12 10:04:02 -07:00
ipc Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
kernel Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
lib Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-04-09 16:20:59 -10:00
LICENSES LICENSES: Add GCC runtime library exception text 2019-01-16 14:54:15 -07:00
mm Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
net Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping" 2019-04-11 15:41:14 -04:00
samples Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-11 08:54:01 -07:00
scripts fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-04-06 07:01:55 -10:00
security apparmor: Restore Y/N in /sys for apparmor's "enabled" 2019-04-10 04:24:48 -07:00
sound ASoC: Fixes for v5.1 2019-04-11 14:36:30 +02:00
tools for-linus-20190412 2019-04-13 16:23:16 -07:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt KVM/ARM fixes for 5.1 2019-03-28 19:07:30 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2019-04-12 12:49:54 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: Add support for DT binding schema checks 2018-12-13 09:41:32 -06:00
.mailmap Update Nicolas Pitre's email address 2019-04-02 18:12:44 -10:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS Char/Misc driver patches for 5.1-rc1 2019-03-06 14:18:59 -08:00
Kbuild Kbuild updates for v5.1 2019-03-10 17:48:21 -07:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS virtio: fixes, reviewers 2019-04-10 06:42:51 -10:00
Makefile Linux 5.1-rc5 2019-04-14 15:17:41 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.