linux/fs/ext4
Jens Axboe fe0f07d08e direct-io: only inc/dec inode->i_dio_count for file systems
do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.

For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:

clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
 | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
 | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
 | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
 | 99.99th=[  165]

After:

clat percentiles (usec):
 |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
 | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
 | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
 | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
 | 99.99th=[  438]

In other setups, Robert Elliott reported seeing good performance
improvements:

https://lkml.org/lkml/2015/4/3/557

The more applications accessing the device, the worse it gets.

Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-24 15:45:28 -04:00
..
acl.c ext2/3/4: use generic posix ACL infrastructure 2014-01-25 23:58:19 -05:00
acl.h ext2/3/4: use generic posix ACL infrastructure 2014-01-25 23:58:19 -05:00
balloc.c ext4: move error report out of atomic context in ext4_init_block_bitmap() 2014-10-13 03:42:12 -04:00
bitmap.c ext4: Replace open coded mdata csum feature to helper function 2014-10-13 03:36:16 -04:00
block_validity.c fs/ext4: use rbtree postorder iteration helper instead of opencoding 2014-01-23 16:37:03 -08:00
dir.c ext4: convert ext4_bread() to use the ERR_PTR convention 2014-08-29 20:52:15 -04:00
ext4_extents.h ext4: teach ext4_ext_find_extent() to realloc path if necessary 2014-09-01 14:40:09 -04:00
ext4_jbd2.c ext4: fix over-defensive complaint after journal abort 2014-10-01 22:23:15 -04:00
ext4_jbd2.h ext4: don't use MAXQUOTAS value 2014-09-11 11:15:15 -04:00
ext4.h direct_IO: use iov_iter_rw() instead of rw everywhere 2015-04-11 22:29:45 -04:00
extents_status.c ext4: introduce aging to extent status tree 2014-11-25 11:55:24 -05:00
extents_status.h ext4: introduce aging to extent status tree 2014-11-25 11:55:24 -05:00
extents.c Revert "ext4: fix suboptimal seek_{data,hole} extents traversial" 2015-01-02 15:16:00 -05:00
file.c mirror O_APPEND and O_DIRECT into iocb->ki_flags 2015-04-11 22:30:22 -04:00
fsync.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
hash.c ext4: reduce one "if" comparison in ext4_dirhash() 2013-02-01 22:33:21 -05:00
ialloc.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
indirect.c direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
inline.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
inode.c direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
ioctl.c ext4: move handling of list of shrinkable inodes into extent status code 2014-11-25 11:49:25 -05:00
Kconfig ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG 2013-04-21 20:32:03 -04:00
Makefile ext4: Remove CONFIG_EXT4_FS_XATTR 2012-12-10 16:30:43 -05:00
mballoc.c ext4: Remove an unnecessary check for NULL before iput() 2014-11-25 20:01:37 -05:00
mballoc.h ext4: remove unused ac_ex_scanned 2014-02-20 13:32:10 -05:00
migrate.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
mmp.c ext4: Replace open coded mdata csum feature to helper function 2014-10-13 03:36:16 -04:00
move_extent.c move_extent_per_page(): get rid of unused w_flags 2014-12-17 06:43:56 -05:00
namei.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
page-io.c fs: move struct kiocb to fs.h 2015-03-25 20:28:11 -04:00
resize.c ext4: prevent online resize with backup superblock 2014-12-26 23:58:21 -05:00
super.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
symlink.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
truncate.h
xattr_security.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xattr_trusted.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xattr_user.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xattr.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xattr.h ext4: each filesystem creates and uses its own mb_cache 2014-03-18 19:24:49 -04:00