linux/fs/ext4
Dmitry Monakhov 28a535f9a0 ext4: completed_io locking cleanup
Current unwritten extent conversion state-machine is very fuzzy.
- For unknown reason it performs conversion under i_mutex. What for?
  My diagnosis:
  We already protect extent tree with i_data_sem, truncate and punch_hole
  should wait for DIO, so the only data we have to protect is end_io->flags
  modification, but only flush_completed_IO and end_io_work modified this
  flags and we can serialize them via i_completed_io_lock.

  Currently all these games with mutex_trylock result in the following deadlock
   truncate:                          kworker:
    ext4_setattr                       ext4_end_io_work
    mutex_lock(i_mutex)
    inode_dio_wait(inode)  ->BLOCK
                             DEADLOCK<- mutex_trylock()
                                        inode_dio_done()
  #TEST_CASE1_BEGIN
  MNT=/mnt_scrach
  unlink $MNT/file
  fallocate -l $((1024*1024*1024)) $MNT/file
  aio-stress -I 100000 -O -s 100m -n -t 1 -c 10 -o 2 -o 3 $MNT/file
  sleep 2
  truncate -s 0 $MNT/file
  #TEST_CASE1_END

Or use 286's xfstests https://github.com/dmonakhov/xfstests/blob/devel/286

This patch makes state machine simple and clean:

(1) xxx_end_io schedule final extent conversion simply by calling
    ext4_add_complete_io(), which append it to ei->i_completed_io_list
    NOTE1: because of (2A) work should be queued only if
    ->i_completed_io_list was empty, otherwise the work is scheduled already.

(2) ext4_flush_completed_IO is responsible for handling all pending
    end_io from ei->i_completed_io_list
    Flushing sequence consists of following stages:
    A) LOCKED: Atomically drain completed_io_list to local_list
    B) Perform extents conversion
    C) LOCKED: move converted io's to to_free list for final deletion
       	     This logic depends on context which we was called from.
    D) Final end_io context destruction
    NOTE1: i_mutex is no longer required because end_io->flags modification
    is protected by ei->ext4_complete_io_lock

Full list of changes:
- Move all completion end_io related routines to page-io.c in order to improve
  logic locality
- Move open coded logic from various xx_end_xx routines to ext4_add_complete_io()
- remove EXT4_IO_END_FSYNC
- Improve SMP scalability by removing useless i_mutex which does not
  protect io->flags anymore.
- Reduce lock contention on i_completed_io_lock by optimizing list walk.
- Rename ext4_end_io_nolock to end4_end_io and make it static
- Check flush completion status to ext4_ext_punch_hole(). Because it is
  not good idea to punch blocks from corrupted inode.

Changes since V3 (in request to Jan's comments):
  Fall back to active flush_completed_IO() approach in order to prevent
  performance issues with nolocked DIO reads.
Changes since V2:
  Fix use-after-free caused by race truncate vs end_io_work

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-29 00:14:55 -04:00
..
acl.c switch posix_acl_equiv_mode() to umode_t * 2011-08-01 02:10:06 -04:00
acl.h fs: take the ACL checks to common code 2011-07-25 14:30:23 -04:00
balloc.c ext4: don't call ext4_error while block group is locked 2012-08-17 09:06:06 -04:00
bitmap.c ext4: don't call ext4_error while block group is locked 2012-08-17 09:06:06 -04:00
block_validity.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
dir.c ext4: use core vfs llseek code for dir seeks 2012-07-23 00:00:28 +04:00
ext4_extents.h ext4: verify and calculate checksums for extent tree blocks 2012-04-29 18:37:10 -04:00
ext4_jbd2.c ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() 2012-07-22 20:37:31 -04:00
ext4_jbd2.h ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() 2012-07-22 20:37:31 -04:00
ext4.h ext4: completed_io locking cleanup 2012-09-29 00:14:55 -04:00
extents.c ext4: completed_io locking cleanup 2012-09-29 00:14:55 -04:00
file.c ext4: give i_aiodio_unwritten a more appropriate name 2012-09-28 23:24:52 -04:00
fsync.c ext4: completed_io locking cleanup 2012-09-29 00:14:55 -04:00
hash.c ext4: return 32/64-bit dir name hash according to usage type 2012-03-18 22:44:40 -04:00
ialloc.c ext4: check free inode count before allocating an inode 2012-09-23 23:16:03 -04:00
indirect.c ext4: completed_io locking cleanup 2012-09-29 00:14:55 -04:00
inode.c ext4: completed_io locking cleanup 2012-09-29 00:14:55 -04:00
ioctl.c ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails 2012-09-26 22:58:50 -04:00
Kconfig ext4: load the crc32c driver if necessary 2012-04-29 18:27:10 -04:00
Makefile ext4: move ext4_ind_* functions from inode.c to indirect.c 2011-06-27 19:40:50 -04:00
mballoc.c ext4: enable FITRIM ioctl on bigalloc file system 2012-09-26 22:21:21 -04:00
mballoc.h ext4: remove unused macro MB_DEFAULT_MAX_GROUPS_TO_SCAN 2012-08-17 10:00:17 -04:00
migrate.c userns: Convert ext4 to user kuid/kgid where appropriate 2012-05-15 14:59:27 -07:00
mmp.c ext4: Convert to new freezing mechanism 2012-07-31 09:45:48 +04:00
move_extent.c ext4: convert to use leXX_add_cpu() 2012-09-27 09:37:53 -04:00
namei.c ext4: ext4_bread usage audit 2012-09-27 09:31:33 -04:00
page-io.c ext4: completed_io locking cleanup 2012-09-29 00:14:55 -04:00
resize.c ext4: don't call update_backups() multiple times for the same bg 2012-09-26 00:08:57 -04:00
super.c ext4: give i_aiodio_unwritten a more appropriate name 2012-09-28 23:24:52 -04:00
symlink.c
truncate.h ext4: move common truncate functions to header file 2011-06-27 19:16:04 -04:00
xattr_security.c Merge branch 'for_linus' into for_linus_merged 2012-01-10 11:54:07 -05:00
xattr_trusted.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
xattr_user.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
xattr.c ext4: use s_csum_seed instead of i_csum_seed for xattr block 2012-07-09 16:29:27 -04:00
xattr.h ext4: change on-disk layout to support extended metadata checksumming 2012-04-29 18:23:10 -04:00