linux/fs/nilfs2
Ryusuke Konishi 7ef3ff2fea nilfs2: fix deadlock of segment constructor over I_SYNC flag
Nilfs2 eventually hangs in a stress test with fsstress program.  This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():

  nilfs_segctor_thread()
    nilfs_segctor_thread_construct()
      nilfs_segctor_unlock()
        nilfs_dispose_list()
          iput()
            iput_final()
              evict()
                inode_wait_for_writeback()  * wait for I_SYNC flag

  writeback_sb_inodes()
     * set I_SYNC flag on inode->i_state
    __writeback_single_inode()
      do_writepages()
        nilfs_writepages()
          nilfs_construct_dsync_segment()
            nilfs_segctor_sync()
               * wait for completion of segment constructor
    inode_sync_complete()
       * clear I_SYNC flag after __writeback_single_inode() completed

writeback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode->i_state.  do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion.  On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().

Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode->i_nlink has a
non-zero count.  We can prevent evict() from being called in iput() by
implementing sop->drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation.  So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-05 13:35:29 -08:00
..
alloc.c nilfs2: implement calculation of free inodes count 2013-07-03 16:08:01 -07:00
alloc.h nilfs2: implement calculation of free inodes count 2013-07-03 16:08:01 -07:00
bmap.c
bmap.h nilfs2: add omitted comments for different structures in driver implementation 2012-07-30 17:25:19 -07:00
btnode.c nilfs2: use mark_buffer_dirty to mark btnode or meta data dirty 2011-05-10 22:21:57 +09:00
btnode.h nilfs2: add omitted comments for different structures in driver implementation 2012-07-30 17:25:19 -07:00
btree.c nilfs2: fix missing block address termination in btree node shrinking 2011-06-11 15:51:15 +09:00
btree.h
cpfile.c nilfs2: verify metadata sizes read from disk 2014-04-03 16:21:26 -07:00
cpfile.h
dat.c nilfs2: verify metadata sizes read from disk 2014-04-03 16:21:26 -07:00
dat.h
dir.c [readdir] convert nilfs2 2013-06-29 12:56:36 +04:00
direct.c
direct.h
export.h nilfs2: add omitted comments for different structures in driver implementation 2012-07-30 17:25:19 -07:00
file.c nilfs2: avoid duplicate segment construction for fsync() 2014-12-10 17:41:16 -08:00
gcinode.c nilfs2: ensure proper cache clearing for gc-inodes 2012-06-20 14:39:35 -07:00
ifile.c ] nilfs2: use atomic64_t type for inodes_count and blocks_count fields in nilfs_root struct 2013-07-03 16:08:01 -07:00
ifile.h nilfs2: implement calculation of free inodes count 2013-07-03 16:08:01 -07:00
inode.c nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races 2014-12-10 17:41:16 -08:00
ioctl.c nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() 2014-10-14 02:18:20 +02:00
Kconfig fs/nilfs2: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:39:04 -08:00
Makefile nilfs2: integrate sysfs support into driver 2014-08-08 15:57:21 -07:00
mdt.c nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption 2013-04-30 17:04:04 -07:00
mdt.h nilfs2: add omitted comments for different structures in driver implementation 2012-07-30 17:25:19 -07:00
namei.c nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races 2014-12-10 17:41:16 -08:00
nilfs.h nilfs2: fix deadlock of segment constructor over I_SYNC flag 2015-02-05 13:35:29 -08:00
page.c nilfs2: fix issue with race condition of competition between segments for dirty blocks 2013-09-30 14:31:02 -07:00
page.h nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption 2013-04-30 17:04:04 -07:00
recovery.c nilfs2: drop vmtruncate 2012-12-20 18:40:54 -05:00
segbuf.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
segbuf.h
segment.c nilfs2: fix deadlock of segment constructor over I_SYNC flag 2015-02-05 13:35:29 -08:00
segment.h nilfs2: fix deadlock of segment constructor over I_SYNC flag 2015-02-05 13:35:29 -08:00
sufile.c nilfs2: verify metadata sizes read from disk 2014-04-03 16:21:26 -07:00
sufile.h nilfs2: add nilfs_sufile_trim_fs to trim clean segs 2014-04-03 16:21:25 -07:00
super.c nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() 2014-10-14 02:18:20 +02:00
sysfs.c nilfs2: integrate sysfs support into driver 2014-08-08 15:57:21 -07:00
sysfs.h nilfs2: add /sys/fs/nilfs2/<device>/mounted_snapshots/<snapshot> group 2014-08-08 15:57:21 -07:00
the_nilfs.c nilfs2: deletion of an unnecessary check before the function call "iput" 2014-12-10 17:41:16 -08:00
the_nilfs.h nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() 2014-10-14 02:18:20 +02:00