linux/fs/ext4
Theodore Ts'o 00d4e7362e ext4: fix potential deadlock in ext4_nonda_switch()
In ext4_nonda_switch(), if the file system is getting full we used to
call writeback_inodes_sb_if_idle().  The problem is that we can be
holding i_mutex already, and this causes a potential deadlock when
writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
lockdep output below).

As it turns out we don't need need to hold s_umount; the fact that we
are in the middle of the write(2) system call will keep the superblock
pinned.  Unfortunately writeback_inodes_sb() checks to make sure
s_umount is taken, and the VFS uses a different mechanism for making
sure the file system doesn't get unmounted out from under us.  The
simplest way of dealing with this is to just simply grab s_umount
using a trylock, and skip kicking the writeback flusher thread in the
very unlikely case that we can't take a read lock on s_umount without
blocking.

Also, we now check the cirteria for kicking the writeback thread
before we decide to whether to fall back to non-delayed writeback, so
if there are any outstanding delayed allocation writes, we try to get
them resolved as soon as possible.

   [ INFO: possible circular locking dependency detected ]
   3.6.0-rc1-00042-gce894ca #367 Not tainted
   -------------------------------------------------------
   dd/8298 is trying to acquire lock:
    (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46

   but task is already holding lock:
    (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3

   which lock already depends on the new lock.

   2 locks held by dd/8298:
    #0:  (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3
    #1:  (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3

   stack backtrace:
   Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
   Call Trace:
    [<c015b79c>] ? console_unlock+0x345/0x372
    [<c06d62a1>] print_circular_bug+0x190/0x19d
    [<c019906c>] __lock_acquire+0x86d/0xb6c
    [<c01999db>] ? mark_held_locks+0x5c/0x7b
    [<c0199724>] lock_acquire+0x66/0xb9
    [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
    [<c06db935>] down_read+0x28/0x58
    [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
    [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
    [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4
    [<c0271ece>] ext4_da_write_begin+0x27/0x193
    [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb
    [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205
    [<c01ddce7>] generic_file_aio_write+0x78/0xd3
    [<c026d336>] ext4_file_write+0x480/0x4a6
    [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c
    [<c0180944>] ? sched_clock_cpu+0x11a/0x13e
    [<c01967e9>] ? trace_hardirqs_off+0xb/0xd
    [<c018099f>] ? local_clock+0x37/0x4e
    [<c0209f2c>] do_sync_write+0x67/0x9d
    [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44
    [<c020a7b9>] vfs_write+0x7b/0xe6
    [<c020a9a6>] sys_write+0x3b/0x64
    [<c06dd4bd>] syscall_call+0x7/0xb

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-09-19 22:42:36 -04:00
..
acl.c switch posix_acl_equiv_mode() to umode_t * 2011-08-01 02:10:06 -04:00
acl.h fs: take the ACL checks to common code 2011-07-25 14:30:23 -04:00
balloc.c ext4: don't call ext4_error while block group is locked 2012-08-17 09:06:06 -04:00
bitmap.c ext4: don't call ext4_error while block group is locked 2012-08-17 09:06:06 -04:00
block_validity.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
dir.c ext4: use core vfs llseek code for dir seeks 2012-07-23 00:00:28 +04:00
ext4_extents.h ext4: verify and calculate checksums for extent tree blocks 2012-04-29 18:37:10 -04:00
ext4_jbd2.c ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() 2012-07-22 20:37:31 -04:00
ext4_jbd2.h ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() 2012-07-22 20:37:31 -04:00
ext4.h ext4: grow the s_group_info array as needed 2012-09-05 01:31:50 -04:00
extents.c ext4: speed up truncate/unlink by not using bforget() unless needed 2012-09-19 14:14:53 -04:00
file.c The usual collection of bug fixes and optimizations. Perhaps of 2012-07-27 20:52:25 -07:00
fsync.c ext4: check return value of blkdev_issue_flush() 2012-08-17 09:58:17 -04:00
hash.c ext4: return 32/64-bit dir name hash according to usage type 2012-03-18 22:44:40 -04:00
ialloc.c ext4: remove useless marking of superblock dirty 2012-07-22 20:29:31 -04:00
indirect.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
inode.c ext4: fix potential deadlock in ext4_nonda_switch() 2012-09-19 22:42:36 -04:00
ioctl.c ext4: add online resizing support for meta_bg and 64-bit file systems 2012-09-05 01:33:50 -04:00
Kconfig ext4: load the crc32c driver if necessary 2012-04-29 18:27:10 -04:00
Makefile ext4: move ext4_ind_* functions from inode.c to indirect.c 2011-06-27 19:40:50 -04:00
mballoc.c ext4: re-enable -o discard functionality in no-journal mode 2012-09-18 13:33:44 -04:00
mballoc.h ext4: remove unused macro MB_DEFAULT_MAX_GROUPS_TO_SCAN 2012-08-17 10:00:17 -04:00
migrate.c userns: Convert ext4 to user kuid/kgid where appropriate 2012-05-15 14:59:27 -07:00
mmp.c ext4: Convert to new freezing mechanism 2012-07-31 09:45:48 +04:00
move_extent.c ext4: add some tracepoints in ext4/extents.c 2011-09-09 19:18:51 -04:00
namei.c ext4: make orphan functions be no-op in no-journal mode 2012-09-18 13:38:59 -04:00
page-io.c Revert "ext4: don't release page refs in ext4_end_bio()" 2012-03-29 17:00:56 -07:00
resize.c ext4: fix online resizing when the # of block groups is constant 2012-09-19 00:55:56 -04:00
super.c ext4: do not enable delalloc by default for ext2 2012-09-17 22:54:36 -04:00
symlink.c
truncate.h ext4: move common truncate functions to header file 2011-06-27 19:16:04 -04:00
xattr_security.c Merge branch 'for_linus' into for_linus_merged 2012-01-10 11:54:07 -05:00
xattr_trusted.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
xattr_user.c ext2/3/4: delete unneeded includes of module.h 2012-01-09 13:52:10 +01:00
xattr.c ext4: use s_csum_seed instead of i_csum_seed for xattr block 2012-07-09 16:29:27 -04:00
xattr.h ext4: change on-disk layout to support extended metadata checksumming 2012-04-29 18:23:10 -04:00