linux/fs
Tejun Heo 6e736be7f2 block: make ioc get/put interface more conventional and fix race on alloction
Ignoring copy_io() during fork, io_context can be allocated from two
places - current_io_context() and set_task_ioprio().  The former is
always called from local task while the latter can be called from
different task.  The synchornization between them are peculiar and
dubious.

* current_io_context() doesn't grab task_lock() and assumes that if it
  saw %NULL ->io_context, it would stay that way until allocation and
  assignment is complete.  It has smp_wmb() between alloc/init and
  assignment.

* set_task_ioprio() grabs task_lock() for assignment and does
  smp_read_barrier_depends() between "ioc = task->io_context" and "if
  (ioc)".  Unfortunately, this doesn't achieve anything - the latter
  is not a dependent load of the former.  ie, if ioc itself were being
  dereferenced "ioc->xxx", it would mean something (not sure what tho)
  but as the code currently stands, the dependent read barrier is
  noop.

As only one of the the two test-assignment sequences is task_lock()
protected, the task_lock() can't do much about race between the two.
Nothing prevents current_io_context() and set_task_ioprio() allocating
its own ioc for the same task and overwriting the other's.

Also, set_task_ioprio() can race with exiting task and create a new
ioc after exit_io_context() is finished.

ioc get/put doesn't have any reason to be complex.  The only hot path
is accessing the existing ioc of %current, which is simple to achieve
given that ->io_context is never destroyed as long as the task is
alive.  All other paths can happily go through task_lock() like all
other task sub structures without impacting anything.

This patch updates ioc get/put so that it becomes more conventional.

* alloc_io_context() is replaced with get_task_io_context().  This is
  the only interface which can acquire access to ioc of another task.
  On return, the caller has an explicit reference to the object which
  should be put using put_io_context() afterwards.

* The functionality of current_io_context() remains the same but when
  creating a new ioc, it shares the code path with
  get_task_io_context() and always goes through task_lock().

* get_io_context() now means incrementing ref on an ioc which the
  caller already has access to (be that an explicit refcnt or implicit
  %current one).

* PF_EXITING inhibits creation of new io_context and once
  exit_io_context() is finished, it's guaranteed that both ioc
  acquisition functions return %NULL.

* All users are updated.  Most are trivial but
  smp_read_barrier_depends() removal from cfq_get_io_context() needs a
  bit of explanation.  I suppose the original intention was to ensure
  ioc->ioprio is visible when set_task_ioprio() allocates new
  io_context and installs it; however, this wouldn't have worked
  because set_task_ioprio() doesn't have wmb between init and install.
  There are other problems with this which will be fixed in another
  patch.

* While at it, use NUMA_NO_NODE instead of -1 for wildcard node
  specification.

-v2: Vivek spotted contamination from debug patch.  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-12-14 00:33:38 +01:00
..
9p filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
adfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
affs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
afs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
autofs4 filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
befs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
bfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2011-12-08 13:18:59 -08:00
cachefiles
ceph ceph: initialize root dentry 2011-11-11 09:50:17 -08:00
cifs cifs: check for NULL last_entry before calling cifs_save_resume_key 2011-12-08 22:04:47 -06:00
coda filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
configfs doc: fix broken references 2011-09-27 18:08:04 +02:00
cramfs
debugfs debugfs: Fix a comment mistake 2011-08-22 17:41:48 -07:00
devpts filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
dlm Merge branch 'for-3.1' of git://linux-nfs.org/~bfields/linux 2011-07-25 22:49:19 -07:00
ecryptfs eCryptfs: Extend array bounds for all filename chars 2011-11-23 15:43:53 -06:00
efs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
exofs Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
exportfs
ext2 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue 2011-11-02 11:41:01 -07:00
ext3 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue 2011-11-02 11:41:01 -07:00
ext4 ext4: fix racy use-after-free in ext4_end_io_dio() 2011-11-24 19:22:24 -05:00
fat filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
freevxfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
fscache FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop 2011-07-21 10:59:16 -07:00
fuse Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
gfs2 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
hfs hfs: add sanity check for file name length 2011-11-15 14:29:42 -02:00
hfsplus filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
hostfs Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue 2011-11-02 11:41:01 -07:00
hpfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
hppfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
hugetlbfs filesystems: add missing nlink wrappers 2011-11-02 12:53:43 +01:00
isofs Merge branch 'akpm' (Andrew's incoming - part two) 2011-11-02 16:07:27 -07:00
jbd jbd/jbd2: validate sb->s_first in journal_get_superblock() 2011-11-01 19:04:59 -04:00
jbd2 jbd2: Unify log messages in jbd2 code 2011-11-01 19:09:18 -04:00
jffs2 Merge git://git.infradead.org/mtd-2.6 2011-11-07 09:11:16 -08:00
jfs Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
lockd SUNRPC: Replace svc_addr_u by sockaddr_storage 2011-09-14 08:21:48 -04:00
logfs Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
minix minixfs: kill manual hweight(), simplify 2011-11-19 11:13:28 -05:00
ncpfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
nfs Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs 2011-11-22 08:54:15 -08:00
nfs_common
nfsd Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
nilfs2 filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
nls
notify atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
ntfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
ocfs2 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 2011-12-01 14:55:34 -08:00
omfs omfs: fix (mode & S_IFDIR) abuse 2011-07-26 13:05:28 -04:00
openpromfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
partitions treewide: use __printf not __attribute__((format(printf,...))) 2011-10-31 17:30:54 -07:00
proc procfs: do not overflow get_{idle,iowait}_time for nohz 2011-12-09 07:50:29 -08:00
pstore pstore: pass allocated memory region back to caller 2011-11-17 12:58:07 -08:00
qnx4 filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
quota Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux 2011-11-06 19:02:23 -08:00
ramfs ramfs: remove module leftovers 2011-11-02 16:06:58 -07:00
reiserfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
romfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next 2011-11-04 16:48:37 -07:00
sysfs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
sysv filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
ubifs Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 2011-11-07 08:52:19 -08:00
udf Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue 2011-11-02 11:41:01 -07:00
ufs filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
xfs xfs: fix the logspace waiting algorithm 2011-12-06 14:19:47 -06:00
aio.c aio: allocate kiocbs in batches 2011-11-02 16:07:03 -07:00
anon_inodes.c vfs: dont chain pipe/anon/socket on superblock s_inodes list 2011-07-26 12:57:09 -04:00
attr.c Merge branch 'next-evm' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/ima-2.6 into next 2011-08-09 10:31:03 +10:00
bad_inode.c fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers 2011-07-20 20:47:59 -04:00
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c binfmt_elf: fix PIE execution with randomization disabled 2011-11-02 16:06:58 -07:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c filesystems: add missing nlink wrappers 2011-11-02 12:53:43 +01:00
binfmt_script.c
binfmt_som.c
bio-integrity.c fs: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros 2011-10-31 19:30:31 -04:00
bio.c bio: change some signed vars to unsigned 2011-11-16 09:21:50 +01:00
block_dev.c Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-block 2011-11-04 17:22:14 -07:00
buffer.c Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux 2011-11-06 19:02:23 -08:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c compat_ioctl: add compat handler for PPPIOCGL2TPSTATS 2011-08-07 22:24:41 -07:00
compat.c Cross Memory Attach 2011-10-31 17:30:44 -07:00
dcache.c fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API 2011-12-06 23:57:18 -05:00
dcookies.c
direct-io.c direct-io: merge direct_io_walker into __blockdev_direct_IO 2011-10-28 14:58:58 +02:00
drop_caches.c
eventfd.c
eventpoll.c epoll: fix spurious lockdep warnings 2011-10-31 17:30:57 -07:00
exec.c oom: remove oom_disable_count 2011-10-31 17:30:45 -07:00
fcntl.c
fhandle.c
fifo.c
file_table.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
file.c
filesystems.c
fs_struct.c
fs-writeback.c writeback: Add a 'reason' to wb_writeback_work 2011-10-31 00:33:36 +08:00
generic_acl.c switch posix_acl_equiv_mode() to umode_t * 2011-08-01 02:10:06 -04:00
inode.c vfs: protect i_nlink 2011-11-02 12:53:43 +01:00
internal.h superblock: move pin_sb_for_writeback() to fs/super.c 2011-07-20 01:44:38 -04:00
ioctl.c
ioprio.c block: make ioc get/put interface more conventional and fix race on alloction 2011-12-14 00:33:38 +01:00
Kconfig tmpfs: add "tmpfs" to the Kconfig prompt to make it obvious. 2011-10-31 17:30:45 -07:00
Kconfig.binfmt
libfs.c filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
locks.c Merge branch 'for-3.2' of git://linux-nfs.org/~bfields/linux 2011-10-25 15:42:01 +02:00
Makefile fs/Makefile: Stupid typo breakage of exofs inclusion 2011-10-27 08:36:51 +02:00
mbcache.c
mpage.c
namei.c VFS: we need to set LOOKUP_JUMPED on mountpoint crossing 2011-11-07 14:58:06 -08:00
namespace.c fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API 2011-12-06 23:57:18 -05:00
no-block.c
open.c leases: fix write-open/read-lease race 2011-10-28 14:59:00 +02:00
pipe.c fs/pipe.c: add ->statfs callback for pipefs 2011-10-31 17:30:51 -07:00
pnode.c
pnode.h
posix_acl.c vfs: pass all mask flags check_acl and posix_acl_permission 2011-10-28 14:58:54 +02:00
read_write.c Cross Memory Attach 2011-10-31 17:30:44 -07:00
read_write.h
readdir.c
select.c
seq_file.c fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API 2011-12-06 23:57:18 -05:00
signalfd.c
splice.c tmpfs: clone shmem_file_splice_read() 2011-07-25 20:57:11 -07:00
stack.c filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
stat.c readlinkat: ensure we return ENOENT for the empty pathname for normal lookups 2011-11-02 12:53:42 +01:00
statfs.c VFS: fix statfs() automounter semantics regression 2011-11-04 18:15:59 -07:00
super.c vfs: ignore error on forced remount 2011-11-02 12:53:42 +01:00
sync.c writeback: Add a 'reason' to wb_writeback_work 2011-10-31 00:33:36 +08:00
timerfd.c
utimes.c
xattr_acl.c
xattr.c