linux/fs
Hans Verkuil 626cf23660 poll: add poll_requested_events() and poll_does_not_wait() functions
In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for.  An example
is when a driver needs to start a DMA engine if the caller polls for
POLLIN, but doesn't want to do that if POLLIN is not requested but instead
only POLLOUT or POLLPRI is requested.  This is something that can happen
in the video4linux subsystem among others.

Unfortunately, the current epoll/poll/select implementation doesn't
provide that information reliably.  The poll_table_struct does have it: it
has a key field with the event mask.  But once a poll() call matches one
or more bits of that mask any following poll() calls are passed a NULL
poll_table pointer.

Also, the eventpoll implementation always left the key field at ~0 instead
of using the requested events mask.

This was changed in eventpoll.c so the key field now contains the actual
events that should be polled for as set by the caller.

The solution to the NULL poll_table pointer is to set the qproc field to
NULL in poll_table once poll() matches the events, not the poll_table
pointer itself.  That way drivers can obtain the mask through a new
poll_requested_events inline.

The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
that case poll_requested_events() returns ~0 (i.e.  all events).

Very rarely drivers might want to know whether poll_wait will actually
wait.  If another earlier file descriptor in the set already matched the
events the caller wanted to wait for, then the kernel will return from the
select() call without waiting.  This might be useful information in order
to avoid doing expensive work.

A new helper function poll_does_not_wait() is added that drivers can use
to detect this situation.  This is now used in sock_poll_wait() in
include/net/sock.h.  This was the only place in the kernel that needed
this information.

Drivers should no longer access any of the poll_table internals, but use
the poll_requested_events() and poll_does_not_wait() access functions
instead.  In order to enforce that the poll_table fields are now prepended
with an underscore and a comment was added warning against using them
directly.

This required a change in unix_dgram_poll() in unix/af_unix.c which used
the key field to get the requested events.  It's been replaced by a call
to poll_requested_events().

For qproc it was especially important to change its name since the
behavior of that field changes with this patch since this function pointer
can now be NULL when that wasn't possible in the past.

Any driver accessing the qproc or key fields directly will now fail to compile.

Some notes regarding the correctness of this patch: the driver's poll()
function is called with a 'struct poll_table_struct *wait' argument.  This
pointer may or may not be NULL, drivers can never rely on it being one or
the other as that depends on whether or not an earlier file descriptor in
the select()'s fdset matched the requested events.

There are only three things a driver can do with the wait argument:

1) obtain the key field:

	events = wait ? wait->key : ~0;

   This will still work although it should be replaced with the new
   poll_requested_events() function (which does exactly the same).
   This will now even work better, since wait is no longer set to NULL
   unnecessarily.

2) use the qproc callback. This could be deadly since qproc can now be
   NULL. Renaming qproc should prevent this from happening. There are no
   kernel drivers that actually access this callback directly, BTW.

3) test whether wait == NULL to determine whether poll would return without
   waiting. This is no longer sufficient as the correct test is now
   wait == NULL || wait->_qproc == NULL.

   However, the worst that can happen here is a slight performance hit in
   the case where wait != NULL and wait->_qproc == NULL. In that case the
   driver will assume that poll_wait() will actually add the fd to the set
   of waiting file descriptors. Of course, poll_wait() will not do that
   since it tests for wait->_qproc. This will not break anything, though.

   There is only one place in the whole kernel where this happens
   (sock_poll_wait() in include/net/sock.h) and that code will be replaced
   by a call to poll_does_not_wait() in the next patch.

   Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
   actually waiting. The next file descriptor from the set might match the
   event mask and thus any possible waits will never happen.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
..
9p 9p: make register_filesystem() the last failure exit 2012-03-20 21:29:45 -04:00
adfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
affs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
afs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
autofs4 autofs: set things up *before* registering fs type 2012-03-20 21:29:46 -04:00
befs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
bfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
cachefiles switch touch_atime to struct path 2012-03-20 21:29:41 -04:00
ceph switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
cifs Merge git://git.samba.org/sfrench/cifs-2.6 2012-03-23 09:07:15 -07:00
coda switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
configfs make configfs_pin_fs() return root dentry on success 2012-03-20 21:29:48 -04:00
cramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
debugfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
devpts Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
dlm dlm for 3.4 2012-03-21 13:54:22 -07:00
ecryptfs ecryptfs: make register_filesystem() the last potential failure exit 2012-03-20 21:29:49 -04:00
efs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
exofs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
exportfs
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
ext3 switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
ext4 ext4: initialization of ext4_li_mtx needs to be done earlier 2012-03-20 22:05:02 -04:00
fat fat: switch to d_make_root() 2012-03-20 21:29:36 -04:00
freevxfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
fscache
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
gfs2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw 2012-03-21 18:00:03 -07:00
hfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
hfsplus hfsplus: add an ioctl to bless files 2012-03-20 21:29:53 -04:00
hostfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
hpfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
hppfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
hugetlbfs Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
isofs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
jbd Power management updates for 3.4 2012-03-21 10:15:51 -07:00
jbd2 Power management updates for 3.4 2012-03-21 10:15:51 -07:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
jfs jfs: mising cleanup on register_filesystem() failure 2012-03-20 21:29:48 -04:00
lockd SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined 2012-03-21 09:31:44 -04:00
logfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
ncpfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
nfs NFS client updates for Linux 3.4 2012-03-23 08:53:47 -07:00
nfs_common
nfsd NFS client updates for Linux 3.4 2012-03-23 08:53:47 -07:00
nilfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
nls NLS: raname "maxlen" to "maxout" in UTF conversion routines 2011-11-26 19:58:47 -08:00
notify fs/notify/notification.c: make subsys_initcall function static 2012-03-23 16:58:31 -07:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
ocfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
omfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
openpromfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
proc Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
pstore One pstore patch 2012-03-23 09:24:07 -07:00
qnx4 qnx4: new helper - try_extent() 2012-03-20 21:29:52 -04:00
qnx6 fs: initial qnx6fs addition 2012-03-20 21:29:38 -04:00
quota Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs 2012-03-23 09:19:22 -07:00
ramfs tidy up after d_make_root() conversion 2012-03-20 21:29:37 -04:00
reiserfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
romfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
squashfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
sysfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
sysv switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
ubifs - Improve error messages 2012-03-23 09:27:40 -07:00
udf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
ufs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
xfs Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs 2012-03-23 09:19:22 -07:00
aio.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
anon_inodes.c anon_inodes: move allocation of anon_inode into ->mount() 2012-03-20 21:29:45 -04:00
attr.c switch is_sxid() to umode_t 2012-01-03 22:55:11 -05:00
bad_inode.c switch ->mknod() to umode_t 2012-01-03 22:54:54 -05:00
binfmt_aout.c take removal of PF_FORKNOEXEC to flush_old_exec() 2012-03-20 21:29:51 -04:00
binfmt_elf_fdpic.c take removal of PF_FORKNOEXEC to flush_old_exec() 2012-03-20 21:29:51 -04:00
binfmt_elf.c take removal of PF_FORKNOEXEC to flush_old_exec() 2012-03-20 21:29:51 -04:00
binfmt_em86.c __register_binfmt() made void 2012-03-20 21:29:46 -04:00
binfmt_flat.c take removal of PF_FORKNOEXEC to flush_old_exec() 2012-03-20 21:29:51 -04:00
binfmt_misc.c magic.h: move some FS magic numbers into magic.h 2012-03-23 16:58:31 -07:00
binfmt_script.c __register_binfmt() made void 2012-03-20 21:29:46 -04:00
binfmt_som.c take removal of PF_FORKNOEXEC to flush_old_exec() 2012-03-20 21:29:51 -04:00
bio-integrity.c fs: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:21 +08:00
bio.c bio: don't overflow in bio_get_nr_vecs() 2012-02-08 22:07:18 +01:00
block_dev.c magic.h: move some FS magic numbers into magic.h 2012-03-23 16:58:31 -07:00
buffer.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
char_dev.c char_dev.c: fix up some whitespace errors 2011-12-13 11:18:17 -08:00
compat_binfmt_elf.c
compat_ioctl.c ppp: Replace uses of <linux/if_ppp.h> with <linux/ppp-ioctl.h> 2012-03-04 20:41:38 -05:00
compat.c vfs: fix compat_sys_stat() handling of overflows in st_nlink 2012-02-13 20:45:39 -05:00
dcache.c fs: fix kernel-doc warnings in dcache.c 2012-03-22 15:49:18 -07:00
dcookies.c
direct-io.c Restore direct_io / truncate locking API 2012-02-23 15:56:21 -08:00
drop_caches.c
eventfd.c
eventpoll.c poll: add poll_requested_events() and poll_does_not_wait() functions 2012-03-23 16:58:38 -07:00
exec.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
fcntl.c
fhandle.c vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb 2012-01-06 23:16:53 -05:00
fifo.c
file_table.c vfs: drop_file_write_access() made static 2012-03-20 21:29:32 -04:00
file.c
filesystems.c vfs: convert fs_supers to hlist 2012-01-03 22:52:39 -05:00
fs_struct.c vfs: take path_get_longterm() out of write_seqcount scope 2012-03-20 21:29:42 -04:00
fs-writeback.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-03-20 21:12:50 -07:00
generic_acl.c
inode.c trim includes in inode.c 2012-03-20 21:29:51 -04:00
internal.h vfs: protect remounting superblock read-only 2012-01-06 23:20:12 -05:00
ioctl.c vfs: fix up ENOIOCTLCMD error handling 2012-01-05 15:40:12 -08:00
ioprio.c block: strip out locking optimization in put_io_context() 2012-02-07 07:51:30 +01:00
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
Kconfig.binfmt fs: binfmt_elf: create Kconfig variable for PIE randomization 2012-01-10 16:30:51 -08:00
libfs.c make simple_pin_fs() pass MS_KERNMOUNT - it's a kernel-internal one 2012-03-20 21:29:44 -04:00
locks.c vfs: fix handling of lock allocation failure in lease-break case 2011-12-26 10:25:26 -08:00
Makefile fs: initial qnx6fs addition 2012-03-20 21:29:38 -04:00
mbcache.c
mount.h vfs: keep list of mounts for each superblock 2012-01-06 23:20:12 -05:00
mpage.c fs: remove unneeded plug in mpage_readpages() 2012-01-12 09:19:54 +01:00
namei.c vfs: tidy up sparse warnings in fs/namei.c 2012-03-22 16:10:40 -07:00
namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-01-08 13:21:22 -08:00
no-block.c
open.c switch security_path_chmod() to struct path * 2012-01-06 23:16:53 -05:00
pipe.c magic.h: move some FS magic numbers into magic.h 2012-03-23 16:58:31 -07:00
pnode.c vfs: switch pnode.h macros to struct mount * 2012-01-03 22:57:11 -05:00
pnode.h vfs: switch pnode.h macros to struct mount * 2012-01-03 22:57:11 -05:00
posix_acl.c
proc_namespace.c vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
read_write.c
read_write.h
readdir.c
select.c poll: add poll_requested_events() and poll_does_not_wait() functions 2012-03-23 16:58:38 -07:00
seq_file.c seq_file: fix mishandling of consecutive pread() invocations. 2012-03-21 17:54:54 -07:00
signalfd.c epoll: ep_unregister_pollwait() can use the freed pwq->whead 2012-02-24 11:42:50 -08:00
splice.c fs: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:21 +08:00
stack.c
stat.c switch touch_atime to struct path 2012-03-20 21:29:41 -04:00
statfs.c vfs: new helper - vfs_ustat() 2012-01-03 22:53:07 -05:00
super.c Cleanups: rename of flush to invalidate, moving reporting of statistics 2012-03-22 19:52:47 -07:00
sync.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
timerfd.c
utimes.c
xattr_acl.c
xattr.c vfs: mnt_drop_write_file() 2012-01-03 22:52:40 -05:00