linux

iv/linux

Author	SHA1	Message	Date
Chuck Lever	42691398be	nfsd: Fix race between FREE_STATEID and LOCK When running LTP's nfslock01 test, the Linux client can send a LOCK and a FREE_STATEID request at the same time. The outcome is: Frame 324 R OPEN stateid [2,O] Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64 Frame 115008 R LOCK stateid [1,L] Frame 115012 C WRITE stateid [0,L] offset 672000 len 64 Frame 115016 R WRITE NFS4_OK Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64 Frame 115022 R LOCKU NFS4_OK Frame 115025 C FREE_STATEID stateid [2,L] Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64 Frame 115029 R FREE_STATEID NFS4_OK Frame 115030 R LOCK stateid [3,L] Frame 115034 C WRITE stateid [0,L] offset 672128 len 64 Frame 115038 R WRITE NFS4ERR_BAD_STATEID In other words, the server returns stateid L in a successful LOCK reply, but it has already released it. Subsequent uses of stateid L fail. To address this, protect the generation check in nfsd4_free_stateid with the st_mutex. This should guarantee that only one of two outcomes occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID returns NFS4ERR_LOCKS_HELD. Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Fix-suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-11 15:08:39 -04:00
Josef Bacik	502aa0a5be	nfsd: fix dentry refcounting on create `b44061d0b9` introduced a dentry ref counting bug. Previously we were grabbing one ref to dchild in nfsd_create(), but with the creation of nfsd_create_locked() we have a ref for dchild from the lookup in nfsd_create(), and then another ref in nfsd_create_locked(). The ref from the lookup in nfsd_create() is never dropped and results in dentries still in use at unmount. Signed-off-by: Josef Bacik <jbacik@fb.com> Fixes: `b44061d0b9` "nfsd: reorganize nfsd_create" Reported-by: kernel test robot <xiaolong.ye@intel.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-11 11:42:08 -04:00
Linus Torvalds	a71e36045e	Highlights: Trond made a change to the server's tcp logic that allows a fast client to better take advantage of high bandwidth networks, but may increase the risk that a single client could starve other clients; a new sunrpc.svc_rpc_per_connection_limit parameter should help mitigate this in the (hopefully unlikely) event this becomes a problem in practice. Tom Haynes added a minimal flex-layout pnfs server, which is of no use in production for now--don't build it unless you're doing client testing or further server development. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXo7HNAAoJECebzXlCjuG+zqUP/RxO5jZjBhNI8/ayGdDW/Jnq s0Fu6B+aNRV3GnugmIeI4tWNGnPyERNzFtjLKlnwaasz/oW4qBLqGbNUWC5xKARS erODs0hM/1aCYWwNBEc5qXP2u23HrWVuQ+B5fg42ACyliKFGq5faDRmf6XGU/1kB 8unXGWPAiLiNZD/bWP91fYhThlLgpfHBFZ7M3G2IqmzWZTSELPzwp1bpRWt7yWQQ z1oYtXToycbwz3yPVk3cXtaoqpjDUVZf2Guqgqi1BwEyEtYOSaYo1VHNsKDf4OId QXQh64AqIK4uszpvtNhvsEaAECN7IiB+N4n2laFiQVmAf8Hfl3AnV/gKeD4lKmTj TY6knnjZO/X88wn80MB7JR1H1WXvvzNIHwNR95qfub/lVKX+C+0AORRtYhi5F9ec ixNs/z1ImLpYxAjiP/T5anD5xcX2S+LcSv7kRjhEufqNFtRAIqBZO9ZWbCdXAAyE tcH9Cru4jeIlFO/y6O61EVrn9FFj2+0uu+7urefNRQ2Y9pmKeculJrLF6WO8WHms 4IzXMmjZK+358RVdX2Ji5Hw6rBDvfgP+LjB8Jn8CeIiNRONEjT+2/AYQcfk61aLb INUbk6G6Vfd8iMO4aaRI9tmW+vKCOZa0IbnrNE1oHKp/AKBDr25i5YPSCsnl3r4Q iR7rRe9FIkfqBpbfjVFv =mo54 -----END PGP SIGNATURE----- Merge tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "Highlights: - Trond made a change to the server's tcp logic that allows a fast client to better take advantage of high bandwidth networks, but may increase the risk that a single client could starve other clients; a new sunrpc.svc_rpc_per_connection_limit parameter should help mitigate this in the (hopefully unlikely) event this becomes a problem in practice. - Tom Haynes added a minimal flex-layout pnfs server, which is of no use in production for now--don't build it unless you're doing client testing or further server development" * tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits) nfsd: remove some dead code in nfsd_create_locked() nfsd: drop unnecessary MAY_EXEC check from create nfsd: clean up bad-type check in nfsd_create_locked nfsd: remove unnecessary positive-dentry check nfsd: reorganize nfsd_create nfsd: check d_can_lookup in fh_verify of directories nfsd: remove redundant zero-length check from create nfsd: Make creates return EEXIST instead of EACCES SUNRPC: Detect immediate closure of accepted sockets SUNRPC: accept() may return sockets that are still in SYN_RECV nfsd: allow nfsd to advertise multiple layout types nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock nfsd/blocklayout: Make sure calculate signature/designator length aligned xfs: abstract block export operations from nfsd layouts SUNRPC: Remove unused callback xpo_adjust_wspace() SUNRPC: Change TCP socket space reservation SUNRPC: Add a server side per-connection limit SUNRPC: Micro optimisation for svc_data_ready SUNRPC: Call the default socket callbacks instead of open coding SUNRPC: lock the socket while detaching it ...	2016-08-04 19:59:06 -04:00
Dan Carpenter	2b11885921	nfsd: remove some dead code in nfsd_create_locked() We changed this around in f135af1041f ('nfsd: reorganize nfsd_create') so "dchild" can't be an error pointer any more. Also, dchild can't be NULL here (and dput would already handle this even if it was). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-04 17:11:53 -04:00
J. Bruce Fields	fa08139d5e	nfsd: drop unnecessary MAY_EXEC check from create We need an fh_verify to make sure we at least have a dentry, but actual permission checks happen later. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-04 17:11:52 -04:00
J. Bruce Fields	7142327449	nfsd: clean up bad-type check in nfsd_create_locked Minor cleanup, no change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-04 17:11:51 -04:00
J. Bruce Fields	d03d9fe476	nfsd: remove unnecessary positive-dentry check vfs_{create,mkdir,mknod} each begin with a call to may_create(), which returns EEXIST if the object already exists. This check is therefore unnecessary. (In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to RFC 1094, our code seems to believe that a CREATE of an existing file should succeed. I'm leaving that behavior alone.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-04 17:11:50 -04:00
J. Bruce Fields	b44061d0b9	nfsd: reorganize nfsd_create There's some odd logic in nfsd_create() that allows it to be called with the parent directory either locked or unlocked. The only already-locked caller is NFSv2's nfsd_proc_create(). It's less confusing to split out the unlocked case into a separate function which the NFSv2 code can call directly. Also fix some comments while we're here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-04 17:11:49 -04:00
J. Bruce Fields	e75b23f9e3	nfsd: check d_can_lookup in fh_verify of directories Create and other nfsd ops generally assume we can call lookup_one_len on inodes with S_IFDIR set. Al says that this assumption isn't true in general, though it should be for the filesystem objects nfsd sees. Add a check just to make sure our assumption isn't violated. Remove a couple checks for i_op->lookup in create code. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-04 17:11:48 -04:00
J. Bruce Fields	12391d0723	nfsd: remove redundant zero-length check from create lookup_one_len already has this check. The only effect of this patch is to return access instead of perm in the 0-length-filename case. I actually prefer nfserr_perm (or _inval?), but I doubt anyone cares. The isdotent check seems redundant too, but I worry that some client might actually care about that strange nfserr_exist error. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-04 17:11:47 -04:00
Oleg Drokin	7eed34f18d	nfsd: Make creates return EEXIST instead of EACCES When doing a create (mkdir/mknod) on a name, it's worth checking the name exists first before returning EACCES in case the directory is not writeable by the user. This makes return values on the client more consistent regardless of whenever the entry there is cached in the local cache or not. Another positive side effect is certain programs only expect EEXIST in that case even despite POSIX allowing any valid error to be returned. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-08-04 17:11:46 -04:00
Linus Torvalds	a867d7349e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns vfs updates from Eric Biederman: "This tree contains some very long awaited work on generalizing the user namespace support for mounting filesystems to include filesystems with a backing store. The real world target is fuse but the goal is to update the vfs to allow any filesystem to be supported. This patchset is based on a lot of code review and testing to approach that goal. While looking at what is needed to support the fuse filesystem it became clear that there were things like xattrs for security modules that needed special treatment. That the resolution of those concerns would not be fuse specific. That sorting out these general issues made most sense at the generic level, where the right people could be drawn into the conversation, and the issues could be solved for everyone. At a high level what this patchset does a couple of simple things: - Add a user namespace owner (s_user_ns) to struct super_block. - Teach the vfs to handle filesystem uids and gids not mapping into to kuids and kgids and being reported as INVALID_UID and INVALID_GID in vfs data structures. By assigning a user namespace owner filesystems that are mounted with only user namespace privilege can be detected. This allows security modules and the like to know which mounts may not be trusted. This also allows the set of uids and gids that are communicated to the filesystem to be capped at the set of kuids and kgids that are in the owning user namespace of the filesystem. One of the crazier corner casees this handles is the case of inodes whose i_uid or i_gid are not mapped into the vfs. Most of the code simply doesn't care but it is easy to confuse the inode writeback path so no operation that could cause an inode write-back is permitted for such inodes (aka only reads are allowed). This set of changes starts out by cleaning up the code paths involved in user namespace permirted mounts. Then when things are clean enough adds code that cleanly sets s_user_ns. Then additional restrictions are added that are possible now that the filesystem superblock contains owner information. These changes should not affect anyone in practice, but there are some parts of these restrictions that are changes in behavior. - Andy's restriction on suid executables that does not honor the suid bit when the path is from another mount namespace (think /proc/[pid]/fd/) or when the filesystem was mounted by a less privileged user. - The replacement of the user namespace implicit setting of MNT_NODEV with implicitly setting SB_I_NODEV on the filesystem superblock instead. Using SB_I_NODEV is a stronger form that happens to make this state user invisible. The user visibility can be managed but it caused problems when it was introduced from applications reasonably expecting mount flags to be what they were set to. There is a little bit of work remaining before it is safe to support mounting filesystems with backing store in user namespaces, beyond what is in this set of changes. - Verifying the mounter has permission to read/write the block device during mount. - Teaching the integrity modules IMA and EVM to handle filesystems mounted with only user namespace root and to reduce trust in their security xattrs accordingly. - Capturing the mounters credentials and using that for permission checks in d_automount and the like. (Given that overlayfs already does this, and we need the work in d_automount it make sense to generalize this case). Furthermore there are a few changes that are on the wishlist: - Get all filesystems supporting posix acls using the generic posix acls so that posix_acl_fix_xattr_from_user and posix_acl_fix_xattr_to_user may be removed. [Maintainability] - Reducing the permission checks in places such as remount to allow the superblock owner to perform them. - Allowing the superblock owner to chown files with unmapped uids and gids to something that is mapped so the files may be treated normally. I am not considering even obvious relaxations of permission checks until it is clear there are no more corner cases that need to be locked down and handled generically. Many thanks to Seth Forshee who kept this code alive, and putting up with me rewriting substantial portions of what he did to handle more corner cases, and for his diligent testing and reviewing of my changes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits) fs: Call d_automount with the filesystems creds fs: Update i_[ug]id_(read\|write) to translate relative to s_user_ns evm: Translate user/group ids relative to s_user_ns when computing HMAC dquot: For now explicitly don't support filesystems outside of init_user_ns quota: Handle quota data stored in s_user_ns in quota_setxquota quota: Ensure qids map to the filesystem vfs: Don't create inodes with a uid or gid unknown to the vfs vfs: Don't modify inodes with a uid or gid unknown to the vfs cred: Reject inodes with invalid ids in set_create_file_as() fs: Check for invalid i_uid in may_follow_link() vfs: Verify acls are valid within superblock's s_user_ns. userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS fs: Refuse uid/gid changes which don't map into s_user_ns selinux: Add support for unprivileged mounts from user namespaces Smack: Handle labels consistently in untrusted mounts Smack: Add support for unprivileged mounts from user namespaces fs: Treat foreign mounts as nosuid fs: Limit file caps to the user namespace of the super block userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag userns: Remove implicit MNT_NODEV fragility. ...	2016-07-29 15:54:19 -07:00
Linus Torvalds	6784725ab0	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted cleanups and fixes. Probably the most interesting part long-term is ->d_init() - that will have a bunch of followups in (at least) ceph and lustre, but we'll need to sort the barrier-related rules before it can get used for really non-trivial stuff. Another fun thing is the merge of ->d_iput() callers (dentry_iput() and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all except the one in __d_lookup_lru())" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) fs/dcache.c: avoid soft-lockup in dput() vfs: new d_init method vfs: Update lookup_dcache() comment bdev: get rid of ->bd_inodes Remove last traces of ->sync_page new helper: d_same_name() dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends() vfs: clean up documentation vfs: document ->d_real() vfs: merge .d_select_inode() into .d_real() unify dentry_iput() and dentry_unlink_inode() binfmt_misc: ->s_root is not going anywhere drop redundant ->owner initializations ufs: get rid of redundant checks orangefs: constify inode_operations missed comment updates from ->direct_IO() prototype change file_inode(f)->i_mapping is f->f_mapping trim fsnotify hooks a bit 9p: new helper - v9fs_parent_fid() debugfs: ->d_parent is never NULL or negative ...	2016-07-28 12:59:05 -07:00
Linus Torvalds	0e6acf0204	xfs: update for 4.8-rc1 Changes in this update: o generic iomap based IO path infrastructure o generic iomap based fiemap implementation o xfs iomap based Io path implementation o buffer error handling fixes o tracking of in flight buffer IO for unmount serialisation o direct IO and DAX io path separation and simplification o shortform directory format definition changes for wider platform compatibility o various buffer cache fixes o cleanups in preparation for rmap merge o error injection cleanups and fixes o log item format buffer memory allocation restructuring to prevent rare OOM reclaim deadlocks o sparse inode chunks are now fully supported. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXmA5XAAoJEK3oKUf0dfodCc0QAKY5Jlfw5HwLria+Ad87HCcM Zi/LGMMC3CPh+vkbqsmDnLKHYjXRwi3HamBoXdufiE8E3UtOjp/sV98/fCw+zwhe tHDLmdAx23RLTn7gUhcsIXydKeXh0+HlRxPa4eBAlmnsJ3nGgrKrKQLgDT7Gjlum nPfRSTYjzm5gs2dpUTYhMV7MplenDW9GFz2uBMct6N9kYQ9m225I99fd/4nb/L7R o/8UocsK7iREUXP6decDoN9uIAzE2mYR720EL+Txy09CTYy+luNyGoNXOsQtxT5O plyoPZbzIIDvC44bvp6bZX96Udm7tAeTloieInCZG13I2zJy9gmTmLqkZ3M2at12 kOyeAMSBOWQYSa3uh++FsEP+JGtBTlZXf+4DAYf+U08s8tMVE/61/RZrtJZF4OjW hyumRBD6zqZ9Y6Qtji2HaA3l9IGxOC2k4URw9JZdDDyMoRTQvawN1QWNAeZINXiv 9ywqTruVsfQnoGDC1Gk1OEfQpubNztTAkEPqVM7ez5dkwOdwuOZXcZPL1Ltvb4Bt PLaWKLIYFYZKrM5kqgQlTERspSQA99++z8H9a21wFezfetaBby28fIqwMMfQAiSw nCq95WshJPwenogMtWjNfOgs/fqOBKdPdLFw0H6Jpmjwna2KpuFIZiTnwu25vvjz dHh4DVSuMTq1pBkXEU7B =vcSd -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs updates from Dave Chinner: "The major addition is the new iomap based block mapping infrastructure. We've been kicking this about locally for years, but there are other filesystems want to use it too (e.g. gfs2). Now it is fully working, reviewed and ready for merge and be used by other filesystems. There are a lot of other fixes and cleanups in the tree, but those are XFS internal things and none are of the scale or visibility of the iomap changes. See below for details. I am likely to send another pull request next week - we're just about ready to merge some new functionality (on disk block->owner reverse mapping infrastructure), but that's a huge chunk of code (74 files changed, 7283 insertions(+), 1114 deletions(-)) so I'm keeping that separate to all the "normal" pull request changes so they don't get lost in the noise. Summary of changes in this update: - generic iomap based IO path infrastructure - generic iomap based fiemap implementation - xfs iomap based Io path implementation - buffer error handling fixes - tracking of in flight buffer IO for unmount serialisation - direct IO and DAX io path separation and simplification - shortform directory format definition changes for wider platform compatibility - various buffer cache fixes - cleanups in preparation for rmap merge - error injection cleanups and fixes - log item format buffer memory allocation restructuring to prevent rare OOM reclaim deadlocks - sparse inode chunks are now fully supported" * tag 'xfs-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (53 commits) xfs: remove EXPERIMENTAL tag from sparse inode feature xfs: bufferhead chains are invalid after end_page_writeback xfs: allocate log vector buffers outside CIL context lock libxfs: directory node splitting does not have an extra block xfs: remove dax code from object file when disabled xfs: skip dirty pages in ->releasepage() xfs: remove __arch_pack xfs: kill xfs_dir2_inou_t xfs: kill xfs_dir2_sf_off_t xfs: split direct I/O and DAX path xfs: direct calls in the direct I/O path xfs: stop using generic_file_read_iter for direct I/O xfs: split xfs_file_read_iter into buffered and direct I/O helpers xfs: remove s_maxbytes enforcement in xfs_file_read_iter xfs: kill ioflags xfs: don't pass ioflags around in the ioctl path xfs: track and serialize in-flight async buffers against unmount xfs: exclude never-released buffers from buftarg I/O accounting xfs: don't reset b_retries to 0 on every failure xfs: remove extraneous buffer flag changes ...	2016-07-27 09:53:35 -07:00
Jeff Layton	8a4c392688	nfsd: allow nfsd to advertise multiple layout types If the underlying filesystem supports multiple layout types, then there is little reason not to advertise that fact to clients and let them choose what type to use. Turn the ex_layout_type field into a bitfield. For each supported layout type, we set a bit in that field. When the client requests a layout, ensure that the bit for that layout type is set. When the client requests attributes, send back a list of supported types. Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-15 15:31:32 -04:00
Chuck Lever	885848186f	nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock nfsd4_release_lockowner finds a lock owner that has no lock state, and drops cl_lock. Then release_lockowner picks up cl_lock and unhashes the lock owner. During the window where cl_lock is dropped, I don't see anything preventing a concurrent nfsd4_lock from finding that same lock owner and adding lock state to it. Move release_lockowner() into nfsd4_release_lockowner and hang onto the cl_lock until after the lock owner's state cannot be found again. Found by inspection, we don't currently have a reproducer. Fixes: `2c41beb0e5` ("nfsd: reduce cl_lock thrashing in ... ") Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-15 15:31:31 -04:00
Kinglong Mee	dd51db1886	nfsd/blocklayout: Make sure calculate signature/designator length aligned These values are all multiples of 4 already, so there's no change in behavior from this patch. But perhaps this will prevent mistakes in the future. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-15 15:31:30 -04:00
Benjamin Coddington	15d66ac209	xfs: abstract block export operations from nfsd layouts Instead of creeping pnfs layout configuration into filesystems, move the definition of block-based export operations under a more abstract configuration. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Chinner <david@fromorbit.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-15 15:31:29 -04:00
Christophe JAILLET	d28c442f5b	nfsd: Fix some indent inconsistancy Silent a few smatch warnings about indentation Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-13 15:53:41 -04:00
Oleg Drokin	93f580a9a2	nfsd: Correct a comment for NFSD_MAY_ defines location Those are now defined in fs/nfsd/vfs.h Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-13 15:53:40 -04:00
Tom Haynes	9b9960a0ca	nfsd: Add a super simple flex file server Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2) is also the ds (NFSv3). I.e., the metadata and the data file are the exact same file. This will allow testing of the flex file client. Simply add the "pnfs" export option to your export in /etc/exports and mount from a client that supports flex files. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-13 15:40:48 -04:00
Tom Haynes	d7c920d134	nfsd: flex file device id encoding will need the server address Signed-off-by: Tom Haynes <loghyr@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-13 15:40:47 -04:00
Andrew Elble	ed94164398	nfsd: implement machine credential support for some operations This addresses the conundrum referenced in RFC5661 18.35.3, and will allow clients to return state to the server using the machine credentials. The biggest part of the problem is that we need to allow the client to send a compound op with integrity/privacy on mounts that don't have it enabled. Add server support for properly decoding and using spo_must_enforce and spo_must_allow bits. Add support for machine credentials to be used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN, and TEST/FREE STATEID. Implement a check so as to not throw WRONGSEC errors when these operations are used if integrity/privacy isn't turned on. Without this, Linux clients with credentials that expired while holding delegations were getting stuck in an endless loop. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-13 15:32:47 -04:00
Andrew Elble	dedeb13f9e	nfsd: allow mach_creds_match to be used more broadly Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-07-13 15:32:47 -04:00
Al Viro	b223f4e215	Merge branch 'd_real' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into work.misc	2016-06-30 23:34:49 -04:00
Ben Hutchings	999653786d	nfsd: check permissions when setting ACLs Use set_posix_acl, which includes proper permission checks, instead of calling ->set_acl directly. Without this anyone may be able to grant themselves permissions to a file by setting the ACL. Lock the inode to make the new checks atomic with respect to set_acl. (Also, nfsd was the only caller of set_acl not locking the inode, so I suspect this may fix other races.) This also simplifies the code, and ensures our ACLs are checked by posix_acl_valid. The permission checks and the inode locking were lost with commit `4ac7249e`, which changed nfsd to use the set_acl inode operation directly instead of going through xattr handlers. Reported-by: David Sinquin <david@sinquin.eu> [agreunba@redhat.com: use set_posix_acl] Fixes: `4ac7249e` Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-06-24 12:11:52 -04:00
Eric W. Biederman	d91ee87d8d	vfs: Pass data, ns, and ns->userns to mount_ns Today what is normally called data (the mount options) is not passed to fill_super through mount_ns. Pass the mount options and the namespace separately to mount_ns so that filesystems such as proc that have mount options, can use mount_ns. Pass the user namespace to mount_ns so that the standard permission check that verifies the mounter has permissions over the namespace can be performed in mount_ns instead of in each filesystems .mount method. Thus removing the duplication between mqueuefs and proc in terms of permission checks. The extra permission check does not currently affect the rpc_pipefs filesystem and the nfsd filesystem as those filesystems do not currently allow unprivileged mounts. Without unpvileged mounts it is guaranteed that the caller has already passed capable(CAP_SYS_ADMIN) which guarantees extra permission check will pass. Update rpc_pipefs and the nfsd filesystem to ensure that the network namespace reference is always taken in fill_super and always put in kill_sb so that the logic is simpler and so that errors originating inside of fill_super do not cause a network namespace leak. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2016-06-23 15:41:53 -05:00
Christoph Hellwig	199a31c6d9	fs: move struct iomap from exportfs.h to a separate header Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-06-21 09:22:39 +10:00
Oleg Drokin	8c7245abda	nfsd: Make init_open_stateid() a bit more whole Move the state selection logic inside from the caller, always making it return correct stp to use. Signed-off-by: J . Bruce Fields <bfields@fieldses.org> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-06-15 22:03:53 -04:00
Oleg Drokin	5cc1fb2a09	nfsd: Extend the mutex holding region around in nfsd4_process_open2() To avoid racing entry into nfs4_get_vfs_file(). Make init_open_stateid() return with locked stateid to be unlocked by the caller. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-06-15 22:03:41 -04:00
Oleg Drokin	feb9dad520	nfsd: Always lock state exclusively. It used to be the case that state had an rwlock that was locked for write by downgrades, but for read for upgrades (opens). Well, the problem is if there are two competing opens for the same state, they step on each other toes potentially leading to leaking file descriptors from the state structure, since access mode is a bitmap only set once. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-06-15 22:03:31 -04:00
J. Bruce Fields	d50039ea5e	nfsd4/rpc: move backchannel create logic into rpc code Also simplify the logic a bit. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com>	2016-06-15 10:32:25 -04:00
Geert Uytterhoeven	eee930163c	nfsd: Fix NFSD_MDS_PR_KEY on 32-bit by adding ULL postfix On 32-bit: fs/nfsd/blocklayout.c: In function ‘nfsd4_block_get_device_info_scsi’: fs/nfsd/blocklayout.c:337: warning: integer constant is too large for ‘long’ type fs/nfsd/blocklayout.c:344: warning: integer constant is too large for ‘long’ type fs/nfsd/blocklayout.c: In function ‘nfsd4_scsi_fence_client’: fs/nfsd/blocklayout.c:385: warning: integer constant is too large for ‘long’ type Add the missing "ULL" postfix to 64-bit constant NFSD_MDS_PR_KEY to fix this. Fixes: `f99d4fbdae` ("nfsd: add SCSI layout support") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-06-14 11:50:04 -04:00
Al Viro	84c60b1388	drop redundant ->owner initializations it's not needed for file_operations of inodes located on fs defined in the hosting module and for file_operations that go into procfs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-29 19:08:00 -04:00
Linus Torvalds	5d22c5ab85	A very quiet cycle for nfsd, mainly just an RDMA update from Chuck Lever. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXRL2PAAoJECebzXlCjuG+c34P/1wnkehVxDozBJp7UEzhrsE/ U1dpwfykzVEIMh68TldBvyrt2Lb4ThLPZ7V2dVwNqA831S/VM6fWJyw8WerSgGgU SUGOzdF04rNfy41lXQNpDiiC417Fbp4Js4O+Q5kd+8kqQbXYqCwz0ce3DVbAT571 JmJgBI8gZLhicyNRDOt0y6C+/3P+0bbXYvS8wkzY+CwbNczHJOCLhwViKzWTptm9 LCSgDGm68ckpR7mZkWfEF3WdiZ9+SxeI+pT9dcomzxNfbv8NluDplYmdLbepA2J8 uWHGprVe9WJMDnw4hJhrI2b3/rHIntpxuZYktmnb/z/ezBTyi3FXYWgAEdE1by+Y Gf7OewKOp8XcQ/iHRZ8vwXNrheHAr9++SB49mGBZJ3qj6bO+FrISQKX9FRxo6PrJ SDRgYjt5yUG2oD1AAs1NzuBPqZzR40mA6Yk4zuNAcxzK/S7DdRF/9Kjyk86TVv08 3E3O5i1RyVcU/A7JdnbiyeDFMQoRshdnN0HShIZcSfcfW+qFKghNlO9bFfSl904F jlG6moNB5OBiV8FNOelY+HGAYoUdw120QxqQMv47oZGKCjv+rfK38aB4GBJ4iEuo TrGqNmrMrs/AKdL3Sd+8LuJqSfXggrwUDc/KS6CFz/U0eBbp6k0kcd7FEyG/J8kW JxQ0URgyJ+DHfc60E8LN =k6RP -----END PGP SIGNATURE----- Merge tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "A very quiet cycle for nfsd, mainly just an RDMA update from Chuck Lever" * tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux: sunrpc: fix stripping of padded MIC tokens svcrpc: autoload rdma module svcrdma: Generalize svc_rdma_xdr_decode_req() svcrdma: Eliminate code duplication in svc_rdma_recvfrom() svcrdma: Drain QP before freeing svcrdma_xprt svcrdma: Post Receives only for forward channel requests svcrdma: Remove superfluous line from rdma_read_chunks() svcrdma: svc_rdma_put_context() is invoked twice in Send error path svcrdma: Do not add XDR padding to xdr_buf page vector svcrdma: Support IPv6 with NFS/RDMA nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid Remove unnecessary allocation	2016-05-24 14:39:20 -07:00
Linus Torvalds	c2e7b20705	Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs cleanups from Al Viro: "More cleanups from Christoph" * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: use RWF_SYNC fs: add RWF_DSYNC aand RWF_SYNC ceph: use generic_write_sync fs: simplify the generic_write_sync prototype fs: add IOCB_SYNC and IOCB_DSYNC direct-io: remove the offset argument to dio_complete direct-io: eliminate the offset argument to ->direct_IO xfs: eliminate the pos variable in xfs_file_dio_aio_write filemap: remove the pos argument to generic_file_direct_write filemap: remove pos variables in generic_file_read_iter	2016-05-17 15:05:23 -07:00
Chuck Lever	6625d09137	svcrdma: Do not add XDR padding to xdr_buf page vector An xdr_buf has a head, a vector of pages, and a tail. Each RPC request is presented to the NFS server contained in an xdr_buf. The RDMA transport would like to supply the NFS server with only the NFS WRITE payload bytes in the page vector. In some common cases, that would allow the NFS server to swap those pages right into the target file's page cache. Have the transport's RDMA Read logic put XDR pad bytes in the tail iovec, and not in the pages that hold the data payload. The NFSv3 WRITE XDR decoder is finicky about the lengths involved, so make sure it is looking in the correct places when computing the total length of the incoming NFS WRITE request. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-05-13 15:53:05 -04:00
Jeff Layton	14b7f4a1ed	nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid Move the existing static function to an inline helper, and call it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-05-13 15:34:47 -04:00
Christoph Hellwig	24368aad47	nfsd: use RWF_SYNC Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-01 19:58:39 -04:00
Al Viro	fc64005c93	don't bother with ->d_inode->i_sb - it's always equal to ->d_sb ... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-04-10 17:11:51 -04:00
Linus Torvalds	8b306a2e7c	Various bugfixes, a RDMA update from Chuck Lever, and support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. Note this pull request also includes the client side of SCSI layout, with Trond's permission. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW9D0/AAoJECebzXlCjuG+fYcP/ibluAOSRrQ523gQcJNS+QSV 3B7YY6diJkfQNkm4oAROwPd1KHT2qhoVAO3JHXA3SZnjVVYQxAHeh2wsZJ2jL6Ft uyZARxix+F9alJVT3S+uYLwagjh9LXLhb0MaRTMheaWGsPKLQTU4JtsLsjAIhCah R0EIIdQfWcb83XoVPmiflVO4Nl/TQWmfA5wHfoVtITJcL3AaC9gzCGNbc8dHLnFC HRjGVgHr3nSL3suvUEFfxSEo4QoNPWIX4kBaWXgqbVgOQqmbtQtaXdnd3gIRtkzj 9Q/lxiwaArtDjdAQdyNtRRBUpkpWo+xWp/vpnNUxTXKoRtpSyqYQX5FaPCPRVAAp GYGw2qHrvWn2hSajtVtKyWwsQ3lYsDmbkxAkgScO9kQdS+kuxNyIzYIEvakdtFyJ txFsauJczkNNFeHKzLPDoGbuX7KB/+pUsjmX5nYtMhwRriXA5S8zcO4AvTrmTPDF vQrLM97mqI60LWmpQUO1OE8CEFPVx5DUZ0KdLMvFNKPZph8BTPJxJMmxJK4R6stV /TWglRTEO8IGUh0ww8+3PfMfxVG5XHnQc99+VGVZOS9hJ4GOXbWYAqZ0m+sRJ2Pi JPawILie5x2gH1FrVYbcTZsQzdmdn/BF9yePNzAkMucjuEUHXFTlf3MMfEhKpYTl 0l8LBCv6ZvtGU+PUJxZn =MToz -----END PGP SIGNATURE----- Merge tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux Pull more nfsd updates from Bruce Fields: "Apologies for the previous request, which omitted the top 8 commits from my for-next branch (including the SCSI layout commits). Thanks to Trond for spotting my error!" This actually includes the new layout types, so here's that part of the pull message repeated: "Support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. Note this pull request also includes the client side of SCSI layout, with Trond's permission" * tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux: nfsd: use short read as well as i_size to set eof nfsd: better layoutupdate bounds-checking nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK nfsd: add SCSI layout support nfsd: move some blocklayout code nfsd: add a new config option for the block layout driver nfs/blocklayout: add SCSI layout support nfs4.h: add SCSI layout definitions	2016-03-24 19:50:32 -07:00
Linus Torvalds	5b1e167d8d	Various bugfixes, a RDMA update from Chuck Lever, and support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. (Also: note this pull request also includes the client side of SCSI layout, with Trond's permission.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW8+uhAAoJECebzXlCjuG+26YP/35DP4MPfszEJ5G0dYq5HMwl dJUni8ajSHRswZ/2FqiBsRwmg3Djfc+uoXdOneD1f6ogkDe7S16yp+FRyh8/VwUs Ym6LcxSjT28uqkxO0MblcnUl0G9nNSuOwqIsZ0HG7/UC7E6RmCF4o3r5fFUfOsA+ B3koB5UcHNAFythAk+GDwOQ46Fr96VkZ7Y+OhdNAwmeXZIdKXIufweueI/o2uipB RoJFJ7lqrzAjFe+CqAUBr2l2k6lEKzdxbEH6HXQ5+cvVNwfVIgnrONpF78uF/p9T NNDnZ+fn3YdRhd+W9RxUHZq7ZL5YOEA8kHsAlloeBH74GqCy7IcS+DrKt1ReM3px bhgsXM3dqqJ9xiDGqmeE4VQwRF30SxgYZbO386E+cLHnCYV+vfY6RUaWPrk6On/r FL9g3iyVvhyC4HO06Xm+uvvERw8R+fTZY9KZQKH2RL0Tr5DkWRRNJfasMO+PwGOv Fdku01vyoA4Y6mbqUgQ9DmrbLO4gK3UyMiOTanQV9shrIDxI0MOuLK03zL25vZCM s1A4YBpXmg4gx3XsOFM+tygv6EVujDu6scICeb+hj0vi0oG82Lx7T9e3MJEiYC+T jbi8bu+x+0bX2obMprvDNVUzi/PgSUVpGCnRlbRTaXBa0lB6nV7uUiQ1HC9gGesm ZWWiOv7du+7WlFP5c6r5 =mY8w -----END PGP SIGNATURE----- Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "Various bugfixes, a RDMA update from Chuck Lever, and support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. (Also: note this pull request also includes the client side of SCSI layout, with Trond's permission.)" * tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race nfsd: recover: fix memory leak nfsd: fix deadlock secinfo+readdir compound nfsd4: resfh unused in nfsd4_secinfo svcrdma: Use new CQ API for RPC-over-RDMA server send CQs svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs svcrdma: Remove close_out exit path svcrdma: Hook up the logic to return ERR_CHUNK svcrdma: Use correct XID in error replies svcrdma: Make RDMA_ERROR messages work rpcrdma: Add RPCRDMA_HDRLEN_ERR svcrdma: svc_rdma_post_recv() should close connection on error svcrdma: Close connection when a send error occurs nfsd: Lower NFSv4.1 callback message size limit svcrdma: Do not send Write chunk XDR pad with inline content svcrdma: Do not write xdr_buf::tail in a Write chunk svcrdma: Find client-provided write and reply chunks once per reply nfsd: Update NFS server comments related to RDMA support nfsd: Fix a memory leak when meeting unsupported state_protect_how4 nfsd4: fix bad bounds checking	2016-03-24 10:41:00 -07:00
Benjamin Coddington	ac503e4a30	nfsd: use short read as well as i_size to set eof Use the result of a local read to determine when to set the eof flag. This allows us to return the location of the end of the file atomically at the time of the read. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> [bfields: add some documentation] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-23 16:02:39 -04:00
J. Bruce Fields	4b15da44e7	nfsd: better layoutupdate bounds-checking You could add any multiple of 2^32/PNFS_SCSI_RANGE_SIZE to nr_iomaps and still pass this check. You'd probably still fail the following kcalloc, but best to be paranoid since this is from-the-wire data. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-22 14:39:35 -04:00
Linus Torvalds	3c2de27d79	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: - Preparations of parallel lookups (the remaining main obstacle is the need to move security_d_instantiate(); once that becomes safe, the rest will be a matter of rather short series local to fs/.c - preadv2/pwritev2 series from Christoph - assorted fixes 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits) splice: handle zero nr_pages in splice_to_pipe() vfs: show_vfsstat: do not ignore errors from show_devname method dcache.c: new helper: __d_add() don't bother with __d_instantiate(dentry, NULL) untangle fsnotify_d_instantiate() a bit uninline d_add() replace d_add_unique() with saner primitive quota: use lookup_one_len_unlocked() cifs_get_root(): use lookup_one_len_unlocked() nfs_lookup: don't bother with d_instantiate(dentry, NULL) kill dentry_unhash() ceph_fill_trace(): don't bother with d_instantiate(dn, NULL) autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup() configfs: move d_rehash() into configfs_create() for regular files ceph: don't bother with d_rehash() in splice_dentry() namei: teach lookup_slow() to skip revalidate namei: massage lookup_slow() to be usable by lookup_one_len_unlocked() lookup_one_len_unlocked(): use lookup_dcache() namei: simplify invalidation logics in lookup_dcache() namei: change calling conventions for lookup_{fast,slow} and follow_managed() ...	2016-03-19 18:52:29 -07:00
Christoph Hellwig	10c4de10b2	nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:42:54 -04:00
Christoph Hellwig	f99d4fbdae	nfsd: add SCSI layout support This is a simple extension to the block layout driver to use SCSI persistent reservations for access control and fencing, as well as SCSI VPD pages for device identification. For this we need to pass the nfs4_client to the proc_getdeviceinfo method to generate the reservation key, and add a new fence_client method to allow for fence actions in the layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:42:53 -04:00
Christoph Hellwig	368248eeb1	nfsd: move some blocklayout code Trivial reorganization, no change in behavior. Move some code around, pull some code out of block layoutcommit that will be useful for the scsi layout. [bfields@redhat.com: split off from "nfsd: add SCSI layout support"] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:41:17 -04:00
Christoph Hellwig	81c3932901	nfsd: add a new config option for the block layout driver Split the config symbols into a generic pNFS one, which is invisible and gets selected by the layout drivers, and one for the block layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:40:57 -04:00
Sudip Mukherjee	956ccef3c9	nfsd: recover: fix memory leak nfsd4_cltrack_grace_start() will allocate the memory for grace_start but when we returned due to error we missed freeing it. Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-17 14:57:15 -04:00
J. Bruce Fields	2f6fc056e8	nfsd: fix deadlock secinfo+readdir compound nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also unlocks if necessary (nfsd filehandle locking is probably too lenient), so it gets unlocked eventually, but if the following op in the compound needs to lock it again, we can deadlock. A fuzzer ran into this; normal clients don't send a secinfo followed by a readdir in the same compound. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-16 10:51:21 -04:00
Christoph Hellwig	793b80ef14	vfs: pass a flags argument to vfs_readv/vfs_writev This way we can set kiocb flags also from the sync read/write path for the read_iter/write_iter operations. For now there is no way to pass flags to plain read/write operations as there is no real need for that, and all flags passed are explicitly rejected for these files. Signed-off-by: Milosz Tanski <milosz@adfin.com> [hch: rebased on top of my kiocb changes] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stephen Bates <stephen.bates@pmcs.com> Tested-by: Stephen Bates <stephen.bates@pmcs.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-04 12:20:10 -05:00
J. Bruce Fields	0f1738a10b	nfsd4: resfh unused in nfsd4_secinfo Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-02 15:26:36 -08:00
Chuck Lever	4500632f60	nfsd: Lower NFSv4.1 callback message size limit The maximum size of a backchannel message on RPC-over-RDMA depends on the connection's inline threshold. Today that threshold is typically 1024 bytes, making the maximum message size 996 bytes. The Linux server's CREATE_SESSION operation checks that the size of callback Calls can be as large as 1044 bytes, to accommodate RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the true message size maximum of 996 bytes. But the server's backchannel currently does not support RPCSEC_GSS. The actual maximum size it needs is much smaller. It is safe to reduce the limit to enable NFSv4.1 on RDMA backchannel operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-01 13:06:35 -08:00
Chuck Lever	4ce85c8cf8	nfsd: Update NFS server comments related to RDMA support The server does indeed now support NFSv4.1 on RDMA transports. It does not support shifting an RDMA-capable TCP transport (such as iWARP) to RDMA mode. Reported-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-01 13:06:32 -08:00
Kinglong Mee	8edf4b0288	nfsd: Fix a memory leak when meeting unsupported state_protect_how4 Remember free allocated client when meeting unsupported state protect how. Fixes: `50c7b948ad` ("nfsd: minor consolidation of mach_cred handling code") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-01 13:06:31 -08:00
J. Bruce Fields	4aed9c46af	nfsd4: fix bad bounds checking A number of spots in the xdr decoding follow a pattern like n = be32_to_cpup(p++); READ_BUF(n + 4); where n is a u32. The only bounds checking is done in READ_BUF itself, but since it's checking (n + 4), it won't catch cases where n is very large, (u32)(-4) or higher. I'm not sure exactly what the consequences are, but we've seen crashes soon after. Instead, just break these up into two READ_BUF()s. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-01 13:02:57 -08:00
Herbert Xu	1edb82d202	nfsd: Use shash This patch replaces uses of the long obsolete hash interface with shash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-01-27 20:36:13 +08:00
Al Viro	5955102c99	wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-22 18:04:28 -05:00
Linus Torvalds	cc80fe0eef	Smaller bugfixes and cleanup, including a fix for a failures of kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK storms that can affect some high-availability NFS setups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWmVZHAAoJECebzXlCjuG+Cn4P/3zwSuwIeLuv9b89vzFXU8Xv AbBWHk7WkFXJQGTKdclYjwxqU+l15D5lYHCae1cuD5eXdviraxXf7EcnqrMhJUc0 oRiQx0rAwlEkKUAVrxGCFP7WKjlX3TsEBV6wPpTCP3BEMzTPDEeaDek7+hICFkLF 9a/miEXAopm3jxP7WNmXEkdKpFEHklDDwtv6Av7iIKCW6+7XCGp7Prqo4NQKAKp6 hjE+nvt2HiD06MZhUeyb14cn6547smzt1rbSfK4IB4yHMwLyaoqPrT7ekDh9LDrE uGgo+Y2PBbEcTAE6tJ88EjZx7cMCFPn0te+eKPgnpPy9RqrNqSxj5N/b7JAecKgW a/09BtvFOoYs8fO5ovqeRY5THrE3IRyMIwn4gt7fCYaaAbG3dwGKG1uklTAVXtb1 95DkhOb8He2VhOCCoJ6ybbTnRfjB6b/cv7ZuEGlQfvTE+BtU3Jj9I76ruWFhb3zd HM1dRI20UfwL/0Y8yYhZ+/rje9SSk2jOmVgSCqY9hnCmEqOqOdUU0X/uumIWaBym zfGx9GIM0jQuYVdLQRXtJJbUgJUUN3MilGyU5wx7YoXip5guqTalXqAdQpShzXeW s1ATYh/mY5X9ig51KogkkVlm9bXDQAzJBAnDRpLtJZqy5Cgkrj9RSu0ExN1Rmlhw LKQCddBQxUSWJ+XWycgK =G7V3 -----END PGP SIGNATURE----- Merge tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "Smaller bugfixes and cleanup, including a fix for a failures of kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK storms that can affect some high-availability NFS setups" * tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux: nfsd: add new io class tracepoint nfsd: give up on CB_LAYOUTRECALLs after two lease periods nfsd: Fix nfsd leaks sunrpc module references lockd: constify nlmsvc_binding structure lockd: use to_delayed_work nfsd: use to_delayed_work Revert "svcrdma: Do not send XDR roundup bytes for a write chunk" lockd: Register callbacks on the inetaddr_chain and inet6addr_chain nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain sunrpc: Add a function to close temporary transports immediately nfsd: don't base cl_cb_status on stale information nfsd4: fix gss-proxy 4.1 mounts for some AD principals nfsd: fix unlikely NULL deref in mach_creds_match nfsd: minor consolidation of mach_cred handling code nfsd: helper for dup of possibly NULL string svcrpc: move some initialization to common code nfsd: fix a warning message nfsd: constify nfsd4_callback_ops structure nfsd: recover: constify nfsd4_client_tracking_ops structures svcrdma: Do not send XDR roundup bytes for a write chunk	2016-01-15 12:49:44 -08:00
Jeff Layton	6e8b50d16a	nfsd: add new io class tracepoint Add some new tracepoints in the nfsd read/write codepaths. The idea is that this will give us the ability to measure how long each phase of a read or write operation takes. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-01-14 17:32:51 -05:00
Linus Torvalds	33caf82acf	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I am asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- \| This is an automated patch using \| \| sed 's/mutex_lock(&\(.\)->i_mutex)/inode_lock(\1)/' \| sed 's/mutex_unlock(&\(.\)->i_mutex)/inode_unlock(\1)/' \| sed 's/mutex_lock_nested(&\(.\)->i_mutex,[ ]I_MUTEX_\([A-Z0-9_]\))/inode_lock_nested(\1, I_MUTEX_\2)/' \| sed 's/mutex_is_locked(&\(.\)->i_mutex)/inode_is_locked(\1)/' \| sed 's/mutex_trylock(&\(.\)->i_mutex)/inode_trylock(\1)/' \| \| with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...	2016-01-12 17:11:47 -08:00
Linus Torvalds	fce205e9da	Merge branch 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs copy_file_range updates from Al Viro: "Several series around copy_file_range/CLONE" * 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: use new dedupe data function pointer vfs: hoist the btrfs deduplication ioctl to the vfs vfs: wire up compat ioctl for CLONE/CLONE_RANGE cifs: avoid unused variable and label nfsd: implement the NFSv4.2 CLONE operation nfsd: Pass filehandle to nfs4_preprocess_stateid_op() vfs: pull btrfs clone API to vfs layer locks: new locks_mandatory_area calling convention vfs: Add vfs_copy_file_range() support for pagecache copies btrfs: add .copy_file_range file operation x86: add sys_copy_file_range to syscall tables vfs: add copy_file_range syscall and vfs helper	2016-01-12 16:30:34 -08:00
NeilBrown	bbddca8e8f	nfsd: don't hold i_mutex over userspace upcalls We need information about exports when crossing mountpoints during lookup or NFSv4 readdir. If we don't already have that information cached, we may have to ask (and wait for) rpc.mountd. In both cases we currently hold the i_mutex on the parent of the directory we're asking rpc.mountd about. We've seen situations where rpc.mountd performs some operation on that directory that tries to take the i_mutex again, resulting in deadlock. With some care, we may be able to avoid that in rpc.mountd. But it seems better just to avoid holding a mutex while waiting on userspace. It appears that lookup_one_len is pretty much the only operation that needs the i_mutex. So we could just drop the i_mutex elsewhere and do something like mutex_lock() lookup_one_len() mutex_unlock() In many cases though the lookup would have been cached and not required the i_mutex, so it's more efficient to create a lookup_one_len() variant that only takes the i_mutex when necessary. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-09 03:07:52 -05:00
Jeff Layton	6b9b21073d	nfsd: give up on CB_LAYOUTRECALLs after two lease periods Have the CB_LAYOUTRECALL code treat NFS4_OK and NFS4ERR_DELAY returns equivalently. Change the code to periodically resend CB_LAYOUTRECALLS until the ls_layouts list is empty or the client returns a different error code. If we go for two lease periods without the list being emptied or the client sending a hard error, then we give up and clean out the list anyway. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-01-08 16:47:51 -05:00
Kinglong Mee	691412b443	nfsd: Fix nfsd leaks sunrpc module references Stefan Hajnoczi reports, nfsd leaks 3 references to the sunrpc module here: # echo -n "asdf 1234" >/proc/fs/nfsd/portlist bash: echo: write error: Protocol not supported Now stop nfsd and try unloading the kernel modules: # systemctl stop nfs-server # systemctl stop nfs # systemctl stop proc-fs-nfsd.mount # systemctl stop var-lib-nfs-rpc_pipefs.mount # rmmod nfsd # rmmod nfs_acl # rmmod lockd # rmmod auth_rpcgss # rmmod sunrpc rmmod: ERROR: Module sunrpc is in use # lsmod \| grep rpc sunrpc 315392 3 It is caused by nfsd don't cleanup rpcb program for nfsd when destroying svc service after creating xprt fail. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-01-07 10:10:51 -05:00
Julia Lawall	2a297450dd	lockd: constify nlmsvc_binding structure The nlmsvc_binding structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-01-07 10:10:50 -05:00
Geliang Tang	2e55f3ab45	nfsd: use to_delayed_work Use to_delayed_work() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-01-07 10:10:49 -05:00
Scott Mayhew	366849966f	nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain Register callbacks on inetaddr_chain and inet6addr_chain to trigger cleanup of nfsd transport sockets when an ip address is deleted. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-12-23 10:08:15 -05:00
J. Bruce Fields	d4f72cb7fa	nfsd: don't base cl_cb_status on stale information The rpc client we use to send callbacks may change occasionally. (In the 4.0 case, the client can use setclientid/setclientid_confirm to update the callback parameters. In the 4.1+ case, sessions and connections can come and go.) The update is done from the callback thread by nfsd4_process_cb_update, which shuts down the old rpc client and then creates a new one. The client shutdown kills any ongoing rpc calls. There won't be any new ones till the new one's created and the callback thread moves on. When an rpc encounters a problem that may suggest callback rpc's aren't working any longer, it normally sets NFSD4_CB_DOWN, so the server can tell the client something's wrong. But if the rpc notices CB_UPDATE is set, then the failure may just be a normal result of shutting down the callback client. Or it could just be a coincidence, but in any case, it means we're runing with the old about-to-be-discarded client, so the failure's not interesting. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-12-23 10:08:14 -05:00
Jeff Layton	be20aa00c6	nfsd: don't hold ls_mutex across a layout recall We do need to serialize layout stateid morphing operations, but we currently hold the ls_mutex across a layout recall which is pretty ugly. It's also unnecessary -- once we've bumped the seqid and copied it, we don't need to serialize the rest of the CB_LAYOUTRECALL vs. anything else. Just drop the mutex once the copy is done. This was causing a "workqueue leaked lock or atomic" warning and an occasional deadlock. There's more work to be done here but this fixes the immediate regression. Fixes: `cc8a55320b` "nfsd: serialize layout stateid morphing operations" Cc: stable@vger.kernel.org Reported-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-12-16 11:49:58 -05:00
Christoph Hellwig	ffa0160a10	nfsd: implement the NFSv4.2 CLONE operation This is basically a remote version of the btrfs CLONE operation, so the implementation is fairly trivial. Made even more trivial by stealing the XDR code and general framework Anna Schumaker's COPY prototype. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-07 23:12:00 -05:00
Anna Schumaker	aa0d6aed45	nfsd: Pass filehandle to nfs4_preprocess_stateid_op() This will be needed so COPY can look up the saved_fh in addition to the current_fh. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-07 23:11:52 -05:00
J. Bruce Fields	414ca017a5	nfsd4: fix gss-proxy 4.1 mounts for some AD principals The principal name on a gss cred is used to setup the NFSv4.0 callback, which has to have a client principal name to authenticate to. That code wants the name to be in the form servicetype@hostname. rpc.svcgssd passes down such names (and passes down no principal name at all in the case the principal isn't a service principal). gss-proxy always passes down the principal name, and passes it down in the form servicetype/hostname@REALM. So we've been munging the name gss-proxy passes down into the format the NFSv4.0 callback code expects, or throwing away the name if we can't. Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've also been using the principal name to verify that certain operations are done as the same principal as was used on the original EXCHANGE_ID call. For that application, the original name passed down by gss-proxy is also useful. Lack of that name in some cases was causing some kerberized NFSv4.1 mount failures in an Active Directory environment. This fix only works in the gss-proxy case. The fix for legacy rpc.svcgssd would be more involved, and rpc.svcgssd already has other problems in the AD case. Reported-and-tested-by: James Ralston <ralston@pobox.com> Acked-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-24 11:36:31 -07:00
J. Bruce Fields	920dd9bb7d	nfsd: fix unlikely NULL deref in mach_creds_match We really shouldn't allow a client to be created with cl_mach_cred set unless it also has a principal name. This also allows us to fail such cases immediately on EXCHANGE_ID as opposed to waiting and incorrectly returning WRONG_CRED on the following CREATE_SESSION. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-24 10:39:18 -07:00
J. Bruce Fields	50c7b948ad	nfsd: minor consolidation of mach_cred handling code Minor cleanup, no change in functionality. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-24 10:39:18 -07:00
J. Bruce Fields	5004385932	nfsd: helper for dup of possibly NULL string Technically the initialization in the NULL case isn't even needed as the only caller already has target zeroed out, but it seems safer to keep copy_cred generic. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-24 10:39:17 -07:00
Dan Carpenter	d3f03403a8	nfsd: fix a warning message The WARN() macro takes a condition and a format string. The condition was accidentally left out here so it just prints the function name instead of the message. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-23 12:15:31 -07:00
Julia Lawall	c4cb897462	nfsd: constify nfsd4_callback_ops structure The nfsd4_callback_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-23 12:15:31 -07:00
Julia Lawall	7c582e4faa	nfsd: recover: constify nfsd4_client_tracking_ops structures The nfsd4_client_tracking_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-23 12:15:30 -07:00
Andrew Elble	7fc0564e3a	nfsd: fix race with open / open upgrade stateids We observed multiple open stateids on the server for files that seemingly should have been closed. nfsd4_process_open2() tests for the existence of a preexisting stateid. If one is not found, the locks are dropped and a new one is created. The problem is that init_open_stateid(), which is also responsible for hashing the newly initialized stateid, doesn't check to see if another open has raced in and created a matching stateid. This fix is to enable init_open_stateid() to return the matching stateid and have nfsd4_process_open2() swap to that stateid and switch to the open upgrade path. In testing this patch, coverage to the newly created path indicates that the race was indeed happening. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-10 09:29:45 -05:00
Andrew Elble	34ed9872e7	nfsd: eliminate sending duplicate and repeated delegations We've observed the nfsd server in a state where there are multiple delegations on the same nfs4_file for the same client. The nfs client does attempt to DELEGRETURN these when they are presented to it - but apparently under some (unknown) circumstances the client does not manage to return all of them. This leads to the eventual attempt to CB_RECALL more than one delegation with the same nfs filehandle to the same client. The first recall will succeed, but the next recall will fail with NFS4ERR_BADHANDLE. This leads to the server having delegations on cl_revoked that the client has no way to FREE or DELEGRETURN, with resulting inability to recover. The state manager on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the state manager on the client will be looping unable to satisfy the server. List discussion also reports a race between OPEN and DELEGRETURN that will be avoided by only sending the delegation once to the client. This is also logically in accordance with RFC5561 9.1.1 and 10.2. So, let's: 1.) Not hand out duplicate delegations. 2.) Only send them to the client once. RFC 5561: 9.1.1: "Delegations and layouts, on the other hand, are not associated with a specific owner but are associated with the client as a whole (identified by a client ID)." 10.2: "...the stateid for a delegation is associated with a client ID and may be used on behalf of all the open-owners for the given client. A delegation is made to the client as a whole and not to any specific process or thread of control within it." Reported-by: Eric Meddaugh <etmsys@rit.edu> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-10 09:29:44 -05:00
Jeff Layton	3e80dbcda7	nfsd: remove recurring workqueue job to clean DRC We have a shrinker, we clean out the cache when nfsd is shut down, and prune the chains on each request. A recurring workqueue job seems like unnecessary overhead. Just remove it. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-11-10 09:25:51 -05:00
Jeff Layton	9767feb2c6	nfsd: ensure that seqid morphing operations are atomic wrt to copies Bruce points out that the increment of the seqid in stateids is not serialized in any way, so it's possible for racing calls to bump it twice and end up sending the same stateid. While we don't have any reports of this problem it _is_ theoretically possible, and could lead to spurious state recovery by the client. In the current code, update_stateid is always followed by a memcpy of that stateid, so we can combine the two operations. For better atomicity, we add a spinlock to the nfs4_stid and hold that when bumping the seqid and copying the stateid. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-23 15:57:33 -04:00
Jeff Layton	cc8a55320b	nfsd: serialize layout stateid morphing operations In order to allow the client to make a sane determination of what happened with racing LAYOUTGET/LAYOUTRETURN/CB_LAYOUTRECALL calls, we must ensure that the seqids return accurately represent the order of operations. The simplest way to do that is to ensure that operations on a single stateid are serialized. This patch adds a mutex to the layout stateid, and locks it when checking the layout stateid's seqid. The mutex is held over the entire operation and released after the seqid is bumped. Note that in the case of CB_LAYOUTRECALL we must move the increment of the seqid and setting into a new cb "prepare" operation. The lease infrastructure will call the lm_break callback with a spinlock held, so and we can't take the mutex in that codepath. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-23 15:57:32 -04:00
J. Bruce Fields	4eaea13425	nfsd: improve client_has_state to check for unused openowners At least in the v4.0 case openowners can hang around for a while after last close, but they shouldn't really block (for example), a new mount with a different principal. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-23 15:57:31 -04:00
J. Bruce Fields	2b63482185	nfsd: fix clid_inuse on mount with security change In bakeathon testing Solaris client was getting CLID_INUSE error when doing a krb5 mount soon after an auth_sys mount, or vice versa. That's not really necessary since in this case the old client doesn't have any state any more: http://tools.ietf.org/html/rfc7530#page-103 "when the server gets a SETCLIENTID for a client ID that currently has no state, or it has state but the lease has expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST allow the SETCLIENTID and confirm the new client ID if followed by the appropriate SETCLIENTID_CONFIRM." This doesn't fix the problem completely since our client_has_state() check counts openowners left around to handle close replays, which we should probably just remove in this case. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-23 15:57:30 -04:00
Jeff Layton	825213e59e	nfsd: move include of state.h from trace.c to trace.h Any file which includes trace.h will need to include state.h, even if they aren't using any state tracepoints. Ensure that we include any headers that might be needed in trace.h instead of relying on the *.c files to have the right ones. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-23 15:57:29 -04:00
Jeff Layton	aaf91ec148	nfsd: switch unsigned char flags in svc_fh to bools ...just for clarity. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-12 17:31:04 -04:00
Jeff Layton	fcaba026a5	nfsd: move svc_fh->fh_maxsize to just after fh_handle This moves the hole in the struct down below the flags fields, which allows us to potentially add a new flag without growing the struct. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-12 17:31:04 -04:00
Julia Lawall	e79017ddce	nfsd: drop null test before destroy functions Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) { \(kmem_cache_destroy\\|mempool_destroy\\|dma_pool_destroy\)(x); x = NULL; -} // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-12 17:31:04 -04:00
Jeff Layton	35a92fe877	nfsd: serialize state seqid morphing operations Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were running in parallel. The server would receive the OPEN_DOWNGRADE first and check its seqid, but then an OPEN would race in and bump it. The OPEN_DOWNGRADE would then complete and bump the seqid again. The result was that the OPEN_DOWNGRADE would be applied after the OPEN, even though it should have been rejected since the seqid changed. The only recourse we have here I think is to serialize operations that bump the seqid in a stateid, particularly when we're given a seqid in the call. To address this, we add a new rw_semaphore to the nfs4_ol_stateid struct. We do a down_write prior to checking the seqid after looking up the stateid to ensure that nothing else is going to bump it while we're operating on it. In the case of OPEN, we do a down_read, as the call doesn't contain a seqid. Those can run in parallel -- we just need to serialize them when there is a concurrent OPEN_DOWNGRADE or CLOSE. LOCK and LOCKU however always take the write lock as there is no opportunity for parallelizing those. Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-12 17:31:03 -04:00
Christoph Hellwig	8c3ad9cb73	nfsd/blocklayout: accept any minlength Recent Linux clients have started to send GETLAYOUT requests with minlength less than blocksize. Servers aren't really allowed to impose this kind of restriction on layouts; see RFC 5661 section 18.43.3 for details. This has been observed to cause indefinite hangs on fsx runs on some clients. Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-10-09 16:11:40 -04:00
Linus Torvalds	4e4adb2f46	NFS client updates for Linux 4.3 Highlights include: Stable patches: - Fix atomicity of pNFS commit list updates - Fix NFSv4 handling of open(O_CREAT\|O_EXCL\|O_RDONLY) - nfs_set_pgio_error sometimes misses errors - Fix a thinko in xs_connect() - Fix borkage in _same_data_server_addrs_locked() - Fix a NULL pointer dereference of migration recovery ops for v4.2 client - Don't let the ctime override attribute barriers. - Revert "NFSv4: Remove incorrect check in can_open_delegated()" - Ensure flexfiles pNFS driver updates the inode after write finishes - flexfiles must not pollute the attribute cache with attrbutes from the DS - Fix a protocol error in layoutreturn - Fix a protocol issue with NFSv4.1 CLOSE stateids Bugfixes + cleanups - pNFS blocks bugfixes from Christoph - Various cleanups from Anna - More fixes for delegation corner cases - Don't fsync twice for O_SYNC/IS_SYNC files - Fix pNFS and flexfiles layoutstats bugs - pnfs/flexfiles: avoid duplicate tracking of mirror data - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues. - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS Features: - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong - More RDMA client transport improvements from Chuck - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs from the SUNRPC, Lustre and core infiniband tree. - Optimise away the close-to-open getattr if there is no cached data -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV7chgAAoJEGcL54qWCgDyqJQP/3kto9VXnXcatC382jF9Pfj5 F55XeSnviOXH7CyiKA4nSBhnxg/sLuWOTpbkVI/4Y+VyWhLby9h+mtcKURHOlBnj d5BFoPwaBVDnUiKlHFQDkRjIyxjj2Sb6/uEb2V/u3v+3znR5AZZ4lzFx4cD85oaz mcru7yGiSxaQCIH6lHExcCEKXaDP5YdvS9YFsyQfv2976JSaQHM9ZG04E0v6MzTo E5wwC4CLMKmhuX9kmQMj85jzs1ASAKZ3N7b4cApTIo6F8DCDH0vKQphq/nEQC497 ECjEs5/fpxtNJUpSBu0gT7G4LCiW3PzE7pHa+8bhbaAn9OzxIR5+qWujKsfGYQhO Oomp3K9zO6omshAc5w4MkknPpbImjoZjGAj/q/6DbtrDpnD7DzOTirwYY2yX0CA8 qcL81uJUb8+j4jJj4RTO+lTUBItrM1XTqTSd/3eSMr5DDRVZj+ERZxh17TaxRBZL YrbrLHxCHrcbdSbPlovyvY+BwjJUUFJRcOxGQXLmNYR9u92fF59rb53bzVyzcRRO wBozzrNRCFL+fPgfNPLEapIb6VtExdM3rl2HYsJGckHj4DPQdnoB3ytIT9iEFZEN +/rE14XEZte7kuH3OP4el2UsP/hVsm7A49mtwrkdbd7rxMWD6XfQUp8DAggWUEtI 1H6T7RC1Y6wsu0X1fnVz =knJA -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix atomicity of pNFS commit list updates - Fix NFSv4 handling of open(O_CREAT\|O_EXCL\|O_RDONLY) - nfs_set_pgio_error sometimes misses errors - Fix a thinko in xs_connect() - Fix borkage in _same_data_server_addrs_locked() - Fix a NULL pointer dereference of migration recovery ops for v4.2 client - Don't let the ctime override attribute barriers. - Revert "NFSv4: Remove incorrect check in can_open_delegated()" - Ensure flexfiles pNFS driver updates the inode after write finishes - flexfiles must not pollute the attribute cache with attrbutes from the DS - Fix a protocol error in layoutreturn - Fix a protocol issue with NFSv4.1 CLOSE stateids Bugfixes + cleanups - pNFS blocks bugfixes from Christoph - Various cleanups from Anna - More fixes for delegation corner cases - Don't fsync twice for O_SYNC/IS_SYNC files - Fix pNFS and flexfiles layoutstats bugs - pnfs/flexfiles: avoid duplicate tracking of mirror data - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS Features: - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong - More RDMA client transport improvements from Chuck - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs from the SUNRPC, Lustre and core infiniband tree. - Optimise away the close-to-open getattr if there is no cached data" * tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits) NFSv4: Respect the server imposed limit on how many changes we may cache NFSv4: Express delegation limit in units of pages Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files" NFS: Optimise away the close-to-open getattr if there is no cached data NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error() nfs: Remove unneeded checking of the return value from scnprintf nfs: Fix truncated client owner id without proto type NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid() NFSv4.1/flexfiles: Fix freeing of mirrors NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file NFSv4.1/pnfs: Handle LAYOUTGET return values correctly NFSv4.1/pnfs: Don't ask for a read layout for an empty file. NFSv4.1: Fix a protocol issue with CLOSE stateids NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors SUNRPC: Prevent SYN+SYNACK+RST storms SUNRPC: xs_reset_transport must mark the connection as disconnected NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload ...	2015-09-07 14:02:24 -07:00
Andrew Elble	a457974f1b	nfsd: deal with DELEGRETURN racing with CB_RECALL We have observed the server sending recalls for delegation stateids that have already been successfully returned. Change nfsd4_cb_recall_done() to return success if the client has returned the delegation. While this does not completely eliminate the sending of recalls for delegations that have already been returned, this does prevent unnecessarily declaring the callback path to be down. Reported-by: Eric Meddaugh <etmsys@rit.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-09-02 10:05:28 -04:00
J. Bruce Fields	f984a7ce58	nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case Somebody with a Solaris client was hitting this case. We haven't figured out why yet, and don't have a reproducer. Meanwhile Frank noticed that RFC 7530 actually recommends CLID_INUSE for this case. Unlikely to help the original reporter, but may as well fix it. Reported-by: Frank Filz <ffilzlnx@mindspring.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-09-01 13:53:40 -04:00
Jeff Layton	3fcbbd244e	nfsd: ensure that delegation stateid hash references are only put once It's possible that a DELEGRETURN could race with (e.g.) client expiry, in which case we could end up putting the delegation hash reference more than once. Have unhash_delegation_locked return a bool that indicates whether it was already unhashed. In the case of destroy_delegation we only conditionally put the hash reference if that returns true. The other callers of unhash_delegation_locked call it while walking list_heads that shouldn't yet be detached. If we find that it doesn't return true in those cases, then throw a WARN_ON as that indicates that we have a partially hashed delegation, and that something is likely very wrong. Tested-by: Andrew W Elble <aweits@rit.edu> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-08-31 16:32:16 -04:00
Jeff Layton	e85687393f	nfsd: ensure that the ol stateid hash reference is only put once When an open or lock stateid is hashed, we take an extra reference to it. When we unhash it, we drop that reference. The code however does not properly account for the case where we have two callers concurrently trying to unhash the stateid. This can lead to list corruption and the hash reference being put more than once. Fix this by having unhash_ol_stateid use list_del_init on the st_perfile list_head, and then testing to see if that list_head is empty before releasing the hash reference. This means that some of the unhashing wrappers now become bool return functions so we can test to see whether the stateid was unhashed before we put the reference. Reported-by: Andrew W Elble <aweits@rit.edu> Tested-by: Andrew W Elble <aweits@rit.edu> Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-08-31 16:32:15 -04:00
Jeff Layton	51a5456859	nfsd: allow more than one laundry job to run at a time We can potentially have several nfs4_laundromat jobs running if there are multiple namespaces running nfsd on the box. Those are effectively separated from one another though, so I don't see any reason to serialize them. Also, create_singlethread_workqueue automatically adds the WQ_MEM_RECLAIM flag. Since we run this job on a timer, it's not really involved in any reclaim paths. I see no need for a rescuer thread. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-08-31 16:32:14 -04:00
Paul Gortmaker	46cc8ba304	nfsd: don't WARN/backtrace for invalid container deployment. These messages, combined with the backtrace they trigger, makes it seem like a serious problem, though a quick search shows distros marking it as a "won't fix" non-issue when the problem is reported by users. The backtrace is overkill, and only really manages to show that if you follow the code path, you can't really avoid it with bootargs or configuration settings in the container. Given that, lets tone it down a bit and get rid of the WARN severity, and the associated backtrace, so people aren't needlessly alarmed. Also, lets drop the split printk line, since they are grep unfriendly. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-08-31 16:32:08 -04:00

1 2 3 4 5 ...

2422 Commits