4493 Commits

Author SHA1 Message Date
Linus Torvalds
1c88e19b0f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "The rest of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits)
  mm, compaction: simplify contended compaction handling
  mm, compaction: introduce direct compaction priority
  mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
  mm, page_alloc: make THP-specific decisions more generic
  mm, page_alloc: restructure direct compaction handling in slowpath
  mm, page_alloc: don't retry initial attempt in slowpath
  mm, page_alloc: set alloc_flags only once in slowpath
  lib/stackdepot.c: use __GFP_NOWARN for stack allocations
  mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
  mm, kasan: account for object redzone in SLUB's nearest_obj()
  mm: fix use-after-free if memory allocation failed in vma_adjust()
  zsmalloc: Delete an unnecessary check before the function call "iput"
  mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
  mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
  mm: optimize copy_page_to/from_iter_iovec
  mm: add cond_resched() to generic_swapfile_activate()
  Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
  mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
  mm: hwpoison: remove incorrect comments
  make __section_nr() more efficient
  ...
2016-07-28 16:36:48 -07:00
Mel Gorman
11fb998986 mm: move most file-based accounting to the node
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone.  This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted.  Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.

[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
  Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Linus Torvalds
6784725ab0 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted cleanups and fixes.

  Probably the most interesting part long-term is ->d_init() - that will
  have a bunch of followups in (at least) ceph and lustre, but we'll
  need to sort the barrier-related rules before it can get used for
  really non-trivial stuff.

  Another fun thing is the merge of ->d_iput() callers (dentry_iput()
  and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
  except the one in __d_lookup_lru())"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  fs/dcache.c: avoid soft-lockup in dput()
  vfs: new d_init method
  vfs: Update lookup_dcache() comment
  bdev: get rid of ->bd_inodes
  Remove last traces of ->sync_page
  new helper: d_same_name()
  dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
  vfs: clean up documentation
  vfs: document ->d_real()
  vfs: merge .d_select_inode() into .d_real()
  unify dentry_iput() and dentry_unlink_inode()
  binfmt_misc: ->s_root is not going anywhere
  drop redundant ->owner initializations
  ufs: get rid of redundant checks
  orangefs: constify inode_operations
  missed comment updates from ->direct_IO() prototype change
  file_inode(f)->i_mapping is f->f_mapping
  trim fsnotify hooks a bit
  9p: new helper - v9fs_parent_fid()
  debugfs: ->d_parent is never NULL or negative
  ...
2016-07-28 12:59:05 -07:00
Linus Torvalds
554828ee0d Merge branch 'salted-string-hash'
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
  fs/dcache.c: Save one 32-bit multiply in dcache lookup
  vfs: make the string hashes salt the hash
2016-07-28 12:26:31 -07:00
Benjamin Coddington
944171cbf4 pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding
A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes,
and in that case NFS_INO_INVALID_ATTR is never set on the second pass
through nfs_update_inode().  The existing check to skip the clearing of
NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this
case (see commit 10b7e9ad4488: "pNFS: Don't mark the inode as revalidated
if a LAYOUTCOMMIT is outstanding").  We know that if a LAYOUTCOMMIT is
outstanding then attributes will need upating, so always set
NFS_INO_INVALID_ATTR.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-28 14:49:08 -04:00
Linus Torvalds
d05d7f4079 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
2016-07-26 15:03:07 -07:00
Trond Myklebust
698c937b0d NFSv4: Clean up lookup of SECINFO_NO_NAME
Use the minor version ops cached in struct nfs_client instead of looking
them up again.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-26 10:59:23 -04:00
Trond Myklebust
6fdf339b0c NFSv4.2: Fix warning "variable ‘stateids’ set but not used"
Replace it with a test for whether or not the sent a stateid in violation
of what we asked for.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 17:36:06 -04:00
Trond Myklebust
139978239b NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’"
Make it static

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 17:35:56 -04:00
Trond Myklebust
1592c4d62a Merge branch 'nfs-rdma' 2016-07-24 17:09:02 -04:00
Trond Myklebust
668f455dac Merge branch 'pnfs' 2016-07-24 17:08:59 -04:00
Trond Myklebust
362745268c Merge branch 'writeback' 2016-07-24 17:08:31 -04:00
Trond Myklebust
01d7b29f0e pNFS: Remove redundant smp_mb() from pnfs_init_lseg()
It's not visible yet, and won't be until after we grab the inode->i_lock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:43 -04:00
Trond Myklebust
119cef97a4 pNFS: Cleanup - do layout segment initialisation in one place
...instead of splitting the initialisation over init_lseg() and
pnfs_layout_process().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:42 -04:00
Trond Myklebust
28c1acffea pNFS: Remove redundant stateid invalidation
The layout stateid will be invalidated once it holds no more layout
segments anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:42 -04:00
Trond Myklebust
f71dfe8fc9 pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()
That's already being taken care of in pnfs_layout_remove_lseg().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:41 -04:00
Trond Myklebust
d9b61708fe pNFS: Clear the layout metadata if the server changed the layout stateid
If the server changed the layout stateid's "other" field, then
we should treat the old layout as being completely gone. In that
case, we want to clear the metadata such as scheduled layoutreturns.

Do this by calling pnfs_mark_layout_stateid_invalid().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:41 -04:00
Trond Myklebust
5f46be049b pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()
Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid
invalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:40 -04:00
Trond Myklebust
e036f46453 NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id
When determining which layout segments to return, we do want
pnfs_mark_matching_lsegs_return to check that they match the layout
sequence id. This ensures that we don't waste time if the server
is replaying a layout recall that has already been satisfied.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:40 -04:00
Trond Myklebust
2d6cf5ab0b pNFS: Do not set plh_return_seq for non-callback related layoutreturns
In cases where we need to send a layoutreturn in order to propagate
an error, we should not tie that to a specific layout stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:39 -04:00
Trond Myklebust
e5fd1904b8 pNFS: Ensure layoutreturn acts as a completion for layout callbacks
When we return NFS_OK to the CB_LAYOUTRECALL, we are required to
send a layoutreturn that "completes" that layout recall request, using
the correct stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:39 -04:00
Trond Myklebust
793b7fe558 pNFS: Fix CB_LAYOUTRECALL stateid verification
We want to evaluate in this order:

If the client holds no layout for this inode, then return
NFS4ERR_NOMATCHING_LAYOUT; it probably forgot the layout.

If the client finds the inode among the list of layouts, but the corresponding
stateid has not yet been initialised, then return NFS4ERR_DELAY to ask the
server to retry once the outstanding LAYOUTGET is complete.

If the current layout stateid's "other" field does not match the recalled
stateid, return NFS4ERR_BAD_STATEID.

If already processing a layout recall with a newer stateid, return
NFS4ERR_OLD_STATEID. This can only happens for servers that are
non-compliant with the NFSv4.1 protocol.

If already processing a layout recall with an older stateid, return
NFS4ERR_DELAY to ask the server to retry once the outstanding
LAYOUTRETURN is complete. Again, this is technically incompliant with
the NFSv4.1 protocol.

If the current layout sequence id is newer than the recalled stateid's
sequence id, return NFS4ERR_OLD_STATEID. This too implies protocol
non-compliance.

If the current layout sequence id is older than the recalled stateid's
sequence id+1, return NFS4ERR_DELAY.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:38 -04:00
Trond Myklebust
ecebb80bf3 pNFS: Always update the layout barrier seqid on LAYOUTGET
Currently, pnfs_set_layout_stateid() will update the layout sequence
id barrier only if the stateid itself is newer than the current
layout stateid. However in a situation where multiple LAYOUTGET calls
and a LAYOUTRETURN raced, it is entirely possible for one of the
LAYOUTGET to set the current stateid to something newer than the
LAYOUTRETURN that needs to set the barrier.

The fix is to allow the "update_barrier" flag to force a check as to
whether or not the barrier needs to be updated.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:38 -04:00
Trond Myklebust
13bede18de pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set
If the layout stateid is invalid, then pnfs_set_layout_stateid() must
always initialise it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:25 -04:00
Trond Myklebust
8e0acf9046 pNFS: Clear the layout return tracking on layout reinitialisation
Ensure that we don't carry over layoutreturn info from a previous
incarnation of this layout.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 12:51:49 -04:00
Trond Myklebust
45fcc7bca7 pNFS: LAYOUTRETURN should only update the stateid if the layout is valid
If the layout was completely returned, then ignore the returned layout
stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 12:51:49 -04:00
Trond Myklebust
dc05973b28 Merge commit 'e7bdea7750eb'
Needed in order to work on top of pNFS changes in Linus' upstream kernel.
2016-07-24 12:51:10 -04:00
Benjamin Coddington
149a4fddd0 nfs: don't create zero-length requests
NFS doesn't expect requests with wb_bytes set to zero and may make
unexpected decisions about how to handle that request at the page IO layer.
Skip request creation if we won't have any wb_bytes in the request.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-22 15:15:16 -04:00
Artem Savkov
297fae4d0b Fix NULL pointer dereference in bl_free_device().
When bl_parse_deviceid() fails in bl_alloc_deviceid_node() on
blkdev_get_by_*() step we get an pnfs_block_dev struct that is
uninitialized except for bdev field which is set to whatever error
blkdev_get_by_*() returns.  bl_free_device() then tries to call
blkdev_put() if bdev is not 0 resulting in a wrong pointer dereference.

Fixing this by setting bdev in struct pnfs_block_dev only if we didn't
get an error from blkdev_get_by_*().

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-22 15:14:21 -04:00
Trond Myklebust
e033fb51eb pNFS/files: filelayout_write_done_cb must call nfs_writeback_update_inode()
All write callbacks are required to call nfs_writeback_update_inode() upon
success to ensure that file size changes are recorded, and the attribute
cache is invalidated.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-21 09:46:42 -04:00
Al Viro
beffb8feb6 qstr: constify instances in nfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-20 23:30:06 -04:00
Tigran Mkrtchyan
b224f7cb63 nfs4: flexfiles: respect noresvport when establishing connections to DSes
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:25 -04:00
Tigran Mkrtchyan
3fc75f1208 nfs4: clnt: respect noresvport when establishing connections to DSes
result:

$ mount -o vers=4.1 dcache-lab007:/ /pnfs
$ cp /etc/profile /pnfs
tcp        0      0 131.169.185.68:1005     131.169.191.141:32049   ESTABLISHED
tcp        0      0 131.169.185.68:751      131.169.191.144:2049    ESTABLISHED
$

$ mount -o vers=4.1,noresvport dcache-lab007:/ /pnfs
$ cp /etc/profile /pnfs
tcp        0      0 131.169.185.68:34894    131.169.191.141:32049   ESTABLISHED
tcp        0      0 131.169.185.68:35722    131.169.191.144:2049    ESTABLISHED
$

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:25 -04:00
Benjamin Coddington
d9c0ce0e45 pnfs/blocklayout: put deviceid node after releasing bl_ext_lock
The last put of deviceid nodes for SCSI layouts may sleep, so we shouldn't
hold any spinlocks.  Make sure we put them outside the bl_ext_lock.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:24 -04:00
Scott Mayhew
ce52914eb7 sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags
A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.

This can be reproduced as follows:

1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server.  Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys

2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m

3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set.  Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

5. Now do some I/O to the sec=sys mount.  This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client.  Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect.  Grabbing a new kerberos ticket at this
point will have no effect either.

Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:24 -04:00
Steve Dickson
e68fd7c807 mount: use sec= that was specified on the command line
When older servers return RPC_AUTH_NULL, it means the
rpc creds will be ignored. In that case use the sec=
that was specified instead of setting sec=null

Fixes Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1112983
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-19 16:23:23 -04:00
Trond Myklebust
f7db0b2838 pNFS: Fix LAYOUTGET handling of NFS4ERR_BAD_STATEID and NFS4ERR_EXPIRED
We want to recover the open stateid if there is no layout stateid
and/or the stateid argument matches an open stateid.
Otherwise throw out the existing layout and recover from scratch, as
the layout stateid is bad.

Fixes: 183d9e7b112aa ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19 16:23:23 -04:00
Trond Myklebust
66b53f3258 pNFS: Handle NFS4ERR_RECALLCONFLICT correctly in LAYOUTGET
Instead of giving up altogether and falling back to doing I/O
through the MDS, which may make the situation worse, wait for
2 lease periods for the callback to resolve itself, and then
try destroying the existing layout.

Only if this was an attempt at getting a first layout, do we
give up altogether, as the server is clearly crazy.

Fixes: 183d9e7b112aa ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19 16:23:22 -04:00
Trond Myklebust
e85d7ee420 pNFS: Separate handling of NFS4ERR_LAYOUTTRYLATER and RECALLCONFLICT
They are not the same error, and need to be handled differently.

Fixes: 183d9e7b112aa ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19 16:23:22 -04:00
Trond Myklebust
56b38a1f7c pNFS: Fix post-layoutget error handling in pnfs_update_layout()
The non-retry error path is currently broken and ends up releasing the
reference to the layout twice. It also can end up clearing the
NFS_LAYOUT_FIRST_LAYOUTGET flag twice, causing a race.

In addition, the retry path will fail to decrement the plh_outstanding
counter.

Fixes: 183d9e7b112aa ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-07-19 16:22:47 -04:00
Trond Myklebust
10b7e9ad44 pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding
We know that the attributes will need updating if there is still a
LAYOUTCOMMIT outstanding.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-18 00:51:01 -04:00
Daniel Bristot de Oliveira
752d596b39 tracing: Use __get_str() when manipulating strings
Use __get_str(str) rather than __get_dynamic_array(str) when
deadling with strings.

It is just a code cleanup, no changes on tracepoint ABI.

Link: http://lkml.kernel.org/r/ea260df91817411cca2a1f3db2abd88860094788.1467407618.git.bristot@redhat.com

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-nfs@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-15 15:52:20 -04:00
Kinglong Mee
c77efc1e78 nfs/blocklayout: Check max uuids and devices before decoding
Avoid nfs return uuids/devices larger than maximum.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:51:11 -04:00
Kinglong Mee
ecc2b88c4a nfs/blocklayout: Make sure calculate signature length aligned
Avoid a bad nfs server return an unaligned length of signature.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:51:11 -04:00
Christoph Hellwig
11487ddbdb nfs/blocklayout: support RH/Fedora dm-mpath device nodes
Instead of reusing the wwn-* names for multipath devices nodes RHEL and
Fedora introduce new dm-mpath-uuid-* nodes with a slightly different
naming scheme.  Try these names first to ensure we always get a
multipath-capable device if it exists.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Christoph Hellwig
d702d41ed4 nfs/blocklayout: refactor open-by-wwn
The current code works with the standard udev/systemd names, but we'll have
to add another method in the next patch.  Refactor it into a separate helper
to make room for the new variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Christoph Hellwig
0173ca0544 nfs/blocklayout: use proper fmode for opening block devices
This was fixed for the original block layout code a while ago, but also
needs to be fixed for the SCSI layout path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Trond Myklebust
8b7d9d09b2 NFSv4: Revert "Truncating file opens should also sync O_DIRECT writes"
We're not holding any locks, so both nfs_wb_all() and inode_dio_wait()
are unenforcible and have livelock potential. Just limit ourselves to
flushing out the data.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-14 12:42:40 -04:00
Chuck Lever
a4e187d83d NFS: Don't drop CB requests with invalid principals
Before commit 778be232a207 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.

Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11 15:50:43 -04:00
Trond Myklebust
9a773e7c8d NFS nfs_vm_page_mkwrite: Don't freeze me, Bro...
Prevent filesystem freezes while handling the write page fault.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05 19:11:08 -04:00