39556 Commits

Author SHA1 Message Date
Dave Chinner
bad962662d Merge branch 'xfs-misc-fixes-for-3.20-4' into for-next 2015-02-10 09:24:25 +11:00
Dave Chinner
e9892d3cc8 xfs: only trace buffer items if they exist
The commit 2d3d0c5 ("xfs: lobotomise xfs_trans_read_buf_map()") left
a landmine in the tracing code: trace_xfs_trans_buf_read() is now
call on all buffers that are read through this interface rather than
just buffers in transactions. For buffers outside transaction
context, bp->b_fspriv is null, and so the buf log item tracing
functions cannot be called. This causes a NULL pointer dereference
in the trace_xfs_trans_buf_read() function when tracing is turned
on.

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-10 09:23:40 +11:00
J. Bruce Fields
c23ae60178 nfsd: default NFSv4.2 to on
The code seems to work.  The protocol looks stable.  The kernel's
version defaults can be overridden by rpc.nfsd arguments.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-02-09 14:58:50 -05:00
Linus Torvalds
cdecbb336e Merge git://git.kvack.org/~bcrl/aio-fixes
Pull aio nested sleep annotation from Ben LaHaise,

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: annotate aio_read_event_ring for sleep patterns
2015-02-08 18:27:58 -08:00
Linus Torvalds
bdfeb5a104 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "Forrest Liu tracked down a missing blk_finish_plug in the btrfs
  logging code.  This isn't a new bug, and it's hard to hit.  But, it's
  safe enough for inclusion now, and in my for-linus branch"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: add missing blk_finish_plug in btrfs_sync_log()
2015-02-07 11:04:48 -08:00
Trond Myklebust
4ef2e4f84c NFSv4.1: Fix pnfs_put_lseg races
pnfs_layoutreturn_free_lseg_async() can also race with inode put in
the general case. We can now fix this, and also simplify the code.

Cc: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-05 23:44:18 -05:00
Trond Myklebust
e4af440aaf NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS
In we want to be able to call pnfs_send_layoutreturn() from within the
writeback path, we really want it to use GFP_NOFS in order to prevent
recursion.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-05 22:16:50 -05:00
Trond Myklebust
5a0ec8acb9 NFSv4.1: Pin the inode and super block in asynchronous layoutreturns
If we're sending an asynchronous layoutreturn, then we need to ensure
that the inode and the super block remain pinned.

Cc: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
2015-02-05 22:16:45 -05:00
Trond Myklebust
472e259449 NFSv4.1: Pin the inode and super block in asynchronous layoutcommit
If we're sending an asynchronous layoutcommit, then we need to ensure
that the inode and the super block remain pinned.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
2015-02-05 22:16:33 -05:00
Trond Myklebust
ea7c38fef0 NFSv4: Ensure we reference the inode for return-on-close in delegreturn
If we have to do a return-on-close in the delegreturn code, then
we must ensure that the inode and super block remain referenced.

Cc: Peng Tao <tao.peng@primarydata.com>
Cc: stable@vger.kernel.org # 3.17.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>
2015-02-05 21:31:06 -05:00
Eric Sandeen
01f9882eac xfs: report proper f_files in statfs if we overshoot imaxpct
Normally, a statfs syscall reports m_maxicount as f_files
(total file nodes in file system) because it is supposed
to be the upper limit for dynamically-allocated inodes.

It's possible, however, to overshoot imaxpct / m_maxicount.
If this happens, we should report the actual number of allocated
inodes, which is contained in sb_icount.  Add one more adjustment
to the statfs code to make this happen.

Reported-by: Alexander Tsvetkov <alexander.tsvetkov@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-06 09:53:02 +11:00
David S. Miller
6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
Ryusuke Konishi
7ef3ff2fea nilfs2: fix deadlock of segment constructor over I_SYNC flag
Nilfs2 eventually hangs in a stress test with fsstress program.  This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():

  nilfs_segctor_thread()
    nilfs_segctor_thread_construct()
      nilfs_segctor_unlock()
        nilfs_dispose_list()
          iput()
            iput_final()
              evict()
                inode_wait_for_writeback()  * wait for I_SYNC flag

  writeback_sb_inodes()
     * set I_SYNC flag on inode->i_state
    __writeback_single_inode()
      do_writepages()
        nilfs_writepages()
          nilfs_construct_dsync_segment()
            nilfs_segctor_sync()
               * wait for completion of segment constructor
    inode_sync_complete()
       * clear I_SYNC flag after __writeback_single_inode() completed

writeback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode->i_state.  do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion.  On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().

Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode->i_nlink has a
non-zero count.  We can prevent evict() from being called in iput() by
implementing sop->drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation.  So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-05 13:35:29 -08:00
Fabian Frederick
6981498d79 udf: remove bool assignment to 0/1
Fix the following coccinelle warnings:

fs/udf/inode.c:753:2-13: WARNING: Assignment of bool to 0/1
fs/udf/inode.c:795:2-13: WARNING: Assignment of bool to 0/1

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-02-05 16:34:25 +01:00
Fabian Frederick
2b8f942111 udf: use bool for done
variable 'done' is only used for true/false in loop.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-02-05 16:34:22 +01:00
Christoph Hellwig
8650b8a058 nfsd: pNFS block layout driver
Add a small shim between core nfsd and filesystems to translate the
somewhat cumbersome pNFS data structures and semantics to something
more palatable for Linux filesystems.

Thanks to Rick McNeal for the old prototype pNFS blocklayout server
code, which gave a lot of inspiration to this version even if no
code is left from it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-05 14:35:18 +01:00
Theodore Ts'o
a26f49926d ext4: add optimization for the lazytime mount option
Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode table block when we need to update some inode in
that inode table block anyway.

Also add some temporary code so that we can set the lazytime mount
option without needing a modified /sbin/mount program which can set
MS_LAZYTIME.  We can eventually make this go away once util-linux has
added support.

Google-Bug-Id: 18297052

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-05 02:45:00 -05:00
Theodore Ts'o
fe032c422c vfs: add find_inode_nowait() function
Add a new function find_inode_nowait() which is an even more general
version of ilookup5_nowait().  It is designed for callers which need
very fine grained control over when the function is allowed to block
or increment the inode's reference count.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-05 02:45:00 -05:00
Theodore Ts'o
0ae45f63d4 vfs: add support for a lazytime mount option
Add a new mount option which enables a new "lazytime" mode.  This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode.  The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.

This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.

For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table.  The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives.  Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).

Google-Bug-Id: 18297052

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-05 02:45:00 -05:00
Forrest Liu
3da5ab5648 Btrfs: add missing blk_finish_plug in btrfs_sync_log()
Add missing blk_finish_plug in btrfs_sync_log()

Signed-off-by: Forrest Liu <forrestl@synology.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2015-02-04 18:02:37 -08:00
kbuild test robot
f92090e95c xfs: xfs_ioctl_setattr_check_projid can be static
fs/xfs/xfs_ioctl.c:1146:1: sparse: symbol 'xfs_ioctl_setattr_check_projid' was not declared. Should it be static?

Also fix xfs_ioctl_setattr_check_extsize at the same time.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-05 11:13:21 +11:00
Christoph Hellwig
f8079b850c xfs: growfs should use synchronous transactions
Growfs updates the secondary superblocks using synchronous unlogged
buffer writes after committing the updates to the primary superblock.

Mark the transaction to the primary superblock as synchronous so that
we guarantee it is committed to disk before we update the secondary
superblocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-05 11:13:21 +11:00
Linus Torvalds
5ee0e96260 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Three small cifs fixes.  One fixes a hang under stress, and the other
  two are security related"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix MUST SecurityFlags filtering
  Complete oplock break jobs before closing file handle
  cifs: use memzero_explicit to clear stack buffer
2015-02-04 10:22:08 -08:00
Trond Myklebust
6ae373394c NFSv4.1: Ask for no delegation on OPEN if using O_DIRECT
If we're using NFSv4.1, then we have the ability to let the server know
whether or not we believe that returning a delegation as part of our OPEN
request would be useful.
The feature needs to be used with care, since the client sending the request
doesn't necessarily know how other clients are using that file, and how
they may be affected by the delegation.
For this reason, our initial use of the feature will be to let the server
know when the client believes that handing out a delegation would not be
useful.
The first application for this function is when opening the file using
O_DIRECT.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-04 10:35:32 -05:00
Oleg Drokin
7456a37d55 GFS2: use __vmalloc GFP_NOFS for fs-related allocations.
leaf_dealloc uses vzalloc as a fallback to kzalloc(GFP_NOFS), so
it clearly does not want any shrinker activity within the fs itself.
convert vzalloc into __vmalloc(GFP_NOFS|__GFP_ZERO) to better achieve
this goal.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2015-02-04 09:58:41 +00:00
Al Viro
2e90b1c45e rxrpc: make the users of rxrpc_kernel_send_data() set kvec-backed msg_iter properly
Use iov_iter_kvec() there, get rid of set_fs() games - now that
rxrpc_send_data() uses iov_iter primitives, it'll handle ITER_KVEC just
fine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:14 -05:00
Dave Chinner
9c9ce763b1 aio: annotate aio_read_event_ring for sleep patterns
Under CONFIG_DEBUG_ATOMIC_SLEEP=y, aio_read_event_ring() will throw
warnings like the following due to being called from wait_event
context:

 WARNING: CPU: 0 PID: 16006 at kernel/sched/core.c:7300 __might_sleep+0x7f/0x90()
 do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810d85a3>] prepare_to_wait_event+0x63/0x110
 Modules linked in:
 CPU: 0 PID: 16006 Comm: aio-dio-fcntl-r Not tainted 3.19.0-rc6-dgc+ #705
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  ffffffff821c0372 ffff88003c117cd8 ffffffff81daf2bd 000000000000d8d8
  ffff88003c117d28 ffff88003c117d18 ffffffff8109beda ffff88003c117cf8
  ffffffff821c115e 0000000000000061 0000000000000000 00007ffffe4aa300
 Call Trace:
  [<ffffffff81daf2bd>] dump_stack+0x4c/0x65
  [<ffffffff8109beda>] warn_slowpath_common+0x8a/0xc0
  [<ffffffff8109bf56>] warn_slowpath_fmt+0x46/0x50
  [<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110
  [<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110
  [<ffffffff810bdfcf>] __might_sleep+0x7f/0x90
  [<ffffffff81db8344>] mutex_lock+0x24/0x45
  [<ffffffff81216b7c>] aio_read_events+0x4c/0x290
  [<ffffffff81216fac>] read_events+0x1ec/0x220
  [<ffffffff810d8650>] ? prepare_to_wait_event+0x110/0x110
  [<ffffffff810fdb10>] ? hrtimer_get_res+0x50/0x50
  [<ffffffff8121899d>] SyS_io_getevents+0x4d/0xb0
  [<ffffffff81dba5a9>] system_call_fastpath+0x12/0x17
 ---[ end trace bde69eaf655a4fea ]---

There is not actually a bug here, so annotate the code to tell the
debug logic that everything is just fine and not to fire a false
positive.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2015-02-03 19:29:05 -05:00
Javi Merino
adf305f778 sysfs: fix warning when creating a sysfs group without attributes
When attempting to create a gropu without attrs, the warning prints the
name of the group.  However, the check for name being a NULL pointer is
wrong: it uses the pointer to the name when it's NULL.  Fix it to use
the name if present, otherwise just put an empty string.

Cc: Bruno Prémont <bonbons@linux-vserver.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-03 15:50:31 -08:00
Trond Myklebust
03a9a42a1a SUNRPC: NULL utsname dereference on NFS umount during namespace cleanup
Fix an Oopsable condition when nsm_mon_unmon is called as part of the
namespace cleanup, which now apparently happens after the utsname
has been freed.

Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home
Reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: stable@vger.kernel.org # 3.18
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-03 16:40:17 -05:00
Trond Myklebust
e2c63e091e Merge branch 'flexfiles'
* flexfiles: (53 commits)
  pnfs: lookup new lseg at lseg boundary
  nfs41: .init_read and .init_write can be called with valid pg_lseg
  pnfs: Update documentation on the Layout Drivers
  pnfs/flexfiles: Add the FlexFile Layout Driver
  nfs: count DIO good bytes correctly with mirroring
  nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
  nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
  nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
  nfs/flexfiles: send layoutreturn before freeing lseg
  nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
  nfs41: allow async version layoutreturn
  nfs41: add range to layoutreturn args
  pnfs: allow LD to ask to resend read through pnfs
  nfs: add nfs_pgio_current_mirror helper
  nfs: only reset desc->pg_mirror_idx when mirroring is supported
  nfs41: add a debug warning if we destroy an unempty layout
  pnfs: fail comparison when bucket verifier not set
  nfs: mirroring support for direct io
  nfs: add mirroring support to pgio layer
  pnfs: pass ds_commit_idx through the commit path
  ...

Conflicts:
	fs/nfs/pnfs.c
	fs/nfs/pnfs.h
2015-02-03 16:01:27 -05:00
Weston Andros Adamson
7c13789e3e pnfs: lookup new lseg at lseg boundary
Before mirroring support was added, the pageio descriptor's pg_lseg was
set to null when an RPC was sent. Because of this, pg_init was called
at lseg boundaries with pg_lseg = NULL, and it could be set to the new
lseg.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:54 -08:00
Peng Tao
cb5d04bc39 nfs41: .init_read and .init_write can be called with valid pg_lseg
With pgio refactoring in v3.15, .init_read and .init_write can be
called with valid pgio->pg_lseg. file layout was fixed at that time
by commit c6194271f (pnfs: filelayout: support non page aligned
layouts). But the generic helper still needs to be fixed.

Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:53 -08:00
Tom Haynes
d67ae825a5 pnfs/flexfiles: Add the FlexFile Layout Driver
The flexfile layout is a new layout that extends the
file layout. It is currently being drafted as a specification at
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Tao Peng <bergwolf@primarydata.com>
2015-02-03 11:06:52 -08:00
Peng Tao
5fadeb47dc nfs: count DIO good bytes correctly with mirroring
When resending to MDS, we might resend multiple mirroring
requests to MDS. As a result, nfs_direct_good_bytes() ends
up counting bytes multiple times, causing application to
get wrong return results in read/write syscalls.

Fix it by tracking start of a dreq and checking the range of
pgio header.

Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:52 -08:00
Peng Tao
aa8a45ee97 nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
Also take care to stop waiting if someone clears retry bit.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:51 -08:00
Peng Tao
012fa16dca nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
To allow pnfs LD to ask direct writes to be resend.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:51 -08:00
Peng Tao
c829013dca nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
Use it to indicate that LD wants to retry layoutget. LD can set
it whenever it wants the common pnfs code to return and retry
pnfs path through a new layout.

The bit gets cleared when client does a new layoutget, when client
closes the file (ROC case), or when kernel needs to evict the inode
(non-ROC case).

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:50 -08:00
Peng Tao
27b6f53987 nfs/flexfiles: send layoutreturn before freeing lseg
Otherwise we'll lose error tracking information when
encoding layoutreturn.

pnfs_put_lseg may be called from rpc callbacks. So we should not
call pnfs_send_layoutreturn directly because it can deadlock in
the rpc layer.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:50 -08:00
Peng Tao
193e3aa2cc nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
When it is set, generic pnfs would try to send layoutreturn right
before last close/delegation_return regard less NFS_LAYOUT_ROC is
set or not. LD can then make sure layoutreturn is always sent
rather than being omitted.

The difference against NFS_LAYOUT_RETURN is that
NFS_LAYOUT_RETURN_BEFORE_CLOSE does not block usage of the layout so
LD can set it and expect generic layer to try pnfs path at the
same time.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:50 -08:00
Peng Tao
6c16605d6e nfs41: allow async version layoutreturn
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:49 -08:00
Peng Tao
15eb67c153 nfs41: add range to layoutreturn args
So that callers can specify which range to return.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:49 -08:00
Peng Tao
ceb11e13df pnfs: allow LD to ask to resend read through pnfs
If current IO cannot be completed due to some transient errors,
LD may want to ask generic layer to resend the request through
pnfs again.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:48 -08:00
Peng Tao
48d635f14a nfs: add nfs_pgio_current_mirror helper
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:48 -08:00
Peng Tao
47af81f295 nfs: only reset desc->pg_mirror_idx when mirroring is supported
so that we don't reset desc->pg_mirror_idx for read unnecessarily.
Remove WARN_ON_ONCE from __nfs_pageio_add_request to allow LD to
set pg_mirror_idx for read where pg_mirror_count is always 1.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:47 -08:00
Peng Tao
566f873763 nfs41: add a debug warning if we destroy an unempty layout
So that we can detect the case if some layout segments are still
pinned which is surely a bug that we need to fix.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03 11:06:47 -08:00
Weston Andros Adamson
80c76fe314 pnfs: fail comparison when bucket verifier not set
This skips the WARN_ON_ONCE, but doesnt change behavior (the memcmp would
fail).

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:46 -08:00
Weston Andros Adamson
0a00b77b33 nfs: mirroring support for direct io
The current mirroring code only notices short writes to the first
mirror. This patch keeps per-mirror byte counts and only considers
a byte to be written once all mirrors report so.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:46 -08:00
Weston Andros Adamson
a7d42ddb30 nfs: add mirroring support to pgio layer
This patch adds mirrored write support to the pgio layer. The default
is to use one mirror, but pgio callers may define callbacks to change
this to any value up to the (arbitrarily selected) limit of 16.

The basic idea is to break out members of nfs_pageio_descriptor that cannot
be shared between mirrored DSes and put them in a new structure.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:45 -08:00
Weston Andros Adamson
b57ff1303a pnfs: pass ds_commit_idx through the commit path
Pass ds_commit_idx through the nfs commit path. It's used to select
the commit bucket when using pnfs and is ignored when not using pnfs.
Several functions had to be changed: nfs_retry_commit,
nfs_mark_request_commit, pnfs_mark_request_commit and the pnfs layout
driver .mark_request_commit functions.

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03 11:06:45 -08:00
Weston Andros Adamson
6cccbb6f52 nfs: rename pgio header ds_idx to ds_commit_idx
'ds_commit_idx' is a better name - it is used to select the right
commit bucket for pnfs.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03 11:06:44 -08:00