linux

iv/linux

Author	SHA1	Message	Date
Trond Myklebust	4a3a0ebad1	Merge commit '24bab491220f' into client-4.2 - Pull in patch 'NFSD: Implement SEEK' from Bruce's nfsd-next tree for dependencies.	2014-09-30 16:23:39 -04:00
Anna Schumaker	24bab49122	NFSD: Implement SEEK This patch adds server support for the NFS v4.2 operation SEEK, which returns the position of the next hole or data segment in a file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-29 14:35:20 -04:00
Anna Schumaker	87a15a8090	NFSD: Add generic v4.2 infrastructure It's cleaner to introduce everything at once and have the server reply with "not supported" than it would be to introduce extra operations when implementing a specific one in the middle of the list. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-29 14:35:19 -04:00
Christoph Hellwig	0162ac2b97	nfsd: introduce nfsd4_callback_ops Add a higher level abstraction than the rpc_ops for callback operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:29 -04:00
Christoph Hellwig	f0b5de1b6b	nfsd: split nfsd4_callback initialization and use Split out initializing the nfs4_callback structure from using it. For the NULL callback this gets rid of tons of pointless re-initializations. Note that I don't quite understand what protects us from running multiple NULL callbacks at the same time, but at least this chance doesn't make it worse.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:28 -04:00
Christoph Hellwig	326129d02a	nfsd: introduce a generic nfsd4_cb Add a helper to queue up a callback. CB_NULL has a bit of special casing because it is special in the specification, but all other new callback operations will be able to share code with this and a few more changes to refactor the callback code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:27 -04:00
Christoph Hellwig	2faf3b4350	nfsd: remove nfsd4_callback.cb_op We can always get at the private data by using container_of, no need for a void pointer. Also introduce a little to_delegation helper to avoid opencoding the container_of everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:26 -04:00
Benny Halevy	341b51df1f	nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence This is incorrect when a callback is has to be restarted, in which case the XDR decoding of the second iteration will see a NULL cb argument. [hch: updated description] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:25 -04:00
Christoph Hellwig	444b6e910d	nfsd: fix nfsd4_cb_recall_done error handling For any error that is not EBADHANDLE or NFS4ERR_BAD_STATEID, nfsd4_cb_recall_done first marks the connection down, then retries until dl_retries hits zero, then marks the connection down again and sets cb_done. This changes the code to only retry for EBADHANDLE or NFS4ERR_BAD_STATEID, and factors setting cb_done into a single point in the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:25 -04:00
J. Bruce Fields	70b2823535	nfsd4: clarify how grace period ends The grace period is ended in two steps--first userland is notified that the grace period is now long enough that any clients who have not yet reclaimed can be safely forgotten, then we flip the switch that forbids reclaims and allows new opens. I had to think a bit to convince myself that the ordering was right here. Document it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-17 16:33:19 -04:00
J. Bruce Fields	bea57fe45b	nfsd4: stop grace_time update at end of grace period The attempt to automatically set a new grace period time at the end of the grace period isn't really helpful. We'll probably shut down and reboot before we actually make use of the new grace period time anyway. So may as well leave it up to the init system to get this right. This just confuses people when they see /proc/fs/nfsd/nfsv4gracetime change from what they set it to. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-17 16:33:18 -04:00
Jeff Layton	65decb650a	nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients In the case of v4.0 clients, we may call into the "create" client tracking operation multiple times (once for each openowner). Upcalling for each one of those is wasteful and slow however. We can skip doing further "create" operations after the first one if we know that one has already been done. v4.1+ clients generally only call into this function once (on RECLAIM_COMPLETE), and we can't skip upcalling on the create even if the STABLE bit is set. Doing so would make it impossible for nfsdcltrack to lift the grace period early since the timestamp has a different meaning in the case where the client is expected to issue a RECLAIM_COMPLETE. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:17 -04:00
Jeff Layton	788a7914ad	nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls The nfsdcltrack upcall doesn't utilize the NFSD4_CLIENT_STABLE flag, which basically results in an upcall every time we call into the client tracking ops. Change it to set this bit on a successful "check" or "create" request, and clear it on a "remove" request. Also, check to see if that bit is set before upcalling on a "check" or "remove" request, and skip upcalling appropriately, depending on its state. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:17 -04:00
Jeff Layton	d682e750ce	nfsd: serialize nfsdcltrack upcalls for a particular client In a later patch, we want to add a flag that will allow us to reduce the need for upcalls. In order to handle that correctly, we'll need to ensure that racing upcalls for the same client can't occur. In practice it should be rare for this to occur with a well-behaved client, but it is possible. Convert one of the bits in the cl_flags field to be an upcall bitlock, and use it to ensure that upcalls for the same client are serialized. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:16 -04:00
Jeff Layton	d4318acd5d	nfsd: pass extra info in env vars to upcalls to allow for early grace period end In order to support lifting the grace period early, we must tell nfsdcltrack what sort of client the "create" upcall is for. We can't reliably tell if a v4.0 client has completed reclaiming, so we can only lift the grace period once all the v4.1+ clients have issued a RECLAIM_COMPLETE and if there are no v4.0 clients. Also, in order to lift the grace period, we have to tell userland when the grace period started so that it can tell whether a RECLAIM_COMPLETE has been issued for each client since then. Since this is all optional info, we pass it along in environment variables to the "init" and "create" upcalls. By doing this, we don't need to revise the upcall format. The UMH upcall can simply make use of this info if it happens to be present. If it's not then it can just avoid lifting the grace period early. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:15 -04:00
Jeff Layton	7f5ef2e900	nfsd: add a v4_end_grace file to /proc/fs/nfsd Allow a privileged userland process to end the v4 grace period early. Writing "Y", "y", or "1" to the file will cause the v4 grace period to be lifted. The basic idea with this will be to allow the userland client tracking program to lift the grace period once it knows that no more clients will be reclaiming state. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:14 -04:00
Jeff Layton	d68e3c4aa4	lockd: add a /proc/fs/lockd/nlm_end_grace file Add a new procfile that will allow a (privileged) userland process to end the NLM grace period early. The basic idea here will be to have sm-notify write to this file, if it sent out no NOTIFY requests when it runs. In that situation, we can generally expect that there will be no reclaim requests so the grace period can be lifted early. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:13 -04:00
Jeff Layton	3b3e7b7223	nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE As stated in RFC 5661, section 18.51.3: Once a RECLAIM_COMPLETE is done, there can be no further reclaim operations for locks whose scope is defined as having completed recovery. Once the client sends RECLAIM_COMPLETE, the server will not allow the client to do subsequent reclaims of locking state for that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. Ensure that we enforce that requirement. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:13 -04:00
Jeff Layton	919b8049f0	nfsd: remove redundant boot_time parm from grace_done client tracking op Since it's stored in nfsd_net, we don't need to pass it in separately. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:12 -04:00
Jeff Layton	f779002965	lockd: move lockd's grace period handling into its own module Currently, all of the grace period handling is part of lockd. Eventually though we'd like to be able to build v4-only servers, at which point we'll need to put all of this elsewhere. Move the code itself into fs/nfs_common and have it build a grace.ko module. Then, rejigger the Kconfig options so that both nfsd and lockd enable it automatically. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:11 -04:00
Christoph Hellwig	f0c63124a6	nfsd: update mtime on truncate This fixes a failure in xfstests generic/313 because nfs doesn't update mtime on a truncate. The protocol requires this to be done implicity for a size changing setattr. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-11 11:12:16 -04:00
Linus Torvalds	9142eadefe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull filesystem fixes from Al Viro: "Several bugfixes (all of them -stable fodder). Alexey's one deals with double mutex_lock() in UFS (apparently, nobody has tried to test "ufs: sb mutex merge + mutex_destroy" on something like file creation/removal on ufs). Mine deal with two kinds of umount bugs, in umount propagation and in handling of automounted submounts, both resulting in bogus transient EBUSY from umount" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs: fix deadlocks introduced by sb mutex merge fix EBUSY on umount() from MNT_SHRINKABLE get rid of propagate_umount() mistakenly treating slaves as busy.	2014-09-07 10:59:58 -07:00
Alexey Khoroshilov	9ef7db7f38	ufs: fix deadlocks introduced by sb mutex merge Commit `0244756edc` ("ufs: sb mutex merge + mutex_destroy") introduces deadlocks in ufs_new_inode() and ufs_free_inode(). Most callers of that functions acqure the mutex by themselves and ufs_{new,free}_inode() do that via lock_ufs(), i.e we have an unavoidable double lock. The patch proposes to resolve the issue by making sure that ufs_{new,free}_inode() are not called with the mutex held. Found by Linux Driver Verification project (linuxtesting.org). Cc: stable@vger.kernel.org # 3.16 Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-07 13:26:39 -04:00
Linus Torvalds	11e9739813	xfs: fixes for v3.17-rc3 Fix: - a direct IO read/buffered read data corruption - the associated fallout from the DIO data corruption fix - collapse range bugs that are potential data corruption issues. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJUCkM+AAoJEK3oKUf0dfodUBgP+gJu50/XV4TFRLPlCRxhvN61 371i3ASls1y7ivhj40NzgbDDAZHM2q8Zqwd//318dFViHhWQDlH/1ga06kscRpZX d8cQEbFHApgUGQL5Gdq2l2hvAzYa75H0m6cq3jveyrN2rscjCSmAXwtlcEmx3AR6 TnCpxuVL5asjEGZYb0KfQACq//rASHJbhukpo1gB4ccZ0boWHOVf5SxuS4remzs9 y+rlPFNl5RD/WVdnJSvu9zu/nP6op3Ax5r7jZanoKbisKHfd7QOa+k65+Vz0Vq9G kxgfhz+yLfkOvcktq+41e1lVBln7fCIlcO9m3b53uxWPx5cla323893UiGYsA4F/ j/gGlh1qaT6C/1M1JBWDLDx931S78XiR1Y+WbtAU1PO+GuO0IEap3+iqtS2+oNAv OrpThLOgqTspK6MJeToCzdn2lRT2BJpcKwxIyDK8g+p9N6qCpyw3DfiKyu0wipGH D2D3mtE6drSHNaSceFAz8CrQvPOR7Ygj92QGpGSfkohxap9h6VJR/wNp/oGnpmN0 qgcxTrHvx3kw1hXssB4gjh6fBDnOUkac0isqxdow22Qt529t9sIanzMBvz+JxHQF zeqeFSh96lOXmB7UFBU+QyOhbDp3cJWChHrtY3Esw/+FmG6fxEy8z6pdZsiJYELr 5tka2richPD+gyXzcZwP =tRQo -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "The fixes all address recently discovered data corruption issues. The original Direct IO issue was discovered by Chris Mason @ Facebook on a production workload which mixed buffered reads with direct reads and writes IO to the same file. The fix for that exposed other issues with page invalidation (exposed by millions of fsx operations) failing due to dirty buffers beyond EOF. Finally, the collapse_range code could also cause problems due to racing writeback changing the extent map while it was being shifted around. The commits for that problem are simple mitigation fixes that prevent the problem from occuring. A more robust fix for 3.18 that addresses the underlying problem is currently being worked on by Brian. Summary of fixes: - a direct IO read/buffered read data corruption - the associated fallout from the DIO data corruption fix - collapse range bugs that are potential data corruption issues" * tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: trim eofblocks before collapse range xfs: xfs_file_collapse_range is delalloc challenged xfs: don't log inode unless extent shift makes extent modifications xfs: use ranged writeback and invalidation for direct IO xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't dirty buffers beyond EOF	2014-09-06 12:13:17 -07:00
Anton Altaparmakov	10096fb108	Export sync_filesystem() for modular ->remount_fs() use This patch changes sync_filesystem() to be EXPORT_SYMBOL(). The reason this is needed is that starting with 3.15 kernel, due to Theodore Ts'o's commit `02b9984d64` ("fs: push sync_filesystem() down to the file system's remount_fs()"), all file systems that have dirty data to be written out need to call sync_filesystem() from their ->remount_fs() method when remounting read-only. As this is now a generically required function rather than an internal only function it should be EXPORT_SYMBOL() so that all file systems can call it. Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-05 08:16:21 -07:00
Linus Torvalds	b7fece1be8	Merge git://git.kvack.org/~bcrl/aio-fixes Pull aio bugfixes from Ben LaHaise: "Two small fixes" * git://git.kvack.org/~bcrl/aio-fixes: aio: block exit_aio() until all context requests are completed aio: add missing smp_rmb() in read_events_ring	2014-09-04 16:08:55 -07:00
Gu Zheng	6098b45b32	aio: block exit_aio() until all context requests are completed It seems that exit_aio() also needs to wait for all iocbs to complete (like io_destroy), but we missed the wait step in current implemention, so fix it in the same way as we did in io_destroy. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org	2014-09-04 16:54:47 -04:00
Kinglong Mee	027bc41a3e	NFSD: Put export if prepare_creds() fail Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:04 -04:00
Kinglong Mee	13c82e8eb5	NFSD: Full checking of authentication name Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:03 -04:00
Kinglong Mee	48c348b09c	NFSD: Fix bad using of return value from qword_get Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:02 -04:00
Kinglong Mee	15d176c195	NFSD: Fix a memory leak if nfsd4_recdir_load fail Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:01 -04:00
Kinglong Mee	c2236f141e	NFSD: Reset creds after mnt_want_write_file() fail Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:01 -04:00
Kinglong Mee	8519f994e5	NFSD: Put file after ima_file_check fail in nfsd_open() Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-03 17:43:00 -04:00
Linus Torvalds	70c8038dd6	Merge tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs bug fixes from Jaegeuk Kim: "This series includes patches to: - fix recovery routines - fix bugs related to inline_data/xattr - fix when casting the dentry names - handle EIO or ENOMEM correctly - fix memory leak - fix lock coverage" * tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (28 commits) f2fs: reposition unlock_new_inode to prevent accessing invalid inode f2fs: fix wrong casting for dentry name f2fs: simplify by using a literal f2fs: truncate stale block for inline_data f2fs: use macro for code readability f2fs: introduce need_do_checkpoint for readability f2fs: fix incorrect calculation with total/free inode num f2fs: remove rename and use rename2 f2fs: skip if inline_data was converted already f2fs: remove rewrite_node_page f2fs: avoid double lock in truncate_blocks f2fs: prevent checkpoint during roll-forward f2fs: add WARN_ON in f2fs_bug_on f2fs: handle EIO not to break fs consistency f2fs: check s_dirty under cp_mutex f2fs: unlock_page when node page is redirtied out f2fs: introduce f2fs_cp_error for readability f2fs: give a chance to mount again when encountering errors f2fs: trigger release_dirty_inode in f2fs_put_super f2fs: don't skip checkpoint if there is no dirty node pages ...	2014-09-03 10:10:28 -07:00
Trond Myklebust	66f09ca717	nfs: do not start the callback thread until we set rqstp->rq_task This fixes an Oopsable race when starting up the callback server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-02 17:53:30 -04:00
Trond Myklebust	d4e8990299	lockd: Do not start the lockd thread before we've set nlmsvc_rqst->rq_task This fixes an Oopsable race when starting lockd. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-02 17:49:17 -04:00
Jeff Moyer	2ff396be60	aio: add missing smp_rmb() in read_events_ring We ran into a case on ppc64 running mariadb where io_getevents would return zeroed out I/O events. After adding instrumentation, it became clear that there was some missing synchronization between reading the tail pointer and the events themselves. This small patch fixes the problem in testing. Thanks to Zach for helping to look into this, and suggesting the fix. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org	2014-09-02 15:20:03 -04:00
Chao Yu	b73e52824c	f2fs: reposition unlock_new_inode to prevent accessing invalid inode As the race condition on the inode cache, following scenario can appear: [Thread a] [Thread b] ->f2fs_mkdir ->f2fs_add_link ->__f2fs_add_link ->init_inode_metadata failed here ->gc_thread_func ->f2fs_gc ->do_garbage_collect ->gc_data_segment ->f2fs_iget ->iget_locked ->wait_on_inode ->unlock_new_inode ->move_data_page ->make_bad_inode ->iput When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode should be set as bad to avoid being accessed by other thread. But in above scenario, it allows f2fs to access the invalid inode before this inode was set as bad. This patch fix the potential problem, and this issue was found by code review. change log from v1: o Add condition judgment in gc_data_segment() suggested by Changman Lee. o use iget_failed to simplify code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-02 00:22:24 -07:00
Brian Foster	41b9d7263e	xfs: trim eofblocks before collapse range xfs_collapse_file_space() currently writes back the entire file undergoing collapse range to settle things down for the extent shift algorithm. While this prevents changes to the extent list during the collapse operation, the writeback itself is not enough to prevent unnecessary collapse failures. The current shift algorithm uses the extent index to iterate the in-core extent list. If a post-eof delalloc extent persists after the writeback (e.g., a prior zero range op where the end of the range aligns with eof can separate the post-eof blocks such that they are not written back and converted), xfs_bmap_shift_extents() becomes confused over the encoded br_startblock value and fails the collapse. As with the full writeback, this is a temporary fix until the algorithm is improved to cope with a volatile extent list and avoid attempts to shift post-eof extents. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-02 12:12:53 +10:00
Dave Chinner	1669a8ca21	xfs: xfs_file_collapse_range is delalloc challenged If we have delalloc extents on a file before we run a collapse range opertaion, we sync the range that we are going to collapse to convert delalloc extents in that region to real extents to simplify the shift operation. However, the shift operation then assumes that the extent list is not going to change as it iterates over the extent list moving things about. Unfortunately, this isn't true because we can't hold the ILOCK over all the operations. We can prevent new IO from modifying the extent list by holding the IOLOCK, but that doesn't prevent writeback from running.... And when writeback runs, it can convert delalloc extents is the range of the file prior to the region being collapsed, and this changes the indexes of all the extents in the file. That causes the collapse range operation to Go Bad. The right fix is to rewrite the extent shift operation not to be dependent on the extent list not changing across the entire operation, but this is a fairly significant piece of work to do. Hence, as a short-term workaround for the problem, sync the entire file before starting a collapse operation to remove all delalloc ranges from the file and so avoid the problem of concurrent writeback changing the extent list. Diagnosed-and-Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-02 12:12:53 +10:00
Brian Foster	ca446d880c	xfs: don't log inode unless extent shift makes extent modifications The file collapse mechanism uses xfs_bmap_shift_extents() to collapse all subsequent extents down into the specified, previously punched out, region. This function performs some validation, such as whether a sufficient hole exists in the target region of the collapse, then shifts the remaining exents downward. The exit path of the function currently logs the inode unconditionally. While we must log the inode (and abort) if an error occurs and the transaction is dirty, the initial validation paths can generate errors before the transaction has been dirtied. This creates an unnecessary filesystem shutdown scenario, as the caller will cancel a transaction that has been marked dirty. Modify xfs_bmap_shift_extents() to OR the logflags bits as modifications are made to the inode bmap. Only log the inode in the exit path if logflags has been set. This ensures we only have to cancel a dirty transaction if modifications have been made and prevents an unnecessary filesystem shutdown otherwise. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-02 12:12:53 +10:00
Dave Chinner	7d4ea3ce63	xfs: use ranged writeback and invalidation for direct IO Now we are not doing silly things with dirtying buffers beyond EOF and using invalidation correctly, we can finally reduce the ranges of writeback and invalidation used by direct IO to match that of the IO being issued. Bring the writeback and invalidation ranges back to match the generic direct IO code - this will greatly reduce the perturbation of cached data when direct IO and buffered IO are mixed, but still provide the same buffered vs direct IO coherency behaviour we currently have. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-02 12:12:53 +10:00
Dave Chinner	834ffca6f7	xfs: don't zero partial page cache pages during O_DIRECT writes Similar to direct IO reads, direct IO writes are using truncate_pagecache_range to invalidate the page cache. This is incorrect due to the sub-block zeroing in the page cache that truncate_pagecache_range() triggers. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. cc: stable@vger.kernel.org Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-02 12:12:52 +10:00
Chris Mason	85e584da32	xfs: don't zero partial page cache pages during O_DIRECT writes xfs is using truncate_pagecache_range to invalidate the page cache during DIO reads. This is different from the other filesystems who only invalidate pages during DIO writes. truncate_pagecache_range is meant to be used when we are freeing the underlying data structs from disk, so it will zero any partial ranges in the page. This means a DIO read can zero out part of the page cache page, and it is possible the page will stay in cache. buffered reads will find an up to date page with zeros instead of the data actually on disk. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. [dchinner: catch error and warn if it fails. Comment.] cc: stable@vger.kernel.org Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-02 12:12:52 +10:00
Dave Chinner	22e757a49c	xfs: don't dirty buffers beyond EOF generic/263 is failing fsx at this point with a page spanning EOF that cannot be invalidated. The operations are: 1190 mapwrite 0x52c00 thru 0x5e569 (0xb96a bytes) 1191 mapread 0x5c000 thru 0x5d636 (0x1637 bytes) 1192 write 0x5b600 thru 0x771ff (0x1bc00 bytes) where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO write attempts to invalidate the cached page over this range, it fails with -EBUSY and so any attempt to do page invalidation fails. The real question is this: Why can't that page be invalidated after it has been written to disk and cleaned? Well, there's data on the first two buffers in the page (1k block size, 4k page), but the third buffer on the page (i.e. beyond EOF) is failing drop_buffers because it's bh->b_state == 0x3, which is BH_Uptodate \| BH_Dirty. IOWs, there's dirty buffers beyond EOF. Say what? OK, set_buffer_dirty() is called on all buffers from __set_page_buffers_dirty(), regardless of whether the buffer is beyond EOF or not, which means that when we get to ->writepage, we have buffers marked dirty beyond EOF that we need to clean. So, we need to implement our own .set_page_dirty method that doesn't dirty buffers beyond EOF. This is messy because the buffer code is not meant to be shared and it has interesting locking issues on the buffer dirty bits. So just copy and paste it and then modify it to suit what we need. Note: the solutions the other filesystems and generic block code use of marking the buffers clean in ->writepage does not work for XFS. It still leaves dirty buffers beyond EOF and invalidations still fail. Hence rather than play whack-a-mole, this patch simply prevents those buffers from being dirtied in the first place. cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-09-02 12:12:51 +10:00
Linus Torvalds	35e274458c	File locking related bugfixes for v3.17 (pile #3 ) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUAmbRAAoJEAAOaEEZVoIVah4P/iupUO7Ae5ODMDMog/vOp+SM +sWnyqnEyeMlQlNDoHoef5TPQ28aKEAq1Sg7CsqlK3qZSYSSPhb4KFsGWLZe6D5A 7iWSMKabdnuQ3qBCsb2Y6ZdB8IRAJz81sIAVI8F32NDmSs325wN/coVwfV4g8mQF QSpv78TjwBY0qNhNw06pS/FLV45IaPTDDgnTHRcOLrfHajDdGTdqrKI/L0ES1PFB 0ZUtG3qMPS2XYRyS6ZQ0TZZrl2/HMA5/fOwqKspNKxYxKS+TOf/umKwPPjHBnMHo mfD1XnG64ECkNio9bpg2CkjUqaT8aloNPgDxuP15vEV6bZ5WBLKjGUOY2IvPa198 do8CaAdp2Ql6kE2IyD+G+IkjqcxZ9H8hNH3cBM+3TzvxYqaiZKKZAky5UTau2LvG E5cyWhDPsVBGvAJXEPBf4vhIgzhaSuNox0+73nL4xU+L7bPTDzYIYyhS/InKO0X+ ZwAZn2u62XQmUDI8b+zrgOAHfWB0hHlcIfIsIrDxotM24TPPbJ2k2Dz0hKCZAraR DYDYPJZg+/QyPc8bujL5Hwjh8MogdDt1vxd9B65MwQWRn791LSGbq6VSsUnsMoAa dhG5U+a5eI8oQ2gkMEEK45o2ljcnZ3BSim6SGdmZ6YrNyEdk63xA4GjozK1YhWzG tLkXb4/7zV/dR8VTOQR/ =Wb1t -----END PGP SIGNATURE----- Merge tag 'locks-v3.17-3' of git://git.samba.org/jlayton/linux Pull file locking bugfx from Jeff Layton: "Just a bugfix for a bug that crept in to v3.15. It's in a rather rare error path, and I'm not aware of anyone having hit it, but it's worth fixing for v3.17" * tag 'locks-v3.17-3' of git://git.samba.org/jlayton/linux: locks: pass correct "before" pointer to locks_unlink_lock in generic_add_lease	2014-08-30 21:04:37 -07:00
Al Viro	81b6b06197	fix EBUSY on umount() from MNT_SHRINKABLE We need the parents of victims alive until namespace_unlock() gets to dput() of the (ex-)mountpoints. However, that screws up the "is it busy" checks in case when we have shrinkable mounts that need to be killed. Solution: go ahead and decrement refcounts of parents right in umount_tree(), increment them again just before dropping rwsem in namespace_unlock() (and let the loop in the end of namespace_unlock() finally drop those references for good, as we do now). Parents can't get freed until we drop rwsem - at least one reference is kept until then, both in case when parent is among the victims and when it is not. So they'll still be around when we get to namespace_unlock(). Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-30 18:32:05 -04:00
Al Viro	88b368f27a	get rid of propagate_umount() mistakenly treating slaves as busy. The check in __propagate_umount() ("has somebody explicitly mounted something on that slave?") is done before taking the already doomed victims out of the child lists. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-30 18:31:41 -04:00
Linus Torvalds	10f3291a1d	Merge branch 'akpm' (fixes from Andrew Morton) Merge patches from Andrew Morton: "22 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) kexec: purgatory: add clean-up for purgatory directory Documentation/kdump/kdump.txt: add ARM description flush_icache_range: export symbol to fix build errors tools: selftests: fix build issue with make kselftests target ocfs2: quorum: add a log for node not fenced ocfs2: o2net: set tcp user timeout to max value ocfs2: o2net: don't shutdown connection when idle timeout ocfs2: do not write error flag to user structure we cannot copy from/to x86/purgatory: use approprate -m64/-32 build flag for arch/x86/purgatory drivers/rtc/rtc-s5m.c: re-add support for devices without irq specified xattr: fix check for simultaneous glibc header inclusion kexec: remove CONFIG_KEXEC dependency on crypto kexec: create a new config option CONFIG_KEXEC_FILE for new syscall x86,mm: fix pte_special versus pte_numa hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked mm/zpool: use prefixed module loading zram: fix incorrect stat with failed_reads lib: turn CONFIG_STACKTRACE into an actual option. mm: actually clear pmd_numa before invalidating memblock, memhotplug: fix wrong type in memblock_find_in_range_node(). ...	2014-08-29 16:28:29 -07:00
Junxiao Bi	8c7b638cec	ocfs2: quorum: add a log for node not fenced For debug use, we can see from the log whether the fence decision is made and why it is not fenced. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-29 16:28:17 -07:00

1 2 3 4 5 ...

37608 Commits