linux

iv/linux

Author	SHA1	Message	Date
NeilBrown	64158668ac	NFS: swap IO handling is slightly different for O_DIRECT IO 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
NeilBrown	4dc73c6791	NFSv4: keep state manager thread active if swap is enabled If we are swapping over NFSv4, we may not be able to allocate memory to start the state-manager thread at the time when we need it. So keep it always running when swap is enabled, and just signal it to start. This requires updating and testing the cl_swapper count on the root rpc_clnt after following all ->cl_parent links. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
NeilBrown	8db55a032a	SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC rpc tasks can be marked as RPC_TASK_SWAPPER. This causes GFP_MEMALLOC to be used for some allocations. This is needed in some cases, but not in all where it is currently provided, and in some where it isn't provided. Currently all tasks associated with a rpc_client on which swap is enabled get the flag and hence some GFP_MEMALLOC support. GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it. However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does need it. xdr_alloc_bvec is called while the XPRT_LOCK is held. If this blocks, then it blocks all other queued tasks. So this allocation needs GFP_MEMALLOC for all requests, not just writes, when the xprt is used for any swap writes. Similarly, if the transport is not connected, that will block all requests including swap writes, so memory allocations should get GFP_MEMALLOC if swap writes are possible. So with this patch: 1/ we ONLY set RPC_TASK_SWAPPER for swap writes. 2/ __rpc_execute() sets PF_MEMALLOC while handling any task with RPC_TASK_SWAPPER set, or when handling any task that holds the XPRT_LOCKED lock on an xprt used for swap. This removes the need for the RPC_IS_SWAPPER() test in ->buf_alloc handlers. 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking any task to a swapper xprt. __rpc_execute() will clear it. 3/ PF_MEMALLOC is set for all the connect workers. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> (for xprtrdma parts) Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
NeilBrown	89c2be8a95	NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
NeilBrown	944d95f766	NFS: remove IS_SWAPFILE hack This code is pointless as IS_SWAPFILE is always defined. So remove it. Suggested-by: Mark Hemment <markhemm@googlemail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
Dave Wysochanski	b5fdf66f6e	NFS: Remove remaining dfprintks related to fscache and remove NFSDBG_FSCACHE The fscache cookie APIs including fscache_acquire_cookie() and fscache_relinquish_cookie() now have very good tracing. Thus, there is no real need for dfprintks in the NFS fscache interface. The NFS fscache interface has removed all dfprintks so remove the NFSDBG_FSCACHE defines. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
Dave Wysochanski	e3f0a7fe69	NFS: Replace dfprintks with tracepoints in fscache read and write page functions Most of fscache and other NFS IO paths are now using tracepoints. Remove the dfprintks in the NFS fscache read/write page functions and replace with tracepoints at the begin and end of the functions. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
Dave Wysochanski	fc1c5abfca	NFS: Rename fscache read and write pages functions Rename NFS fscache functions in a more consistent fashion to better reflect when we read from and write to fscache. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
Dave Wysochanski	45f3a70ba6	NFS: Cleanup usage of nfs_inode in fscache interface A number of places in the fscache interface used nfs_inode when inode could be used, simplifying the code. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:35 -04:00
Olga Kornievskaia	b4be2c598b	NFSv4.1 restrict GETATTR fs_location query to the main transport In the presence of trunking transports, it's helpful to make sure that during the migration event, the GETATTR for fs_location attribute happens on the main transport. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:34 -04:00
Alexey Khoroshilov	cb8fac6d27	NFS: remove unneeded check in decode_devicenotify_args() [You don't often get email from khoroshilov@ispras.ru. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Overflow check in not needed anymore after we switch to kmalloc_array(). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: `a4f743a6bb` ("NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-13 12:59:34 -04:00
Trond Myklebust	612896ec5a	NFS: Cache all entries in the readdirplus reply Even if we're not able to cache all the entries in the readdir buffer, let's ensure that we do prime the dcache. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	0adf85b445	NFS: Optimise away the previous cookie field Replace the 'previous cookie' field in struct nfs_entry with the array->last_cookie. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	b0365ccb07	NFS: Fix up forced readdirplus Avoid clearing the entire readdir page cache if we're just doing forced readdirplus for the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	f648022faa	NFS: Convert readdir page cache to use a cookie based index Instead of using a linear index to address the pages, use the cookie of the first entry, since that is what we use to match the page anyway. This allows us to avoid re-reading the entire cache on a seekdir() type of operation. The latter is very common when re-exporting NFS, and is a major performance drain. The change does affect our duplicate cookie detection, since we can no longer rely on the page index as a linear offset for detecting whether we looped backwards. However since we no longer do a linear search through all the pages on each call to nfs_readdir(), this is less of a concern than it was previously. The other downside is that invalidate_mapping_pages() no longer can use the page index to avoid clearing pages that have been read. A subsequent patch will restore the functionality this provides to the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	9332cf14e2	NFS: Clean up page array initialisation/free Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	11d03d0a1e	NFS: Trace effects of the readdirplus heuristic Enable tracking of when the readdirplus heuristic causes a page cache invalidation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	eace45a18c	NFS: Trace effects of readdirplus on the dcache Trace the effects of readdirplus on attribute and dentry revalidation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	310e318745	NFS: Add basic readdir tracing Add tracing to track how often the client goes to the server for updated readdir information. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	0b3cc71b5a	NFS: Don't request readdirplus when revalidation was forced If the revalidation was forced, due to the presence of a LOOKUP_EXCL or a LOOKUP_REVAL flag, then readdirplus won't help. It also can't help when we're doing a path component lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	2c2c336506	NFS: Readdirplus can't help lookup for case insensitive filesystems If the filesystem is case insensitive, then readdirplus can't help with cache misses, since it won't return case folded variants of the filename. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:39 -05:00
Trond Myklebust	c49c68944f	NFSv4: Ask for a full XDR buffer of readdir goodness Instead of pretending that we know the ratio of directory info vs readdirplus attribute info, just set the 'dircount' field to the same value as the 'maxcount' field. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	ad1e109a41	NFS: Don't ask for readdirplus unless it can help nfs_getattr() If attribute caching is turned off, then use of readdirplus is not going to help stat() performance. Readdirplus also doesn't help if a file is being written to, since we will have to flush those writes in order to sync the mtime/ctime. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	230bc98f7a	NFS: Improve heuristic for readdirplus The heuristic for readdirplus is designed to try to detect 'ls -l' and similar patterns. It does so by looking for cache hit/miss patterns in both the attribute cache and in the dcache of the files in a given directory, and then sets a flag for the readdirplus code to interpret. The problem with this approach is that a single attribute or dcache miss can cause the NFS code to force a refresh of the attributes for the entire set of files contained in the directory. To be able to make a more nuanced decision, let's sample the number of hits and misses in the set of open directory descriptors. That allows us to set thresholds at which we start preferring READDIRPLUS over regular READDIR, or at which we start to force a re-read of the remaining readdir cache using READDIRPLUS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	9c3f4d988c	NFS: Reduce use of uncached readdir When reading a very large directory, we want to try to keep the page cache up to date if doing so is inexpensive. With the change to allow readdir to continue reading even when the cache is incomplete, we no longer need to fall back to uncached readdir in order to scale to large directories. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	9ff89c25d8	NFS: Simplify nfs_readdir_xdr_to_array() Recent changes to readdir mean that we can cope with partially filled page cache entries, so we no longer need to rely on looping in nfs_readdir_xdr_to_array(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	6c34f05b75	NFS: If the cookie verifier changes, we must invalidate the page cache Ensure that if the cookie verifier changes when we use the zero-valued cookie, then we invalidate any cached pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	580f236737	NFS: Adjust the amount of readahead performed by NFS readdir The current NFS readdir code will always try to maximise the amount of readahead it performs on the assumption that we can cache anything that isn't immediately read by the process. There are several cases where this assumption breaks down, including when the 'ls -l' heuristic kicks in to try to force use of readdirplus as a batch replacement for lookup/getattr. This patch therefore tries to tone down the amount of readahead we perform, and adjust it to try to match the amount of data being requested by user space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	c8f0523ba3	NFS: Don't advance the page pointer unless the page is full When we hit the end of the data in the readdir page, we don't want to start filling a new page, unless this one is full. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	728dd0ab37	NFS: Don't re-read the entire page cache to find the next cookie If the page cache entry that was last read gets invalidated for some reason, then make sure we can re-create it on the next call to readdir. This, combined with the cache page validation, allows us to reuse the cached value of page-index on successive calls to nfs_readdir. Credit is due to Benjamin Coddington for showing that the concept works, and that it allows for improved cache sharing between processes even in the case where pages are lost due to LRU or active invalidation. Suggested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:38 -05:00
Trond Myklebust	d09e673f49	NFS: Store the change attribute in the directory page cache Use the change attribute and the first cookie in a directory page cache entry to validate that the page is up to date. Suggested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-03-02 08:43:33 -05:00
Trond Myklebust	0b2662b7e7	NFS: Calculate page offsets algorithmically Instead of relying on counting the page offsets as we walk through the page cache, switch to calculating them algorithmically. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-28 10:11:33 -05:00
Trond Myklebust	281f31b2e5	NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-28 10:11:32 -05:00
Trond Myklebust	d1e32ea355	NFS: Initialise the readdir verifier as best we can in nfs_opendir() For the purpose of ensuring that opendir() followed by seekdir() work as correctly as possible, try to initialise the readdir verifier in nfs_opendir(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-28 10:11:32 -05:00
Trond Myklebust	2eef8a3111	NFS: Trace lookup revalidation failure Enable tracing of lookup revalidation failures. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-28 10:11:32 -05:00
Trond Myklebust	64cfca85ba	NFS: Return valid errors from nfs2/3_decode_dirent() Valid return values for decode_dirent() callback functions are: 0: Success -EBADCOOKIE: End of directory -EAGAIN: End of xdr_stream All errors need to map into one of those three values. Fixes: `573c4e1ef5` ("NFS: Simplify ->decode_dirent() calling sequence") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-28 10:11:32 -05:00
Trond Myklebust	b38e09b9b6	Revert "NFSv4: use unique client identifiers in network namespaces" This reverts commit `50c790a0b6`. The functionality is believed to be capable of causing regressions in existing setups, so the author has requested that it be reverted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-28 10:09:23 -05:00
Trond Myklebust	6c984083ec	NFS: Use of mapping_set_error() results in spurious errors The use of mapping_set_error() in conjunction with calls to filemap_check_errors() is problematic because every error gets reported as either an EIO or an ENOSPC by filemap_check_errors() in functions such as filemap_write_and_wait() or filemap_write_and_wait_range(). In almost all cases, we prefer to use the more nuanced wb errors. Fixes: `b8946d7bfb` ("NFS: Revalidate the file mapping on all fatal writeback errors") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:13 -05:00
Trond Myklebust	84631f84ac	NFS: Clean up NFSv4.2 xattrs Add a helper for the xattr mask so that we can get rid of the inlined ifdefs. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:13 -05:00
Trond Myklebust	f1ec501d08	NFS: Remove unnecessary XATTR cache invalidation in nfs_fhget() We should never expect the 'xattr_cache' to be non-null in that case, hence nfs_set_cache_invalid() is just going to optimise it away. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:13 -05:00
Trond Myklebust	b622ffe1d9	NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTR Ensure that we always initialise the 'xattr_support' field in struct nfs_fsinfo, so that nfs_server_set_fsinfo() doesn't declare our NFSv2/v3 client to be capable of supporting the NFSv4.2 xattr protocol by setting the NFS_CAP_XATTR capability. This configuration can cause nfs_do_access() to set access mode bits that are unsupported by the NFSv3 ACCESS call, which may confuse spec-compliant servers. Reported-by: Olga Kornievskaia <kolga@netapp.com> Fixes: `b78ef845c3` ("NFSv4.2: query the server for extended attribute support") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:13 -05:00
Trond Myklebust	41e97b7f8a	NFS: Remove unused flag NFS_INO_REVAL_PAGECACHE Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:13 -05:00
Trond Myklebust	88a6099fc3	NFS: Replace last uses of NFS_INO_REVAL_PAGECACHE Now that we have more fine grained attribute revalidation, let's just get rid of NFS_INO_REVAL_PAGECACHE. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:13 -05:00
Benjamin Coddington	50c790a0b6	NFSv4: use unique client identifiers in network namespaces In order to differentiate client state, assign a random uuid to the uniquifing portion of the client identifier when a network namespace is created. Containers may still override this value if they wish to maintain stable client identifiers by writing to /sys/fs/nfs/net/client/identifier, either by udev rules or other means. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:12 -05:00
Olga Kornievskaia	43245eca6e	NFSv4.1 support for NFS4_RESULT_PRESERVER_UNLINKED In 4.1+, the server is allowed to set a flag NFS4_RESULT_PRESERVE_UNLINKED in reply to the OPEN, that tells the client that it does not need to do a silly rename of an opened file when it's being removed. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:12 -05:00
Trond Myklebust	4fb547be35	NFSv4.2/copyoffload: Convert GFP_NOFS to GFP_KERNEL There doesn't seem to be any reason why the copy offload code can't use GFP_KERNEL. It can't get called by direct reclaim. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:12 -05:00
Trond Myklebust	61345a42a2	NFSv4/flexfiles: Convert GFP_NOFS to GFP_KERNEL Assume that the higher layers will have set memalloc_nofs_save/restore as appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:12 -05:00
Trond Myklebust	da48f267f9	NFS: Convert GFP_NOFS to GFP_KERNEL Assume that sections that should not re-enter the filesystem are already protected with memalloc_nofs_save/restore call, so relax those GFP_NOFS instances which might be used by other contexts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:12 -05:00
Trond Myklebust	5c60e89e71	NFSv4.2: Fix up an invalid combination of memory allocation flags We should use either GFP_KERNEL or GFP_NOFS, but not both. Also strip GFP_KERNEL_ACCOUNT down to GFP_KERNEL. This memory is shrinkable, so does not need to be limited by kmemcg. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:12 -05:00
Trond Myklebust	9c00fd9acb	NFSv4: Charge NFSv4 open state trackers to kmemcg Allow kmemcg to limit the number of NFSv4 delegation, lock and open state trackers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-02-25 18:50:12 -05:00

1 2 3 4 5 ...

74895 Commits