6556 Commits

Author SHA1 Message Date
Trond Myklebust
eace45a18c NFS: Trace effects of readdirplus on the dcache
Trace the effects of readdirplus on attribute and dentry revalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
310e318745 NFS: Add basic readdir tracing
Add tracing to track how often the client goes to the server for updated
readdir information.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
0b3cc71b5a NFS: Don't request readdirplus when revalidation was forced
If the revalidation was forced, due to the presence of a LOOKUP_EXCL or
a LOOKUP_REVAL flag, then readdirplus won't help. It also can't help
when we're doing a path component lookup.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
2c2c336506 NFS: Readdirplus can't help lookup for case insensitive filesystems
If the filesystem is case insensitive, then readdirplus can't help with
cache misses, since it won't return case folded variants of the filename.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:39 -05:00
Trond Myklebust
c49c68944f NFSv4: Ask for a full XDR buffer of readdir goodness
Instead of pretending that we know the ratio of directory info vs
readdirplus attribute info, just set the 'dircount' field to the same
value as the 'maxcount' field.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
ad1e109a41 NFS: Don't ask for readdirplus unless it can help nfs_getattr()
If attribute caching is turned off, then use of readdirplus is not going
to help stat() performance.
Readdirplus also doesn't help if a file is being written to, since we
will have to flush those writes in order to sync the mtime/ctime.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
230bc98f7a NFS: Improve heuristic for readdirplus
The heuristic for readdirplus is designed to try to detect 'ls -l' and
similar patterns. It does so by looking for cache hit/miss patterns in
both the attribute cache and in the dcache of the files in a given
directory, and then sets a flag for the readdirplus code to interpret.

The problem with this approach is that a single attribute or dcache miss
can cause the NFS code to force a refresh of the attributes for the
entire set of files contained in the directory.

To be able to make a more nuanced decision, let's sample the number of
hits and misses in the set of open directory descriptors. That allows us
to set thresholds at which we start preferring READDIRPLUS over regular
READDIR, or at which we start to force a re-read of the remaining
readdir cache using READDIRPLUS.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
9c3f4d988c NFS: Reduce use of uncached readdir
When reading a very large directory, we want to try to keep the page
cache up to date if doing so is inexpensive. With the change to allow
readdir to continue reading even when the cache is incomplete, we no
longer need to fall back to uncached readdir in order to scale to large
directories.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
9ff89c25d8 NFS: Simplify nfs_readdir_xdr_to_array()
Recent changes to readdir mean that we can cope with partially filled
page cache entries, so we no longer need to rely on looping in
nfs_readdir_xdr_to_array().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
6c34f05b75 NFS: If the cookie verifier changes, we must invalidate the page cache
Ensure that if the cookie verifier changes when we use the zero-valued
cookie, then we invalidate any cached pages.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
580f236737 NFS: Adjust the amount of readahead performed by NFS readdir
The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.

This patch therefore tries to tone down the amount of readahead we
perform, and adjust it to try to match the amount of data being
requested by user space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
c8f0523ba3 NFS: Don't advance the page pointer unless the page is full
When we hit the end of the data in the readdir page, we don't want to
start filling a new page, unless this one is full.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
728dd0ab37 NFS: Don't re-read the entire page cache to find the next cookie
If the page cache entry that was last read gets invalidated for some
reason, then make sure we can re-create it on the next call to readdir.
This, combined with the cache page validation, allows us to reuse the
cached value of page-index on successive calls to nfs_readdir.

Credit is due to Benjamin Coddington for showing that the concept works,
and that it allows for improved cache sharing between processes even in
the case where pages are lost due to LRU or active invalidation.

Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
d09e673f49 NFS: Store the change attribute in the directory page cache
Use the change attribute and the first cookie in a directory page cache
entry to validate that the page is up to date.

Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:33 -05:00
Chuck Lever
37902c6313 NFSD: Move svc_serv_ops::svo_function into struct svc_serv
Hoist svo_function back into svc_serv and remove struct
svc_serv_ops, since the struct is now devoid of fields.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
f49169c97f NFSD: Remove svc_serv_ops::svo_module
struct svc_serv_ops is about to be removed.

Neil Brown says:
> I suspect svo_module can go as well - I don't think the thread is
> ever the thing that primarily keeps a module active.

A random sample of kthread_create() callers shows sunrpc is the only
one that manages module reference count in this way.

Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
c7d7ec8f04 SUNRPC: Remove svc_shutdown_net()
Clean up: svc_shutdown_net() now does nothing but call
svc_close_net(). Replace all external call sites.

svc_close_net() is renamed to be the inverse of svc_xprt_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
352ad31448 SUNRPC: Rename svc_create_xprt()
Clean up: Use the "svc_xprt_<task>" function naming convention as
is used for other external APIs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
a9ff2e99e9 SUNRPC: Remove the .svo_enqueue_xprt method
We have never been able to track down and address the underlying
cause of the performance issues with workqueue-based service
support. svo_enqueue_xprt is called multiple times per RPC, so
it adds instruction path length, but always ends up at the same
function: svc_xprt_do_enqueue(). We do not anticipate needing
this flexibility for dynamic nfsd thread management support.

As a micro-optimization, remove .svo_enqueue_xprt because
Spectre/Meltdown makes virtual function calls more costly.

This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc:
turn enqueueing a svc_xprt into a svc_serv operation").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Trond Myklebust
0b2662b7e7 NFS: Calculate page offsets algorithmically
Instead of relying on counting the page offsets as we walk through the
page cache, switch to calculating them algorithmically.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28 10:11:33 -05:00
Trond Myklebust
281f31b2e5 NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28 10:11:32 -05:00
Trond Myklebust
d1e32ea355 NFS: Initialise the readdir verifier as best we can in nfs_opendir()
For the purpose of ensuring that opendir() followed by seekdir() work as
correctly as possible, try to initialise the readdir verifier in
nfs_opendir().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28 10:11:32 -05:00
Trond Myklebust
2eef8a3111 NFS: Trace lookup revalidation failure
Enable tracing of lookup revalidation failures.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28 10:11:32 -05:00
Trond Myklebust
64cfca85ba NFS: Return valid errors from nfs2/3_decode_dirent()
Valid return values for decode_dirent() callback functions are:
 0: Success
 -EBADCOOKIE: End of directory
 -EAGAIN: End of xdr_stream

All errors need to map into one of those three values.

Fixes: 573c4e1ef53a ("NFS: Simplify ->decode_dirent() calling sequence")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28 10:11:32 -05:00
Trond Myklebust
b38e09b9b6 Revert "NFSv4: use unique client identifiers in network namespaces"
This reverts commit 50c790a0b69bdc420f00f30bdf348d6c90194c78.

The functionality is believed to be capable of causing regressions in
existing setups, so the author has requested that it be reverted.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28 10:09:23 -05:00
Trond Myklebust
6c984083ec NFS: Use of mapping_set_error() results in spurious errors
The use of mapping_set_error() in conjunction with calls to
filemap_check_errors() is problematic because every error gets reported
as either an EIO or an ENOSPC by filemap_check_errors() in functions
such as filemap_write_and_wait() or filemap_write_and_wait_range().
In almost all cases, we prefer to use the more nuanced wb errors.

Fixes: b8946d7bfb94 ("NFS: Revalidate the file mapping on all fatal writeback errors")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:13 -05:00
Trond Myklebust
84631f84ac NFS: Clean up NFSv4.2 xattrs
Add a helper for the xattr mask so that we can get rid of the inlined
ifdefs.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:13 -05:00
Trond Myklebust
f1ec501d08 NFS: Remove unnecessary XATTR cache invalidation in nfs_fhget()
We should never expect the 'xattr_cache' to be non-null in that case,
hence nfs_set_cache_invalid() is just going to optimise it away.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:13 -05:00
Trond Myklebust
b622ffe1d9 NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTR
Ensure that we always initialise the 'xattr_support' field in struct
nfs_fsinfo, so that nfs_server_set_fsinfo() doesn't declare our NFSv2/v3
client to be capable of supporting the NFSv4.2 xattr protocol by setting
the NFS_CAP_XATTR capability.

This configuration can cause nfs_do_access() to set access mode bits
that are unsupported by the NFSv3 ACCESS call, which may confuse
spec-compliant servers.

Reported-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: b78ef845c35d ("NFSv4.2: query the server for extended attribute support")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:13 -05:00
Trond Myklebust
41e97b7f8a NFS: Remove unused flag NFS_INO_REVAL_PAGECACHE
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:13 -05:00
Trond Myklebust
88a6099fc3 NFS: Replace last uses of NFS_INO_REVAL_PAGECACHE
Now that we have more fine grained attribute revalidation, let's just
get rid of NFS_INO_REVAL_PAGECACHE.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:13 -05:00
Benjamin Coddington
50c790a0b6 NFSv4: use unique client identifiers in network namespaces
In order to differentiate client state, assign a random uuid to the
uniquifing portion of the client identifier when a network namespace is
created.  Containers may still override this value if they wish to maintain
stable client identifiers by writing to /sys/fs/nfs/net/client/identifier,
either by udev rules or other means.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Olga Kornievskaia
43245eca6e NFSv4.1 support for NFS4_RESULT_PRESERVER_UNLINKED
In 4.1+, the server is allowed to set a flag
NFS4_RESULT_PRESERVE_UNLINKED in reply to the OPEN, that tells
the client that it does not need to do a silly rename of an
opened file when it's being removed.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Trond Myklebust
4fb547be35 NFSv4.2/copyoffload: Convert GFP_NOFS to GFP_KERNEL
There doesn't seem to be any reason why the copy offload code can't use
GFP_KERNEL. It can't get called by direct reclaim.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Trond Myklebust
61345a42a2 NFSv4/flexfiles: Convert GFP_NOFS to GFP_KERNEL
Assume that the higher layers will have set memalloc_nofs_save/restore
as appropriate.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Trond Myklebust
da48f267f9 NFS: Convert GFP_NOFS to GFP_KERNEL
Assume that sections that should not re-enter the filesystem are already
protected with memalloc_nofs_save/restore call, so relax those GFP_NOFS
instances which might be used by other contexts.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Trond Myklebust
5c60e89e71 NFSv4.2: Fix up an invalid combination of memory allocation flags
We should use either GFP_KERNEL or GFP_NOFS, but not both. Also strip
GFP_KERNEL_ACCOUNT down to GFP_KERNEL. This memory is shrinkable, so
does not need to be limited by kmemcg.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Trond Myklebust
9c00fd9acb NFSv4: Charge NFSv4 open state trackers to kmemcg
Allow kmemcg to limit the number of NFSv4 delegation, lock and open
state trackers.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Trond Myklebust
d7867712d8 NFS: Charge open/lock file contexts to kmemcg
Allow kmemcg to limit the number of open/lock file contexts, in the same
way that it limits the parent file descriptors.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Trond Myklebust
3e17898aca NFSv4: Protect the state recovery thread against direct reclaim
If memory allocation triggers a direct reclaim from the state recovery
thread, then we can deadlock. Use memalloc_nofs_save/restore to ensure
that doesn't happen.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Xin Xiong
b7f114edd5 NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify()
[You don't often get email from xiongx18@fudan.edu.cn. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]

The reference counting issue happens in two error paths in the
function _nfs42_proc_copy_notify(). In both error paths, the function
simply returns the error code and forgets to balance the refcount of
object `ctx`, bumped by get_nfs_open_context() earlier, which may
cause refcount leaks.

Fix it by balancing refcount of the `ctx` object before the function
returns in both error paths.

Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Matthew Wilcox (Oracle)
8786fde842 Convert NFS from readpages to readahead
NFS is one of the last two users of the deprecated ->readpages aop.
This conversion looks straightforward, but I have only compile-tested
it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 15:07:07 -05:00
Tom Rix
98c27f276b NFS: simplify check for freeing cn_resp
nfs42_files_from_same_server() is called to check if freeing
cn_resp is required, just do the free.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 15:07:07 -05:00
Trond Myklebust
d19e0183a8 NFS: Do not report writeback errors in nfs_getattr()
The result of the writeback, whether it is an ENOSPC or an EIO, or
anything else, does not inhibit the NFS client from reporting the
correct file timestamps.

Fixes: 79566ef018f5 ("NFS: Getattr doesn't require data sync semantics")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-16 15:15:22 -05:00
Trond Myklebust
e0caaf75d4 NFS: LOOKUP_DIRECTORY is also ok with symlinks
Commit ac795161c936 (NFSv4: Handle case where the lookup of a directory
fails) [1], part of Linux since 5.17-rc2, introduced a regression, where
a symbolic link on an NFS mount to a directory on another NFS does not
resolve(?) the first time it is accessed:

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: ac795161c936 ("NFSv4: Handle case where the lookup of a directory fails")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-14 14:58:48 -05:00
Trond Myklebust
9d047bf68f NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()
In nfs4_update_changeattr_locked(), we don't need to set the
NFS_INO_REVAL_PAGECACHE flag, because we already know the value of the
change attribute, and we're already flagging the size. In fact, this
forces us to revalidate the change attribute a second time for no good
reason.
This extra flag appears to have been introduced as part of the xattr
feature, when update_changeattr_locked() was converted for use by the
xattr code.

Fixes: 1b523ca972ed ("nfs: modify update_changeattr to deal with regular files")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-14 14:58:48 -05:00
Yang Li
3d4a39404b NFS: Fix nfs4_proc_get_locations() kernel-doc comment
Add the description of @server and @fhandle, and remove the excess
@inode in nfs4_proc_get_locations() kernel-doc comment to remove
warnings found by running scripts/kernel-doc, which is caused by
using 'make W=1'.

fs/nfs/nfs4proc.c:8219: warning: Function parameter or member 'server'
not described in 'nfs4_proc_get_locations'
fs/nfs/nfs4proc.c:8219: warning: Function parameter or member 'fhandle'
not described in 'nfs4_proc_get_locations'
fs/nfs/nfs4proc.c:8219: warning: Excess function parameter 'inode'
description in 'nfs4_proc_get_locations'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08 09:14:26 -05:00
Trond Myklebust
468d126dab NFS: Fix initialisation of nfs_client cl_flags field
For some long forgotten reason, the nfs_client cl_flags field is
initialised in nfs_get_client() instead of being initialised at
allocation time. This quirk was harmless until we moved the call to
nfs_create_rpc_client().

Fixes: dd99e9f98fbf ("NFSv4: Initialise connection to the server in nfs4_alloc_client()")
Cc: stable@vger.kernel.org # 4.8.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08 09:13:49 -05:00
Trond Myklebust
e1d2699b96 NFS: Avoid duplicate uncached readdir calls on eof
If we've reached the end of the directory, then cache that information
in the context so that we don't need to do an uncached readdir in order
to rediscover that fact.

Fixes: 794092c57f89 ("NFS: Do uncached readdir when we're seeking a cookie in an empty page cache")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-02 10:47:33 -05:00
trondmy@kernel.org
ce292d8faf NFS: Don't skip directory entries when doing uncached readdir
Ensure that we initialise desc->cache_entry_index correctly in
uncached_readdir().

Fixes: d1bacf9eb2fd ("NFS: add readdir cache array")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-02 10:47:23 -05:00