6416 Commits

Author SHA1 Message Date
Trond Myklebust
4cd27df88a NFS: Remove redundant call to __set_page_dirty_nobuffers
Remove a redundant call in nfs_updatepage(). nfs_writepage_setup() will
have already called nfs_mark_request_dirty() on success.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-21 17:11:41 -04:00
Anna Schumaker
5fe1210d25 NFS: Unexport nfs_probe_fsinfo()
All the callers are now in client.c so we can remove the
EXPORT_SYMBOL_GPL() and make it static.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:55 -04:00
Anna Schumaker
1301ba603c NFS: Call nfs_probe_server() during a fscontext-reconfigure event
This lets us update the server's attributes when the user does a "mount
-o remount" on the filesystem.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:55 -04:00
Anna Schumaker
4d4cf8d2d6 NFS: Replace calls to nfs_probe_fsinfo() with nfs_probe_server()
Clean up. There are a few places where we want to probe the server, but
don't actually care about the fsinfo result. Change these to use
nfs_probe_server(), which handles the fattr allocation for us.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Anna Schumaker
e5731131fb NFS: Move nfs_probe_destination() into the generic client
And rename it to nfs_probe_server(). I also change it to take the nfs_fh
as an argument so callers can choose what filehandle to probe.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Anna Schumaker
01dde76e47 NFS: Create an nfs4_server_set_init_caps() function
And call it before doing an FSINFO probe to reset to the baseline
capabilities before probing.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Chuck Lever
86882c7546 NFS: Remove --> and <-- dprintk call sites
dprintk call sites that display no other information than the
function name can be replaced with use of the trace "function" or
"function_graph" plug-ins.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Chuck Lever
b40887e10d SUNRPC: Trace calls to .rpc_call_done
Introduce a single tracepoint that can replace simple dprintk call
sites in upper layer "rpc_call_done" callbacks. Example:

   kworker/u24:2-1254  [001]   771.026677: rpc_stats_latency:    task:00000001@00000002 xid=0x16a6f3c0 rpcbindv2 GETPORT backlog=446 rtt=101 execute=555
   kworker/u24:2-1254  [001]   771.026677: rpc_task_call_done:   task:00000001@00000002 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpcb_getport_done
   kworker/u24:2-1254  [001]   771.026678: rpcb_setport:         task:00000001@00000002 status=0 port=20048

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Chuck Lever
d9f877433e NFS: Replace dprintk callsites in nfs_readpage(s)
These new events report slightly different information for readpage
and readpages/readahead.

For readpage:
             fsx-1387  [006]   380.761896: nfs_aop_readpage:    fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072
             fsx-1387  [006]   380.761900: nfs_aop_readpage_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 ret=0

The index of a synchronous single-page read is reported.

For readpages:

             fsx-1387  [006]   380.760847: nfs_aop_readahead:   fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3
             fsx-1387  [006]   380.760853: nfs_aop_readahead_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 ret=0

The count of pages requested is reported. nfs_readpages does not
wait for the READ requests to complete.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Chuck Lever
b4776a341e SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field
For certain special cases, RPC-related tracepoints record a -1 as
the task ID or the client ID. It's ugly for a trace event to display
4 billion in these cases.

To help keep SUNRPC tracepoints consistent, create a macro that
defines the print format specifiers for tk_pid and cl_clid. At some
point in the future we might try tk_pid with a wider range of values
than 0..64K so this makes it easier to make that change.

RPC tracepoints now look like this:

<...>-1276  [009]   149.720358: rpc_clnt_new:         client=00000005 peer=[192.168.2.55]:20049 program=nfs server=klimt.ib

<...>-1342  [004]   149.921234: rpc_xdr_recvfrom:     task:0000001a@00000005 head=[0xff1242d9ab6dc01c,144] page=0 tail=[(nil),0] len=144
<...>-1342  [004]   149.921235: xprt_release_cong:    task:0000001a@00000005 snd_task:ffffffff cong=256 cwnd=16384
<...>-1342  [004]   149.921235: xprt_put_cong:        task:0000001a@00000005 snd_task:ffffffff cong=0 cwnd=16384

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Alexey Gladkov
d5f458a979 Fix user namespace leak
Fixes: 61ca2c4afd9d ("NFS: Only reference user namespace from nfs4idmap struct instead of cred")
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Trond Myklebust
e591b298d7 NFS: Save some space in the inode
Save some space in the nfs_inode by setting up an anonymous union with
the fields that are peculiar to a specific type of filesystem object.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Trond Myklebust
6e176d4716 NFSv4: Fixes for nfs4_inode_return_delegation()
We mustn't call nfs_wb_all() on anything other than a regular file.
Furthermore, we can exit early when we don't hold a delegation.

Reported-by: David Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:53 -04:00
Trond Myklebust
f0caea8882 NFS: Fix an Oops in pnfs_mark_request_commit()
Olga reports seeing the following Oops when doing O_DIRECT writes to a
pNFS flexfiles server:

Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 234186 Comm: kworker/u8:1 Not tainted 5.15.0-rc4+ #4
Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
Workqueue: nfsiod rpc_async_release [sunrpc]
RIP: 0010:nfs_mark_request_commit+0x12/0x30 [nfs]
Code: ff ff be 03 00 00 00 e8 ac 34 83 eb e9 29 ff ff
ff e8 22 bc d7 eb 66 90 0f 1f 44 00 00 48 85 f6 74 16 48 8b 42 10 48
8b 40 18 <48> 8b 40 18 48 85 c0 74 05 e9 70 fc 15 ec 48 89 d6 e9 68 ed
ff ff
RSP: 0018:ffffa82f0159fe00 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8f3393141880 RCX: 0000000000000000
RDX: ffffa82f0159fe08 RSI: ffff8f3381252500 RDI: ffff8f3393141880
RBP: ffff8f33ac317c00 R08: 0000000000000000 R09: ffff8f3487724cb0
R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000001
R13: ffff8f3485bccee0 R14: ffff8f33ac317c10 R15: ffff8f33ac317cd8
FS:  0000000000000000(0000) GS:ffff8f34fbc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 0000000122120006 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 nfs_direct_write_completion+0x13b/0x250 [nfs]
 rpc_free_task+0x39/0x60 [sunrpc]
 rpc_async_release+0x29/0x40 [sunrpc]
 process_one_work+0x1ce/0x370
 worker_thread+0x30/0x380
 ? process_one_work+0x370/0x370
 kthread+0x11a/0x140
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 9c455a8c1e14 ("NFS/pNFS: Clean up pNFS commit operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:53 -04:00
Trond Myklebust
133a48abf6 NFS: Fix up commit deadlocks
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead
to fsync() hangs. The fix is to ensure that O_DIRECT calls
nfs_commit_end().

Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:45 -04:00
Vivek Goyal
15bf32398a security: Return xattr name from security_dentry_init_security()
Right now security_dentry_init_security() only supports single security
label and is used by SELinux only. There are two users of this hook,
namely ceph and nfs.

NFS does not care about xattr name. Ceph hardcodes the xattr name to
security.selinux (XATTR_NAME_SELINUX).

I am making changes to fuse/virtiofs to send security label to virtiofsd
and I need to send xattr name as well. I also hardcoded the name of
xattr to security.selinux.

Stephen Smalley suggested that it probably is a good idea to modify
security_dentry_init_security() to also return name of xattr so that
we can avoid this hardcoding in the callers.

This patch adds a new parameter "const char **xattr_name" to
security_dentry_init_security() and LSM puts the name of xattr
too if caller asked for it (xattr_name != NULL).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
[PM: fixed typos in the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-20 08:17:08 -04:00
Christoph Hellwig
6e50e781fe nfs/blocklayout: use bdev_nr_bytes instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211018101130.1838532-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:23 -06:00
Chuck Lever
130e2054d4 SUNRPC: Change return value type of .pc_encode
Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_encode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13 11:34:49 -04:00
Chuck Lever
fda4944114 SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR encoder, and can be removed.

Note also that there is a line in each encoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per encoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13 11:34:49 -04:00
Trond Myklebust
64a93dbf25 NFS: Fix deadlocks in nfs_scan_commit_list()
Partially revert commit 2ce209c42c01 ("NFS: Wait for requests that are
locked on the commit list"), since it can lead to deadlocks between
commit requests and nfs_join_page_group().
For now we should assume that any locked requests on the commit list are
either about to be removed and committed by another task, or the writes
they describe are about to be retransmitted. In either case, we should
not need to worry.

Fixes: 2ce209c42c01 ("NFS: Wait for requests that are locked on the commit list")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-10 11:05:54 +02:00
Chuck Lever
110cb2d2f9 NFS: Instrument i_size_write()
Generate a trace event whenever the NFS client modifies the size of
a file. These new events aid troubleshooting workloads that trigger
races around size updates.

There are four new trace points, all named nfs_size_something so
they are easy to grep for or enable as a group with a single glob.

Size updated on the server:

  kworker/u24:10-194   [010]   369.939174: nfs_size_update:      fileid=00:28:2 fhandle=0x36fbbe51 version=1752899344277980615 cursize=250471 newsize=172083

Server-side size update reported via NFSv3 WCC attributes:

             fsx-1387  [006]   380.760686: nfs_size_wcc:         fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 cursize=146792 newsize=171216

File has been truncated locally:

             fsx-1387  [007]   369.437421: nfs_size_truncate:    fileid=00:28:2 fhandle=0x36fbbe51 version=1752899231200117272 cursize=215244 newsize=0

File has been extended locally:

             fsx-1387  [007]   369.439213: nfs_size_grow:        fileid=00:28:2 fhandle=0x36fbbe51 version=1752899343704248410 cursize=258048 newsize=262144

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-10 11:05:54 +02:00
Chuck Lever
8e09650f5e NFS: Remove unnecessary TRACE_DEFINE_ENUM()s
Clean up: TRACE_DEFINE_ENUM is unnecessary because the target
symbols are all C macros, not enums.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-10 11:05:54 +02:00
Baptiste Lepers
a2915fa062 pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds
_nfs4_pnfs_v3/v4_ds_connect do
   some work
   smp_wmb
   ds->ds_clp = clp;

And nfs4_ff_layout_prepare_ds currently does
   smp_rmb
   if(ds->ds_clp)
      ...

This patch places the smp_rmb after the if. This ensures that following
reads only happen once nfs4_ff_layout_prepare_ds has checked that data
has been properly initialized.

Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03 23:00:50 -04:00
Trond Myklebust
36a10a3c4c NFS: Remove unnecessary page cache invalidations
Remove cache invalidations that are already covered by change attribute
updates.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03 20:49:07 -04:00
Trond Myklebust
b97583b263 NFS: Do not flush the readdir cache in nfs_dentry_iput()
The original premise in commit 83672d392f7b ("NFS: Fix directory caching
problem - with test case and patch.") was that readdirplus was caching
attribute information and replaying it later. This is no longer the
case.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03 20:49:07 -04:00
Trond Myklebust
cec08f452a NFS: Fix dentry verifier races
If the directory changed while we were revalidating the dentry, then
don't update the dentry verifier. There is no value in setting the
verifier to an older value, and we could end up overwriting a more up to
date verifier from a parallel revalidation.

Fixes: efeda80da38d ("NFSv4: Fix revalidation of dentries with delegations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2021-10-03 20:49:07 -04:00
Trond Myklebust
ff81dfb5d7 NFS: Further optimisations for 'ls -l'
If a user is doing 'ls -l', we have a heuristic in GETATTR that tells
the readdir code to try to use READDIRPLUS in order to refresh the inode
attributes. In certain cirumstances, we also try to invalidate the
remaining directory entries in order to ensure this refresh.

If there are multiple readers of the directory, we probably should avoid
invalidating the page cache, since the heuristic breaks down in that
situation anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2021-10-03 20:49:07 -04:00
Trond Myklebust
2929bc3329 NFS: Fix up nfs_readdir_inode_mapping_valid()
The check for duplicate readdir cookies should only care if the change
attribute is invalid or the data cache is invalid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2021-10-03 20:49:07 -04:00
Trond Myklebust
a6a361c4ca NFS: Ignore the directory size when marking for revalidation
If we want to revalidate the directory, then just mark the change
attribute as invalid.

Fixes: 13c0b082b6a9 ("NFS: Replace use of NFS_INO_REVAL_PAGECACHE when checking cache validity")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2021-10-03 20:49:06 -04:00
Trond Myklebust
488796ec1e NFS: Don't set NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA
NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA should be considered
mutually exclusive.

Fixes: 1c341b777501 ("NFS: Add deferred cache invalidation for close-to-open consistency violations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
2021-10-03 20:49:06 -04:00
Trond Myklebust
eea413308f NFS: Default change_attr_type to NFS4_CHANGE_TYPE_IS_UNDEFINED
Both NFSv3 and NFSv2 generate their change attribute from the ctime
value that was supplied by the server. However the problem is that there
are plenty of servers out there with ctime resolutions of 1ms or worse.
In a modern performance system, this is insufficient when trying to
decide which is the most recent set of attributes when, for instance, a
READ or GETATTR call races with a WRITE or SETATTR.

For this reason, let's revert to labelling the NFSv2/v3 change
attributes as NFS4_CHANGE_TYPE_IS_UNDEFINED. This will ensure we protect
against such races.

Fixes: 7b24dacf0840 ("NFS: Another inode revalidation improvement")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
2021-10-03 20:49:06 -04:00
Trond Myklebust
a1e7f30a86 NFSv4: Retrieve ACCESS on open if we're not using NFS4_CREATE_EXCLUSIVE
NFS4_CREATE_EXCLUSIVE does not allow the caller to set an access mode,
so for most Linux filesystems, the access call ends up returning no
permissions. However both NFS4_CREATE_EXCLUSIVE4_1 and
NFS4_CREATE_GUARDED allow the client to set the access mode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03 20:49:06 -04:00
Trond Myklebust
43d20e80e2 NFS: Fix a few more clear_bit() instances that need release semantics
All these bits are being used as bit locks.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03 20:49:06 -04:00
Trond Myklebust
ca05cbae2a NFS: Fix up nfs_ctx_key_to_expire()
If the cached credential exists but doesn't have any expiration callback
then exit early.
Fix up atomicity issues when replacing the credential with a new one
since the existing code could lead to refcount leaks.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03 20:49:05 -04:00
Trond Myklebust
9019fb391d NFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink()
After the success of an operation such as rmdir() or unlink(), we expect
to add the dentry back to the dcache as an ordinary negative dentry.
However in NFS, unless it is labelled with the appropriate verifier for
the parent directory state, then nfs_lookup_revalidate will end up
discarding that dentry and forcing a new lookup.

The fix is to ensure that we relabel the dentry appropriately on
success.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03 20:49:05 -04:00
Trond Myklebust
342a67f088 NFS: Label the dentry with a verifier in nfs_link(), nfs_symlink()
After the success of an operation such as link(), or symlink(), we
expect to add the dentry back to the dcache as an ordinary positive
dentry.
However in NFS, unless it is labelled with the appropriate verifier for
the parent directory state, then nfs_lookup_revalidate will end up
discarding that dentry and forcing a new lookup.

The fix is to ensure that we relabel the dentry appropriately on
success.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-03 20:49:05 -04:00
Jeff Layton
90f7d7a0d0 locks: remove LOCK_MAND flock lock support
As best I can tell, the logic for these has been broken for a long time
(at least before the move to git), such that they never conflict with
anything. Also, nothing checks for these flags and prevented opens or
read/write behavior on the files. They don't seem to do anything.

Given that, we can rip these symbols out of the kernel, and just make
flock(2) return 0 when LOCK_MAND is set in order to preserve existing
behavior.

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-09-10 16:21:44 -04:00
Linus Torvalds
0961f0c00e NFS Client Updates for Linux 5.15
- New Features:
   - Better client responsiveness when server isn't replying
   - Use refcount_t in sunrpc rpc_client refcount tracking
   - Add srcaddr and dst_port to the sunrpc sysfs info files
   - Add basic support for connection sharing between servers with multiple NICs`
 
 - Bugfixes and Cleanups:
   - Sunrpc tracepoint cleanups
   - Disconnect after ib_post_send() errors to avoid deadlocks
   - Fix for tearing down rpcrdma_reps
   - Fix a potential pNFS layoutget livelock loop
   - pNFS layout barrier fixes
   - Fix a potential memory corruption in rpc_wake_up_queued_task_set_status()
   - Fix reconnection locking
   - Fix return value of get_srcport()
   - Remove rpcrdma_post_sends()
   - Remove pNFS dead code
   - Remove copy size restriction for inter-server copies
   - Overhaul the NFS callback service
   - Clean up sunrpc TCP socket shutdowns
   - Always provide aligned buffers to RPC read layers
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmExP7AACgkQ18tUv7Cl
 QOshTg/zBz7OfrS23CcLLgNidTJ6S7JOuj1DShG+YzsYXT8f9Nl1DadLM7yAEyok
 6JZzC8rXYzJcmYztHZzRyTuzj1+tGGb0u/MrD0bBk42VEel6eOjH/Y9ybn12Gf/E
 aqlcJh8hPx44U8oo5EFjRJsg2h28O06vywqhJz+sTbkqKN4hlAgMOo5ysAB+1thg
 BrTlR84EKBw5QqxPJ1WPmq9tEyGebU9Yrj1p8f0Uf015IeRNeTOXx3NzmdPshphf
 2yJvjumwEzqkcHXTJFDfP6ikIcGPPMNVAOK8DHb+vDGzNsOXW7dDM7GuWA3U8DlU
 ZHvyyb05Wwe6Wwg8xwx90FEXcYZFfZbSKmI9z2uoOuGFzNG07zWzPDzRft+qrOvU
 VMMwP9oEh71+qesmWTvqIbR2RjxqbCYlTcc8mBrD66DROi6jZ2jznraNC85sxG0Q
 b8GE+2SnYr2Q25yehj2xrRlOXyiYNkeeYmIpIquEqH9o7cSyDNJhBWbzIv6x+ith
 O/S06ZVKMc9X1nH5t5121XcHrSTMMVA/67WMyKfKMxWnrADAWPQALG+ttoTcbRu7
 Txew3Jb+hB8+ZdHAqbPf1l1i+7USQl1CRHMw3GRvNjCL2qcjZb1R7eyJRSQQtUyw
 q6SJRGe6Sn1FTUnn96Hv15Zy8VHx+q0cOL/EQVzL1RzJIXYcag==
 =Ad/3
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Better client responsiveness when server isn't replying
   - Use refcount_t in sunrpc rpc_client refcount tracking
   - Add srcaddr and dst_port to the sunrpc sysfs info files
   - Add basic support for connection sharing between servers with multiple NICs`

  Bugfixes and Cleanups:
   - Sunrpc tracepoint cleanups
   - Disconnect after ib_post_send() errors to avoid deadlocks
   - Fix for tearing down rpcrdma_reps
   - Fix a potential pNFS layoutget livelock loop
   - pNFS layout barrier fixes
   - Fix a potential memory corruption in rpc_wake_up_queued_task_set_status()
   - Fix reconnection locking
   - Fix return value of get_srcport()
   - Remove rpcrdma_post_sends()
   - Remove pNFS dead code
   - Remove copy size restriction for inter-server copies
   - Overhaul the NFS callback service
   - Clean up sunrpc TCP socket shutdowns
   - Always provide aligned buffers to RPC read layers"

* tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (39 commits)
  NFS: Always provide aligned buffers to the RPC read layers
  NFSv4.1 add network transport when session trunking is detected
  SUNRPC enforce creation of no more than max_connect xprts
  NFSv4 introduce max_connect mount options
  SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs
  SUNRPC keep track of number of transports to unique addresses
  NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox
  SUNRPC: Tweak TCP socket shutdown in the RPC client
  SUNRPC: Simplify socket shutdown when not reusing TCP ports
  NFSv4.2: remove restriction of copy size for inter-server copy.
  NFS: Clean up the synopsis of callback process_op()
  NFS: Extract the xdr_init_encode/decode() calls from decode_compound
  NFS: Remove unused callback void decoder
  NFS: Add a private local dispatcher for NFSv4 callback operations
  SUNRPC: Eliminate the RQ_AUTHERR flag
  SUNRPC: Set rq_auth_stat in the pg_authenticate() callout
  SUNRPC: Add svc_rqst::rq_auth_stat
  SUNRPC: Add dst_port to the sysfs xprt info file
  SUNRPC: Add srcaddr as a file in sysfs
  sunrpc: Fix return value of get_srcport()
  ...
2021-09-04 10:25:26 -07:00
Linus Torvalds
815409a12c overlayfs update for 5.15
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYTDKKAAKCRDh3BK/laaZ
 PG9PAQCUF0fdBlCKudwSEt5PV5xemycL9OCAlYCd7d4XbBIe9wEA6sVJL9J+OwV2
 aF0NomiXtJccE+S9+byjVCyqSzQJGQQ=
 =6L2Y
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs update from Miklos Szeredi:

 - Copy up immutable/append/sync/noatime attributes (Amir Goldstein)

 - Improve performance by enabling RCU lookup.

 - Misc fixes and improvements

The reason this touches so many files is that the ->get_acl() method now
gets a "bool rcu" argument.  The ->get_acl() API was updated based on
comments from Al and Linus:

Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/

* tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: enable RCU'd ->get_acl()
  vfs: add rcu argument to ->get_acl() callback
  ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
  ovl: use kvalloc in xattr copy-up
  ovl: update ctime when changing fileattr
  ovl: skip checking lower file's i_writecount on truncate
  ovl: relax lookup error on mismatch origin ftype
  ovl: do not set overlay.opaque for new directories
  ovl: add ovl_allow_offline_changes() helper
  ovl: disable decoding null uuid with redirect_dir
  ovl: consistent behavior for immutable/append-only inodes
  ovl: copy up sync/noatime fileattr flags
  ovl: pass ovl_fs to ovl_check_setxattr()
  fs: add generic helper for filling statx attribute flags
2021-09-02 09:21:27 -07:00
Linus Torvalds
8bda955776 New features:
- Support for server-side disconnect injection via debugfs
 - Protocol definitions for new RPC_AUTH_TLS authentication flavor
 
 Performance improvements:
 - Reduce page allocator traffic in the NFSD splice read actor
 - Reduce CPU utilization in svcrdma's Send completion handler
 
 Notable bug fixes:
 - Stabilize lockd operation when re-exporting NFS mounts
 - Fix the use of %.*s in NFSD tracepoints
 - Fix /proc/sys/fs/nfs/nsm_use_hostnames
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmEqq0AACgkQM2qzM29m
 f5dYig/5AaPN2BWYf4D1VkrAS3+zGS+3IN23WVgpbA54jgfjPEH+Aa00YhEQQa0j
 Y5u/jE5g/tWvenDefq5BmvdRfZMWCVc2JkngctOSflhaREUWK+HgCkH+5DQs6zUM
 rbX7qy0v6wJnEMSlwCKJ2AuZbYw7Bsg2nvOgEbb718/ent3umeoXEK09x3HTWLEp
 eVcMU5uicB5wRRPpROYG792oWzUScQ8kyiRCKJfQDoR7bINhBeVHObAIFMBo1UaH
 x9CMX4RlPYGmoMYUc+AqcOM7hizucHpXqM1r3oVjQ7FyI+pmDLuLL/3OTjtRUX7+
 nYLqNW/PijH9PjFe4BPjGHAUQfKiTIXANAe8VdjQj70D40jYkP+jQ9SPdV+pEgi4
 U4azfK3S+85/bRYYq/1alcLiP1+6dgcL++rVvnKESTH9NRgNoEw2WZHeKxXiYaxU
 p7oOC4XdnYDwcz/3QVWa0sK2kA5IJHzOsCQR7OilD09NAJ+AbJTAp0H3xFXTllzb
 AV2CAEBVZlP+pZYOehuVnKpZPa7YAWx92wRK2anbRUMZN3lF1wWBEOTd6KweIpTx
 l2GJSf3GWBqL1x9PjSet/cBusxYjTA+S1hE7KMrsNPhzbvpIgAZEtSqOfn9apDCV
 uAFIN2DSiHm3Tv0aFSJWo+CMyKkyktuiS8JFKaFdzCp9NtsBM2M=
 =TGkK
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "New features:

   - Support for server-side disconnect injection via debugfs

   - Protocol definitions for new RPC_AUTH_TLS authentication flavor

  Performance improvements:

   - Reduce page allocator traffic in the NFSD splice read actor

   - Reduce CPU utilization in svcrdma's Send completion handler

  Notable bug fixes:

   - Stabilize lockd operation when re-exporting NFS mounts

   - Fix the use of %.*s in NFSD tracepoints

   - Fix /proc/sys/fs/nfs/nsm_use_hostnames"

* tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits)
  nfsd: fix crash on LOCKT on reexported NFSv3
  nfs: don't allow reexport reclaims
  lockd: don't attempt blocking locks on nfs reexports
  nfs: don't atempt blocking locks on nfs reexports
  Keep read and write fds with each nlm_file
  lockd: update nlm_lookup_file reexport comment
  nlm: minor refactoring
  nlm: minor nlm_lookup_file argument change
  lockd: lockd server-side shouldn't set fl_ops
  SUNRPC: Add documentation for the fail_sunrpc/ directory
  SUNRPC: Server-side disconnect injection
  SUNRPC: Move client-side disconnect injection
  SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory
  svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free()
  nfsd4: Fix forced-expiry locking
  rpc: fix gss_svc_init cleanup on failure
  SUNRPC: Add RPC_AUTH_TLS protocol numbers
  lockd: change the proc_handler for nsm_use_hostnames
  sysctl: introduce new proc handler proc_dobool
  SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()
  ...
2021-08-31 10:57:06 -07:00
Trond Myklebust
8cfb901528 NFS: Always provide aligned buffers to the RPC read layers
Instead of messing around with XDR padding in the RDMA layer, we should
just give the RPC layer an aligned buffer. Try to avoid creating extra
RPC calls by aligning to the smaller value of ALIGN(len, rsize) and
PAGE_SIZE.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-30 13:21:38 -04:00
Olga Kornievskaia
2a7a451a90 NFSv4.1 add network transport when session trunking is detected
After trunking is discovered in nfs4_discover_server_trunking(),
add the transport to the old client structure if the allowed limit
of transports has not been reached.

An example: there exists a multi-homed server and client mounts
one server address and some volume and then doest another mount to
a different address of the same server and perhaps a different
volume. Previously, the client checks that this is a session
trunkable servers (same server), and removes the newly created
client structure along with its transport. Now, the client
adds the connection from the 2nd mount into the xprt switch of
the existing client (it leads to having 2 available connections).

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:41 -04:00
Olga Kornievskaia
dc48e0abee SUNRPC enforce creation of no more than max_connect xprts
If we are adding new transports via rpc_clnt_test_and_add_xprt()
then check if we've reached the limit. Currently only pnfs path
adds transports via that function but this is done in
preparation when the client would add new transports when
session trunking is detected. A warning is logged if the
limit is reached.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:29 -04:00
Olga Kornievskaia
7e134205f6 NFSv4 introduce max_connect mount options
This option will control up to how many xprts can the client
establish to the server with a distinct address (that means
nconnect connections are not counted towards this new limit).
This patch is setting up nfs structures to keeep track of the
max_connect limit (does not enforce it).

The default value is kept at 1 so that no current mounts that
don't want any additional connections would be effected. The
maximum value is set at 16.

Mounts to DS are not limited to default value of 1 but instead
set to the maximum default value of 16 (NFS_MAX_TRANSPORTS).

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:17 -04:00
Ye Bin
79d534f8cb NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox
As eb96d5c97b08 ("SUNRPC handle EKEYEXPIRED in call_refreshresult")
commit handle EKEYEXPIRED in call_refreshresult, so there is only handle
when "task->tk_status" is equal "-EJUKEBOX" in nfs3_async_handle_jukebox.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:36:21 -04:00
J. Bruce Fields
bb0a55bb71 nfs: don't allow reexport reclaims
In the reexport case, nfsd is currently passing along locks with the
reclaim bit set.  The client sends a new lock request, which is granted
if there's currently no conflict--even if it's possible a conflicting
lock could have been briefly held in the interim.

We don't currently have any way to safely grant reclaim, so for now
let's just deny them all.

I'm doing this by passing the reclaim bit to nfs and letting it fail the
call, with the idea that eventually the client might be able to do
something more forgiving here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-26 15:32:28 -04:00
J. Bruce Fields
f657f8eef3 nfs: don't atempt blocking locks on nfs reexports
NFS implements blocking locks by blocking inside its lock method.  In
the reexport case, this blocks the nfs server thread, which could lead
to deadlocks since an nfs server thread might be required to unlock the
conflicting lock.  It also causes a crash, since the nfs server thread
assumes it can free the lock when its lm_notify lock callback is called.

Ideal would be to make the nfs lock method return without blocking in
this case, but for now it works just not to attempt blocking locks.  The
difference is just that the original client will have to poll (as it
does in the v4.0 case) instead of getting a callback when the lock's
available.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-26 15:32:10 -04:00
Jeff Layton
f7e33bdbd6 fs: remove mandatory file locking support
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.

I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.

This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-08-23 06:15:36 -04:00
Miklos Szeredi
0cad624662 vfs: add rcu argument to ->get_acl() callback
Add a rcu argument to the ->get_acl() callback to allow
get_cached_acl_rcu() to call the ->get_acl() method in the next patch.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-18 22:08:24 +02:00
Dai Ngo
ca7d1d1a0b NFSv4.2: remove restriction of copy size for inter-server copy.
Currently inter-server copy is allowed only if the copy size is larger
than (rsize*14) which is the over-head of the mount operation of the
source export. This patch, relying on the delayed unmount feature,
removes this restriction since the mount and unmount overhead is now
not applicable for every inter-server copy.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00