579 Commits

Author SHA1 Message Date
Chuck Lever
d0b28aadfd NFSD: Add nfsd4_encode_fattr4_size()
Refactor the encoder for FATTR4_SIZE into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:14 -04:00
Chuck Lever
263453d9bb NFSD: Add nfsd4_encode_fattr4_change()
Refactor the encoder for FATTR4_CHANGE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

The code is restructured a bit to use the modern xdr_stream flow,
and the encoded cinfo value is made const so that callers of the
encoders can be passed a const cinfo.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:14 -04:00
Chuck Lever
36ed7e6494 NFSD: Add nfsd4_encode_fattr4_fh_expire_type()
Refactor the encoder for FATTR4_FH_EXPIRE_TYPE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:13 -04:00
Chuck Lever
b06cf37545 NFSD: Add nfsd4_encode_fattr4_type()
Refactor the encoder for FATTR4_TYPE into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

In addition, restructure the code so that byte-swapping is done on
constant values rather than at run time. Run-time swapping can be
costly on some platforms, and "type" is a frequently-requested
attribute.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:13 -04:00
Chuck Lever
c9090e2733 NFSD: Add nfsd4_encode_fattr4_supported_attrs()
Refactor the encoder for FATTR4_SUPPORTED_ATTRS into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:13 -04:00
Chuck Lever
8c4422881f NFSD: Add nfsd4_encode_fattr4__false()
Add an encoding helper that encodes a single boolean "false" value.
Attributes that always return "false" can use this helper.

In a subsequent patch, this helper will be called from a bitmask
loop, so it is given a standardized synopsis.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:12 -04:00
Chuck Lever
c88cb4727a NFSD: Add nfsd4_encode_fattr4__true()
Add an encoding helper that encodes a single boolean "true" value.
Attributes that always return "true" can use this helper.

In a subsequent patch, this helper will be called from a bitmask
loop, so it is given a standardized synopsis.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:12 -04:00
Chuck Lever
83ab8678ad NFSD: Add struct nfsd4_fattr_args
I'm about to split nfsd4_encode_fattr() into a number of smaller
functions. Instead of passing a large number of arguments to each of
the smaller functions, create a struct that can gather the common
argument variables into something with a convenient handle on it.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:12 -04:00
Chuck Lever
c3dcb45bcd NFSD: Clean up nfsd4_encode_setattr()
De-duplicate the encoding of bitmap4 results in
nfsd4_encode_setattr().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:11 -04:00
Chuck Lever
e64301f51b NFSD: Rename nfsd4_encode_bitmap()
For alignment with the specification, the name of NFSD's encoder
function should match the name of the XDR type.

I've also replaced a few "naked integers" with symbolic constants
that better reflect the usage of these values.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:11 -04:00
Dai Ngo
6c41d9a9bd NFSD: handle GETATTR conflict with write delegation
If the GETATTR request on a file that has write delegation in effect
and the request attributes include the change info and size attribute
then the request is handled as below:

Server sends CB_GETATTR to client to get the latest change info and file
size. If these values are the same as the server's cached values then
the GETATTR proceeds as normal.

If either the change info or file size is different from the server's
cached values, or the file was already marked as modified, then:

    . update time_modify and time_metadata into file's metadata
      with current time

    . encode GETATTR as normal except the file size is encoded with
      the value returned from CB_GETATTR

    . mark the file as modified

If the CB_GETATTR fails for any reasons, the delegation is recalled
and NFS4ERR_DELAY is returned for the GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:09 -04:00
Chuck Lever
0d32a6bbb8 NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
nfsd4_encode_readv() uses xdr->buf->page_len as a starting point for
the nfsd_iter_read() sink buffer -- page_len is going to be offset
by the parts of the COMPOUND that have already been encoded into
xdr->buf->pages.

However, that value must be captured /before/
xdr_reserve_space_vec() advances page_len by the expected size of
the read payload. Otherwise, the whole front part of the first
page of the payload in the reply will be uninitialized.

Mantas hit this because sec=krb5i forces RQ_SPLICE_OK off, which
invokes the readv part of the nfsd4_encode_read() path. Also,
older Linux NFS clients appear to send shorter READ requests
for files smaller than a page, whereas newer clients just send
page-sized requests and let the server send as many bytes as
are in the file.

Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/f1d0b234-e650-0f6e-0f5d-126b3d51d1eb@gmail.com/
Fixes: 703d75215555 ("NFSD: Hoist rq_vec preparation into nfsd_read() [step two]")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-09-28 10:34:28 -04:00
Chuck Lever
6372e2ee62 NFSD: da_addr_body field missing in some GETDEVICEINFO replies
The XDR specification in RFC 8881 looks like this:

struct device_addr4 {
	layouttype4	da_layout_type;
	opaque		da_addr_body<>;
};

struct GETDEVICEINFO4resok {
	device_addr4	gdir_device_addr;
	bitmap4		gdir_notification;
};

union GETDEVICEINFO4res switch (nfsstat4 gdir_status) {
case NFS4_OK:
	GETDEVICEINFO4resok gdir_resok4;
case NFS4ERR_TOOSMALL:
	count4		gdir_mincount;
default:
	void;
};

Looking at nfsd4_encode_getdeviceinfo() ....

When the client provides a zero gd_maxcount, then the Linux NFS
server implementation encodes the da_layout_type field and then
skips the da_addr_body field completely, proceeding directly to
encode gdir_notification field.

There does not appear to be an option in the specification to skip
encoding da_addr_body. Moreover, Section 18.40.3 says:

> If the client wants to just update or turn off notifications, it
> MAY send a GETDEVICEINFO operation with gdia_maxcount set to zero.
> In that event, if the device ID is valid, the reply's da_addr_body
> field of the gdir_device_addr field will be of zero length.

Since the layout drivers are responsible for encoding the
da_addr_body field, put this fix inside the ->encode_getdeviceinfo
methods.

Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Tom Haynes <loghyr@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
Chuck Lever
50bce06f0e NFSD: Report zero space limit for write delegations
Replace the -1 (no limit) with a zero (no reserved space).

This prevents certain non-determinant client behavior, such as
silly-renaming a file when the only open reference is a write
delegation. Such a rename can leave unexpected .nfs files in a
directory that is otherwise supposed to be empty.

Note that other server implementations that support write delegation
also set this field to zero.

Suggested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
Dai Ngo
fd19ca36fd NFSD: handle GETATTR conflict with write delegation
If the GETATTR request on a file that has write delegation in effect and
the request attributes include the change info and size attribute then
the write delegation is recalled. If the delegation is returned within
30ms then the GETATTR is serviced as normal otherwise the NFS4ERR_DELAY
error is returned for the GETATTR.

Add counter for write delegation recall due to conflict GETATTR. This is
used to evaluate the need to implement CB_GETATTR to adoid recalling the
delegation with conflit GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
Tavian Barnes
d7dbed457c nfsd: Fix creation time serialization order
In nfsd4_encode_fattr(), TIME_CREATE was being written out after all
other times.  However, they should be written out in an order that
matches the bit flags in bmval1, which in this case are

    #define FATTR4_WORD1_TIME_ACCESS        (1UL << 15)
    #define FATTR4_WORD1_TIME_CREATE        (1UL << 18)
    #define FATTR4_WORD1_TIME_DELTA         (1UL << 19)
    #define FATTR4_WORD1_TIME_METADATA      (1UL << 20)
    #define FATTR4_WORD1_TIME_MODIFY        (1UL << 21)

so TIME_CREATE should come second.

I noticed this on a FreeBSD NFSv4.2 client, which supports creation
times.  On this client, file times were weirdly permuted.  With this
patch applied on the server, times looked normal on the client.

Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute")
Link: https://unix.stackexchange.com/q/749605/56202
Signed-off-by: Tavian Barnes <tavianator@tavianator.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-27 12:10:47 -04:00
Chuck Lever
262176798b NFSD: Add an nfsd4_encode_nfstime4() helper
Clean up: de-duplicate some common code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17 13:18:06 -04:00
Dai Ngo
58f5d89400 NFSD: add encoding of op_recall flag for write delegation
Modified nfsd4_encode_open to encode the op_recall flag properly
for OPEN result with write delegation granted.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
2023-06-12 12:16:35 -04:00
Chuck Lever
703d752155 NFSD: Hoist rq_vec preparation into nfsd_read() [step two]
Now that the preparation of an rq_vec has been removed from the
generic read path, nfsd_splice_read() no longer needs to reset
rq_next_page.

nfsd4_encode_read() calls nfsd_splice_read() directly. As far as I
can ascertain, resetting rq_next_page for NFSv4 splice reads is
unnecessary because rq_next_page is already set correctly.

Moreover, resetting it might even be incorrect if previous
operations in the COMPOUND have already consumed at least a page of
the send buffer. I would expect that the result would be encoding
the READ payload over previously-encoded results.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-11 16:37:46 -04:00
Chuck Lever
ed4a567a17 NFSD: Update rq_next_page between COMPOUND operations
A GETATTR with a large result can advance xdr->page_ptr without
updating rq_next_page. If a splice READ follows that GETATTR in the
COMPOUND, nfsd_splice_actor can start splicing at the wrong page.

I've also seen READLINK and READDIR leave rq_next_page in an
unmodified state.

There are potentially a myriad of combinations like this, so play it
safe: move the rq_next_page update to nfsd4_encode_operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-11 16:37:45 -04:00
Chuck Lever
ba21e20b30 NFSD: Use svcxdr_encode_opaque_pages() in nfsd4_encode_splice_read()
Commit 15b23ef5d348 ("nfsd4: fix corruption of NFSv4 read data")
encountered exactly the same issue: after a splice read, a
filesystem-owned page is left in rq_pages[]; the symptoms are the
same as described there.

If the computed number of pages in nfsd4_encode_splice_read() is not
exactly the same as the actual number of pages that were consumed by
nfsd_splice_actor() (say, because of a bug) then hilarity ensues.

Instead of recomputing the page offset based on the size of the
payload, use rq_next_page, which is already properly updated by
nfsd_splice_actor(), to cause svc_rqst_release_pages() to operate
correctly in every instance.

This is a defensive change since we believe that after commit
27c934dd8832 ("nfsd: don't replace page in rq_pages if it's a
continuation of last page") has been applied, there are no known
opportunities for nfsd_splice_actor() to screw up. So I'm not
marking it for stable backport.

Reported-by: Andy Zlotek <andy.zlotek@oracle.com>
Suggested-by: Calum Mackay <calum.mackay@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-11 16:37:40 -04:00
Chuck Lever
66a21db7db NFSD: Replace encode_cinfo()
De-duplicate "reserve_space; encode_cinfo".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05 09:01:44 -04:00
Chuck Lever
adaa7a50d0 NFSD: Add encoders for NFSv4 clientids and verifiers
Deduplicate some common code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05 09:01:44 -04:00
Linus Torvalds
7bcff5a396 v6.4/vfs.acl
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZEEhwgAKCRCRxhvAZXjc
 otwgAQDXHnKiPm/d76lITXbxdUNCtvZz+ig26EbOrD+vEszzIQEA81dru0QbCNCt
 ctoZdcsmtKbt2VaYQF1CDOhlnNg5VQM=
 =pER1
 -----END PGP SIGNATURE-----

Merge tag 'v6.4/vfs.acl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull acl updates from Christian Brauner:
 "After finishing the introduction of the new posix acl api last cycle
  the generic POSIX ACL xattr handlers are still around in the
  filesystems xattr handlers for two reasons:

   (1) Because a few filesystems rely on the ->list() method of the
       generic POSIX ACL xattr handlers in their ->listxattr() inode
       operation.

   (2) POSIX ACLs are only available if IOP_XATTR is raised. The
       IOP_XATTR flag is raised in inode_init_always() based on whether
       the sb->s_xattr pointer is non-NULL. IOW, the registered xattr
       handlers of the filesystem are used to raise IOP_XATTR. Removing
       the generic POSIX ACL xattr handlers from all filesystems would
       risk regressing filesystems that only implement POSIX ACL support
       and no other xattrs (nfs3 comes to mind).

  This contains the work to decouple POSIX ACLs from the IOP_XATTR flag
  as they don't depend on xattr handlers anymore. So it's now possible
  to remove the generic POSIX ACL xattr handlers from the sb->s_xattr
  list of all filesystems. This is a crucial step as the generic POSIX
  ACL xattr handlers aren't used for POSIX ACLs anymore and POSIX ACLs
  don't depend on the xattr infrastructure anymore.

  Adressing problem (1) will require more long-term work. It would be
  best to get rid of the ->list() method of xattr handlers completely at
  some point.

  For erofs, ext{2,4}, f2fs, jffs2, ocfs2, and reiserfs the nop POSIX
  ACL xattr handler is kept around so they can continue to use
  array-based xattr handler indexing.

  This update does simplify the ->listxattr() implementation of all
  these filesystems however"

* tag 'v6.4/vfs.acl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  acl: don't depend on IOP_XATTR
  ovl: check for ->listxattr() support
  reiserfs: rework priv inode handling
  fs: rename generic posix acl handlers
  reiserfs: rework ->listxattr() implementation
  fs: simplify ->listxattr() implementation
  fs: drop unused posix acl handlers
  xattr: remove unused argument
  xattr: add listxattr helper
  xattr: simplify listxattr helpers
2023-04-24 13:35:23 -07:00
Linus Torvalds
ceeea1b782 nfsd-6.3 fixes:
- Fix a crash and a resource leak in NFSv4 COMPOUND processing
 - Fix issues with AUTH_SYS credential handling
 - Try again to address an NFS/NFSD/SUNRPC build dependency regression
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmQsLHMACgkQM2qzM29m
 f5eNAA/8DokMQLQ+gN8zhQNqmw92sUdW3m41o0/DfETVXyE60sX8uOE7PktSGwfz
 fWMMiQpvmnmw/lbO84XQ9i8E0hjh8cT26l1CJum4VgSFiBJJvIqNxb0Yro43R4Jc
 1wU2AOpC9qzCokdHhHKszDXuOgsz3v5OMJSQz3mG50dlq8/+6KKrCnK6jakyrvxr
 vKcVMsoENhxh2MnJfbsIQ70UM/rF6dmZWzGuBJH51Fkt+0FD9cnxXZKCkv1+D8JN
 5Hr+8rv4I/VqBGDzv9QHoEowZr70e8UUi2UME/jwSArhdxwsfPEqV/qWwHq9Q133
 RW40Gco7/E3JUjpAVTRXVGSB+LwU1EvhWQ9qSpSx5D2CPAHJ9hsOw4I54+Q0vD+j
 2druOpqIITZOvI3K54ZJXa2LK6SpZ8NnncP5YkLWOwR0Wqohy1U8Sm5uOiMs+IJa
 neTxL7f+u3MDQgaDCTuBkI4oKzSDF/ZiMTWh52iPyy9x03SRYXbW6UgqDiySIg0P
 jvvaDFCvKvvL2qEmksMoQbWxSjVj8PqL+qJIxQNIZwHbows6paL+l0rdSPXc+l2O
 97GBlqNPfHt+AjfJvGDscaIcLA+gu+ErzwxG6BLKvB9QcX9/F3A62Nh3txpe5Q1r
 M5NyQwK3vVQcTejMqw34sBqp3EeI5iIJ9CjD/2tN+dUpeHyQld8=
 =bk6S
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a crash and a resource leak in NFSv4 COMPOUND processing

 - Fix issues with AUTH_SYS credential handling

 - Try again to address an NFS/NFSD/SUNRPC build dependency regression

* tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: callback request does not use correct credential for AUTH_SYS
  NFS: Remove "select RPCSEC_GSS_KRB5
  sunrpc: only free unix grouplist after RCU settles
  nfsd: call op_release, even when op_func returns an error
  NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
2023-04-04 11:20:55 -07:00
Jeff Layton
15a8b55dbb nfsd: call op_release, even when op_func returns an error
For ops with "trivial" replies, nfsd4_encode_operation will shortcut
most of the encoding work and skip to just marshalling up the status.
One of the things it skips is calling op_release. This could cause a
memory leak in the layoutget codepath if there is an error at an
inopportune time.

Have the compound processing engine always call op_release, even when
op_func sets an error in op->status. With this change, we also need
nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL
on error to avoid a double free.

Reported-by: Zhi Li <yieli@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403
Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-31 17:29:49 -04:00
Chuck Lever
804d8e0a6e NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
OPDESC() simply indexes into nfsd4_ops[] by the op's operation
number, without range checking that value. It assumes callers are
careful to avoid calling it with an out-of-bounds opnum value.

nfsd4_decode_compound() is not so careful, and can invoke OPDESC()
with opnum set to OP_ILLEGAL, which is 10044 -- well beyond the end
of nfsd4_ops[].

Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: f4f9ef4a1b0a ("nfsd4: opdesc will be useful outside nfs4proc.c")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-31 17:28:49 -04:00
Christian Brauner
831be973aa
xattr: remove unused argument
his helpers is really just used to check for user.* xattr support so
don't make it pointlessly generic.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-06 09:57:11 +01:00
Jeff Layton
638e3e7d94 nfsd: use the getattr operation to fetch i_version
Now that we can call into vfs_getattr to get the i_version field, use
that facility to fetch it instead of doing it in nfsd4_change_attribute.

Neil also pointed out recently that IS_I_VERSION directory operations
are always logged, and so we only need to mitigate the rollback problem
on regular files. Also, we don't need to factor in the ctime when
reexporting NFS or Ceph.

Set the STATX_CHANGE_COOKIE (and BTIME) bits in the request when we're
dealing with a v4 request. Then, instead of looking at IS_I_VERSION when
generating the change attr, look at the result mask and only use it if
STATX_CHANGE_COOKIE is set.

Change nfsd4_change_attribute to only factor in the ctime if it's a
regular file and the fs doesn't advertise STATX_ATTR_CHANGE_MONOTONIC.

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-01-26 07:00:06 -05:00
Chuck Lever
7827c81f02 Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"
The premise that "Once an svc thread is scheduled and executing an
RPC, no other processes will touch svc_rqst::rq_flags" is false.
svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
threads when determining which thread to wake up next.

Found via KCSAN.

Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-06 13:17:12 -05:00
Jeff Layton
cad853374d nfsd: fix handling of readdir in v4root vs. mount upcall timeout
If v4 READDIR operation hits a mountpoint and gets back an error,
then it will include that entry in the reply and set RDATTR_ERROR for it
to the error.

That's fine for "normal" exported filesystems, but on the v4root, we
need to be more careful to only expose the existence of dentries that
lead to exports.

If the mountd upcall times out while checking to see whether a
mountpoint on the v4root is exported, then we have no recourse other
than to fail the whole operation.

Cc: Steve Dickson <steved@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
Reported-by: JianHong Yin <yin-jianhong@163.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
2023-01-02 10:45:31 -05:00
Kees Cook
e78e274eb2 NFSD: Avoid clashing function prototypes
When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].

There were 97 warnings produced by NFS. For example:

fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict]
        [OP_ACCESS]             = (nfsd4_dec)nfsd4_decode_access,
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The enc/dec callbacks were defined as passing "void *" as the second
argument, but were being implicitly cast to a new type. Replace the
argument with union nfsd4_op_u, and perform explicit member selection
in the function body. There are no resulting binary differences.

Changes were made mechanically using the following Coccinelle script,
with minor by-hand fixes for members that didn't already match their
existing argument name:

@find@
identifier func;
type T, opsT;
identifier ops, N;
@@

 opsT ops[] = {
	[N] = (T) func,
 };

@already_void@
identifier find.func;
identifier name;
@@

 func(...,
-void
+union nfsd4_op_u
 *name)
 {
	...
 }

@proto depends on !already_void@
identifier find.func;
type T;
identifier name;
position p;
@@

 func@p(...,
 	T name
 ) {
	...
   }

@script:python get_member@
type_name << proto.T;
member;
@@

coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0])

@convert@
identifier find.func;
type proto.T;
identifier proto.name;
position proto.p;
identifier get_member.member;
@@

 func@p(...,
-	T name
+	union nfsd4_op_u *u
 ) {
+	T name = &u->member;
	...
   }

@cast@
identifier find.func;
type T, opsT;
identifier ops, N;
@@

 opsT ops[] = {
	[N] =
-	(T)
	func,
 };

Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10 11:01:13 -05:00
Anna Schumaker
eeadcb7579 NFSD: Simplify READ_PLUS
Chuck had suggested reverting READ_PLUS so it returns a single DATA
segment covering the requested read range. This prepares the server for
a future "sparse read" function so support can easily be added without
needing to rip out the old READ_PLUS code at the same time.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:43 -05:00
Chuck Lever
9993a66317 NFSD: Clean up nfs4svc_encode_compoundres()
In today's Linux NFS server implementation, the NFS dispatcher
initializes each XDR result stream, and the NFSv4 .pc_func and
.pc_encode methods all use xdr_stream-based encoding. This keeps
rq_res.len automatically updated. There is no longer a need for
the WARN_ON_ONCE() check in nfs4svc_encode_compoundres().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:48 -04:00
Chuck Lever
3fdc546462 NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing
Have SunRPC clear everything except for the iops array. Then have
each NFSv4 XDR decoder clear it's own argument before decoding.

Now individual operations may have a large argument struct while not
penalizing the vast majority of operations with a small struct.

And, clearing the argument structure occurs as the argument fields
are initialized, enabling the CPU to do write combining on that
memory. In some cases, clearing is not even necessary because all
of the fields in the argument structure are initialized by the
decoder.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:42 -04:00
Anna Schumaker
06981d5606 NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
This was discussed with Chuck as part of this patch set. Returning
nfserr_resource was decided to not be the best error message here, and
he suggested changing to nfserr_serverfault instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Link: https://lore.kernel.org/linux-nfs/20220907195259.926736-1-anna@kernel.org/T/#t
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:41 -04:00
Jeff Layton
6106d9119b nfsd: clean up mounted_on_fileid handling
We only need the inode number for this, not a full rack of attributes.
Rename this function make it take a pointer to a u64 instead of
struct kstat, and change it to just request STATX_INO.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
[ cel: renamed get_mounted_on_ino() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:30 -04:00
Chuck Lever
7518a3dc5e NFSD: Fix handling of oversized NFSv4 COMPOUND requests
If an NFS server returns NFS4ERR_RESOURCE on the first operation in
an NFSv4 COMPOUND, there's no way for a client to know where the
problem is and then simplify the compound to make forward progress.

So instead, make NFSD process as many operations in an oversized
COMPOUND as it can and then return NFS4ERR_RESOURCE on the first
operation it did not process.

pynfs NFSv4.0 COMP6 exercises this case, but checks only for the
COMPOUND status code, not whether the server has processed any
of the operations.

pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects
too many operations per COMPOUND by checking against the limits
negotiated when the session was created.

Suggested-by: Bruce Fields <bfields@fieldses.org>
Fixes: 0078117c6d91 ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:30 -04:00
Chuck Lever
80e591ce63 NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND
When attempting an NFSv4 mount, a Solaris NFSv4 client builds a
single large COMPOUND that chains a series of LOOKUPs to get to the
pseudo filesystem root directory that is to be mounted. The Linux
NFS server's current maximum of 16 operations per NFSv4 COMPOUND is
not large enough to ensure that this works for paths that are more
than a few components deep.

Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most
NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not
trigger any re-allocation of the operation array on the server),
increasing this maximum should result in little to no impact.

The ops array can get large now, so allocate it via vmalloc() to
help ensure memory fragmentation won't cause an allocation failure.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:25 -04:00
Chuck Lever
1913cdf56c NFSD: Replace boolean fields in struct nfsd4_copy
Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy()
with an atomic bitop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:59 -04:00
Chuck Lever
87689df694 NFSD: Shrink size of struct nfsd4_copy
struct nfsd4_copy is part of struct nfsd4_op, which resides in an
8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 1696, cachelines: 27, members: 5 */
After:  /* size: 672, cachelines: 11, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
09426ef2a6 NFSD: Shrink size of struct nfsd4_copy_notify
struct nfsd4_copy_notify is part of struct nfsd4_op, which resides
in an 8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 2208, cachelines: 35, members: 5 */
After:  /* size: 1696, cachelines: 27, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
bb4d842722 NFSD: nfserrno(-ENOMEM) is nfserr_jukebox
Suggested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
99b002a1fa NFSD: Clean up nfsd4_encode_readlink()
Similar changes to nfsd4_encode_readv(), all bundled into a single
patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
5e64d85c7d NFSD: Use xdr_pad_size()
Clean up: Use a helper instead of open-coding the calculation of
the XDR pad size.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
071ae99fea NFSD: Simplify starting_len
Clean-up: Now that nfsd4_encode_readv() does not have to encode the
EOF or rd_length values, it no longer needs to subtract 8 from
@starting_len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
28d5bc468e NFSD: Optimize nfsd4_encode_readv()
write_bytes_to_xdr_buf() is pretty expensive to use for inserting
an XDR data item that is always 1 XDR_UNIT at an address that is
always XDR word-aligned.

Since both the readv and splice read paths encode EOF and maxcount
values, move both to a common code path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
24c7fb8549 NFSD: Add an nfsd4_read::rd_eof field
Refactor: Make the EOF result available in the entire NFSv4 READ
path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
c738b218a2 NFSD: Clean up SPLICE_OK in nfsd4_encode_read()
Do the test_bit() once -- this reduces the number of locked-bus
operations and makes the function a little easier to read.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
ab04de60ae NFSD: Optimize nfsd4_encode_fattr()
write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.

However, it is costly. In nfsd4_encode_fattr(), it is unnecessary
because the data item is fixed in size and the buffer destination
address is always word-aligned.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00