License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 17:07:57 +03:00
// SPDX-License-Identifier: GPL-2.0
2005-04-17 02:20:36 +04:00
/*
* File operations used by nfsd . Some of these have been ripped from
* other parts of the kernel because they weren ' t exported , others
* are partial duplicates with added or changed functionality .
*
* Note that several functions dget ( ) the dentry upon which they want
* to act , most notably those that create directory entries . Response
* dentry ' s are dput ( ) ' d if necessary in the release callback .
* So if you notice code paths that apparently fail to dput ( ) the
* dentry , don ' t worry - - they have been taken care of .
*
* Copyright ( C ) 1995 - 1999 Olaf Kirch < okir @ monad . swb . de >
* Zerocpy NFS support ( C ) 2002 Hirokazu Takahashi < taka @ valinux . co . jp >
*/
# include <linux/fs.h>
# include <linux/file.h>
2007-06-04 11:59:47 +04:00
# include <linux/splice.h>
2014-11-07 22:44:26 +03:00
# include <linux/falloc.h>
2005-04-17 02:20:36 +04:00
# include <linux/fcntl.h>
# include <linux/namei.h>
# include <linux/delay.h>
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 01:06:03 +04:00
# include <linux/fsnotify.h>
2005-04-17 02:20:36 +04:00
# include <linux/posix_acl_xattr.h>
# include <linux/xattr.h>
2009-12-03 21:30:56 +03:00
# include <linux/jhash.h>
2022-02-14 01:23:58 +03:00
# include <linux/pagemap.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 11:04:11 +03:00
# include <linux/slab.h>
2016-12-24 22:46:01 +03:00
# include <linux/uaccess.h>
2010-02-17 23:05:11 +03:00
# include <linux/exportfs.h>
# include <linux/writeback.h>
2013-05-02 21:19:10 +04:00
# include <linux/security.h>
2009-12-03 21:30:56 +03:00
# include "xdr3.h"
2006-01-10 07:51:55 +03:00
# ifdef CONFIG_NFSD_V4
2015-12-03 14:59:52 +03:00
# include "../internal.h"
2011-01-05 01:37:15 +03:00
# include "acl.h"
# include "idmap.h"
2021-12-19 04:38:00 +03:00
# include "xdr4.h"
2005-04-17 02:20:36 +04:00
# endif /* CONFIG_NFSD_V4 */
2009-12-03 21:30:56 +03:00
# include "nfsd.h"
# include "vfs.h"
2019-08-18 21:18:49 +03:00
# include "filecache.h"
2015-11-17 14:52:23 +03:00
# include "trace.h"
2005-04-17 02:20:36 +04:00
# define NFSDDBG_FACILITY NFSDDBG_FILEOP
2022-10-18 14:47:55 +03:00
/**
* nfserrno - Map Linux errnos to NFS errnos
* @ errno : POSIX ( - ish ) error code to be mapped
*
* Returns the appropriate ( net - endian ) nfserr_ * ( or nfs_ok if errno is 0 ) . If
* it ' s an error we don ' t expect , log it once and return nfserr_io .
*/
__be32
nfserrno ( int errno )
{
static struct {
__be32 nfserr ;
int syserr ;
} nfs_errtbl [ ] = {
{ nfs_ok , 0 } ,
{ nfserr_perm , - EPERM } ,
{ nfserr_noent , - ENOENT } ,
{ nfserr_io , - EIO } ,
{ nfserr_nxio , - ENXIO } ,
{ nfserr_fbig , - E2BIG } ,
{ nfserr_stale , - EBADF } ,
{ nfserr_acces , - EACCES } ,
{ nfserr_exist , - EEXIST } ,
{ nfserr_xdev , - EXDEV } ,
{ nfserr_mlink , - EMLINK } ,
{ nfserr_nodev , - ENODEV } ,
{ nfserr_notdir , - ENOTDIR } ,
{ nfserr_isdir , - EISDIR } ,
{ nfserr_inval , - EINVAL } ,
{ nfserr_fbig , - EFBIG } ,
{ nfserr_nospc , - ENOSPC } ,
{ nfserr_rofs , - EROFS } ,
{ nfserr_mlink , - EMLINK } ,
{ nfserr_nametoolong , - ENAMETOOLONG } ,
{ nfserr_notempty , - ENOTEMPTY } ,
{ nfserr_dquot , - EDQUOT } ,
{ nfserr_stale , - ESTALE } ,
{ nfserr_jukebox , - ETIMEDOUT } ,
{ nfserr_jukebox , - ERESTARTSYS } ,
{ nfserr_jukebox , - EAGAIN } ,
{ nfserr_jukebox , - EWOULDBLOCK } ,
{ nfserr_jukebox , - ENOMEM } ,
{ nfserr_io , - ETXTBSY } ,
{ nfserr_notsupp , - EOPNOTSUPP } ,
{ nfserr_toosmall , - ETOOSMALL } ,
{ nfserr_serverfault , - ESERVERFAULT } ,
{ nfserr_serverfault , - ENFILE } ,
{ nfserr_io , - EREMOTEIO } ,
{ nfserr_stale , - EOPENSTALE } ,
{ nfserr_io , - EUCLEAN } ,
{ nfserr_perm , - ENOKEY } ,
{ nfserr_no_grace , - ENOGRACE } ,
} ;
int i ;
for ( i = 0 ; i < ARRAY_SIZE ( nfs_errtbl ) ; i + + ) {
if ( nfs_errtbl [ i ] . syserr = = errno )
return nfs_errtbl [ i ] . nfserr ;
}
WARN_ONCE ( 1 , " nfsd: non-standard errno: %d \n " , errno ) ;
return nfserr_io ;
}
2005-04-17 02:20:36 +04:00
/*
* Called from nfsd_lookup and encode_dirent . Check if we have crossed
* a mount point .
2006-12-13 11:35:25 +03:00
* Returns - EAGAIN or - ETIMEDOUT leaving * dpp and * expp unchanged ,
2005-04-17 02:20:36 +04:00
* or nfs_ok having possibly changed * dpp and * expp
*/
int
nfsd_cross_mnt ( struct svc_rqst * rqstp , struct dentry * * dpp ,
struct svc_export * * expp )
{
struct svc_export * exp = * expp , * exp2 = NULL ;
struct dentry * dentry = * dpp ;
2009-04-18 10:42:05 +04:00
struct path path = { . mnt = mntget ( exp - > ex_path . mnt ) ,
. dentry = dget ( dentry ) } ;
2022-12-07 11:43:08 +03:00
unsigned int follow_flags = 0 ;
2006-10-20 10:28:58 +04:00
int err = 0 ;
2005-04-17 02:20:36 +04:00
2022-12-07 11:43:08 +03:00
if ( exp - > ex_flags & NFSEXP_CROSSMOUNT )
follow_flags = LOOKUP_AUTOMOUNT ;
err = follow_down ( & path , follow_flags ) ;
Add a dentry op to allow processes to be held during pathwalk transit
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk. The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).
The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory. This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.
The ->d_manage() dentry operation:
int (*d_manage)(struct path *path, bool mounting_here);
takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.
It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.
->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep. However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.
Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.
follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).
A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()). The new follow_down() calls d_manage() as appropriate. It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage(). follow_down()
ignores automount points so that it can be used to mount on them.
__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself. That would allow the autofs
daemon to continue on in rcu-walk mode.
Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked. It can always be set again when necessary.
==========================
WHAT THIS MEANS FOR AUTOFS
==========================
Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.
autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it. This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.
The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:
mkdir S ffffffff8014e05a 0 32580 24956
Call Trace:
[<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
[<ffffffff80127f7d>] avc_has_perm+0x46/0x58
[<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
[<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
[<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
[<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
[<ffffffff80057a2f>] lookup_create+0x46/0x80
[<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4
versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:
automount D ffffffff8014e05a 0 32581 1 32561
Call Trace:
[<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
[<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
[<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
[<ffffffff800e6d55>] do_rmdir+0x77/0xde
[<ffffffff8005d229>] tracesys+0x71/0xe0
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
which means that the system is deadlocked.
This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().
Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-14 21:45:26 +03:00
if ( err < 0 )
goto out ;
NFS: don't try to cross a mountpount when there isn't one there.
consider the sequence of commands:
mkdir -p /import/nfs /import/bind /import/etc
mount --bind / /import/bind
mount --make-private /import/bind
mount --bind /import/etc /import/bind/etc
exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
mount -o vers=4 localhost:/ /import/nfs
ls -l /import/nfs/etc
You would not expect this to report a stale file handle.
Yet it does.
The manipulations under /import/bind cause the dentry for
/etc to get the DCACHE_MOUNTED flag set, even though nothing
is mounted on /etc. This causes nfsd to call
nfsd_cross_mnt() even though there is no mountpoint. So an
upcall to mountd for "/etc" is performed.
The 'crossmnt' flag on the export of / causes mountd to
report that /etc is exported as it is a descendant of /. It
assumes the kernel wouldn't ask about something that wasn't
a mountpoint. The filehandle returned identifies the
filesystem and the inode number of /etc.
When this filehandle is presented to rpc.mountd, via
"nfsd.fh", the inode cannot be found associated with any
name in /etc/exports, or with any mountpoint listed by
getmntent(). So rpc.mountd says the filehandle doesn't
exist. Hence ESTALE.
This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
too much. It is just a hint, not a guarantee.
Change nfsd_mountpoint() to return '1' for a certain mountpoint,
'2' for a possible mountpoint, and 0 otherwise.
Then change nfsd_crossmnt() to check if follow_down()
actually found a mountpount and, if not, to avoid performing
a lookup if the location is not known to certainly require
an export-point.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-03-15 04:40:44 +03:00
if ( path . mnt = = exp - > ex_path . mnt & & path . dentry = = dentry & &
nfsd_mountpoint ( dentry , exp ) = = 2 ) {
/* This is only a mountpoint in some other namespace */
path_put ( & path ) ;
goto out ;
}
2005-04-17 02:20:36 +04:00
2009-04-18 10:42:05 +04:00
exp2 = rqst_exp_get_by_name ( rqstp , & path ) ;
2005-04-17 02:20:36 +04:00
if ( IS_ERR ( exp2 ) ) {
2009-10-26 04:18:19 +03:00
err = PTR_ERR ( exp2 ) ;
/*
* We normally allow NFS clients to continue
* " underneath " a mountpoint that is not exported .
* The exception is V4ROOT , where no traversal is ever
* allowed without an explicit export of the new
* directory .
*/
if ( err = = - ENOENT & & ! ( exp - > ex_flags & NFSEXP_V4ROOT ) )
err = 0 ;
2009-04-18 10:42:05 +04:00
path_put ( & path ) ;
2005-04-17 02:20:36 +04:00
goto out ;
}
2009-09-09 23:02:40 +04:00
if ( nfsd_v4client ( rqstp ) | |
( exp - > ex_flags & NFSEXP_CROSSMOUNT ) | | EX_NOHIDE ( exp2 ) ) {
2005-04-17 02:20:36 +04:00
/* successfully crossed mount point */
2009-04-18 10:32:31 +04:00
/*
2009-04-18 10:42:05 +04:00
* This is subtle : path . dentry is * not * on path . mnt
* at this point . The only reason we are safe is that
* original mnt is pinned down by exp , so we should
* put path * before * putting exp
2009-04-18 10:32:31 +04:00
*/
2009-04-18 10:42:05 +04:00
* dpp = path . dentry ;
path . dentry = dentry ;
2009-04-18 10:32:31 +04:00
* expp = exp2 ;
2009-04-18 10:42:05 +04:00
exp2 = exp ;
2005-04-17 02:20:36 +04:00
}
2009-04-18 10:42:05 +04:00
path_put ( & path ) ;
exp_put ( exp2 ) ;
2005-04-17 02:20:36 +04:00
out :
return err ;
}
2009-09-27 04:32:24 +04:00
static void follow_to_parent ( struct path * path )
{
struct dentry * dp ;
while ( path - > dentry = = path - > mnt - > mnt_root & & follow_up ( path ) )
;
dp = dget_parent ( path - > dentry ) ;
dput ( path - > dentry ) ;
path - > dentry = dp ;
}
static int nfsd_lookup_parent ( struct svc_rqst * rqstp , struct dentry * dparent , struct svc_export * * exp , struct dentry * * dentryp )
{
struct svc_export * exp2 ;
struct path path = { . mnt = mntget ( ( * exp ) - > ex_path . mnt ) ,
. dentry = dget ( dparent ) } ;
follow_to_parent ( & path ) ;
exp2 = rqst_exp_parent ( rqstp , & path ) ;
if ( PTR_ERR ( exp2 ) = = - ENOENT ) {
* dentryp = dget ( dparent ) ;
} else if ( IS_ERR ( exp2 ) ) {
path_put ( & path ) ;
return PTR_ERR ( exp2 ) ;
} else {
* dentryp = dget ( path . dentry ) ;
exp_put ( * exp ) ;
* exp = exp2 ;
}
path_put ( & path ) ;
return 0 ;
}
2009-10-26 04:33:15 +03:00
/*
* For nfsd purposes , we treat V4ROOT exports as though there was an
* export at * every * directory .
NFS: don't try to cross a mountpount when there isn't one there.
consider the sequence of commands:
mkdir -p /import/nfs /import/bind /import/etc
mount --bind / /import/bind
mount --make-private /import/bind
mount --bind /import/etc /import/bind/etc
exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
mount -o vers=4 localhost:/ /import/nfs
ls -l /import/nfs/etc
You would not expect this to report a stale file handle.
Yet it does.
The manipulations under /import/bind cause the dentry for
/etc to get the DCACHE_MOUNTED flag set, even though nothing
is mounted on /etc. This causes nfsd to call
nfsd_cross_mnt() even though there is no mountpoint. So an
upcall to mountd for "/etc" is performed.
The 'crossmnt' flag on the export of / causes mountd to
report that /etc is exported as it is a descendant of /. It
assumes the kernel wouldn't ask about something that wasn't
a mountpoint. The filehandle returned identifies the
filesystem and the inode number of /etc.
When this filehandle is presented to rpc.mountd, via
"nfsd.fh", the inode cannot be found associated with any
name in /etc/exports, or with any mountpoint listed by
getmntent(). So rpc.mountd says the filehandle doesn't
exist. Hence ESTALE.
This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
too much. It is just a hint, not a guarantee.
Change nfsd_mountpoint() to return '1' for a certain mountpoint,
'2' for a possible mountpoint, and 0 otherwise.
Then change nfsd_crossmnt() to check if follow_down()
actually found a mountpount and, if not, to avoid performing
a lookup if the location is not known to certainly require
an export-point.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-03-15 04:40:44 +03:00
* We return :
* ' 1 ' if this dentry * must * be an export point ,
* ' 2 ' if it might be , if there is really a mount here , and
* ' 0 ' if there is no chance of an export point here .
2009-10-26 04:33:15 +03:00
*/
2009-10-26 04:43:01 +03:00
int nfsd_mountpoint ( struct dentry * dentry , struct svc_export * exp )
2009-10-26 04:33:15 +03:00
{
NFS: don't try to cross a mountpount when there isn't one there.
consider the sequence of commands:
mkdir -p /import/nfs /import/bind /import/etc
mount --bind / /import/bind
mount --make-private /import/bind
mount --bind /import/etc /import/bind/etc
exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
mount -o vers=4 localhost:/ /import/nfs
ls -l /import/nfs/etc
You would not expect this to report a stale file handle.
Yet it does.
The manipulations under /import/bind cause the dentry for
/etc to get the DCACHE_MOUNTED flag set, even though nothing
is mounted on /etc. This causes nfsd to call
nfsd_cross_mnt() even though there is no mountpoint. So an
upcall to mountd for "/etc" is performed.
The 'crossmnt' flag on the export of / causes mountd to
report that /etc is exported as it is a descendant of /. It
assumes the kernel wouldn't ask about something that wasn't
a mountpoint. The filehandle returned identifies the
filesystem and the inode number of /etc.
When this filehandle is presented to rpc.mountd, via
"nfsd.fh", the inode cannot be found associated with any
name in /etc/exports, or with any mountpoint listed by
getmntent(). So rpc.mountd says the filehandle doesn't
exist. Hence ESTALE.
This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
too much. It is just a hint, not a guarantee.
Change nfsd_mountpoint() to return '1' for a certain mountpoint,
'2' for a possible mountpoint, and 0 otherwise.
Then change nfsd_crossmnt() to check if follow_down()
actually found a mountpount and, if not, to avoid performing
a lookup if the location is not known to certainly require
an export-point.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-03-15 04:40:44 +03:00
if ( ! d_inode ( dentry ) )
return 0 ;
if ( exp - > ex_flags & NFSEXP_V4ROOT )
2009-10-26 04:33:15 +03:00
return 1 ;
2011-09-13 03:37:26 +04:00
if ( nfsd4_is_junction ( dentry ) )
return 1 ;
2022-12-07 11:43:07 +03:00
if ( d_managed ( dentry ) )
NFS: don't try to cross a mountpount when there isn't one there.
consider the sequence of commands:
mkdir -p /import/nfs /import/bind /import/etc
mount --bind / /import/bind
mount --make-private /import/bind
mount --bind /import/etc /import/bind/etc
exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
mount -o vers=4 localhost:/ /import/nfs
ls -l /import/nfs/etc
You would not expect this to report a stale file handle.
Yet it does.
The manipulations under /import/bind cause the dentry for
/etc to get the DCACHE_MOUNTED flag set, even though nothing
is mounted on /etc. This causes nfsd to call
nfsd_cross_mnt() even though there is no mountpoint. So an
upcall to mountd for "/etc" is performed.
The 'crossmnt' flag on the export of / causes mountd to
report that /etc is exported as it is a descendant of /. It
assumes the kernel wouldn't ask about something that wasn't
a mountpoint. The filehandle returned identifies the
filesystem and the inode number of /etc.
When this filehandle is presented to rpc.mountd, via
"nfsd.fh", the inode cannot be found associated with any
name in /etc/exports, or with any mountpoint listed by
getmntent(). So rpc.mountd says the filehandle doesn't
exist. Hence ESTALE.
This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
too much. It is just a hint, not a guarantee.
Change nfsd_mountpoint() to return '1' for a certain mountpoint,
'2' for a possible mountpoint, and 0 otherwise.
Then change nfsd_crossmnt() to check if follow_down()
actually found a mountpount and, if not, to avoid performing
a lookup if the location is not known to certainly require
an export-point.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-03-15 04:40:44 +03:00
/*
* Might only be a mountpoint in a different namespace ,
* but we need to check .
*/
return 2 ;
return 0 ;
2009-10-26 04:33:15 +03:00
}
2006-10-20 10:28:58 +04:00
__be32
2007-07-17 15:04:47 +04:00
nfsd_lookup_dentry ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
2007-11-01 23:57:09 +03:00
const char * name , unsigned int len ,
2007-07-17 15:04:47 +04:00
struct svc_export * * exp_ret , struct dentry * * dentry_ret )
2005-04-17 02:20:36 +04:00
{
struct svc_export * exp ;
struct dentry * dparent ;
struct dentry * dentry ;
2006-10-20 10:28:58 +04:00
int host_err ;
2005-04-17 02:20:36 +04:00
dprintk ( " nfsd: nfsd_lookup(fh %s, %.*s) \n " , SVCFH_fmt ( fhp ) , len , name ) ;
dparent = fhp - > fh_dentry ;
2014-06-10 18:06:44 +04:00
exp = exp_get ( fhp - > fh_export ) ;
2005-04-17 02:20:36 +04:00
/* Lookup the name, but don't follow links */
if ( isdotent ( name , len ) ) {
if ( len = = 1 )
dentry = dget ( dparent ) ;
2008-02-15 06:38:39 +03:00
else if ( dparent ! = exp - > ex_path . dentry )
2005-04-17 02:20:36 +04:00
dentry = dget_parent ( dparent ) ;
2009-09-27 00:53:01 +04:00
else if ( ! EX_NOHIDE ( exp ) & & ! nfsd_v4client ( rqstp ) )
2005-04-17 02:20:36 +04:00
dentry = dget ( dparent ) ; /* .. == . just like at / */
else {
/* checking mountpoint crossing is very different when stepping up */
2009-09-27 04:32:24 +04:00
host_err = nfsd_lookup_parent ( rqstp , dparent , & exp , & dentry ) ;
if ( host_err )
2005-04-17 02:20:36 +04:00
goto out_nfserr ;
}
} else {
2022-07-26 09:45:30 +03:00
dentry = lookup_one_len_unlocked ( name , dparent , len ) ;
2006-10-20 10:28:58 +04:00
host_err = PTR_ERR ( dentry ) ;
2005-04-17 02:20:36 +04:00
if ( IS_ERR ( dentry ) )
goto out_nfserr ;
2009-10-26 04:33:15 +03:00
if ( nfsd_mountpoint ( dentry , exp ) ) {
2022-07-26 09:45:30 +03:00
host_err = nfsd_cross_mnt ( rqstp , & dentry , & exp ) ;
if ( host_err ) {
2005-04-17 02:20:36 +04:00
dput ( dentry ) ;
goto out_nfserr ;
}
}
}
2007-07-17 15:04:47 +04:00
* dentry_ret = dentry ;
* exp_ret = exp ;
return 0 ;
out_nfserr :
exp_put ( exp ) ;
return nfserrno ( host_err ) ;
}
2022-07-26 09:45:30 +03:00
/**
* nfsd_lookup - look up a single path component for nfsd
*
* @ rqstp : the request context
* @ fhp : the file handle of the directory
* @ name : the component name , or % NULL to look up parent
* @ len : length of name to examine
* @ resfh : pointer to pre - initialised filehandle to hold result .
*
2007-07-17 15:04:47 +04:00
* Look up one component of a pathname .
* N . B . After this call _both_ fhp and resfh need an fh_put
*
* If the lookup would cross a mountpoint , and the mounted filesystem
* is exported to the client with NFSEXP_NOHIDE , then the lookup is
* accepted as it stands and the mounted directory is
* returned . Otherwise the covered directory is returned .
* NOTE : this mountpoint crossing is not supported properly by all
* clients and is explicitly disallowed for NFSv3
2022-07-26 09:45:30 +03:00
*
2007-07-17 15:04:47 +04:00
*/
__be32
nfsd_lookup ( struct svc_rqst * rqstp , struct svc_fh * fhp , const char * name ,
2022-07-26 09:45:30 +03:00
unsigned int len , struct svc_fh * resfh )
2007-07-17 15:04:47 +04:00
{
struct svc_export * exp ;
struct dentry * dentry ;
__be32 err ;
2011-04-09 19:28:53 +04:00
err = fh_verify ( rqstp , fhp , S_IFDIR , NFSD_MAY_EXEC ) ;
if ( err )
return err ;
2007-07-17 15:04:47 +04:00
err = nfsd_lookup_dentry ( rqstp , fhp , name , len , & exp , & dentry ) ;
if ( err )
return err ;
2007-07-17 15:04:48 +04:00
err = check_nfsd_access ( exp , rqstp ) ;
if ( err )
goto out ;
2005-04-17 02:20:36 +04:00
/*
* Note : we compose the file handle now , but as the
* dentry may be negative , it may need to be updated .
*/
err = fh_compose ( resfh , exp , dentry , fhp ) ;
2015-03-18 01:25:59 +03:00
if ( ! err & & d_really_is_negative ( dentry ) )
2005-04-17 02:20:36 +04:00
err = nfserr_noent ;
2007-07-17 15:04:48 +04:00
out :
2005-04-17 02:20:36 +04:00
dput ( dentry ) ;
exp_put ( exp ) ;
return err ;
}
2023-09-11 21:43:57 +03:00
static void
commit_reset_write_verifier ( struct nfsd_net * nn , struct svc_rqst * rqstp ,
int err )
{
switch ( err ) {
case - EAGAIN :
case - ESTALE :
/*
* Neither of these are the result of a problem with
* durable storage , so avoid a write verifier reset .
*/
break ;
default :
nfsd_reset_write_verifier ( nn ) ;
trace_nfsd_writeverf_reset ( nn , rqstp , err ) ;
}
}
2010-02-17 23:05:11 +03:00
/*
* Commit metadata changes to stable storage .
*/
static int
2019-12-18 22:57:23 +03:00
commit_inode_metadata ( struct inode * inode )
2010-02-17 23:05:11 +03:00
{
const struct export_operations * export_ops = inode - > i_sb - > s_export_op ;
2010-10-06 12:48:20 +04:00
if ( export_ops - > commit_metadata )
return export_ops - > commit_metadata ( inode ) ;
return sync_inode_metadata ( inode , 1 ) ;
2010-02-17 23:05:11 +03:00
}
2007-07-17 15:04:47 +04:00
2019-12-18 22:57:23 +03:00
static int
commit_metadata ( struct svc_fh * fhp )
{
struct inode * inode = d_inode ( fhp - > fh_dentry ) ;
if ( ! EX_ISSYNC ( fhp - > fh_export ) )
return 0 ;
return commit_inode_metadata ( inode ) ;
}
2005-04-17 02:20:36 +04:00
/*
2013-11-18 17:07:30 +04:00
* Go over the attributes and take care of the small differences between
* NFS semantics and what Linux expects .
2005-04-17 02:20:36 +04:00
*/
2013-11-18 17:07:30 +04:00
static void
nfsd_sanitize_attrs ( struct inode * inode , struct iattr * iap )
2005-04-17 02:20:36 +04:00
{
2022-09-08 05:08:40 +03:00
/* Ignore mode updates on symlinks */
if ( S_ISLNK ( inode - > i_mode ) )
iap - > ia_valid & = ~ ATTR_MODE ;
knfsd: clear both setuid and setgid whenever a chown is done
Currently, knfsd only clears the setuid bit if the owner of a file is
changed on a SETATTR call, and only clears the setgid bit if the group
is changed. POSIX says this in the spec for chown():
"If the specified file is a regular file, one or more of the
S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set, and the
process does not have appropriate privileges, the set-user-ID
(S_ISUID) and set-group-ID (S_ISGID) bits of the file mode shall
be cleared upon successful return from chown()."
If I'm reading this correctly, then knfsd is doing this wrong. It should
be clearing both the setuid and setgid bit on any SETATTR that changes
the uid or gid. This wasn't really as noticable before, but now that the
ATTR_KILL_S*ID bits are a no-op for the NFS client, it's more evident.
This patch corrects the nfsd_setattr logic so that this occurs. It also
does a bit of cleanup to the function.
There is also one small behavioral change. If a SETATTR call comes in
that changes the uid/gid and the mode, then we now only clear the setgid
bit if the group execute bit isn't set. The setgid bit without a group
execute bit signifies mandatory locking and we likely don't want to
clear the bit in that case. Since there is no call in POSIX that should
generate a SETATTR call like this, then this should rarely happen, but
it's worth noting.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-17 00:28:47 +04:00
/* sanitize the mode change */
2005-04-17 02:20:36 +04:00
if ( iap - > ia_valid & ATTR_MODE ) {
iap - > ia_mode & = S_IALLUGO ;
2008-04-17 00:28:46 +04:00
iap - > ia_mode | = ( inode - > i_mode & ~ S_IALLUGO ) ;
knfsd: clear both setuid and setgid whenever a chown is done
Currently, knfsd only clears the setuid bit if the owner of a file is
changed on a SETATTR call, and only clears the setgid bit if the group
is changed. POSIX says this in the spec for chown():
"If the specified file is a regular file, one or more of the
S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set, and the
process does not have appropriate privileges, the set-user-ID
(S_ISUID) and set-group-ID (S_ISGID) bits of the file mode shall
be cleared upon successful return from chown()."
If I'm reading this correctly, then knfsd is doing this wrong. It should
be clearing both the setuid and setgid bit on any SETATTR that changes
the uid or gid. This wasn't really as noticable before, but now that the
ATTR_KILL_S*ID bits are a no-op for the NFS client, it's more evident.
This patch corrects the nfsd_setattr logic so that this occurs. It also
does a bit of cleanup to the function.
There is also one small behavioral change. If a SETATTR call comes in
that changes the uid/gid and the mode, then we now only clear the setgid
bit if the group execute bit isn't set. The setgid bit without a group
execute bit signifies mandatory locking and we likely don't want to
clear the bit in that case. Since there is no call in POSIX that should
generate a SETATTR call like this, then this should rarely happen, but
it's worth noting.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-17 00:28:47 +04:00
}
/* Revoke setuid/setgid on chown */
Inconsistent setattr behaviour
There is an inconsistency seen in the behaviour of nfs compared to other local
filesystems on linux when changing owner or group of a directory. If the
directory has SUID/SGID flags set, on changing owner or group on the directory,
the flags are stripped off on nfs. These flags are maintained on other
filesystems such as ext3.
To reproduce on a nfs share or local filesystem, run the following commands
mkdir test; chmod +s+g test; chown user1 test; ls -ld test
On the nfs share, the flags are stripped and the output seen is
drwxr-xr-x 2 user1 root 4096 Feb 23 2009 test
On other local filesystems(ex: ext3), the flags are not stripped and the output
seen is
drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test
chown_common() called from sys_chown() will only strip the flags if the inode is
not a directory.
static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
{
..
if (!S_ISDIR(inode->i_mode))
newattrs.ia_valid |=
ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
..
}
See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html
"If the path argument refers to a regular file, the set-user-ID (S_ISUID) and
set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return
from chown(), unless the call is made by a process with appropriate privileges,
in which case it is implementation-dependent whether these bits are altered. If
chown() is successfully invoked on a file that is not a regular file, these
bits may be cleared. These bits are defined in <sys/stat.h>."
The behaviour as it stands does not appear to violate POSIX. However the
actions performed are inconsistent when comparing ext3 and nfs.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-02-23 19:22:03 +03:00
if ( ! S_ISDIR ( inode - > i_mode ) & &
2013-12-11 14:16:36 +04:00
( ( iap - > ia_valid & ATTR_UID ) | | ( iap - > ia_valid & ATTR_GID ) ) ) {
knfsd: clear both setuid and setgid whenever a chown is done
Currently, knfsd only clears the setuid bit if the owner of a file is
changed on a SETATTR call, and only clears the setgid bit if the group
is changed. POSIX says this in the spec for chown():
"If the specified file is a regular file, one or more of the
S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set, and the
process does not have appropriate privileges, the set-user-ID
(S_ISUID) and set-group-ID (S_ISGID) bits of the file mode shall
be cleared upon successful return from chown()."
If I'm reading this correctly, then knfsd is doing this wrong. It should
be clearing both the setuid and setgid bit on any SETATTR that changes
the uid or gid. This wasn't really as noticable before, but now that the
ATTR_KILL_S*ID bits are a no-op for the NFS client, it's more evident.
This patch corrects the nfsd_setattr logic so that this occurs. It also
does a bit of cleanup to the function.
There is also one small behavioral change. If a SETATTR call comes in
that changes the uid/gid and the mode, then we now only clear the setgid
bit if the group execute bit isn't set. The setgid bit without a group
execute bit signifies mandatory locking and we likely don't want to
clear the bit in that case. Since there is no call in POSIX that should
generate a SETATTR call like this, then this should rarely happen, but
it's worth noting.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-17 00:28:47 +04:00
iap - > ia_valid | = ATTR_KILL_PRIV ;
if ( iap - > ia_valid & ATTR_MODE ) {
/* we're setting mode too, just clear the s*id bits */
2007-10-18 14:05:19 +04:00
iap - > ia_mode & = ~ S_ISUID ;
knfsd: clear both setuid and setgid whenever a chown is done
Currently, knfsd only clears the setuid bit if the owner of a file is
changed on a SETATTR call, and only clears the setgid bit if the group
is changed. POSIX says this in the spec for chown():
"If the specified file is a regular file, one or more of the
S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set, and the
process does not have appropriate privileges, the set-user-ID
(S_ISUID) and set-group-ID (S_ISGID) bits of the file mode shall
be cleared upon successful return from chown()."
If I'm reading this correctly, then knfsd is doing this wrong. It should
be clearing both the setuid and setgid bit on any SETATTR that changes
the uid or gid. This wasn't really as noticable before, but now that the
ATTR_KILL_S*ID bits are a no-op for the NFS client, it's more evident.
This patch corrects the nfsd_setattr logic so that this occurs. It also
does a bit of cleanup to the function.
There is also one small behavioral change. If a SETATTR call comes in
that changes the uid/gid and the mode, then we now only clear the setgid
bit if the group execute bit isn't set. The setgid bit without a group
execute bit signifies mandatory locking and we likely don't want to
clear the bit in that case. Since there is no call in POSIX that should
generate a SETATTR call like this, then this should rarely happen, but
it's worth noting.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-17 00:28:47 +04:00
if ( iap - > ia_mode & S_IXGRP )
iap - > ia_mode & = ~ S_ISGID ;
} else {
/* set ATTR_KILL_* bits and let VFS handle it */
2023-05-02 16:36:02 +03:00
iap - > ia_valid | = ATTR_KILL_SUID ;
iap - > ia_valid | =
setattr_should_drop_sgid ( & nop_mnt_idmap , inode ) ;
2007-10-18 14:05:19 +04:00
}
2005-04-17 02:20:36 +04:00
}
2013-11-18 17:07:30 +04:00
}
2017-02-09 22:20:42 +03:00
static __be32
nfsd_get_write_access ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
struct iattr * iap )
{
struct inode * inode = d_inode ( fhp - > fh_dentry ) ;
if ( iap - > ia_size < inode - > i_size ) {
__be32 err ;
err = nfsd_permission ( rqstp , fhp - > fh_export , fhp - > fh_dentry ,
NFSD_MAY_TRUNC | NFSD_MAY_OWNER_OVERRIDE ) ;
if ( err )
return err ;
}
2021-08-19 21:56:38 +03:00
return nfserrno ( get_write_access ( inode ) ) ;
2017-02-09 22:20:42 +03:00
}
2022-09-09 01:14:07 +03:00
static int __nfsd_setattr ( struct dentry * dentry , struct iattr * iap )
{
int host_err ;
if ( iap - > ia_valid & ATTR_SIZE ) {
/*
* RFC5661 , Section 18.30 .4 :
* Changing the size of a file with SETATTR indirectly
* changes the time_modify and change attributes .
*
* ( and similar for the older RFCs )
*/
struct iattr size_attr = {
. ia_valid = ATTR_SIZE | ATTR_CTIME | ATTR_MTIME ,
. ia_size = iap - > ia_size ,
} ;
if ( iap - > ia_size < 0 )
return - EFBIG ;
2023-01-13 14:49:10 +03:00
host_err = notify_change ( & nop_mnt_idmap , dentry , & size_attr , NULL ) ;
2022-09-09 01:14:07 +03:00
if ( host_err )
return host_err ;
iap - > ia_valid & = ~ ATTR_SIZE ;
/*
* Avoid the additional setattr call below if the only other
* attribute that the client sends is the mtime , as we update
* it as part of the size change above .
*/
if ( ( iap - > ia_valid & ~ ATTR_MTIME ) = = 0 )
return 0 ;
}
if ( ! iap - > ia_valid )
return 0 ;
iap - > ia_valid | = ATTR_CTIME ;
2023-01-13 14:49:10 +03:00
return notify_change ( & nop_mnt_idmap , dentry , iap , NULL ) ;
2022-09-09 01:14:07 +03:00
}
/**
* nfsd_setattr - Set various file attributes .
* @ rqstp : controlling RPC transaction
* @ fhp : filehandle of target
* @ attr : attributes to set
* @ guardtime : do not act if ctime . tv_sec does not match this timestamp
*
* This call may adjust the contents of @ attr ( in particular , this
* call may change the bits in the na_iattr . ia_valid field ) .
*
* Returns nfs_ok on success , otherwise an NFS status code is
* returned . Caller must release @ fhp by calling fh_put in either
* case .
2013-11-18 17:07:30 +04:00
*/
__be32
2022-07-26 09:45:30 +03:00
nfsd_setattr ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
2024-02-16 04:24:51 +03:00
struct nfsd_attrs * attr , const struct timespec64 * guardtime )
2013-11-18 17:07:30 +04:00
{
struct dentry * dentry ;
struct inode * inode ;
2022-07-26 09:45:30 +03:00
struct iattr * iap = attr - > na_iattr ;
2013-11-18 17:07:30 +04:00
int accmode = NFSD_MAY_SATTR ;
umode_t ftype = 0 ;
__be32 err ;
2024-02-16 04:24:50 +03:00
int host_err = 0 ;
2014-02-24 23:59:47 +04:00
bool get_write_count ;
2017-02-21 01:04:42 +03:00
bool size_change = ( iap - > ia_valid & ATTR_SIZE ) ;
2022-09-09 01:14:13 +03:00
int retries ;
2013-11-18 17:07:30 +04:00
2018-11-30 11:04:25 +03:00
if ( iap - > ia_valid & ATTR_SIZE ) {
2013-11-18 17:07:30 +04:00
accmode | = NFSD_MAY_WRITE | NFSD_MAY_OWNER_OVERRIDE ;
ftype = S_IFREG ;
2018-11-30 11:04:25 +03:00
}
/*
* If utimes ( 2 ) and friends are called with times not NULL , we should
* not set NFSD_MAY_WRITE bit . Otherwise fh_verify - > nfsd_permission
2019-05-27 15:21:32 +03:00
* will return EACCES , when the caller ' s effective UID does not match
2018-11-30 11:04:25 +03:00
* the owner of the file , and the caller is not privileged . In this
* situation , we should return EPERM ( notify_change will return this ) .
*/
if ( iap - > ia_valid & ( ATTR_ATIME | ATTR_MTIME ) ) {
accmode | = NFSD_MAY_OWNER_OVERRIDE ;
if ( ! ( iap - > ia_valid & ( ATTR_ATIME_SET | ATTR_MTIME_SET ) ) )
accmode | = NFSD_MAY_WRITE ;
}
2013-11-18 17:07:30 +04:00
2014-02-24 23:59:47 +04:00
/* Callers that do fh_verify should do the fh_want_write: */
get_write_count = ! fhp - > fh_dentry ;
2013-11-18 17:07:30 +04:00
/* Get inode */
err = fh_verify ( rqstp , fhp , ftype , accmode ) ;
if ( err )
2017-02-21 01:04:42 +03:00
return err ;
2014-02-24 23:59:47 +04:00
if ( get_write_count ) {
host_err = fh_want_write ( fhp ) ;
if ( host_err )
2017-02-21 01:04:42 +03:00
goto out ;
2014-02-24 23:59:47 +04:00
}
2013-11-18 17:07:30 +04:00
dentry = fhp - > fh_dentry ;
2015-03-18 01:25:59 +03:00
inode = d_inode ( dentry ) ;
2013-11-18 17:07:30 +04:00
nfsd_sanitize_attrs ( inode , iap ) ;
/*
* The size case is special , it changes the file in addition to the
2017-02-20 09:21:33 +03:00
* attributes , and file systems don ' t expect it to be mixed with
* " random " attribute changes . We thus split out the size change
* into a separate call to - > setattr , and do the rest as a separate
* setattr call .
2013-11-18 17:07:30 +04:00
*/
2017-02-21 01:04:42 +03:00
if ( size_change ) {
2017-02-09 22:20:42 +03:00
err = nfsd_get_write_access ( rqstp , fhp , iap ) ;
if ( err )
2017-02-21 01:04:42 +03:00
return err ;
2017-02-20 09:21:33 +03:00
}
2014-09-07 23:15:52 +04:00
2022-07-26 09:45:30 +03:00
inode_lock ( inode ) ;
2024-02-16 04:24:50 +03:00
err = fh_fill_pre_attrs ( fhp ) ;
if ( err )
goto out_unlock ;
2024-02-16 04:24:51 +03:00
if ( guardtime ) {
struct timespec64 ctime = inode_get_ctime ( inode ) ;
if ( ( u32 ) guardtime - > tv_sec ! = ( u32 ) ctime . tv_sec | |
guardtime - > tv_nsec ! = ctime . tv_nsec ) {
err = nfserr_notsync ;
goto out_fill_attrs ;
}
}
2022-09-09 01:14:13 +03:00
for ( retries = 1 ; ; ) {
2023-05-17 19:26:44 +03:00
struct iattr attrs ;
/*
* notify_change ( ) can alter its iattr argument , making
* @ iap unsuitable for submission multiple times . Make a
* copy for every loop iteration .
*/
attrs = * iap ;
host_err = __nfsd_setattr ( dentry , & attrs ) ;
2022-09-09 01:14:13 +03:00
if ( host_err ! = - EAGAIN | | ! retries - - )
break ;
if ( ! nfsd_wait_for_delegreturn ( rqstp , inode ) )
break ;
}
2022-07-26 09:45:30 +03:00
if ( attr - > na_seclabel & & attr - > na_seclabel - > len )
attr - > na_labelerr = security_inode_setsecctx ( dentry ,
attr - > na_seclabel - > data , attr - > na_seclabel - > len ) ;
2022-07-26 09:45:30 +03:00
if ( IS_ENABLED ( CONFIG_FS_POSIX_ACL ) & & attr - > na_pacl )
2023-01-13 14:49:20 +03:00
attr - > na_aclerr = set_posix_acl ( & nop_mnt_idmap ,
2022-09-23 11:29:39 +03:00
dentry , ACL_TYPE_ACCESS ,
2022-07-26 09:45:30 +03:00
attr - > na_pacl ) ;
if ( IS_ENABLED ( CONFIG_FS_POSIX_ACL ) & &
! attr - > na_aclerr & & attr - > na_dpacl & & S_ISDIR ( inode - > i_mode ) )
2023-01-13 14:49:20 +03:00
attr - > na_aclerr = set_posix_acl ( & nop_mnt_idmap ,
2022-09-23 11:29:39 +03:00
dentry , ACL_TYPE_DEFAULT ,
2022-07-26 09:45:30 +03:00
attr - > na_dpacl ) ;
2024-02-16 04:24:51 +03:00
out_fill_attrs :
2024-02-18 19:48:10 +03:00
/*
* RFC 1813 Section 3.3 .2 does not mandate that an NFS server
* returns wcc_data for SETATTR . Some client implementations
* depend on receiving wcc_data , however , to sort out partial
* updates ( eg . , the client requested that size and mode be
* modified , but the server changed only the file mode ) .
*/
2024-02-16 04:24:50 +03:00
fh_fill_post_attrs ( fhp ) ;
out_unlock :
2022-07-26 09:45:30 +03:00
inode_unlock ( inode ) ;
2017-02-09 22:20:42 +03:00
if ( size_change )
put_write_access ( inode ) ;
out :
2017-02-21 01:04:42 +03:00
if ( ! host_err )
host_err = commit_metadata ( fhp ) ;
2024-02-16 04:24:50 +03:00
return err ! = 0 ? err : nfserrno ( host_err ) ;
2005-04-17 02:20:36 +04:00
}
2006-01-10 07:51:55 +03:00
# if defined(CONFIG_NFSD_V4)
2012-01-05 01:26:43 +04:00
/*
* NFS junction information is stored in an extended attribute .
*/
# define NFSD_JUNCTION_XATTR_NAME XATTR_TRUSTED_PREFIX "junction.nfs"
/**
* nfsd4_is_junction - Test if an object could be an NFS junction
*
* @ dentry : object to test
*
* Returns 1 if " dentry " appears to contain NFS junction information .
* Otherwise 0 is returned .
*/
2011-09-13 03:37:26 +04:00
int nfsd4_is_junction ( struct dentry * dentry )
{
2015-03-18 01:25:59 +03:00
struct inode * inode = d_inode ( dentry ) ;
2011-09-13 03:37:26 +04:00
if ( inode = = NULL )
return 0 ;
if ( inode - > i_mode & S_IXUGO )
return 0 ;
if ( ! ( inode - > i_mode & S_ISVTX ) )
return 0 ;
2023-01-13 14:49:22 +03:00
if ( vfs_getxattr ( & nop_mnt_idmap , dentry , NFSD_JUNCTION_XATTR_NAME ,
2021-01-21 16:19:28 +03:00
NULL , 0 ) < = 0 )
2011-09-13 03:37:26 +04:00
return 0 ;
return 1 ;
}
2013-05-02 21:19:10 +04:00
2021-12-19 04:38:00 +03:00
static struct nfsd4_compound_state * nfsd4_get_cstate ( struct svc_rqst * rqstp )
{
return & ( ( struct nfsd4_compoundres * ) rqstp - > rq_resp ) - > cstate ;
}
__be32 nfsd4_clone_file_range ( struct svc_rqst * rqstp ,
struct nfsd_file * nf_src , u64 src_pos ,
struct nfsd_file * nf_dst , u64 dst_pos ,
u64 count , bool sync )
2015-12-03 14:59:52 +03:00
{
2020-01-06 21:40:32 +03:00
struct file * src = nf_src - > nf_file ;
struct file * dst = nf_dst - > nf_file ;
2021-12-19 04:38:01 +03:00
errseq_t since ;
2018-10-30 02:41:49 +03:00
loff_t cloned ;
2020-01-06 21:40:33 +03:00
__be32 ret = 0 ;
2018-10-30 02:41:49 +03:00
2021-12-19 04:38:01 +03:00
since = READ_ONCE ( dst - > f_wb_err ) ;
2018-10-30 02:41:56 +03:00
cloned = vfs_clone_file_range ( src , src_pos , dst , dst_pos , count , 0 ) ;
2020-01-06 21:40:33 +03:00
if ( cloned < 0 ) {
ret = nfserrno ( cloned ) ;
goto out_err ;
}
if ( count & & cloned ! = count ) {
ret = nfserrno ( - EINVAL ) ;
goto out_err ;
}
2019-11-28 01:05:51 +03:00
if ( sync ) {
loff_t dst_end = count ? dst_pos + count - 1 : LLONG_MAX ;
int status = vfs_fsync_range ( dst , dst_pos , dst_end , 0 ) ;
2019-12-18 22:57:23 +03:00
2021-12-19 04:38:01 +03:00
if ( ! status )
status = filemap_check_wb_err ( dst - > f_mapping , since ) ;
2019-12-18 22:57:23 +03:00
if ( ! status )
status = commit_inode_metadata ( file_inode ( src ) ) ;
2020-01-06 21:40:33 +03:00
if ( status < 0 ) {
2021-12-28 22:27:56 +03:00
struct nfsd_net * nn = net_generic ( nf_dst - > nf_net ,
nfsd_net_id ) ;
2021-12-19 04:38:00 +03:00
trace_nfsd_clone_file_range_err ( rqstp ,
& nfsd4_get_cstate ( rqstp ) - > save_fh ,
src_pos ,
& nfsd4_get_cstate ( rqstp ) - > current_fh ,
dst_pos ,
count , status ) ;
2023-09-11 21:43:57 +03:00
commit_reset_write_verifier ( nn , rqstp , status ) ;
2020-01-06 21:40:33 +03:00
ret = nfserrno ( status ) ;
}
2019-11-28 01:05:51 +03:00
}
2020-01-06 21:40:33 +03:00
out_err :
return ret ;
2015-12-03 14:59:52 +03:00
}
2016-09-07 22:57:30 +03:00
ssize_t nfsd_copy_file_range ( struct file * src , u64 src_pos , struct file * dst ,
u64 dst_pos , u64 count )
{
2022-06-30 22:58:49 +03:00
ssize_t ret ;
2016-09-07 22:57:30 +03:00
/*
* Limit copy to 4 MB to prevent indefinitely blocking an nfsd
* thread and client rpc slot . The choice of 4 MB is somewhat
* arbitrary . We might instead base this on r / wsize , or make it
* tunable , or use a time instead of a byte limit , or implement
* asynchronous copy . In theory a client could also recognize a
* limit like this and pipeline multiple COPY requests .
*/
count = min_t ( u64 , count , 1 < < 22 ) ;
2022-06-30 22:58:49 +03:00
ret = vfs_copy_file_range ( src , src_pos , dst , dst_pos , count , 0 ) ;
if ( ret = = - EOPNOTSUPP | | ret = = - EXDEV )
2022-11-17 23:52:49 +03:00
ret = vfs_copy_file_range ( src , src_pos , dst , dst_pos , count ,
COPY_FILE_SPLICE ) ;
2022-06-30 22:58:49 +03:00
return ret ;
2016-09-07 22:57:30 +03:00
}
2014-11-07 22:44:26 +03:00
__be32 nfsd4_vfs_fallocate ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
struct file * file , loff_t offset , loff_t len ,
int flags )
{
int error ;
if ( ! S_ISREG ( file_inode ( file ) - > i_mode ) )
return nfserr_inval ;
error = vfs_fallocate ( file , flags , offset , len ) ;
if ( ! error )
error = commit_metadata ( fhp ) ;
return nfserrno ( error ) ;
}
2010-07-06 20:39:12 +04:00
# endif /* defined(CONFIG_NFSD_V4) */
2005-04-17 02:20:36 +04:00
/*
* Check server access rights to a file system object
*/
struct accessmap {
u32 access ;
int how ;
} ;
static struct accessmap nfs3_regaccess [ ] = {
2008-06-16 15:20:29 +04:00
{ NFS3_ACCESS_READ , NFSD_MAY_READ } ,
{ NFS3_ACCESS_EXECUTE , NFSD_MAY_EXEC } ,
{ NFS3_ACCESS_MODIFY , NFSD_MAY_WRITE | NFSD_MAY_TRUNC } ,
{ NFS3_ACCESS_EXTEND , NFSD_MAY_WRITE } ,
2005-04-17 02:20:36 +04:00
2020-06-24 01:39:24 +03:00
# ifdef CONFIG_NFSD_V4
{ NFS4_ACCESS_XAREAD , NFSD_MAY_READ } ,
{ NFS4_ACCESS_XAWRITE , NFSD_MAY_WRITE } ,
{ NFS4_ACCESS_XALIST , NFSD_MAY_READ } ,
# endif
2005-04-17 02:20:36 +04:00
{ 0 , 0 }
} ;
static struct accessmap nfs3_diraccess [ ] = {
2008-06-16 15:20:29 +04:00
{ NFS3_ACCESS_READ , NFSD_MAY_READ } ,
{ NFS3_ACCESS_LOOKUP , NFSD_MAY_EXEC } ,
{ NFS3_ACCESS_MODIFY , NFSD_MAY_EXEC | NFSD_MAY_WRITE | NFSD_MAY_TRUNC } ,
{ NFS3_ACCESS_EXTEND , NFSD_MAY_EXEC | NFSD_MAY_WRITE } ,
{ NFS3_ACCESS_DELETE , NFSD_MAY_REMOVE } ,
2005-04-17 02:20:36 +04:00
2020-06-24 01:39:24 +03:00
# ifdef CONFIG_NFSD_V4
{ NFS4_ACCESS_XAREAD , NFSD_MAY_READ } ,
{ NFS4_ACCESS_XAWRITE , NFSD_MAY_WRITE } ,
{ NFS4_ACCESS_XALIST , NFSD_MAY_READ } ,
# endif
2005-04-17 02:20:36 +04:00
{ 0 , 0 }
} ;
static struct accessmap nfs3_anyaccess [ ] = {
/* Some clients - Solaris 2.6 at least, make an access call
* to the server to check for access for things like / dev / null
* ( which really , the server doesn ' t care about ) . So
* We provide simple access checking for them , looking
* mainly at mode bits , and we make sure to ignore read - only
* filesystem checks
*/
2008-06-16 15:20:29 +04:00
{ NFS3_ACCESS_READ , NFSD_MAY_READ } ,
{ NFS3_ACCESS_EXECUTE , NFSD_MAY_EXEC } ,
{ NFS3_ACCESS_MODIFY , NFSD_MAY_WRITE | NFSD_MAY_LOCAL_ACCESS } ,
{ NFS3_ACCESS_EXTEND , NFSD_MAY_WRITE | NFSD_MAY_LOCAL_ACCESS } ,
2005-04-17 02:20:36 +04:00
{ 0 , 0 }
} ;
2006-10-20 10:28:58 +04:00
__be32
2005-04-17 02:20:36 +04:00
nfsd_access ( struct svc_rqst * rqstp , struct svc_fh * fhp , u32 * access , u32 * supported )
{
struct accessmap * map ;
struct svc_export * export ;
struct dentry * dentry ;
u32 query , result = 0 , sresult = 0 ;
2006-10-20 10:28:58 +04:00
__be32 error ;
2005-04-17 02:20:36 +04:00
2008-06-16 15:20:29 +04:00
error = fh_verify ( rqstp , fhp , 0 , NFSD_MAY_NOP ) ;
2005-04-17 02:20:36 +04:00
if ( error )
goto out ;
export = fhp - > fh_export ;
dentry = fhp - > fh_dentry ;
VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-29 15:02:35 +03:00
if ( d_is_reg ( dentry ) )
2005-04-17 02:20:36 +04:00
map = nfs3_regaccess ;
VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-29 15:02:35 +03:00
else if ( d_is_dir ( dentry ) )
2005-04-17 02:20:36 +04:00
map = nfs3_diraccess ;
else
map = nfs3_anyaccess ;
query = * access ;
for ( ; map - > access ; map + + ) {
if ( map - > access & query ) {
2006-10-20 10:28:58 +04:00
__be32 err2 ;
2005-04-17 02:20:36 +04:00
sresult | = map - > access ;
2007-07-17 15:04:48 +04:00
err2 = nfsd_permission ( rqstp , export , dentry , map - > how ) ;
2005-04-17 02:20:36 +04:00
switch ( err2 ) {
case nfs_ok :
result | = map - > access ;
break ;
/* the following error codes just mean the access was not allowed,
* rather than an error occurred */
case nfserr_rofs :
case nfserr_acces :
case nfserr_perm :
/* simply don't "or" in the access bit. */
break ;
default :
error = err2 ;
goto out ;
}
}
}
* access = result ;
if ( supported )
* supported = sresult ;
out :
return error ;
}
2019-08-18 21:18:48 +03:00
int nfsd_open_break_lease ( struct inode * inode , int access )
2011-06-07 19:50:23 +04:00
{
unsigned int mode ;
2005-04-17 02:20:36 +04:00
2011-06-07 19:50:23 +04:00
if ( access & NFSD_MAY_NOT_BREAK_LEASE )
return 0 ;
mode = ( access & NFSD_MAY_WRITE ) ? O_WRONLY : O_RDONLY ;
return break_lease ( inode , mode | O_NONBLOCK ) ;
}
2005-04-17 02:20:36 +04:00
/*
* Open an existing file or directory .
2012-03-19 06:44:49 +04:00
* The may_flags argument indicates the type of open ( read / write / lock )
* and additional flags .
2005-04-17 02:20:36 +04:00
* N . B . After this call fhp needs an fh_put
*/
2023-09-11 21:30:27 +03:00
static int
2019-08-18 21:18:48 +03:00
__nfsd_open ( struct svc_rqst * rqstp , struct svc_fh * fhp , umode_t type ,
2012-03-19 06:44:49 +04:00
int may_flags , struct file * * filp )
2005-04-17 02:20:36 +04:00
{
2012-06-26 21:58:53 +04:00
struct path path ;
2005-04-17 02:20:36 +04:00
struct inode * inode ;
2014-09-03 04:14:06 +04:00
struct file * file ;
2006-10-20 10:28:58 +04:00
int flags = O_RDONLY | O_LARGEFILE ;
2023-09-11 21:30:27 +03:00
int host_err = - EPERM ;
2005-04-17 02:20:36 +04:00
2012-06-26 21:58:53 +04:00
path . mnt = fhp - > fh_export - > ex_path . mnt ;
path . dentry = fhp - > fh_dentry ;
2015-03-18 01:25:59 +03:00
inode = d_inode ( path . dentry ) ;
2005-04-17 02:20:36 +04:00
2012-03-19 06:44:49 +04:00
if ( IS_APPEND ( inode ) & & ( may_flags & NFSD_MAY_WRITE ) )
2005-04-17 02:20:36 +04:00
goto out ;
if ( ! inode - > i_fop )
goto out ;
2012-03-19 06:44:49 +04:00
host_err = nfsd_open_break_lease ( inode , may_flags ) ;
2006-10-20 10:28:58 +04:00
if ( host_err ) /* NOMEM or WOULDBLOCK */
2023-09-11 21:30:27 +03:00
goto out ;
2005-04-17 02:20:36 +04:00
2012-03-19 06:44:49 +04:00
if ( may_flags & NFSD_MAY_WRITE ) {
if ( may_flags & NFSD_MAY_READ )
2006-06-30 12:56:17 +04:00
flags = O_RDWR | O_LARGEFILE ;
else
flags = O_WRONLY | O_LARGEFILE ;
2005-04-17 02:20:36 +04:00
}
2012-03-19 06:44:49 +04:00
2014-09-03 04:14:06 +04:00
file = dentry_open ( & path , flags , current_cred ( ) ) ;
if ( IS_ERR ( file ) ) {
host_err = PTR_ERR ( file ) ;
2023-09-11 21:30:27 +03:00
goto out ;
2014-09-03 04:14:06 +04:00
}
2024-02-15 13:31:00 +03:00
host_err = security_file_post_open ( file , may_flags ) ;
2014-09-03 04:14:06 +04:00
if ( host_err ) {
2015-04-28 16:41:16 +03:00
fput ( file ) ;
2023-09-11 21:30:27 +03:00
goto out ;
2012-03-19 06:44:50 +04:00
}
2014-09-03 04:14:06 +04:00
if ( may_flags & NFSD_MAY_64BIT_COOKIE )
file - > f_mode | = FMODE_64BITHASH ;
else
file - > f_mode | = FMODE_32BITHASH ;
* filp = file ;
2005-04-17 02:20:36 +04:00
out :
2023-09-11 21:30:27 +03:00
return host_err ;
2019-08-18 21:18:48 +03:00
}
__be32
nfsd_open ( struct svc_rqst * rqstp , struct svc_fh * fhp , umode_t type ,
int may_flags , struct file * * filp )
{
__be32 err ;
2023-09-11 21:30:27 +03:00
int host_err ;
2021-12-19 04:37:56 +03:00
bool retried = false ;
2019-08-18 21:18:48 +03:00
/*
* If we get here , then the client has already done an " open " ,
* and ( hopefully ) checked permission - so allow OWNER_OVERRIDE
* in case a chmod has now revoked permission .
*
* Arguably we should also allow the owner override for
* directories , but we never have and it doesn ' t seem to have
* caused anyone a problem . If we were to change this , note
* also that our filldir callbacks would need a variant of
* lookup_one_len that doesn ' t check permissions .
*/
if ( type = = S_IFREG )
may_flags | = NFSD_MAY_OWNER_OVERRIDE ;
2021-12-19 04:37:56 +03:00
retry :
2019-08-18 21:18:48 +03:00
err = fh_verify ( rqstp , fhp , type , may_flags ) ;
2021-12-19 04:37:56 +03:00
if ( ! err ) {
2023-09-11 21:30:27 +03:00
host_err = __nfsd_open ( rqstp , fhp , type , may_flags , filp ) ;
if ( host_err = = - EOPENSTALE & & ! retried ) {
2021-12-19 04:37:56 +03:00
retried = true ;
fh_put ( fhp ) ;
goto retry ;
}
2023-09-11 21:30:27 +03:00
err = nfserrno ( host_err ) ;
2021-12-19 04:37:56 +03:00
}
2019-08-18 21:18:48 +03:00
return err ;
}
2022-03-27 23:46:47 +03:00
/**
* nfsd_open_verified - Open a regular file for the filecache
* @ rqstp : RPC request
* @ fhp : NFS filehandle of the file to open
* @ may_flags : internal permission flags
* @ filp : OUT : open " struct file * "
*
2023-09-11 21:30:27 +03:00
* Returns zero on success , or a negative errno value .
2022-03-27 23:46:47 +03:00
*/
2023-09-11 21:30:27 +03:00
int
2022-03-27 23:46:47 +03:00
nfsd_open_verified ( struct svc_rqst * rqstp , struct svc_fh * fhp , int may_flags ,
struct file * * filp )
2019-08-18 21:18:48 +03:00
{
2023-12-15 23:40:57 +03:00
return __nfsd_open ( rqstp , fhp , S_IFREG , may_flags , filp ) ;
2005-04-17 02:20:36 +04:00
}
/*
2007-06-12 23:22:14 +04:00
* Grab and keep cached pages associated with a file in the svc_rqst
2023-06-24 01:55:12 +03:00
* so that they can be passed to the network sendmsg routines
2007-06-12 23:22:14 +04:00
* directly . They will be released after the sending has completed .
2023-03-18 00:09:20 +03:00
*
* Return values : Number of bytes consumed , or - EIO if there are no
* remaining pages in rqstp - > rq_pages .
2005-04-17 02:20:36 +04:00
*/
static int
2007-06-12 23:22:14 +04:00
nfsd_splice_actor ( struct pipe_inode_info * pipe , struct pipe_buffer * buf ,
struct splice_desc * sd )
2005-04-17 02:20:36 +04:00
{
2007-06-12 23:22:14 +04:00
struct svc_rqst * rqstp = sd - > u . data ;
2022-09-11 00:14:02 +03:00
struct page * page = buf - > page ; // may be a compound one
unsigned offset = buf - > offset ;
2022-11-23 22:14:32 +03:00
struct page * last_page ;
2022-09-11 00:14:02 +03:00
2022-11-23 22:14:32 +03:00
last_page = page + ( offset + sd - > len - 1 ) / PAGE_SIZE ;
2023-03-17 20:13:08 +03:00
for ( page + = offset / PAGE_SIZE ; page < = last_page ; page + + ) {
/*
2023-07-27 19:21:17 +03:00
* Skip page replacement when extending the contents of the
* current page . But note that we may get two zero_pages in a
* row from shmem .
2023-03-17 20:13:08 +03:00
*/
2023-07-27 19:21:17 +03:00
if ( page = = * ( rqstp - > rq_next_page - 1 ) & &
offset_in_page ( rqstp - > rq_res . page_base +
rqstp - > rq_res . page_len ) )
2023-03-17 20:13:08 +03:00
continue ;
2023-03-18 00:09:20 +03:00
if ( unlikely ( ! svc_rqst_replace_page ( rqstp , page ) ) )
return - EIO ;
2023-03-17 20:13:08 +03:00
}
2022-09-11 00:14:02 +03:00
if ( rqstp - > rq_res . page_len = = 0 ) // first call
rqstp - > rq_res . page_base = offset % PAGE_SIZE ;
2021-06-28 23:34:20 +03:00
rqstp - > rq_res . page_len + = sd - > len ;
return sd - > len ;
2005-04-17 02:20:36 +04:00
}
2007-06-12 23:22:14 +04:00
static int nfsd_direct_splice_actor ( struct pipe_inode_info * pipe ,
struct splice_desc * sd )
{
return __splice_from_pipe ( pipe , sd , nfsd_splice_actor ) ;
}
2019-08-26 20:03:11 +03:00
static u32 nfsd_eof_on_read ( struct file * file , loff_t offset , ssize_t len ,
size_t expected )
{
if ( expected ! = 0 & & len = = 0 )
return 1 ;
if ( offset + len > = i_size_read ( file_inode ( file ) ) )
return 1 ;
return 0 ;
}
2018-03-28 20:29:11 +03:00
static __be32 nfsd_finish_read ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
struct file * file , loff_t offset ,
2019-08-26 20:03:11 +03:00
unsigned long * count , u32 * eof , ssize_t host_err )
2005-04-17 02:20:36 +04:00
{
2006-10-20 10:28:58 +04:00
if ( host_err > = 0 ) {
2024-01-26 18:39:47 +03:00
struct nfsd_net * nn = net_generic ( SVC_NET ( rqstp ) , nfsd_net_id ) ;
nfsd_stats_io_read_add ( nn , fhp - > fh_export , host_err ) ;
2019-08-26 20:03:11 +03:00
* eof = nfsd_eof_on_read ( file , offset , host_err , * count ) ;
2006-10-20 10:28:58 +04:00
* count = host_err ;
2009-12-18 05:24:21 +03:00
fsnotify_access ( file ) ;
2018-03-28 20:29:11 +03:00
trace_nfsd_read_io_done ( rqstp , fhp , offset , * count ) ;
2014-03-19 01:01:51 +04:00
return 0 ;
2018-03-28 20:29:11 +03:00
} else {
trace_nfsd_read_err ( rqstp , fhp , offset , host_err ) ;
2014-03-19 01:01:51 +04:00
return nfserrno ( host_err ) ;
2018-03-28 20:29:11 +03:00
}
2014-03-19 01:01:51 +04:00
}
2023-05-18 20:46:03 +03:00
/**
* nfsd_splice_read - Perform a VFS read using a splice pipe
* @ rqstp : RPC transaction context
* @ fhp : file handle of file to be read
* @ file : opened struct file of file to be read
* @ offset : starting byte offset
* @ count : IN : requested number of bytes ; OUT : number of bytes read
* @ eof : OUT : set non - zero if operation reached the end of the file
*
* Returns nfs_ok on success , otherwise an nfserr stat value is
* returned .
*/
2018-03-28 20:29:11 +03:00
__be32 nfsd_splice_read ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
2019-08-26 20:03:11 +03:00
struct file * file , loff_t offset , unsigned long * count ,
u32 * eof )
2014-03-19 01:01:51 +04:00
{
struct splice_desc sd = {
. len = 0 ,
. total_len = * count ,
. pos = offset ,
. u . data = rqstp ,
} ;
2019-08-26 20:03:11 +03:00
ssize_t host_err ;
2014-03-19 01:01:51 +04:00
2018-03-28 20:29:11 +03:00
trace_nfsd_read_splice ( rqstp , fhp , offset , * count ) ;
2023-11-22 15:27:02 +03:00
host_err = rw_verify_area ( READ , file , & offset , * count ) ;
if ( ! host_err )
host_err = splice_direct_to_actor ( file , & sd ,
nfsd_direct_splice_actor ) ;
2019-08-26 20:03:11 +03:00
return nfsd_finish_read ( rqstp , fhp , file , offset , count , eof , host_err ) ;
2014-03-19 01:01:51 +04:00
}
2023-05-18 20:45:56 +03:00
/**
* nfsd_iter_read - Perform a VFS read using an iterator
* @ rqstp : RPC transaction context
* @ fhp : file handle of file to be read
* @ file : opened struct file of file to be read
* @ offset : starting byte offset
* @ count : IN : requested number of bytes ; OUT : number of bytes read
* @ base : offset in first page of read buffer
* @ eof : OUT : set non - zero if operation reached the end of the file
*
* Some filesystems or situations cannot use nfsd_splice_read . This
* function is the slightly less - performant fallback for those cases .
*
* Returns nfs_ok on success , otherwise an nfserr stat value is
* returned .
*/
__be32 nfsd_iter_read ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
struct file * file , loff_t offset , unsigned long * count ,
unsigned int base , u32 * eof )
{
unsigned long v , total ;
struct iov_iter iter ;
loff_t ppos = offset ;
struct page * page ;
ssize_t host_err ;
v = 0 ;
total = * count ;
while ( total ) {
page = * ( rqstp - > rq_next_page + + ) ;
rqstp - > rq_vec [ v ] . iov_base = page_address ( page ) + base ;
rqstp - > rq_vec [ v ] . iov_len = min_t ( size_t , total , PAGE_SIZE - base ) ;
total - = rqstp - > rq_vec [ v ] . iov_len ;
+ + v ;
base = 0 ;
}
WARN_ON_ONCE ( v > ARRAY_SIZE ( rqstp - > rq_vec ) ) ;
trace_nfsd_read_vector ( rqstp , fhp , offset , * count ) ;
iov_iter_kvec ( & iter , ITER_DEST , rqstp - > rq_vec , v , * count ) ;
host_err = vfs_iter_read ( file , & iter , & ppos , 0 ) ;
return nfsd_finish_read ( rqstp , fhp , file , offset , count , eof , host_err ) ;
}
2009-06-16 03:03:53 +04:00
/*
* Gathered writes : If another process is currently writing to the file ,
* there ' s a high chance this is another nfsd ( triggered by a bulk write
* from a client ' s biod ) . Rather than syncing the file with each write
* request , we sleep for 10 msec .
*
* I don ' t know if this roughly approximates C . Juszak ' s idea of
* gathered writes , but it ' s a nice and simple solution ( IMHO ) , and it
* seems to work : - )
*
* Note : we do this only in the NFSv2 case , since v3 and higher have a
* better tool ( separate unstable writes and commits ) for solving this
* problem .
*/
static int wait_for_concurrent_writes ( struct file * file )
{
2013-01-24 02:07:38 +04:00
struct inode * inode = file_inode ( file ) ;
2009-06-16 03:03:53 +04:00
static ino_t last_ino ;
static dev_t last_dev ;
int err = 0 ;
if ( atomic_read ( & inode - > i_writecount ) > 1
| | ( last_ino = = inode - > i_ino & & last_dev = = inode - > i_sb - > s_dev ) ) {
dprintk ( " nfsd: write defer %d \n " , task_pid_nr ( current ) ) ;
msleep ( 10 ) ;
dprintk ( " nfsd: write resume %d \n " , task_pid_nr ( current ) ) ;
}
if ( inode - > i_state & I_DIRTY ) {
dprintk ( " nfsd: write sync %d \n " , task_pid_nr ( current ) ) ;
2010-03-22 19:32:25 +03:00
err = vfs_fsync ( file , 0 ) ;
2009-06-16 03:03:53 +04:00
}
last_ino = inode - > i_ino ;
last_dev = inode - > i_sb - > s_dev ;
return err ;
}
2015-06-18 17:45:00 +03:00
__be32
2020-01-06 21:40:29 +03:00
nfsd_vfs_write ( struct svc_rqst * rqstp , struct svc_fh * fhp , struct nfsd_file * nf ,
2005-04-17 02:20:36 +04:00
loff_t offset , struct kvec * vec , int vlen ,
2020-01-06 21:40:37 +03:00
unsigned long * cnt , int stable ,
__be32 * verf )
2005-04-17 02:20:36 +04:00
{
2021-12-28 20:41:32 +03:00
struct nfsd_net * nn = net_generic ( SVC_NET ( rqstp ) , nfsd_net_id ) ;
2020-01-06 21:40:29 +03:00
struct file * file = nf - > nf_file ;
2020-12-01 01:03:19 +03:00
struct super_block * sb = file_inode ( file ) - > i_sb ;
2005-04-17 02:20:36 +04:00
struct svc_export * exp ;
2017-05-27 11:16:53 +03:00
struct iov_iter iter ;
2021-12-19 04:38:01 +03:00
errseq_t since ;
2018-03-27 17:53:27 +03:00
__be32 nfserr ;
2006-10-20 10:28:58 +04:00
int host_err ;
2009-06-05 20:35:15 +04:00
int use_wgather ;
2013-03-22 22:18:24 +04:00
loff_t pos = offset ;
2020-12-01 01:03:19 +03:00
unsigned long exp_op_flags = 0 ;
2014-05-12 05:22:47 +04:00
unsigned int pflags = current - > flags ;
2017-07-06 19:58:37 +03:00
rwf_t flags = 0 ;
2020-12-01 01:03:19 +03:00
bool restore_flags = false ;
2014-05-12 05:22:47 +04:00
2018-03-27 17:53:27 +03:00
trace_nfsd_write_opened ( rqstp , fhp , offset , * cnt ) ;
2020-12-01 01:03:19 +03:00
if ( sb - > s_export_op )
exp_op_flags = sb - > s_export_op - > flags ;
if ( test_bit ( RQ_LOCAL , & rqstp - > rq_flags ) & &
! ( exp_op_flags & EXPORT_OP_REMOTE_FS ) ) {
2014-05-12 05:22:47 +04:00
/*
mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
daemon needs to write to one bdi (the final bdi) in order to free up
writes queued to another bdi (the client bdi).
The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
pages, so that it can still dirty pages after other processses have been
throttled. The purpose of this is to avoid deadlock that happen when
the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
but it is being thottled and cannot write.
This approach was designed when all threads were blocked equally,
independently on which device they were writing to, or how fast it was.
Since that time the writeback algorithm has changed substantially with
different threads getting different allowances based on non-trivial
heuristics. This means the simple "add 25%" heuristic is no longer
reliable.
The important issue is not that the daemon needs a *larger* dirty page
allowance, but that it needs a *private* dirty page allowance, so that
dirty pages for the "client" bdi that it is helping to clear (the bdi
for an NFS filesystem or loop block device etc) do not affect the
throttling of the daemon writing to the "final" bdi.
This patch changes the heuristic so that the task is not throttled when
the bdi it is writing to has a dirty page count below below (or equal
to) the free-run threshold for that bdi. This ensures it will always be
able to have some pages in flight, and so will not deadlock.
In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
still be throttled by global threshold, but that is acceptable as it is
only the deadlock state that is interesting for this flag.
This approach of "only throttle when target bdi is busy" is consistent
with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
it causes attention to be focussed only on the target bdi.
So this patch
- renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
- removes the 25% bonus that that flag gives, and
- If PF_LOCAL_THROTTLE is set, don't delay at all unless the
global and the local free-run thresholds are exceeded.
Note that previously realtime threads were treated the same as
PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour
for real-time threads, so it is now different from the behaviour of nfsd
and loop tasks. I don't know what is wanted for realtime.
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd]
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 07:48:18 +03:00
* We want throttling in balance_dirty_pages ( )
* and shrink_inactive_list ( ) to only consider
* the backingdev we are writing to , so that nfs to
2014-05-12 05:22:47 +04:00
* localhost doesn ' t cause nfsd to lock up due to all
* the client ' s dirty pages or its congested queue .
*/
mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
daemon needs to write to one bdi (the final bdi) in order to free up
writes queued to another bdi (the client bdi).
The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
pages, so that it can still dirty pages after other processses have been
throttled. The purpose of this is to avoid deadlock that happen when
the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
but it is being thottled and cannot write.
This approach was designed when all threads were blocked equally,
independently on which device they were writing to, or how fast it was.
Since that time the writeback algorithm has changed substantially with
different threads getting different allowances based on non-trivial
heuristics. This means the simple "add 25%" heuristic is no longer
reliable.
The important issue is not that the daemon needs a *larger* dirty page
allowance, but that it needs a *private* dirty page allowance, so that
dirty pages for the "client" bdi that it is helping to clear (the bdi
for an NFS filesystem or loop block device etc) do not affect the
throttling of the daemon writing to the "final" bdi.
This patch changes the heuristic so that the task is not throttled when
the bdi it is writing to has a dirty page count below below (or equal
to) the free-run threshold for that bdi. This ensures it will always be
able to have some pages in flight, and so will not deadlock.
In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
still be throttled by global threshold, but that is acceptable as it is
only the deadlock state that is interesting for this flag.
This approach of "only throttle when target bdi is busy" is consistent
with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
it causes attention to be focussed only on the target bdi.
So this patch
- renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
- removes the 25% bonus that that flag gives, and
- If PF_LOCAL_THROTTLE is set, don't delay at all unless the
global and the local free-run thresholds are exceeded.
Note that previously realtime threads were treated the same as
PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour
for real-time threads, so it is now different from the behaviour of nfsd
and loop tasks. I don't know what is wanted for realtime.
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd]
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 07:48:18 +03:00
current - > flags | = PF_LOCAL_THROTTLE ;
2020-12-01 01:03:19 +03:00
restore_flags = true ;
}
2005-04-17 02:20:36 +04:00
2016-12-31 16:00:21 +03:00
exp = fhp - > fh_export ;
2009-06-05 20:35:15 +04:00
use_wgather = ( rqstp - > rq_vers = = 2 ) & & EX_WGATHER ( exp ) ;
2005-04-17 02:20:36 +04:00
if ( ! EX_ISSYNC ( exp ) )
2016-12-31 15:59:53 +03:00
stable = NFS_UNSTABLE ;
2005-04-17 02:20:36 +04:00
2016-04-07 18:52:04 +03:00
if ( stable & & ! use_wgather )
flags | = RWF_SYNC ;
2022-09-16 03:25:47 +03:00
iov_iter_kvec ( & iter , ITER_SOURCE , vec , vlen , * cnt ) ;
2021-12-19 04:38:01 +03:00
since = READ_ONCE ( file - > f_wb_err ) ;
2021-12-28 22:19:41 +03:00
if ( verf )
2021-12-30 18:22:05 +03:00
nfsd_copy_write_verifier ( verf , nn ) ;
2021-12-28 22:19:41 +03:00
host_err = vfs_iter_write ( file , & iter , & pos , flags ) ;
2020-01-06 21:40:31 +03:00
if ( host_err < 0 ) {
2023-09-11 21:43:57 +03:00
commit_reset_write_verifier ( nn , rqstp , host_err ) ;
2009-06-16 06:07:13 +04:00
goto out_nfserr ;
2020-01-06 21:40:31 +03:00
}
2019-12-17 20:33:33 +03:00
* cnt = host_err ;
2024-01-26 18:39:47 +03:00
nfsd_stats_io_write_add ( nn , exp , * cnt ) ;
2009-12-18 05:24:21 +03:00
fsnotify_modify ( file ) ;
2021-12-19 04:38:01 +03:00
host_err = filemap_check_wb_err ( file - > f_mapping , since ) ;
if ( host_err < 0 )
goto out_nfserr ;
2005-04-17 02:20:36 +04:00
2019-09-02 20:02:58 +03:00
if ( stable & & use_wgather ) {
2016-04-07 18:52:04 +03:00
host_err = wait_for_concurrent_writes ( file ) ;
2023-09-11 21:43:57 +03:00
if ( host_err < 0 )
commit_reset_write_verifier ( nn , rqstp , host_err ) ;
2019-09-02 20:02:58 +03:00
}
2005-04-17 02:20:36 +04:00
2009-06-16 06:07:13 +04:00
out_nfserr :
2018-03-27 17:53:27 +03:00
if ( host_err > = 0 ) {
trace_nfsd_write_io_done ( rqstp , fhp , offset , * cnt ) ;
nfserr = nfs_ok ;
} else {
trace_nfsd_write_err ( rqstp , fhp , offset , host_err ) ;
nfserr = nfserrno ( host_err ) ;
}
2020-12-01 01:03:19 +03:00
if ( restore_flags )
mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
daemon needs to write to one bdi (the final bdi) in order to free up
writes queued to another bdi (the client bdi).
The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
pages, so that it can still dirty pages after other processses have been
throttled. The purpose of this is to avoid deadlock that happen when
the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
but it is being thottled and cannot write.
This approach was designed when all threads were blocked equally,
independently on which device they were writing to, or how fast it was.
Since that time the writeback algorithm has changed substantially with
different threads getting different allowances based on non-trivial
heuristics. This means the simple "add 25%" heuristic is no longer
reliable.
The important issue is not that the daemon needs a *larger* dirty page
allowance, but that it needs a *private* dirty page allowance, so that
dirty pages for the "client" bdi that it is helping to clear (the bdi
for an NFS filesystem or loop block device etc) do not affect the
throttling of the daemon writing to the "final" bdi.
This patch changes the heuristic so that the task is not throttled when
the bdi it is writing to has a dirty page count below below (or equal
to) the free-run threshold for that bdi. This ensures it will always be
able to have some pages in flight, and so will not deadlock.
In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
still be throttled by global threshold, but that is acceptable as it is
only the deadlock state that is interesting for this flag.
This approach of "only throttle when target bdi is busy" is consistent
with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
it causes attention to be focussed only on the target bdi.
So this patch
- renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
- removes the 25% bonus that that flag gives, and
- If PF_LOCAL_THROTTLE is set, don't delay at all unless the
global and the local free-run thresholds are exceeded.
Note that previously realtime threads were treated the same as
PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour
for real-time threads, so it is now different from the behaviour of nfsd
and loop tasks. I don't know what is wanted for realtime.
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd]
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 07:48:18 +03:00
current_restore_flags ( pflags , PF_LOCAL_THROTTLE ) ;
2018-03-27 17:53:27 +03:00
return nfserr ;
2005-04-17 02:20:36 +04:00
}
2023-11-18 01:14:33 +03:00
/**
* nfsd_read_splice_ok - check if spliced reading is supported
* @ rqstp : RPC transaction context
*
* Return values :
* % true : nfsd_splice_read ( ) may be used
* % false : nfsd_splice_read ( ) must not be used
*
* NFS READ normally uses splice to send data in - place . However the
* data in cache can change after the reply ' s MIC is computed but
* before the RPC reply is sent . To prevent the client from
* rejecting the server - computed MIC in this somewhat rare case , do
* not use splice with the GSS integrity and privacy services .
*/
bool nfsd_read_splice_ok ( struct svc_rqst * rqstp )
{
switch ( svc_auth_flavor ( rqstp ) ) {
case RPC_AUTH_GSS_KRB5I :
case RPC_AUTH_GSS_KRB5P :
return false ;
}
return true ;
}
2023-05-18 20:45:56 +03:00
/**
* nfsd_read - Read data from a file
* @ rqstp : RPC transaction context
* @ fhp : file handle of file to be read
* @ offset : starting byte offset
* @ count : IN : requested number of bytes ; OUT : number of bytes read
* @ eof : OUT : set non - zero if operation reached the end of the file
*
* The caller must verify that there is enough space in @ rqstp . rq_res
* to perform this operation .
*
2014-03-19 01:01:51 +04:00
* N . B . After this call fhp needs an fh_put
2023-05-18 20:45:56 +03:00
*
* Returns nfs_ok on success , otherwise an nfserr stat value is
* returned .
2014-03-19 01:01:51 +04:00
*/
__be32 nfsd_read ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
2023-05-18 20:45:56 +03:00
loff_t offset , unsigned long * count , u32 * eof )
2014-03-19 01:01:51 +04:00
{
2019-08-18 21:18:50 +03:00
struct nfsd_file * nf ;
2014-03-19 01:01:51 +04:00
struct file * file ;
__be32 err ;
2018-03-27 17:53:11 +03:00
trace_nfsd_read_start ( rqstp , fhp , offset , * count ) ;
2022-10-28 17:46:51 +03:00
err = nfsd_file_acquire_gc ( rqstp , fhp , NFSD_MAY_READ , & nf ) ;
2014-03-19 01:01:51 +04:00
if ( err )
return err ;
2019-08-18 21:18:50 +03:00
file = nf - > nf_file ;
2023-11-18 01:14:33 +03:00
if ( file - > f_op - > splice_read & & nfsd_read_splice_ok ( rqstp ) )
2019-08-26 20:03:11 +03:00
err = nfsd_splice_read ( rqstp , fhp , file , offset , count , eof ) ;
2017-05-27 11:16:54 +03:00
else
2023-05-18 20:45:56 +03:00
err = nfsd_iter_read ( rqstp , fhp , file , offset , count , 0 , eof ) ;
2015-11-17 14:52:23 +03:00
2019-08-18 21:18:50 +03:00
nfsd_file_put ( nf ) ;
2018-03-27 17:53:11 +03:00
trace_nfsd_read_done ( rqstp , fhp , offset , * count ) ;
2010-07-28 00:48:54 +04:00
return err ;
}
2005-04-17 02:20:36 +04:00
/*
* Write data to a file .
* The stable flag requests synchronous writes .
* N . B . After this call fhp needs an fh_put
*/
2006-10-20 10:28:58 +04:00
__be32
2016-12-31 16:00:13 +03:00
nfsd_write ( struct svc_rqst * rqstp , struct svc_fh * fhp , loff_t offset ,
2020-01-06 21:40:37 +03:00
struct kvec * vec , int vlen , unsigned long * cnt , int stable ,
__be32 * verf )
2005-04-17 02:20:36 +04:00
{
2019-08-18 21:18:49 +03:00
struct nfsd_file * nf ;
__be32 err ;
2005-04-17 02:20:36 +04:00
2018-03-27 17:53:11 +03:00
trace_nfsd_write_start ( rqstp , fhp , offset , * cnt ) ;
2015-11-17 14:52:23 +03:00
2022-10-28 17:46:51 +03:00
err = nfsd_file_acquire_gc ( rqstp , fhp , NFSD_MAY_WRITE , & nf ) ;
2016-12-31 16:00:13 +03:00
if ( err )
goto out ;
2020-01-06 21:40:29 +03:00
err = nfsd_vfs_write ( rqstp , fhp , nf , offset , vec ,
2020-01-06 21:40:37 +03:00
vlen , cnt , stable , verf ) ;
2019-08-18 21:18:49 +03:00
nfsd_file_put ( nf ) ;
2005-04-17 02:20:36 +04:00
out :
2018-03-27 17:53:11 +03:00
trace_nfsd_write_done ( rqstp , fhp , offset , * cnt ) ;
2005-04-17 02:20:36 +04:00
return err ;
}
NFSD: COMMIT operations must not return NFS?ERR_INVAL
Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.
However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:
NFS3ERR_IO
NFS3ERR_STALE
NFS3ERR_BADHANDLE
NFS3ERR_SERVERFAULT
NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.
RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
specify when it can or should be used.
Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.
The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.
Note that this change results in a new pynfs failure:
CMT4 st_commit.testCommitOverflow : RUNNING
CMT4 st_commit.testCommitOverflow : FAILURE
COMMIT with offset + count overflow should return
NFS4ERR_INVAL, instead got NFS4_OK
IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.
Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-24 23:50:31 +03:00
/**
* nfsd_commit - Commit pending writes to stable storage
* @ rqstp : RPC request being processed
* @ fhp : NFS filehandle
2022-10-28 17:46:38 +03:00
* @ nf : target file
NFSD: COMMIT operations must not return NFS?ERR_INVAL
Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.
However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:
NFS3ERR_IO
NFS3ERR_STALE
NFS3ERR_BADHANDLE
NFS3ERR_SERVERFAULT
NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.
RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
specify when it can or should be used.
Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.
The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.
Note that this change results in a new pynfs failure:
CMT4 st_commit.testCommitOverflow : RUNNING
CMT4 st_commit.testCommitOverflow : FAILURE
COMMIT with offset + count overflow should return
NFS4ERR_INVAL, instead got NFS4_OK
IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.
Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-24 23:50:31 +03:00
* @ offset : raw offset from beginning of file
* @ count : raw count of bytes to sync
* @ verf : filled in with the server ' s current write verifier
2010-01-30 00:44:25 +03:00
*
NFSD: COMMIT operations must not return NFS?ERR_INVAL
Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.
However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:
NFS3ERR_IO
NFS3ERR_STALE
NFS3ERR_BADHANDLE
NFS3ERR_SERVERFAULT
NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.
RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
specify when it can or should be used.
Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.
The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.
Note that this change results in a new pynfs failure:
CMT4 st_commit.testCommitOverflow : RUNNING
CMT4 st_commit.testCommitOverflow : FAILURE
COMMIT with offset + count overflow should return
NFS4ERR_INVAL, instead got NFS4_OK
IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.
Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-24 23:50:31 +03:00
* Note : we guarantee that data that lies within the range specified
* by the ' offset ' and ' count ' parameters will be synced . The server
* is permitted to sync data that lies outside this range at the
* same time .
2005-04-17 02:20:36 +04:00
*
* Unfortunately we cannot lock the file to make sure we return full WCC
* data to the client , as locking happens lower down in the filesystem .
NFSD: COMMIT operations must not return NFS?ERR_INVAL
Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.
However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:
NFS3ERR_IO
NFS3ERR_STALE
NFS3ERR_BADHANDLE
NFS3ERR_SERVERFAULT
NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.
RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
specify when it can or should be used.
Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.
The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.
Note that this change results in a new pynfs failure:
CMT4 st_commit.testCommitOverflow : RUNNING
CMT4 st_commit.testCommitOverflow : FAILURE
COMMIT with offset + count overflow should return
NFS4ERR_INVAL, instead got NFS4_OK
IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.
Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-24 23:50:31 +03:00
*
* Return values :
* An nfsstat value in network byte order .
2005-04-17 02:20:36 +04:00
*/
2006-10-20 10:28:58 +04:00
__be32
2022-10-28 17:46:38 +03:00
nfsd_commit ( struct svc_rqst * rqstp , struct svc_fh * fhp , struct nfsd_file * nf ,
u64 offset , u32 count , __be32 * verf )
2005-04-17 02:20:36 +04:00
{
2022-10-28 17:46:38 +03:00
__be32 err = nfs_ok ;
NFSD: COMMIT operations must not return NFS?ERR_INVAL
Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.
However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:
NFS3ERR_IO
NFS3ERR_STALE
NFS3ERR_BADHANDLE
NFS3ERR_SERVERFAULT
NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.
RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
specify when it can or should be used.
Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.
The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.
Note that this change results in a new pynfs failure:
CMT4 st_commit.testCommitOverflow : RUNNING
CMT4 st_commit.testCommitOverflow : FAILURE
COMMIT with offset + count overflow should return
NFS4ERR_INVAL, instead got NFS4_OK
IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.
Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-24 23:50:31 +03:00
u64 maxbytes ;
loff_t start , end ;
2021-12-28 22:26:03 +03:00
struct nfsd_net * nn ;
NFSD: COMMIT operations must not return NFS?ERR_INVAL
Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.
However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:
NFS3ERR_IO
NFS3ERR_STALE
NFS3ERR_BADHANDLE
NFS3ERR_SERVERFAULT
NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.
RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
specify when it can or should be used.
Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.
The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.
Note that this change results in a new pynfs failure:
CMT4 st_commit.testCommitOverflow : RUNNING
CMT4 st_commit.testCommitOverflow : FAILURE
COMMIT with offset + count overflow should return
NFS4ERR_INVAL, instead got NFS4_OK
IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.
Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-24 23:50:31 +03:00
/*
* Convert the client - provided ( offset , count ) range to a
* ( start , end ) range . If the client - provided range falls
* outside the maximum file size of the underlying FS ,
* clamp the sync range appropriately .
*/
start = 0 ;
end = LLONG_MAX ;
maxbytes = ( u64 ) fhp - > fh_dentry - > d_sb - > s_maxbytes ;
if ( offset < maxbytes ) {
start = offset ;
if ( count & & ( offset + count - 1 < maxbytes ) )
end = offset + count - 1 ;
}
2021-12-28 22:26:03 +03:00
nn = net_generic ( nf - > nf_net , nfsd_net_id ) ;
2005-04-17 02:20:36 +04:00
if ( EX_ISSYNC ( fhp - > fh_export ) ) {
2021-12-19 04:38:01 +03:00
errseq_t since = READ_ONCE ( nf - > nf_file - > f_wb_err ) ;
int err2 ;
2010-01-30 00:44:25 +03:00
NFSD: COMMIT operations must not return NFS?ERR_INVAL
Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.
However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:
NFS3ERR_IO
NFS3ERR_STALE
NFS3ERR_BADHANDLE
NFS3ERR_SERVERFAULT
NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.
RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
specify when it can or should be used.
Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.
The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.
Note that this change results in a new pynfs failure:
CMT4 st_commit.testCommitOverflow : RUNNING
CMT4 st_commit.testCommitOverflow : FAILURE
COMMIT with offset + count overflow should return
NFS4ERR_INVAL, instead got NFS4_OK
IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.
Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-24 23:50:31 +03:00
err2 = vfs_fsync_range ( nf - > nf_file , start , end , 0 ) ;
2019-09-02 20:02:58 +03:00
switch ( err2 ) {
case 0 :
2021-12-30 18:22:05 +03:00
nfsd_copy_write_verifier ( verf , nn ) ;
2021-12-19 04:38:01 +03:00
err2 = filemap_check_wb_err ( nf - > nf_file - > f_mapping ,
since ) ;
2022-06-25 23:52:43 +03:00
err = nfserrno ( err2 ) ;
2019-09-02 20:02:58 +03:00
break ;
case - EINVAL :
2005-04-17 02:20:36 +04:00
err = nfserr_notsupp ;
2019-09-02 20:02:58 +03:00
break ;
default :
2023-09-11 21:43:57 +03:00
commit_reset_write_verifier ( nn , rqstp , err2 ) ;
2022-06-25 23:52:43 +03:00
err = nfserrno ( err2 ) ;
2019-09-02 20:02:58 +03:00
}
2020-01-06 21:40:36 +03:00
} else
2021-12-30 18:22:05 +03:00
nfsd_copy_write_verifier ( verf , nn ) ;
2005-04-17 02:20:36 +04:00
return err ;
}
2022-03-28 23:10:17 +03:00
/**
* nfsd_create_setattr - Set a created file ' s attributes
* @ rqstp : RPC transaction being executed
* @ fhp : NFS filehandle of parent directory
* @ resfhp : NFS filehandle of new object
2022-07-26 09:45:30 +03:00
* @ attrs : requested attributes of new object
2022-03-28 23:10:17 +03:00
*
* Returns nfs_ok on success , or an nfsstat in network byte order .
*/
__be32
nfsd_create_setattr ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
2022-07-26 09:45:30 +03:00
struct svc_fh * resfhp , struct nfsd_attrs * attrs )
2007-12-01 00:55:23 +03:00
{
2022-07-26 09:45:30 +03:00
struct iattr * iap = attrs - > na_iattr ;
2022-03-28 23:10:17 +03:00
__be32 status ;
2007-12-01 00:55:23 +03:00
/*
2022-03-28 23:10:17 +03:00
* Mode has already been set by file creation .
2007-12-01 00:55:23 +03:00
*/
iap - > ia_valid & = ~ ATTR_MODE ;
2022-03-28 23:10:17 +03:00
2007-12-01 00:55:23 +03:00
/*
* Setting uid / gid works only for root . Irix appears to
* send along the gid on create when it tries to implement
* setgid directories via NFS :
*/
2013-02-02 18:53:11 +04:00
if ( ! uid_eq ( current_fsuid ( ) , GLOBAL_ROOT_UID ) )
2007-12-01 00:55:23 +03:00
iap - > ia_valid & = ~ ( ATTR_UID | ATTR_GID ) ;
2022-03-28 23:10:17 +03:00
/*
* Callers expect new file metadata to be committed even
* if the attributes have not changed .
*/
2007-12-01 00:55:23 +03:00
if ( iap - > ia_valid )
2024-02-16 04:24:51 +03:00
status = nfsd_setattr ( rqstp , resfhp , attrs , NULL ) ;
2022-03-28 23:10:17 +03:00
else
status = nfserrno ( commit_metadata ( resfhp ) ) ;
/*
* Transactional filesystems had a chance to commit changes
* for both parent and child simultaneously making the
* following commit_metadata a noop in many cases .
*/
if ( ! status )
status = nfserrno ( commit_metadata ( fhp ) ) ;
/*
* Update the new filehandle to pick up the new attributes .
*/
if ( ! status )
status = fh_update ( resfhp ) ;
return status ;
2007-12-01 00:55:23 +03:00
}
2009-02-10 06:27:51 +03:00
/* HPUX client sometimes creates a file in mode 000, and sets size to 0.
* setting size to 0 may fail for some specific file systems by the permission
* checking which requires WRITE permission but the mode is 000.
* we ignore the resizing ( to 0 ) on the just new created file , since the size is
* 0 after file created .
*
* call this only after vfs_create ( ) is called .
* */
static void
nfsd_check_ignore_resizing ( struct iattr * iap )
{
if ( ( iap - > ia_valid & ATTR_SIZE ) & & ( iap - > ia_size = = 0 ) )
iap - > ia_valid & = ~ ATTR_SIZE ;
}
2016-07-20 23:16:06 +03:00
/* The parent directory should already be locked: */
2006-10-20 10:28:58 +04:00
__be32
2016-07-20 23:16:06 +03:00
nfsd_create_locked ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
2022-09-06 03:42:19 +03:00
struct nfsd_attrs * attrs ,
2022-07-26 09:45:30 +03:00
int type , dev_t rdev , struct svc_fh * resfhp )
2005-04-17 02:20:36 +04:00
{
2016-08-03 22:05:00 +03:00
struct dentry * dentry , * dchild ;
2005-04-17 02:20:36 +04:00
struct inode * dirp ;
2022-07-26 09:45:30 +03:00
struct iattr * iap = attrs - > na_iattr ;
2006-10-20 10:28:58 +04:00
__be32 err ;
int host_err ;
2005-04-17 02:20:36 +04:00
dentry = fhp - > fh_dentry ;
2015-03-18 01:25:59 +03:00
dirp = d_inode ( dentry ) ;
2005-04-17 02:20:36 +04:00
2016-07-20 23:16:06 +03:00
dchild = dget ( resfhp - > fh_dentry ) ;
2016-07-15 06:20:22 +03:00
err = nfsd_permission ( rqstp , fhp - > fh_export , dentry , NFSD_MAY_CREATE ) ;
if ( err )
goto out ;
2005-04-17 02:20:36 +04:00
if ( ! ( iap - > ia_valid & ATTR_MODE ) )
iap - > ia_mode = 0 ;
iap - > ia_mode = ( iap - > ia_mode & S_IALLUGO ) | type ;
2020-06-16 23:43:18 +03:00
if ( ! IS_POSIXACL ( dirp ) )
iap - > ia_mode & = ~ current_umask ( ) ;
2006-11-09 04:44:59 +03:00
err = 0 ;
2005-04-17 02:20:36 +04:00
switch ( type ) {
case S_IFREG :
2023-01-13 14:49:10 +03:00
host_err = vfs_create ( & nop_mnt_idmap , dirp , dchild ,
iap - > ia_mode , true ) ;
2009-02-10 06:27:51 +03:00
if ( ! host_err )
nfsd_check_ignore_resizing ( iap ) ;
2005-04-17 02:20:36 +04:00
break ;
case S_IFDIR :
2023-01-13 14:49:10 +03:00
host_err = vfs_mkdir ( & nop_mnt_idmap , dirp , dchild , iap - > ia_mode ) ;
2018-05-12 00:03:19 +03:00
if ( ! host_err & & unlikely ( d_unhashed ( dchild ) ) ) {
struct dentry * d ;
d = lookup_one_len ( dchild - > d_name . name ,
dchild - > d_parent ,
dchild - > d_name . len ) ;
if ( IS_ERR ( d ) ) {
host_err = PTR_ERR ( d ) ;
break ;
}
if ( unlikely ( d_is_negative ( d ) ) ) {
dput ( d ) ;
err = nfserr_serverfault ;
goto out ;
}
dput ( resfhp - > fh_dentry ) ;
resfhp - > fh_dentry = dget ( d ) ;
err = fh_update ( resfhp ) ;
dput ( dchild ) ;
dchild = d ;
if ( err )
goto out ;
}
2005-04-17 02:20:36 +04:00
break ;
case S_IFCHR :
case S_IFBLK :
case S_IFIFO :
case S_IFSOCK :
2023-01-13 14:49:10 +03:00
host_err = vfs_mknod ( & nop_mnt_idmap , dirp , dchild ,
2021-01-21 16:19:33 +03:00
iap - > ia_mode , rdev ) ;
2005-04-17 02:20:36 +04:00
break ;
2016-07-22 19:03:46 +03:00
default :
printk ( KERN_WARNING " nfsd: bad file type %o in nfsd_create \n " ,
type ) ;
host_err = - EINVAL ;
2005-04-17 02:20:36 +04:00
}
2012-06-12 18:20:33 +04:00
if ( host_err < 0 )
2005-04-17 02:20:36 +04:00
goto out_nfserr ;
2022-07-26 09:45:30 +03:00
err = nfsd_create_setattr ( rqstp , fhp , resfhp , attrs ) ;
2005-04-17 02:20:36 +04:00
out :
2016-08-03 22:05:00 +03:00
dput ( dchild ) ;
2005-04-17 02:20:36 +04:00
return err ;
out_nfserr :
2006-10-20 10:28:58 +04:00
err = nfserrno ( host_err ) ;
2005-04-17 02:20:36 +04:00
goto out ;
}
2016-07-20 23:16:06 +03:00
/*
* Create a filesystem object ( regular , directory , special ) .
* Note that the parent directory is left locked .
*
* N . B . Every call to nfsd_create needs an fh_put for _both_ fhp and resfhp
*/
__be32
nfsd_create ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
2022-07-26 09:45:30 +03:00
char * fname , int flen , struct nfsd_attrs * attrs ,
int type , dev_t rdev , struct svc_fh * resfhp )
2016-07-20 23:16:06 +03:00
{
struct dentry * dentry , * dchild = NULL ;
__be32 err ;
int host_err ;
if ( isdotent ( fname , flen ) )
return nfserr_exist ;
2016-07-21 23:00:12 +03:00
err = fh_verify ( rqstp , fhp , S_IFDIR , NFSD_MAY_NOP ) ;
2016-07-20 23:16:06 +03:00
if ( err )
return err ;
dentry = fhp - > fh_dentry ;
host_err = fh_want_write ( fhp ) ;
if ( host_err )
return nfserrno ( host_err ) ;
2022-07-26 09:45:30 +03:00
inode_lock_nested ( dentry - > d_inode , I_MUTEX_PARENT ) ;
2016-07-20 23:16:06 +03:00
dchild = lookup_one_len ( fname , dentry , flen ) ;
host_err = PTR_ERR ( dchild ) ;
2022-07-26 09:45:30 +03:00
if ( IS_ERR ( dchild ) ) {
err = nfserrno ( host_err ) ;
goto out_unlock ;
}
2016-07-20 23:16:06 +03:00
err = fh_compose ( resfhp , fhp - > fh_export , dchild , fhp ) ;
2016-08-10 21:46:27 +03:00
/*
* We unconditionally drop our ref to dchild as fh_compose will have
* already grabbed its own ref for it .
*/
dput ( dchild ) ;
if ( err )
2022-07-26 09:45:30 +03:00
goto out_unlock ;
2023-07-21 17:29:10 +03:00
err = fh_fill_pre_attrs ( fhp ) ;
if ( err ! = nfs_ok )
goto out_unlock ;
2022-09-06 03:42:19 +03:00
err = nfsd_create_locked ( rqstp , fhp , attrs , type , rdev , resfhp ) ;
2022-07-26 09:45:30 +03:00
fh_fill_post_attrs ( fhp ) ;
2022-07-26 09:45:30 +03:00
out_unlock :
2022-07-26 09:45:30 +03:00
inode_unlock ( dentry - > d_inode ) ;
2022-07-26 09:45:30 +03:00
return err ;
2016-07-20 23:16:06 +03:00
}
2005-04-17 02:20:36 +04:00
/*
* Read a symlink . On entry , * lenp must contain the maximum path length that
* fits into the buffer . On return , it contains the true length .
* N . B . After this call fhp needs an fh_put
*/
2006-10-20 10:28:58 +04:00
__be32
2005-04-17 02:20:36 +04:00
nfsd_readlink ( struct svc_rqst * rqstp , struct svc_fh * fhp , char * buf , int * lenp )
{
2006-10-20 10:28:58 +04:00
__be32 err ;
2017-05-27 23:11:23 +03:00
const char * link ;
2012-03-15 16:21:57 +04:00
struct path path ;
2017-05-27 23:11:23 +03:00
DEFINE_DELAYED_CALL ( done ) ;
int len ;
2005-04-17 02:20:36 +04:00
2008-06-16 15:20:29 +04:00
err = fh_verify ( rqstp , fhp , S_IFLNK , NFSD_MAY_NOP ) ;
2017-05-27 23:11:23 +03:00
if ( unlikely ( err ) )
return err ;
2005-04-17 02:20:36 +04:00
2012-03-15 16:21:57 +04:00
path . mnt = fhp - > fh_export - > ex_path . mnt ;
path . dentry = fhp - > fh_dentry ;
2005-04-17 02:20:36 +04:00
2017-05-27 23:11:23 +03:00
if ( unlikely ( ! d_is_symlink ( path . dentry ) ) )
return nfserr_inval ;
2005-04-17 02:20:36 +04:00
2012-03-15 16:21:57 +04:00
touch_atime ( & path ) ;
2005-04-17 02:20:36 +04:00
2017-05-27 23:11:23 +03:00
link = vfs_get_link ( path . dentry , & done ) ;
if ( IS_ERR ( link ) )
return nfserrno ( PTR_ERR ( link ) ) ;
2005-04-17 02:20:36 +04:00
2017-05-27 23:11:23 +03:00
len = strlen ( link ) ;
if ( len < * lenp )
* lenp = len ;
memcpy ( buf , link , * lenp ) ;
do_delayed_call ( & done ) ;
return 0 ;
2005-04-17 02:20:36 +04:00
}
2022-07-26 09:45:30 +03:00
/**
* nfsd_symlink - Create a symlink and look up its inode
* @ rqstp : RPC transaction being executed
* @ fhp : NFS filehandle of parent directory
* @ fname : filename of the new symlink
* @ flen : length of @ fname
* @ path : content of the new symlink ( NUL - terminated )
* @ attrs : requested attributes of new object
* @ resfhp : NFS filehandle of new object
*
2005-04-17 02:20:36 +04:00
* N . B . After this call _both_ fhp and resfhp need an fh_put
2022-07-26 09:45:30 +03:00
*
* Returns nfs_ok on success , or an nfsstat in network byte order .
2005-04-17 02:20:36 +04:00
*/
2006-10-20 10:28:58 +04:00
__be32
2005-04-17 02:20:36 +04:00
nfsd_symlink ( struct svc_rqst * rqstp , struct svc_fh * fhp ,
2022-07-26 09:45:30 +03:00
char * fname , int flen ,
char * path , struct nfsd_attrs * attrs ,
struct svc_fh * resfhp )
2005-04-17 02:20:36 +04:00
{
struct dentry * dentry , * dnew ;
2006-10-20 10:28:58 +04:00
__be32 err , cerr ;
int host_err ;
2005-04-17 02:20:36 +04:00
err = nfserr_noent ;
2014-06-20 19:52:21 +04:00
if ( ! flen | | path [ 0 ] = = ' \0 ' )
2005-04-17 02:20:36 +04:00
goto out ;
err = nfserr_exist ;
if ( isdotent ( fname , flen ) )
goto out ;
2008-06-16 15:20:29 +04:00
err = fh_verify ( rqstp , fhp , S_IFDIR , NFSD_MAY_CREATE ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto out ;
2012-06-12 18:20:33 +04:00
host_err = fh_want_write ( fhp ) ;
2022-07-26 09:45:30 +03:00
if ( host_err ) {
err = nfserrno ( host_err ) ;
goto out ;
}
2012-06-12 18:20:33 +04:00
2005-04-17 02:20:36 +04:00
dentry = fhp - > fh_dentry ;
2022-07-26 09:45:30 +03:00
inode_lock_nested ( dentry - > d_inode , I_MUTEX_PARENT ) ;
2005-04-17 02:20:36 +04:00
dnew = lookup_one_len ( fname , dentry , flen ) ;
2022-07-26 09:45:30 +03:00
if ( IS_ERR ( dnew ) ) {
err = nfserrno ( PTR_ERR ( dnew ) ) ;
2022-07-26 09:45:30 +03:00
inode_unlock ( dentry - > d_inode ) ;
2022-07-26 09:45:30 +03:00
goto out_drop_write ;
}
2023-07-21 17:29:10 +03:00
err = fh_fill_pre_attrs ( fhp ) ;
if ( err ! = nfs_ok )
goto out_unlock ;
2023-01-13 14:49:10 +03:00
host_err = vfs_symlink ( & nop_mnt_idmap , d_inode ( dentry ) , dnew , path ) ;
2006-10-20 10:28:58 +04:00
err = nfserrno ( host_err ) ;
2022-07-26 09:45:30 +03:00
cerr = fh_compose ( resfhp , fhp - > fh_export , dnew , fhp ) ;
if ( ! err )
nfsd_create_setattr ( rqstp , fhp , resfhp , attrs ) ;
2022-07-26 09:45:30 +03:00
fh_fill_post_attrs ( fhp ) ;
2023-07-21 17:29:10 +03:00
out_unlock :
2022-07-26 09:45:30 +03:00
inode_unlock ( dentry - > d_inode ) ;
2010-02-17 23:05:11 +03:00
if ( ! err )
err = nfserrno ( commit_metadata ( fhp ) ) ;
2005-04-17 02:20:36 +04:00
dput ( dnew ) ;
if ( err = = 0 ) err = cerr ;
2022-07-26 09:45:30 +03:00
out_drop_write :
fh_drop_write ( fhp ) ;
2005-04-17 02:20:36 +04:00
out :
return err ;
}
/*
* Create a hardlink
* N . B . After this call _both_ ffhp and tfhp need an fh_put
*/
2006-10-20 10:28:58 +04:00
__be32
2005-04-17 02:20:36 +04:00
nfsd_link ( struct svc_rqst * rqstp , struct svc_fh * ffhp ,
char * name , int len , struct svc_fh * tfhp )
{
struct dentry * ddir , * dnew , * dold ;
2010-07-20 00:38:24 +04:00
struct inode * dirp ;
2006-10-20 10:28:58 +04:00
__be32 err ;
int host_err ;
2005-04-17 02:20:36 +04:00
2008-06-16 15:20:29 +04:00
err = fh_verify ( rqstp , ffhp , S_IFDIR , NFSD_MAY_CREATE ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto out ;
2011-08-16 00:59:55 +04:00
err = fh_verify ( rqstp , tfhp , 0 , NFSD_MAY_NOP ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto out ;
2011-08-16 00:59:55 +04:00
err = nfserr_isdir ;
VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-29 15:02:35 +03:00
if ( d_is_dir ( tfhp - > fh_dentry ) )
2011-08-16 00:59:55 +04:00
goto out ;
2005-04-17 02:20:36 +04:00
err = nfserr_perm ;
if ( ! len )
goto out ;
err = nfserr_exist ;
if ( isdotent ( name , len ) )
goto out ;
2012-06-12 18:20:33 +04:00
host_err = fh_want_write ( tfhp ) ;
if ( host_err ) {
err = nfserrno ( host_err ) ;
goto out ;
}
2005-04-17 02:20:36 +04:00
ddir = ffhp - > fh_dentry ;
2015-03-18 01:25:59 +03:00
dirp = d_inode ( ddir ) ;
2022-07-26 09:45:30 +03:00
inode_lock_nested ( dirp , I_MUTEX_PARENT ) ;
2005-04-17 02:20:36 +04:00
dnew = lookup_one_len ( name , ddir , len ) ;
2022-07-26 09:45:30 +03:00
if ( IS_ERR ( dnew ) ) {
err = nfserrno ( PTR_ERR ( dnew ) ) ;
goto out_unlock ;
}
2005-04-17 02:20:36 +04:00
dold = tfhp - > fh_dentry ;
2011-01-11 21:55:46 +03:00
err = nfserr_noent ;
2015-03-18 01:25:59 +03:00
if ( d_really_is_negative ( dold ) )
2012-06-12 18:20:33 +04:00
goto out_dput ;
2023-07-21 17:29:10 +03:00
err = fh_fill_pre_attrs ( ffhp ) ;
if ( err ! = nfs_ok )
goto out_dput ;
2023-01-13 14:49:10 +03:00
host_err = vfs_link ( dold , & nop_mnt_idmap , dirp , dnew , NULL ) ;
2022-07-26 09:45:30 +03:00
fh_fill_post_attrs ( ffhp ) ;
inode_unlock ( dirp ) ;
2006-10-20 10:28:58 +04:00
if ( ! host_err ) {
2010-02-17 23:05:11 +03:00
err = nfserrno ( commit_metadata ( ffhp ) ) ;
if ( ! err )
err = nfserrno ( commit_metadata ( tfhp ) ) ;
2005-04-17 02:20:36 +04:00
} else {
2006-10-20 10:28:58 +04:00
if ( host_err = = - EXDEV & & rqstp - > rq_vers = = 2 )
2005-04-17 02:20:36 +04:00
err = nfserr_acces ;
else
2006-10-20 10:28:58 +04:00
err = nfserrno ( host_err ) ;
2005-04-17 02:20:36 +04:00
}
dput ( dnew ) ;
2022-07-26 09:45:30 +03:00
out_drop_write :
2012-06-12 18:20:33 +04:00
fh_drop_write ( tfhp ) ;
2005-04-17 02:20:36 +04:00
out :
return err ;
2022-07-26 09:45:30 +03:00
out_dput :
dput ( dnew ) ;
out_unlock :
2022-07-26 09:45:30 +03:00
inode_unlock ( dirp ) ;
2022-07-26 09:45:30 +03:00
goto out_drop_write ;
2005-04-17 02:20:36 +04:00
}
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
static void
nfsd_close_cached_files ( struct dentry * dentry )
{
struct inode * inode = d_inode ( dentry ) ;
if ( inode & & S_ISREG ( inode - > i_mode ) )
nfsd_file_close_inode_sync ( inode ) ;
}
static bool
nfsd_has_cached_files ( struct dentry * dentry )
{
bool ret = false ;
struct inode * inode = d_inode ( dentry ) ;
if ( inode & & S_ISREG ( inode - > i_mode ) )
ret = nfsd_file_is_cached ( inode ) ;
return ret ;
}
2005-04-17 02:20:36 +04:00
/*
* Rename a file
* N . B . After this call _both_ ffhp and tfhp need an fh_put
*/
2006-10-20 10:28:58 +04:00
__be32
2005-04-17 02:20:36 +04:00
nfsd_rename ( struct svc_rqst * rqstp , struct svc_fh * ffhp , char * fname , int flen ,
struct svc_fh * tfhp , char * tname , int tlen )
{
struct dentry * fdentry , * tdentry , * odentry , * ndentry , * trap ;
struct inode * fdir , * tdir ;
2006-10-20 10:28:58 +04:00
__be32 err ;
int host_err ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
None of this is really necessary for "typical" filesystems though. It's
mostly of use for NFS, so declare a new export op flag and use that to
determine whether to close the files beforehand.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-01 01:03:16 +03:00
bool close_cached = false ;
2005-04-17 02:20:36 +04:00
2008-06-16 15:20:29 +04:00
err = fh_verify ( rqstp , ffhp , S_IFDIR , NFSD_MAY_REMOVE ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto out ;
2008-06-16 15:20:29 +04:00
err = fh_verify ( rqstp , tfhp , S_IFDIR , NFSD_MAY_CREATE ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto out ;
fdentry = ffhp - > fh_dentry ;
2015-03-18 01:25:59 +03:00
fdir = d_inode ( fdentry ) ;
2005-04-17 02:20:36 +04:00
tdentry = tfhp - > fh_dentry ;
2015-03-18 01:25:59 +03:00
tdir = d_inode ( tdentry ) ;
2005-04-17 02:20:36 +04:00
err = nfserr_perm ;
if ( ! flen | | isdotent ( fname , flen ) | | ! tlen | | isdotent ( tname , tlen ) )
goto out ;
2023-10-15 04:34:40 +03:00
err = ( rqstp - > rq_vers = = 2 ) ? nfserr_acces : nfserr_xdev ;
if ( ffhp - > fh_export - > ex_path . mnt ! = tfhp - > fh_export - > ex_path . mnt )
goto out ;
if ( ffhp - > fh_export - > ex_path . dentry ! = tfhp - > fh_export - > ex_path . dentry )
goto out ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
retry :
2012-06-12 18:20:33 +04:00
host_err = fh_want_write ( ffhp ) ;
if ( host_err ) {
err = nfserrno ( host_err ) ;
goto out ;
}
2005-04-17 02:20:36 +04:00
trap = lock_rename ( tdentry , fdentry ) ;
2023-11-21 04:02:11 +03:00
if ( IS_ERR ( trap ) ) {
err = ( rqstp - > rq_vers = = 2 ) ? nfserr_acces : nfserr_xdev ;
2024-03-18 19:32:09 +03:00
goto out_want_write ;
2023-11-21 04:02:11 +03:00
}
2023-07-21 17:29:10 +03:00
err = fh_fill_pre_attrs ( ffhp ) ;
if ( err ! = nfs_ok )
goto out_unlock ;
err = fh_fill_pre_attrs ( tfhp ) ;
if ( err ! = nfs_ok )
goto out_unlock ;
2005-04-17 02:20:36 +04:00
odentry = lookup_one_len ( fname , fdentry , flen ) ;
2006-10-20 10:28:58 +04:00
host_err = PTR_ERR ( odentry ) ;
2005-04-17 02:20:36 +04:00
if ( IS_ERR ( odentry ) )
goto out_nfserr ;
2006-10-20 10:28:58 +04:00
host_err = - ENOENT ;
2015-03-18 01:25:59 +03:00
if ( d_really_is_negative ( odentry ) )
2005-04-17 02:20:36 +04:00
goto out_dput_old ;
2006-10-20 10:28:58 +04:00
host_err = - EINVAL ;
2005-04-17 02:20:36 +04:00
if ( odentry = = trap )
goto out_dput_old ;
ndentry = lookup_one_len ( tname , tdentry , tlen ) ;
2006-10-20 10:28:58 +04:00
host_err = PTR_ERR ( ndentry ) ;
2005-04-17 02:20:36 +04:00
if ( IS_ERR ( ndentry ) )
goto out_dput_old ;
2006-10-20 10:28:58 +04:00
host_err = - ENOTEMPTY ;
2005-04-17 02:20:36 +04:00
if ( ndentry = = trap )
goto out_dput_new ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
None of this is really necessary for "typical" filesystems though. It's
mostly of use for NFS, so declare a new export op flag and use that to
determine whether to close the files beforehand.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-01 01:03:16 +03:00
if ( ( ndentry - > d_sb - > s_export_op - > flags & EXPORT_OP_CLOSE_BEFORE_UNLINK ) & &
nfsd_has_cached_files ( ndentry ) ) {
close_cached = true ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
goto out_dput_old ;
} else {
2021-01-21 16:19:32 +03:00
struct renamedata rd = {
2023-01-13 14:49:10 +03:00
. old_mnt_idmap = & nop_mnt_idmap ,
2021-01-21 16:19:32 +03:00
. old_dir = fdir ,
. old_dentry = odentry ,
2023-01-13 14:49:10 +03:00
. new_mnt_idmap = & nop_mnt_idmap ,
2021-01-21 16:19:32 +03:00
. new_dir = tdir ,
. new_dentry = ndentry ,
} ;
2022-09-09 01:14:19 +03:00
int retries ;
for ( retries = 1 ; ; ) {
host_err = vfs_rename ( & rd ) ;
if ( host_err ! = - EAGAIN | | ! retries - - )
break ;
if ( ! nfsd_wait_for_delegreturn ( rqstp , d_inode ( odentry ) ) )
break ;
}
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
if ( ! host_err ) {
host_err = commit_metadata ( tfhp ) ;
if ( ! host_err )
host_err = commit_metadata ( ffhp ) ;
}
2005-04-17 02:20:36 +04:00
}
out_dput_new :
dput ( ndentry ) ;
out_dput_old :
dput ( odentry ) ;
out_nfserr :
2006-10-20 10:28:58 +04:00
err = nfserrno ( host_err ) ;
2022-07-26 09:45:30 +03:00
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
None of this is really necessary for "typical" filesystems though. It's
mostly of use for NFS, so declare a new export op flag and use that to
determine whether to close the files beforehand.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-01 01:03:16 +03:00
if ( ! close_cached ) {
2021-12-24 22:36:49 +03:00
fh_fill_post_attrs ( ffhp ) ;
fh_fill_post_attrs ( tfhp ) ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
}
2023-07-21 17:29:10 +03:00
out_unlock :
2005-04-17 02:20:36 +04:00
unlock_rename ( tdentry , fdentry ) ;
2024-03-18 19:32:09 +03:00
out_want_write :
2012-06-12 18:20:33 +04:00
fh_drop_write ( ffhp ) ;
2005-04-17 02:20:36 +04:00
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
/*
2023-12-15 04:18:31 +03:00
* If the target dentry has cached open files , then we need to
* try to close them prior to doing the rename . Final fput
* shouldn ' t be done with locks held however , so we delay it
* until this point and then reattempt the whole shebang .
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
*/
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
None of this is really necessary for "typical" filesystems though. It's
mostly of use for NFS, so declare a new export op flag and use that to
determine whether to close the files beforehand.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-01 01:03:16 +03:00
if ( close_cached ) {
close_cached = false ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
nfsd_close_cached_files ( ndentry ) ;
dput ( ndentry ) ;
goto retry ;
}
2005-04-17 02:20:36 +04:00
out :
return err ;
}
/*
* Unlink a file or directory
* N . B . After this call fhp needs an fh_put
*/
2006-10-20 10:28:58 +04:00
__be32
2005-04-17 02:20:36 +04:00
nfsd_unlink ( struct svc_rqst * rqstp , struct svc_fh * fhp , int type ,
char * fname , int flen )
{
struct dentry * dentry , * rdentry ;
struct inode * dirp ;
2021-05-14 06:58:29 +03:00
struct inode * rinode ;
2006-10-20 10:28:58 +04:00
__be32 err ;
int host_err ;
2005-04-17 02:20:36 +04:00
err = nfserr_acces ;
if ( ! flen | | isdotent ( fname , flen ) )
goto out ;
2008-06-16 15:20:29 +04:00
err = fh_verify ( rqstp , fhp , S_IFDIR , NFSD_MAY_REMOVE ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto out ;
2012-06-12 18:20:33 +04:00
host_err = fh_want_write ( fhp ) ;
if ( host_err )
goto out_nfserr ;
2005-04-17 02:20:36 +04:00
dentry = fhp - > fh_dentry ;
2015-03-18 01:25:59 +03:00
dirp = d_inode ( dentry ) ;
2022-07-26 09:45:30 +03:00
inode_lock_nested ( dirp , I_MUTEX_PARENT ) ;
2005-04-17 02:20:36 +04:00
rdentry = lookup_one_len ( fname , dentry , flen ) ;
2006-10-20 10:28:58 +04:00
host_err = PTR_ERR ( rdentry ) ;
2005-04-17 02:20:36 +04:00
if ( IS_ERR ( rdentry ) )
2022-07-26 09:45:30 +03:00
goto out_unlock ;
2005-04-17 02:20:36 +04:00
2015-03-18 01:25:59 +03:00
if ( d_really_is_negative ( rdentry ) ) {
2005-04-17 02:20:36 +04:00
dput ( rdentry ) ;
2019-04-12 23:26:30 +03:00
host_err = - ENOENT ;
2022-07-26 09:45:30 +03:00
goto out_unlock ;
2005-04-17 02:20:36 +04:00
}
2021-05-14 06:58:29 +03:00
rinode = d_inode ( rdentry ) ;
2023-07-21 17:29:10 +03:00
err = fh_fill_pre_attrs ( fhp ) ;
if ( err ! = nfs_ok )
goto out_unlock ;
2005-04-17 02:20:36 +04:00
2023-07-21 17:29:10 +03:00
ihold ( rinode ) ;
2005-04-17 02:20:36 +04:00
if ( ! type )
2015-03-18 01:25:59 +03:00
type = d_inode ( rdentry ) - > i_mode & S_IFMT ;
2005-04-17 02:20:36 +04:00
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
if ( type ! = S_IFDIR ) {
2022-09-09 01:14:25 +03:00
int retries ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
None of this is really necessary for "typical" filesystems though. It's
mostly of use for NFS, so declare a new export op flag and use that to
determine whether to close the files beforehand.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-01 01:03:16 +03:00
if ( rdentry - > d_sb - > s_export_op - > flags & EXPORT_OP_CLOSE_BEFORE_UNLINK )
nfsd_close_cached_files ( rdentry ) ;
2022-09-09 01:14:25 +03:00
for ( retries = 1 ; ; ) {
2023-01-13 14:49:10 +03:00
host_err = vfs_unlink ( & nop_mnt_idmap , dirp , rdentry , NULL ) ;
2022-09-09 01:14:25 +03:00
if ( host_err ! = - EAGAIN | | ! retries - - )
break ;
if ( ! nfsd_wait_for_delegreturn ( rqstp , rinode ) )
break ;
}
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
} else {
2023-01-13 14:49:10 +03:00
host_err = vfs_rmdir ( & nop_mnt_idmap , dirp , rdentry ) ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
}
2022-07-26 09:45:30 +03:00
fh_fill_post_attrs ( fhp ) ;
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-18 21:18:57 +03:00
2022-07-26 09:45:30 +03:00
inode_unlock ( dirp ) ;
2010-02-17 23:05:11 +03:00
if ( ! host_err )
host_err = commit_metadata ( fhp ) ;
2011-01-15 04:00:02 +03:00
dput ( rdentry ) ;
2021-05-14 06:58:29 +03:00
iput ( rinode ) ; /* truncate the inode here */
2011-01-15 04:00:02 +03:00
2019-04-12 23:26:30 +03:00
out_drop_write :
fh_drop_write ( fhp ) ;
2005-04-17 02:20:36 +04:00
out_nfserr :
2019-11-28 05:56:43 +03:00
if ( host_err = = - EBUSY ) {
/* name is mounted-on. There is no perfect
* error status .
*/
if ( nfsd_v4client ( rqstp ) )
err = nfserr_file_open ;
else
err = nfserr_acces ;
} else {
err = nfserrno ( host_err ) ;
}
2006-01-19 04:43:13 +03:00
out :
return err ;
2022-07-26 09:45:30 +03:00
out_unlock :
2022-07-26 09:45:30 +03:00
inode_unlock ( dirp ) ;
2022-07-26 09:45:30 +03:00
goto out_drop_write ;
2005-04-17 02:20:36 +04:00
}
2008-07-31 23:29:12 +04:00
/*
* We do this buffering because we must not call back into the file
* system ' s - > lookup ( ) method from the filldir callback . That may well
* deadlock a number of file systems .
*
* This is based heavily on the implementation of same in XFS .
*/
struct buffered_dirent {
u64 ino ;
loff_t offset ;
int namlen ;
unsigned int d_type ;
char name [ ] ;
} ;
struct readdir_data {
2013-05-15 21:52:59 +04:00
struct dir_context ctx ;
2008-07-31 23:29:12 +04:00
char * dirent ;
size_t used ;
2008-08-24 15:29:52 +04:00
int full ;
2008-07-31 23:29:12 +04:00
} ;
2022-08-16 18:57:56 +03:00
static bool nfsd_buffered_filldir ( struct dir_context * ctx , const char * name ,
2014-10-30 19:37:34 +03:00
int namlen , loff_t offset , u64 ino ,
unsigned int d_type )
2008-07-31 23:29:12 +04:00
{
2014-10-30 19:37:34 +03:00
struct readdir_data * buf =
container_of ( ctx , struct readdir_data , ctx ) ;
2008-07-31 23:29:12 +04:00
struct buffered_dirent * de = ( void * ) ( buf - > dirent + buf - > used ) ;
unsigned int reclen ;
reclen = ALIGN ( sizeof ( struct buffered_dirent ) + namlen , sizeof ( u64 ) ) ;
2008-08-24 15:29:52 +04:00
if ( buf - > used + reclen > PAGE_SIZE ) {
buf - > full = 1 ;
2022-08-16 18:57:56 +03:00
return false ;
2008-08-24 15:29:52 +04:00
}
2008-07-31 23:29:12 +04:00
de - > namlen = namlen ;
de - > offset = offset ;
de - > ino = ino ;
de - > d_type = d_type ;
memcpy ( de - > name , name , namlen ) ;
buf - > used + = reclen ;
2022-08-16 18:57:56 +03:00
return true ;
2008-07-31 23:29:12 +04:00
}
2021-03-05 21:57:40 +03:00
static __be32 nfsd_buffered_readdir ( struct file * file , struct svc_fh * fhp ,
nfsd_filldir_t func , struct readdir_cd * cdp ,
loff_t * offsetp )
2008-07-31 20:16:51 +04:00
{
2008-07-31 23:29:12 +04:00
struct buffered_dirent * de ;
2008-07-31 20:16:51 +04:00
int host_err ;
2008-07-31 23:29:12 +04:00
int size ;
loff_t offset ;
2013-05-23 06:22:04 +04:00
struct readdir_data buf = {
. ctx . actor = nfsd_buffered_filldir ,
. dirent = ( void * ) __get_free_page ( GFP_KERNEL )
} ;
2008-07-31 20:16:51 +04:00
2008-07-31 23:29:12 +04:00
if ( ! buf . dirent )
2009-04-21 02:18:37 +04:00
return nfserrno ( - ENOMEM ) ;
2008-07-31 23:29:12 +04:00
offset = * offsetp ;
2008-07-31 20:16:51 +04:00
2008-07-31 23:29:12 +04:00
while ( 1 ) {
unsigned int reclen ;
Fix nfsd truncation of readdir results
Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
situations" introduced a bug: on a directory in an exported ext3
filesystem with dir_index unset, a READDIR will only return about 250
entries, even if the directory was larger.
Bisected it back to this commit; reverting it fixes the problem.
It turns out that in this case ext3 reads a block at a time, then
returns from readdir, which means we can end up with buf.full==0 but
with more entries in the directory still to be read. Before 8d7c4203
(but after c002a6c797 "Optimise NFS readdir hack slightly"), this would
cause us to return the READDIR result immediately, but with the eof bit
unset. That could cause a performance regression (because the client
would need more roundtrips to the server to read the whole directory),
but no loss in correctness, since the cleared eof bit caused the client
to send another readdir. After 8d7c4203, the setting of the eof bit
made this a correctness problem.
So, move nfserr_eof into the loop and remove the buf.full check so that
we loop until buf.used==0. The following seems to do the right thing
and reduces the network traffic since we don't return a READDIR result
until the buffer is full.
Tested on an empty directory & large directory; eof is properly sent and
there are no more short buffers.
Signed-off-by: Doug Nazar <nazard@dragoninc.ca>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-11-05 14:16:28 +03:00
cdp - > err = nfserr_eof ; /* will be cleared on successful read */
2008-07-31 23:29:12 +04:00
buf . used = 0 ;
2008-08-24 15:29:52 +04:00
buf . full = 0 ;
2008-07-31 23:29:12 +04:00
2013-05-15 21:52:59 +04:00
host_err = iterate_dir ( file , & buf . ctx ) ;
2008-08-24 15:29:52 +04:00
if ( buf . full )
host_err = 0 ;
if ( host_err < 0 )
2008-07-31 23:29:12 +04:00
break ;
size = buf . used ;
if ( ! size )
break ;
de = ( struct buffered_dirent * ) buf . dirent ;
while ( size > 0 ) {
offset = de - > offset ;
if ( func ( cdp , de - > name , de - > namlen , de - > offset ,
de - > ino , de - > d_type ) )
2009-04-21 02:18:37 +04:00
break ;
2008-07-31 23:29:12 +04:00
if ( cdp - > err ! = nfs_ok )
2009-04-21 02:18:37 +04:00
break ;
2008-07-31 23:29:12 +04:00
2021-03-05 21:57:40 +03:00
trace_nfsd_dirent ( fhp , de - > ino , de - > name , de - > namlen ) ;
2008-07-31 23:29:12 +04:00
reclen = ALIGN ( sizeof ( * de ) + de - > namlen ,
sizeof ( u64 ) ) ;
size - = reclen ;
de = ( struct buffered_dirent * ) ( ( char * ) de + reclen ) ;
}
2009-04-21 02:18:37 +04:00
if ( size > 0 ) /* We bailed out early */
break ;
2008-08-17 20:21:18 +04:00
offset = vfs_llseek ( file , 0 , SEEK_CUR ) ;
2008-07-31 23:29:12 +04:00
}
free_page ( ( unsigned long ) ( buf . dirent ) ) ;
2008-07-31 20:16:51 +04:00
if ( host_err )
return nfserrno ( host_err ) ;
2008-07-31 23:29:12 +04:00
* offsetp = offset ;
return cdp - > err ;
2008-07-31 20:16:51 +04:00
}
2023-11-20 04:17:11 +03:00
/**
* nfsd_readdir - Read entries from a directory
* @ rqstp : RPC transaction context
* @ fhp : NFS file handle of directory to be read
* @ offsetp : OUT : seek offset of final entry that was read
* @ cdp : OUT : an eof error value
* @ func : entry filler actor
*
* This implementation ignores the NFSv3 / 4 verifier cookie .
*
* NB : normal system calls hold file - > f_pos_lock when calling
* - > iterate_shared and - > llseek , but nfsd_readdir ( ) does not .
* Because the struct file acquired here is not visible to other
* threads , it ' s internal state does not need mutex protection .
*
* Returns nfs_ok on success , otherwise an nfsstat code is
* returned .
2005-04-17 02:20:36 +04:00
*/
2006-10-20 10:28:58 +04:00
__be32
2005-04-17 02:20:36 +04:00
nfsd_readdir ( struct svc_rqst * rqstp , struct svc_fh * fhp , loff_t * offsetp ,
2014-10-30 19:37:34 +03:00
struct readdir_cd * cdp , nfsd_filldir_t func )
2005-04-17 02:20:36 +04:00
{
2006-10-20 10:28:58 +04:00
__be32 err ;
2005-04-17 02:20:36 +04:00
struct file * file ;
loff_t offset = * offsetp ;
2012-03-19 06:44:50 +04:00
int may_flags = NFSD_MAY_READ ;
/* NFSv2 only supports 32 bit cookies */
if ( rqstp - > rq_vers > 2 )
may_flags | = NFSD_MAY_64BIT_COOKIE ;
2005-04-17 02:20:36 +04:00
2012-03-19 06:44:50 +04:00
err = nfsd_open ( rqstp , fhp , S_IFDIR , may_flags , & file ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto out ;
2012-04-25 23:30:00 +04:00
offset = vfs_llseek ( file , offset , SEEK_SET ) ;
2005-04-17 02:20:36 +04:00
if ( offset < 0 ) {
err = nfserrno ( ( int ) offset ) ;
goto out_close ;
}
2021-03-05 21:57:40 +03:00
err = nfsd_buffered_readdir ( file , fhp , func , cdp , offsetp ) ;
2005-04-17 02:20:36 +04:00
if ( err = = nfserr_eof | | err = = nfserr_toosmall )
err = nfs_ok ; /* can still be found in ->err */
out_close :
2023-12-15 04:18:31 +03:00
nfsd_filp_close ( file ) ;
2005-04-17 02:20:36 +04:00
out :
return err ;
}
2023-12-15 04:18:31 +03:00
/**
* nfsd_filp_close : close a file synchronously
* @ fp : the file to close
*
* nfsd_filp_close ( ) is similar in behaviour to filp_close ( ) .
* The difference is that if this is the final close on the
* file , the that finalisation happens immediately , rather then
* being handed over to a work_queue , as it the case for
* filp_close ( ) .
* When a user - space process closes a file ( even when using
* filp_close ( ) the finalisation happens before returning to
* userspace , so it is effectively synchronous . When a kernel thread
* uses file_close ( ) , on the other hand , the handling is completely
* asynchronous . This means that any cost imposed by that finalisation
* is not imposed on the nfsd thread , and nfsd could potentually
* close files more quickly than the work queue finalises the close ,
* which would lead to unbounded growth in the queue .
*
* In some contexts is it not safe to synchronously wait for
* close finalisation ( see comment for __fput_sync ( ) ) , but nfsd
* does not match those contexts . In partcilarly it does not , at the
* time that this function is called , hold and locks and no finalisation
* of any file , socket , or device driver would have any cause to wait
* for nfsd to make progress .
*/
void nfsd_filp_close ( struct file * fp )
{
get_file ( fp ) ;
filp_close ( fp , NULL ) ;
__fput_sync ( fp ) ;
}
2005-04-17 02:20:36 +04:00
/*
* Get file system stats
* N . B . After this call fhp needs an fh_put
*/
2006-10-20 10:28:58 +04:00
__be32
2008-08-07 21:00:20 +04:00
nfsd_statfs ( struct svc_rqst * rqstp , struct svc_fh * fhp , struct kstatfs * stat , int access )
2005-04-17 02:20:36 +04:00
{
2010-07-07 20:53:11 +04:00
__be32 err ;
err = fh_verify ( rqstp , fhp , 0 , NFSD_MAY_NOP | access ) ;
2010-08-13 17:53:49 +04:00
if ( ! err ) {
struct path path = {
. mnt = fhp - > fh_export - > ex_path . mnt ,
. dentry = fhp - > fh_dentry ,
} ;
if ( vfs_statfs ( & path , stat ) )
err = nfserr_io ;
}
2005-04-17 02:20:36 +04:00
return err ;
}
2007-07-19 12:49:20 +04:00
static int exp_rdonly ( struct svc_rqst * rqstp , struct svc_export * exp )
2007-07-19 12:49:20 +04:00
{
2007-07-19 12:49:20 +04:00
return nfsexp_flags ( rqstp , exp ) & NFSEXP_READONLY ;
2007-07-19 12:49:20 +04:00
}
2020-06-24 01:39:23 +03:00
# ifdef CONFIG_NFSD_V4
/*
* Helper function to translate error numbers . In the case of xattr operations ,
* some error codes need to be translated outside of the standard translations .
*
* ENODATA needs to be translated to nfserr_noxattr .
* E2BIG to nfserr_xattr2big .
*
* Additionally , vfs_listxattr can return - ERANGE . This means that the
* file has too many extended attributes to retrieve inside an
* XATTR_LIST_MAX sized buffer . This is a bug in the xattr implementation :
* filesystems will allow the adding of extended attributes until they hit
* their own internal limit . This limit may be larger than XATTR_LIST_MAX .
* So , at that point , the attributes are present and valid , but can ' t
* be retrieved using listxattr , since the upper level xattr code enforces
* the XATTR_LIST_MAX limit .
*
* This bug means that we need to deal with listxattr returning - ERANGE . The
* best mapping is to return TOOSMALL .
*/
static __be32
nfsd_xattr_errno ( int err )
{
switch ( err ) {
case - ENODATA :
return nfserr_noxattr ;
case - E2BIG :
return nfserr_xattr2big ;
case - ERANGE :
return nfserr_toosmall ;
}
return nfserrno ( err ) ;
}
/*
* Retrieve the specified user extended attribute . To avoid always
* having to allocate the maximum size ( since we are not getting
* a maximum size from the RPC ) , do a probe + alloc . Hold a reader
* lock on i_rwsem to prevent the extended attribute from changing
* size while we ' re doing this .
*/
__be32
nfsd_getxattr ( struct svc_rqst * rqstp , struct svc_fh * fhp , char * name ,
void * * bufp , int * lenp )
{
ssize_t len ;
__be32 err ;
char * buf ;
struct inode * inode ;
struct dentry * dentry ;
err = fh_verify ( rqstp , fhp , 0 , NFSD_MAY_READ ) ;
if ( err )
return err ;
err = nfs_ok ;
dentry = fhp - > fh_dentry ;
inode = d_inode ( dentry ) ;
inode_lock_shared ( inode ) ;
2023-01-13 14:49:22 +03:00
len = vfs_getxattr ( & nop_mnt_idmap , dentry , name , NULL , 0 ) ;
2020-06-24 01:39:23 +03:00
/*
* Zero - length attribute , just return .
*/
if ( len = = 0 ) {
* bufp = NULL ;
* lenp = 0 ;
goto out ;
}
if ( len < 0 ) {
err = nfsd_xattr_errno ( len ) ;
goto out ;
}
if ( len > * lenp ) {
err = nfserr_toosmall ;
goto out ;
}
NFSD: Clean up xattr memory allocation flags
Tetsuo Handa points out:
> Since GFP_KERNEL is "GFP_NOFS | __GFP_FS", usage like
> "GFP_KERNEL | GFP_NOFS" does not make sense.
The original intent was to hold the inode lock while estimating
the buffer requirements for the requested information. Frank van
der Linden, the author of NFSD's xattr code, says:
> ... you need inode_lock to get an atomic view of an xattr. Since
> both nfsd_getxattr and nfsd_listxattr to the standard trick of
> querying the xattr length with a NULL buf argument (just getting
> the length back), allocating the right buffer size, and then
> querying again, they need to hold the inode lock to avoid having
> the xattr changed from under them while doing that.
>
> From that then flows the requirement that GFP_FS could cause
> problems while holding i_rwsem, so I added GFP_NOFS.
However, Dave Chinner states:
> You can do GFP_KERNEL allocations holding the i_rwsem just fine.
> All that it requires is the caller holds a reference to the
> inode ...
Since these code paths acquire a dentry, they do indeed hold a
reference. It is therefore safe to use GFP_KERNEL for these memory
allocations. In particular, that's what this code is already doing;
but now the C source code looks sane too.
At a later time we can revisit in order to remove the inode lock in
favor of simply retrying if the estimated buffer size is too small.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-20 18:02:33 +03:00
buf = kvmalloc ( len , GFP_KERNEL ) ;
2020-06-24 01:39:23 +03:00
if ( buf = = NULL ) {
err = nfserr_jukebox ;
goto out ;
}
2023-01-13 14:49:22 +03:00
len = vfs_getxattr ( & nop_mnt_idmap , dentry , name , buf , len ) ;
2020-06-24 01:39:23 +03:00
if ( len < = 0 ) {
kvfree ( buf ) ;
buf = NULL ;
err = nfsd_xattr_errno ( len ) ;
}
* lenp = len ;
* bufp = buf ;
out :
inode_unlock_shared ( inode ) ;
return err ;
}
/*
* Retrieve the xattr names . Since we can ' t know how many are
* user extended attributes , we must get all attributes here ,
* and have the XDR encode filter out the " user. " ones .
*
* While this could always just allocate an XATTR_LIST_MAX
* buffer , that ' s a waste , so do a probe + allocate . To
* avoid any changes between the probe and allocate , wrap
* this in inode_lock .
*/
__be32
nfsd_listxattr ( struct svc_rqst * rqstp , struct svc_fh * fhp , char * * bufp ,
int * lenp )
{
ssize_t len ;
__be32 err ;
char * buf ;
struct inode * inode ;
struct dentry * dentry ;
err = fh_verify ( rqstp , fhp , 0 , NFSD_MAY_READ ) ;
if ( err )
return err ;
dentry = fhp - > fh_dentry ;
inode = d_inode ( dentry ) ;
* lenp = 0 ;
inode_lock_shared ( inode ) ;
len = vfs_listxattr ( dentry , NULL , 0 ) ;
if ( len < = 0 ) {
err = nfsd_xattr_errno ( len ) ;
goto out ;
}
if ( len > XATTR_LIST_MAX ) {
err = nfserr_xattr2big ;
goto out ;
}
NFSD: Clean up xattr memory allocation flags
Tetsuo Handa points out:
> Since GFP_KERNEL is "GFP_NOFS | __GFP_FS", usage like
> "GFP_KERNEL | GFP_NOFS" does not make sense.
The original intent was to hold the inode lock while estimating
the buffer requirements for the requested information. Frank van
der Linden, the author of NFSD's xattr code, says:
> ... you need inode_lock to get an atomic view of an xattr. Since
> both nfsd_getxattr and nfsd_listxattr to the standard trick of
> querying the xattr length with a NULL buf argument (just getting
> the length back), allocating the right buffer size, and then
> querying again, they need to hold the inode lock to avoid having
> the xattr changed from under them while doing that.
>
> From that then flows the requirement that GFP_FS could cause
> problems while holding i_rwsem, so I added GFP_NOFS.
However, Dave Chinner states:
> You can do GFP_KERNEL allocations holding the i_rwsem just fine.
> All that it requires is the caller holds a reference to the
> inode ...
Since these code paths acquire a dentry, they do indeed hold a
reference. It is therefore safe to use GFP_KERNEL for these memory
allocations. In particular, that's what this code is already doing;
but now the C source code looks sane too.
At a later time we can revisit in order to remove the inode lock in
favor of simply retrying if the estimated buffer size is too small.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-20 18:02:33 +03:00
buf = kvmalloc ( len , GFP_KERNEL ) ;
2020-06-24 01:39:23 +03:00
if ( buf = = NULL ) {
err = nfserr_jukebox ;
goto out ;
}
len = vfs_listxattr ( dentry , buf , len ) ;
if ( len < = 0 ) {
kvfree ( buf ) ;
err = nfsd_xattr_errno ( len ) ;
goto out ;
}
* lenp = len ;
* bufp = buf ;
err = nfs_ok ;
out :
inode_unlock_shared ( inode ) ;
return err ;
}
2022-07-26 09:45:30 +03:00
/**
* nfsd_removexattr - Remove an extended attribute
* @ rqstp : RPC transaction being executed
* @ fhp : NFS filehandle of object with xattr to remove
* @ name : name of xattr to remove ( NUL - terminate )
*
* Pass in a NULL pointer for delegated_inode , and let the client deal
* with NFS4ERR_DELAY ( same as with e . g . setattr and remove ) .
*
* Returns nfs_ok on success , or an nfsstat in network byte order .
2020-06-24 01:39:23 +03:00
*/
__be32
nfsd_removexattr ( struct svc_rqst * rqstp , struct svc_fh * fhp , char * name )
{
2020-09-11 21:47:48 +03:00
__be32 err ;
int ret ;
2020-06-24 01:39:23 +03:00
err = fh_verify ( rqstp , fhp , 0 , NFSD_MAY_WRITE ) ;
if ( err )
return err ;
ret = fh_want_write ( fhp ) ;
if ( ret )
return nfserrno ( ret ) ;
2022-07-26 09:45:30 +03:00
inode_lock ( fhp - > fh_dentry - > d_inode ) ;
2023-07-21 17:29:10 +03:00
err = fh_fill_pre_attrs ( fhp ) ;
if ( err ! = nfs_ok )
goto out_unlock ;
2023-01-13 14:49:22 +03:00
ret = __vfs_removexattr_locked ( & nop_mnt_idmap , fhp - > fh_dentry ,
2021-01-21 16:19:28 +03:00
name , NULL ) ;
2023-07-21 17:29:10 +03:00
err = nfsd_xattr_errno ( ret ) ;
2022-07-26 09:45:30 +03:00
fh_fill_post_attrs ( fhp ) ;
2023-07-21 17:29:10 +03:00
out_unlock :
2022-07-26 09:45:30 +03:00
inode_unlock ( fhp - > fh_dentry - > d_inode ) ;
2020-06-24 01:39:23 +03:00
fh_drop_write ( fhp ) ;
2023-07-21 17:29:10 +03:00
return err ;
2020-06-24 01:39:23 +03:00
}
__be32
nfsd_setxattr ( struct svc_rqst * rqstp , struct svc_fh * fhp , char * name ,
void * buf , u32 len , u32 flags )
{
2020-09-11 21:47:48 +03:00
__be32 err ;
int ret ;
2020-06-24 01:39:23 +03:00
err = fh_verify ( rqstp , fhp , 0 , NFSD_MAY_WRITE ) ;
if ( err )
return err ;
ret = fh_want_write ( fhp ) ;
if ( ret )
return nfserrno ( ret ) ;
2022-07-26 09:45:30 +03:00
inode_lock ( fhp - > fh_dentry - > d_inode ) ;
2023-07-21 17:29:10 +03:00
err = fh_fill_pre_attrs ( fhp ) ;
if ( err ! = nfs_ok )
goto out_unlock ;
ret = __vfs_setxattr_locked ( & nop_mnt_idmap , fhp - > fh_dentry ,
name , buf , len , flags , NULL ) ;
2022-07-26 09:45:30 +03:00
fh_fill_post_attrs ( fhp ) ;
2023-07-21 17:29:10 +03:00
err = nfsd_xattr_errno ( ret ) ;
out_unlock :
2022-07-26 09:45:30 +03:00
inode_unlock ( fhp - > fh_dentry - > d_inode ) ;
2020-06-24 01:39:23 +03:00
fh_drop_write ( fhp ) ;
2023-07-21 17:29:10 +03:00
return err ;
2020-06-24 01:39:23 +03:00
}
# endif
2005-04-17 02:20:36 +04:00
/*
* Check for a user ' s access permissions to this inode .
*/
2006-10-20 10:28:58 +04:00
__be32
2007-07-17 15:04:48 +04:00
nfsd_permission ( struct svc_rqst * rqstp , struct svc_export * exp ,
struct dentry * dentry , int acc )
2005-04-17 02:20:36 +04:00
{
2015-03-18 01:25:59 +03:00
struct inode * inode = d_inode ( dentry ) ;
2005-04-17 02:20:36 +04:00
int err ;
2011-04-10 18:35:12 +04:00
if ( ( acc & NFSD_MAY_MASK ) = = NFSD_MAY_NOP )
2005-04-17 02:20:36 +04:00
return 0 ;
#if 0
dprintk ( " nfsd: permission 0x%x%s%s%s%s%s%s%s mode 0%o%s%s%s \n " ,
acc ,
2008-06-16 15:20:29 +04:00
( acc & NFSD_MAY_READ ) ? " read " : " " ,
( acc & NFSD_MAY_WRITE ) ? " write " : " " ,
( acc & NFSD_MAY_EXEC ) ? " exec " : " " ,
( acc & NFSD_MAY_SATTR ) ? " sattr " : " " ,
( acc & NFSD_MAY_TRUNC ) ? " trunc " : " " ,
( acc & NFSD_MAY_LOCK ) ? " lock " : " " ,
( acc & NFSD_MAY_OWNER_OVERRIDE ) ? " owneroverride " : " " ,
2005-04-17 02:20:36 +04:00
inode - > i_mode ,
IS_IMMUTABLE ( inode ) ? " immut " : " " ,
IS_APPEND ( inode ) ? " append " : " " ,
2008-02-16 01:37:56 +03:00
__mnt_is_readonly ( exp - > ex_path . mnt ) ? " ro " : " " ) ;
2005-04-17 02:20:36 +04:00
dprintk ( " owner %d/%d user %d/%d \n " ,
2008-11-14 02:38:58 +03:00
inode - > i_uid , inode - > i_gid , current_fsuid ( ) , current_fsgid ( ) ) ;
2005-04-17 02:20:36 +04:00
# endif
/* Normally we reject any write/sattr etc access on a read-only file
* system . But if it is IRIX doing check on write - access for a
* device special file , we ignore rofs .
*/
2008-06-16 15:20:29 +04:00
if ( ! ( acc & NFSD_MAY_LOCAL_ACCESS ) )
if ( acc & ( NFSD_MAY_WRITE | NFSD_MAY_SATTR | NFSD_MAY_TRUNC ) ) {
2008-02-16 01:37:56 +03:00
if ( exp_rdonly ( rqstp , exp ) | |
__mnt_is_readonly ( exp - > ex_path . mnt ) )
2005-04-17 02:20:36 +04:00
return nfserr_rofs ;
2008-06-16 15:20:29 +04:00
if ( /* (acc & NFSD_MAY_WRITE) && */ IS_IMMUTABLE ( inode ) )
2005-04-17 02:20:36 +04:00
return nfserr_perm ;
}
2008-06-16 15:20:29 +04:00
if ( ( acc & NFSD_MAY_TRUNC ) & & IS_APPEND ( inode ) )
2005-04-17 02:20:36 +04:00
return nfserr_perm ;
2008-06-16 15:20:29 +04:00
if ( acc & NFSD_MAY_LOCK ) {
2005-04-17 02:20:36 +04:00
/* If we cannot rely on authentication in NLM requests,
* just allow locks , otherwise require read permission , or
* ownership
*/
if ( exp - > ex_flags & NFSEXP_NOAUTHNLM )
return 0 ;
else
2008-06-16 15:20:29 +04:00
acc = NFSD_MAY_READ | NFSD_MAY_OWNER_OVERRIDE ;
2005-04-17 02:20:36 +04:00
}
/*
* The file owner always gets access permission for accesses that
* would normally be checked at open time . This is to make
* file access work even when the client has done a fchmod ( fd , 0 ) .
*
* However , ` cp foo bar ' should fail nevertheless when bar is
* readonly . A sensible way to do this might be to reject all
* attempts to truncate a read - only file , because a creat ( ) call
* always implies file truncation .
* . . . but this isn ' t really fair . A process may reasonably call
* ftruncate on an open file descriptor on a file with perm 000.
* We must trust the client to do permission checking - using " ACCESS "
* with NFSv3 .
*/
2008-06-16 15:20:29 +04:00
if ( ( acc & NFSD_MAY_OWNER_OVERRIDE ) & &
2013-02-02 18:53:11 +04:00
uid_eq ( inode - > i_uid , current_fsuid ( ) ) )
2005-04-17 02:20:36 +04:00
return 0 ;
2008-06-16 15:20:29 +04:00
/* This assumes NFSD_MAY_{READ,WRITE,EXEC} == MAY_{READ,WRITE,EXEC} */
2023-01-13 14:49:22 +03:00
err = inode_permission ( & nop_mnt_idmap , inode ,
2021-01-21 16:19:24 +03:00
acc & ( MAY_READ | MAY_WRITE | MAY_EXEC ) ) ;
2005-04-17 02:20:36 +04:00
/* Allow read access to binaries even when mode 111 */
if ( err = = - EACCES & & S_ISREG ( inode - > i_mode ) & &
2011-08-25 18:48:39 +04:00
( acc = = ( NFSD_MAY_READ | NFSD_MAY_OWNER_OVERRIDE ) | |
acc = = ( NFSD_MAY_READ | NFSD_MAY_READ_IF_EXEC ) ) )
2023-01-13 14:49:22 +03:00
err = inode_permission ( & nop_mnt_idmap , inode , MAY_EXEC ) ;
2005-04-17 02:20:36 +04:00
return err ? nfserrno ( err ) : 0 ;
}