52813 Commits

Author SHA1 Message Date
Hans van Kranenburg
92e222df7b btrfs: alloc_chunk: fix DUP stripe size handling
In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.

The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.

Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.

The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.

Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.

Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.

Other attempts in the past were made to fix this:
* 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
* 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.

The real problem was already introduced with the rest of the code in
73c5de0051.

The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.

Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:16:47 +01:00
Nikolay Borisov
765f3cebff btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
Essentially duplicate the error handling from the above block which
handles the !PageUptodate(page) case and additionally clear
EXTENT_BOUNDARY.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:16:12 +01:00
Nikolay Borisov
ac01f26a27 btrfs: handle failure of add_pending_csums
add_pending_csums was added as part of the new data=ordered
implementation in e6dcd2dc9c48 ("Btrfs: New data=ordered
implementation"). Even back then it called the btrfs_csum_file_blocks
which can fail but it never bothered handling the failure. In ENOMEM
situation this could lead to the filesystem failing to write the
checksums for a particular extent and not detect this. On read this
could lead to the filesystem erroring out due to crc mismatch. Fix it by
propagating failure from add_pending_csums and handling them.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:16:00 +01:00
Jeff Mahoney
a8fd1f7174 btrfs: use kvzalloc to allocate btrfs_fs_info
The srcu_struct in btrfs_fs_info scales in size with NR_CPUS.  On
kernels built with NR_CPUS=8192, this can result in kmalloc failures
that prevent mounting.

There is work in progress to try to resolve this for every user of
srcu_struct but using kvzalloc will work around the failures until
that is complete.

As an example with NR_CPUS=512 on x86_64: the overall size of
subvol_srcu is 3460 bytes, fs_info is 6496.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-01 16:15:36 +01:00
Linus Torvalds
c02be2334e Changes since last update:
- Fix some compiler warnings
 - Fix block rservations for transactions created during log recovery
 - Fix resource leaks when respecifying mount options
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJalEvWAAoJEPh/dxk0SrTrdDcQAKgYcpg8Ip6uKNqc38hm79l+
 RXjHQFbpgKiojBzjgI8+lPNsVBEhSSh65rXB6nEVSS/TFS0ONScRbNKrcH9Selkn
 cH1RsuhKk1NvKWaLMMFJWMTZK5Z6cHLtJU2szqnPdCsv/2EqdS6NylyYSFhljtSl
 xbD8vffjnJ1HU9ijZsSoZkiu0DO1yoXYu7EUQkWKPRSb/el+qYIMKSwFC8qQdxrp
 KGPyYH1CENm2jarKXAgTgqmUmrdJ9ikLHLT0sXQPZ9AbOjOoGlZ9Bn0KmvggQFmq
 TyE0je7L2EWrDRJ6R+lcFKC0fHDgo7ec8Iz/CJlOiExSebbNZgtsn0bbPMfq2Rnz
 8IMYPAV+NBY8RQWumgBN2aOjyjV9EUd+TkeJh5aIubyFOE2GtEmHjlR4p0bG9os9
 yOZJv+5JDF09oN1dLDf/xpEwXsHho6KHDYtVqbKhBWfQiw84sAlW9/NwfQOEugJS
 6RXN3LaExSvpFSc9qcGqsrdGvEuMcNLo+XtTwz9g8DNR0Ztp1bFE64xh4KYlNsx5
 QQj256Hx56R7vZ2/DC73MiT/hOSgfPpqnwOZP+Yc7I3DeO65DLwdCt9m2c0f1Odn
 6xi3ZPXuq+QutJEp3iAfj5XXVmwSVgcub8EmtVmyK2fttiK3+SXbZdvo6JqyuEMX
 ZxwBaZkNuYWikHhj5iAp
 =A2Bo
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

 - fix some compiler warnings

 - fix block reservations for transactions created during log recovery

 - fix resource leaks when respecifying mount options

* tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix potential memory leak in mount option parsing
  xfs: reserve blocks for refcount / rmap log item recovery
  xfs: use memset to initialize xfs_scrub_agfl_info
2018-02-28 11:40:51 -08:00
Chengguang Xu
5b4c845ea4 xfs: fix potential memory leak in mount option parsing
When specifying string type mount option (e.g., logdev)
several times in a mount, current option parsing may
cause memory leak. Hence, call kfree for previous one
in this case.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-02-26 10:02:13 -08:00
Jan Kara
560e7cb2f3 blockdev: Avoid two active bdev inodes for one device
When blkdev_open() races with device removal and creation it can happen
that unhashed bdev inode gets associated with newly created gendisk
like:

CPU0					CPU1
blkdev_open()
  bdev = bd_acquire()
					del_gendisk()
					  bdev_unhash_inode(bdev);
					remove device
					create new device with the same number
  __blkdev_get()
    disk = get_gendisk()
      - gets reference to gendisk of the new device

Now another blkdev_open() will not find original 'bdev' as it got
unhashed, create a new one and associate it with the same 'disk' at
which point problems start as we have two independent page caches for
one device.

Fix the problem by verifying that the bdev inode didn't get unhashed
before we acquired gendisk reference. That way we make sure gendisk can
get associated only with visible bdev inodes.

Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-26 09:48:42 -07:00
Jan Kara
897366537f genhd: Fix use after free in __blkdev_get()
When two blkdev_open() calls race with device removal and recreation,
__blkdev_get() can use looked up gendisk after it is freed:

CPU0				CPU1			CPU2
							del_gendisk(disk);
							  bdev_unhash_inode(inode);
blkdev_open()			blkdev_open()
  bdev = bd_acquire(inode);
    - creates and returns new inode
				  bdev = bd_acquire(inode);
				    - returns the same inode
  __blkdev_get(devt)		  __blkdev_get(devt)
    disk = get_gendisk(devt);
      - got structure of device going away
							<finish device removal>
							<new device gets
							 created under the same
							 device number>
				  disk = get_gendisk(devt);
				    - got new device structure
				  if (!bdev->bd_openers) {
				    does the first open
				  }
    if (!bdev->bd_openers)
      - false
    } else {
      put_disk_and_module(disk)
        - remember this was old device - this was last ref and disk is
          now freed
    }
    disk_unblock_events(disk); -> oops

Fix the problem by making sure we drop reference to disk in
__blkdev_get() only after we are really done with it.

Reported-by: Hou Tao <houtao1@huawei.com>
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-26 09:48:42 -07:00
Jan Kara
9df6c29912 genhd: Add helper put_disk_and_module()
Add a proper counterpart to get_disk_and_module() -
put_disk_and_module(). Currently it is opencoded in several places.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-26 09:48:42 -07:00
Jan Kara
d9c10e5b88 direct-io: Fix sleep in atomic due to sync AIO
Commit e864f39569f4 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
way for direct IO to become synchronous and thus trigger fsync from the
IO completion handler. Then commit 9830f4be159b "fs: Use RWF_* flags for
AIO operations" allowed these flags to be set for AIO as well. However
that commit forgot to update the condition checking whether the IO
completion handling should be defered to a workqueue and thus AIO DIO
with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
sleep in atomic.

Fix the problem by checking directly iocb flags (the same way as it is
done in dio_complete()) instead of checking all conditions that could
lead to IO being synchronous.

CC: Christoph Hellwig <hch@lst.de>
CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
CC: stable@vger.kernel.org
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9830f4be159b29399d107bffb99e0132bc5aedd4
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-26 09:05:35 -07:00
Vivek Goyal
d1fe96c0e4 ovl: redirect_dir=nofollow should not follow redirect for opaque lower
redirect_dir=nofollow should not follow a redirect. But in a specific
configuration it can still follow it.  For example try this.

$ mkdir -p lower0 lower1/foo upper work merged
$ touch lower1/foo/lower-file.txt
$ setfattr -n "trusted.overlay.opaque" -v "y" lower1/foo
$ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=on none merged
$ cd merged
$ mv foo foo-renamed
$ umount merged

# mount again. This time with redirect_dir=nofollow
$ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=nofollow none merged
$ ls merged/foo-renamed/
# This lists lower-file.txt, while it should not have.

Basically, we are doing redirect check after we check for d.stop. And
if this is not last lower, and we find an opaque lower, d.stop will be
set.

ovl_lookup_single()
        if (!d->last && ovl_is_opaquedir(this)) {
                d->stop = d->opaque = true;
                goto out;
        }

To fix this, first check redirect is allowed. And after that check if
d.stop has been set or not.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Fixes: 438c84c2f0c7 ("ovl: don't follow redirects if redirect_dir=off")
Cc: <stable@vger.kernel.org> #v4.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-26 16:55:51 +01:00
Chengguang Xu
18106734b5 ceph: fix dentry leak when failing to init debugfs
When failing from ceph_fs_debugfs_init() in ceph_real_mount(),
there is lack of dput of root_dentry and it causes slab errors,
so change the calling order of ceph_fs_debugfs_init() and
open_root_dentry() and do some cleanups to avoid this issue.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-02-26 16:20:07 +01:00
Chengguang Xu
937441f3a3 libceph, ceph: avoid memory leak when specifying same option several times
When parsing string option, in order to avoid memory leak we need to
carefully free it first in case of specifying same option several times.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-02-26 16:19:30 +01:00
Zhi Zhang
6ef0bc6dde ceph: flush dirty caps of unlinked inode ASAP
Client should release unlinked inode from its cache ASAP. But client
can't release inode with dirty caps.

Link: http://tracker.ceph.com/issues/22886
Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-02-26 16:19:16 +01:00
Fengguang Wu
b5095f24e7 ovl: fix ptr_ret.cocci warnings
fs/overlayfs/export.c:459:10-16: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 4b91c30a5a19 ("ovl: lookup connected ancestor of dir in inode cache")
CC: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-26 12:45:20 +01:00
Linus Torvalds
c89be52426 NFS client bugfixes for Linux 4.16
Hightlights include:
 - Fix a broken cast in nfs4_callback_recallany()
 - Fix an Oops during NFSv4 migration events
 - make struct nlmclnt_fl_close_lock_ops static
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJakuNIAAoJEGcL54qWCgDykYYQAITHRrWP7tQ6aSpZxW5+Un5z
 6K3RRbfxFjHWVyaePCBzRMOtPTA/puqO0ggx9H+2D4+u2GeXhFl7FMdIuLueGKrc
 rh0wzB6+KiHvqK8NT3g4c2VzZbGJ8IWB6jlNaA3ZyHRJcO+Oi3rQhYBNZpVqP6ny
 M1C3yXQTUtA13aOLjeThoAKIJyknwdZcsiMTptJslvSsQ9PL0w6m6jZKrVHu6Rc+
 Hg12FFptaKien/gj2IUYJb6Z2Mz3arJu1Y7cm1P/zH/NBs37ynMUsrb9AvPbzvRm
 PvPRT4ugNOlTgDTaIT2JHwP2bhlp2JF+Tdzq7WYE3ek2CEPUD49jv07MlgTHxy/w
 +tcp/322ZCxvKLjHTeEWqGn4T0TZ1TdPrd4dIJsjox9Ffy72Z3rOjvLKt5UrRV0i
 8IhiE3/ruHFpB75Yfi7ABEIH8aEwwmchQTf5bth0ZKoZdaEPHmy3xnJkxa+wlD1V
 Hp6KoqMNlDXEFx2Ih/SD6j50MFKszq6+cjUk2D8iclLnelXhu9iddFQ+PFNfsxVZ
 WSo4AWZoPbtbLnn9Ez9dkdsJILKv86LbEvYxLX6/LnxLzzX70E34tbRQa+drVeR3
 Na6czRpld85juKWgiFkzNx+zD4TaBAxUMbQH2Gbwngiz5RoT6SnkkdkAK3EW7nhR
 pIJgZGiq/NG3NLiCxIgs
 =l2jv
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - fix a broken cast in nfs4_callback_recallany()

 - fix an Oops during NFSv4 migration events

 - make struct nlmclnt_fl_close_lock_ops static

* tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: make struct nlmclnt_fl_close_lock_ops static
  nfs: system crashes after NFS4ERR_MOVED recovery
  NFSv4: Fix broken cast in nfs4_callback_recallany()
2018-02-25 13:43:18 -08:00
Will Deacon
8cc07c808c fs: dcache: Use READ_ONCE when accessing i_dir_seq
i_dir_seq is subject to concurrent modification by a cmpxchg or
store-release operation, so ensure that the relaxed access in
d_alloc_parallel uses READ_ONCE.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-25 12:51:10 -05:00
Will Deacon
015555fd4d fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
If d_alloc_parallel runs concurrently with __d_add, it is possible for
d_alloc_parallel to continuously retry whilst i_dir_seq has been
incremented to an odd value by __d_add:

CPU0:
__d_add
	n = start_dir_add(dir);
		cmpxchg(&dir->i_dir_seq, n, n + 1) == n

CPU1:
d_alloc_parallel
retry:
	seq = smp_load_acquire(&parent->d_inode->i_dir_seq) & ~1;
	hlist_bl_lock(b);
		bit_spin_lock(0, (unsigned long *)b); // Always succeeds

CPU0:
	__d_lookup_done(dentry)
		hlist_bl_lock
			bit_spin_lock(0, (unsigned long *)b); // Never succeeds

CPU1:
	if (unlikely(parent->d_inode->i_dir_seq != seq)) {
		hlist_bl_unlock(b);
		goto retry;
	}

Since the simple bit_spin_lock used to implement hlist_bl_lock does not
provide any fairness guarantees, then CPU1 can starve CPU0 of the lock
and prevent it from reaching end_dir_add(dir), therefore CPU1 cannot
exit its retry loop because the sequence number always has the bottom
bit set.

This patch resolves the livelock by not taking hlist_bl_lock in
d_alloc_parallel if the sequence counter is odd, since any subsequent
masked comparison with i_dir_seq will fail anyway.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Naresh Madhusudana <naresh.madhusudana@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-25 12:51:09 -05:00
Al Viro
3b82140963 lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
In case when dentry passed to lock_parent() is protected from freeing only
by the fact that it's on a shrink list and trylock of parent fails, we
could get hit by __dentry_kill() (and subsequent dentry_kill(parent))
between unlocking dentry and locking presumed parent.  We need to recheck
that dentry is alive once we lock both it and parent *and* postpone
rcu_read_unlock() until after that point.  Otherwise we could return
a pointer to struct dentry that already is rcu-scheduled for freeing, with
->d_lock held on it; caller's subsequent attempt to unlock it can end
up with memory corruption.

Cc: stable@vger.kernel.org # 3.12+, counting backports
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-23 20:47:17 -05:00
Linus Torvalds
bae6cfe8a3 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo fix from Eric Biederman:
 "This fixes a build error that only shows up on blackfin"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  fs/signalfd: fix build error for BUS_MCEERR_AR
2018-02-22 17:04:06 -08:00
Darrick J. Wong
b31c2bdcd8 xfs: reserve blocks for refcount / rmap log item recovery
During log recovery, the per-AG reservations aren't yet set up, so log
recovery has to reserve enough blocks to handle all possible btree
splits.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-02-22 14:41:25 -08:00
Eric Sandeen
86516eff3b xfs: use memset to initialize xfs_scrub_agfl_info
Apparently different gcc versions have competing and
incompatible notions of how to initialize at declaration,
so just give up and fall back to the time-tested memset().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-02-22 14:41:25 -08:00
Randy Dunlap
9026e820cb fs/signalfd: fix build error for BUS_MCEERR_AR
Fix build error in fs/signalfd.c by using same method that is used in
kernel/signal.c: separate blocks for different signal si_code values.

./fs/signalfd.c: error: 'BUS_MCEERR_AR' undeclared (first use in this function)

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-02-22 15:00:07 -06:00
Luck, Tony
bef3efbeb8 efivarfs: Limit the rate for non-root to read files
Each read from a file in efivarfs results in two calls to EFI
(one to get the file size, another to get the actual data).

On X86 these EFI calls result in broadcast system management
interrupts (SMI) which affect performance of the whole system.
A malicious user can loop performing reads from efivarfs bringing
the system to its knees.

Linus suggested per-user rate limit to solve this.

So we add a ratelimit structure to "user_struct" and initialize
it for the root user for no limit. When allocating user_struct for
other users we set the limit to 100 per second. This could be used
for other places that want to limit the rate of some detrimental
user action.

In efivarfs if the limit is exceeded when reading, we take an
interruptible nap for 50ms and check the rate limit again.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-22 10:21:02 -08:00
Colin Ian King
1b72040645 NFS: make struct nlmclnt_fl_close_lock_ops static
The structure nlmclnt_fl_close_lock_ops s local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
fs/nfs/nfs3proc.c:876:33: warning: symbol 'nlmclnt_fl_close_lock_ops' was not
declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-22 12:23:01 -05:00
Bill.Baker@oracle.com
ad86f605c5 nfs: system crashes after NFS4ERR_MOVED recovery
nfs4_update_server unconditionally releases the nfs_client for the
source server. If migration fails, this can cause the source server's
nfs_client struct to be left with a low reference count, resulting in
use-after-free.  Also, adjust reference count handling for ELOOP.

NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6
WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]()
	nfs_put_client+0xfa/0x110 [nfs]
	nfs4_run_state_manager+0x30/0x40 [nfsv4]
	kthread+0xd8/0xf0

BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
	nfs4_xdr_enc_write+0x6b/0x160 [nfsv4]
	rpcauth_wrap_req+0xac/0xf0 [sunrpc]
	call_transmit+0x18c/0x2c0 [sunrpc]
	__rpc_execute+0xa6/0x490 [sunrpc]
	rpc_async_schedule+0x15/0x20 [sunrpc]
	process_one_work+0x160/0x470
	worker_thread+0x112/0x540
	? rescuer_thread+0x3f0/0x3f0
	kthread+0xd8/0xf0

This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"),
but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops")

Reported-by: Helen Chao <helen.chao@oracle.com>
Fixes: 52442f9b11b7 ("NFS4: Avoid migration loops")
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-22 12:17:42 -05:00
Trond Myklebust
6d243a2356 NFSv4: Fix broken cast in nfs4_callback_recallany()
Passing a pointer to a unsigned integer to test_bit() is broken.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-21 16:35:50 -05:00
Linus Torvalds
da370f1d63 for-4.16-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlqG8poACgkQxWXV+ddt
 WDuHSA//eC+69XpHwohI6pcPQ7Jbr9UCj1L/Gt0U96YSzijGW4Hv3OQEWLIRBu4c
 nZbzQYtUunpguLYwfXgUUgXRHBTo2Y5bXZNmF2MtL7JcPOLhLh4h/IcGY7eRd2Vq
 qvv2bqr3yAcQo7s6z5U/D8ulohzHQTxG7Jaq/BkVxQqhvu+vdu/9T8ikAWnmSTjw
 lONu8soR5QO7tewxz23Cguw/t1bWe1aMXG9Ykd4avyhQHtgzNE+l82i4DYUhK2CM
 x8M5/CxnDLPe73IJuA2INCUtpPvR4Qufi5Nz6EN3BrJNCGBkmg18sPIvWlH6LsVh
 bsm4Lwz/piq+hkDq2GG+Z79uiGAfCVUWAsnm7yYHwpVyMvwHKlfrcVSAuRZixw5E
 /NZ0JEkEOtvzpv4inZFYbAgD+oKfvYvwj9BW5BXfu2aH6hJBImfAeMSd1aHB3uZI
 kGgy52k2v2P3WKQOFUbmW417P05DvvGmRvRmU+tSFpB+lXAZqRzoiVIuFm0xwhf1
 1SmnYgnSYzPmzIRXAMsSYQeK/8NXDdMZMutaw/AYwX+QBEdIAErf6MWcjI6XZRyG
 g8Gr8JcpwSa+H5/LKN5uswfXxfSAsqVHnZhbOVrjyGX0wyR4KJg3ag3KsHd9SCxb
 LDEjPSYEDn9yfmw6pK2Q6J26FGYiKpuUXaNiYVNymGe6162IiBM=
 =VeA/
 -----END PGP SIGNATURE-----

Merge tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "We have a few assorted fixes, some of them show up during fstests so I
  gave them more testing"

* tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
  Btrfs: fix null pointer dereference when replacing missing device
  btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
  btrfs: Ignore errors from btrfs_qgroup_trace_extent_post
  Btrfs: fix unexpected -EEXIST when creating new inode
  Btrfs: fix use-after-free on root->orphan_block_rsv
  Btrfs: fix btrfs_evict_inode to handle abnormal inodes correctly
  Btrfs: fix extent state leak from tree log
  Btrfs: fix crash due to not cleaning up tree log block's dirty bits
  Btrfs: fix deadlock in run_delalloc_nocow
2018-02-16 09:26:18 -08:00
Amir Goldstein
7168179fcf ovl: check ERR_PTR() return value from ovl_lookup_real()
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 061701540349 ("ovl: lookup indexed ancestor of lower dir")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16 15:53:20 +01:00
Amir Goldstein
2ca3c148a0 ovl: check lower ancestry on encode of lower dir file handle
This change relaxes copy up on encode of merge dir with lower layer > 1
and handles the case of encoding a merge dir with lower layer 1, where an
ancestor is a non-indexed merge dir. In that case, decode of the lower
file handle will not have been possible if the non-indexed ancestor is
redirected before or after encode.

Before encoding a non-upper directory file handle from real layer N, we
need to check if it will be possible to reconnect an overlay dentry from
the real lower decoded dentry. This is done by following the overlay
ancestry up to a "layer N connected" ancestor and verifying that all
parents along the way are "layer N connectable". If an ancestor that is
NOT "layer N connectable" is found, we need to copy up an ancestor, which
is "layer N connectable", thus making that ancestor "layer N connected".
For example:

 layer 1: /a
 layer 2: /a/b/c

The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is
copied up and renamed, upper dir /a will be indexed by lower dir /a from
layer 1. The dir /a from layer 2 will never be indexed, so the algorithm
in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected
overlay dentry from the connected lower dentry /a/b/c.

To avoid this problem on decode time, we need to copy up an ancestor of
/a/b/c, which is "layer 2 connectable", on encode time. That ancestor is
/a/b. After copy up (and index) of /a/b, it will become "layer 2 connected"
and when the time comes to decode the file handle from lower dentry /a/b/c,
ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding
a connected overlay dentry will be accomplished.

(*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup
an entry /a in the lower layers above layer N and find the indexed dir /a
from layer 1. If that improvement is made, then the check for "layer N
connected" will need to verify there are no redirects in lower layers above
layer N. In the example above, /a will be "layer 2 connectable". However,
if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be
"layer 2 connectable":

 layer 1: /A (redirect = /a)
 layer 2: /a/b/c

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16 15:53:20 +01:00
Amir Goldstein
764baba801 ovl: hash non-dir by lower inode for fsnotify
Commit 31747eda41ef ("ovl: hash directory inodes for fsnotify")
fixed an issue of inotify watch on directory that stops getting
events after dropping dentry caches.

A similar issue exists for non-dir non-upper files, for example:

$ mkdir -p lower upper work merged
$ touch lower/foo
$ mount -t overlay -o
lowerdir=lower,workdir=work,upperdir=upper none merged
$ inotifywait merged/foo &
$ echo 2 > /proc/sys/vm/drop_caches
$ cat merged/foo

inotifywait doesn't get the OPEN event, because ovl_lookup() called
from 'cat' allocates a new overlay inode and does not reuse the
watched inode.

Fix this by hashing non-dir overlay inodes by lower real inode in
the following cases that were not hashed before this change:
 - A non-upper overlay mount
 - A lower non-hardlink when index=off

A helper ovl_hash_bylower() was added to put all the logic and
documentation about which real inode an overlay inode is hashed by
into one place.

The issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.

Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-16 15:53:20 +01:00
Linus Torvalds
e525de3ab0 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes all across the map:

   - /proc/kcore vsyscall related fixes
   - LTO fix
   - build warning fix
   - CPU hotplug fix
   - Kconfig NR_CPUS cleanups
   - cpu_has() cleanups/robustification
   - .gitignore fix
   - memory-failure unmapping fix
   - UV platform fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
  x86/error_inject: Make just_return_func() globally visible
  x86/platform/UV: Fix GAM Range Table entries less than 1GB
  x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
  x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
  x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
  vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
  x86/Kconfig: Further simplify the NR_CPUS config
  x86/Kconfig: Simplify NR_CPUS config
  x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
  x86/cpufeature: Update _static_cpu_has() to use all named variables
  x86/cpufeature: Reindent _static_cpu_has()
2018-02-14 17:31:51 -08:00
Linus Torvalds
6556677a80 Fix regressions in patch Implement iomap for block_map
This tag is meant for pulling a patch called gfs2: Fixes to
 "Implement iomap for block_map". The patch fixes some
 regressions we recently discovered in commit 3974320ca6.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJahFiJAAoJENeLYdPf93o7JKYH/irlIZM7NPHhiOcot1lXG6HL
 x1fV9u6Rjw7QimctgM6ks1lu/R7hamNvOCAPz7TFXIo0grWes2qOcZa7tdWqkpZK
 TGmSIv+NfrI9NzB3PwleImClfHR8SOgIh/ZlvHQWu9JvKkPlZ3Ik0mZCXbzUFn0I
 Q5ebe+yvaaGeU3QUzsdBgTWuYRE0uQfIylyTz7f8wc9PDp2zB2l01CCCbat/VEWe
 Jy1HlXSiQsmR0N5ypm5d3AszXJ0zbHfjQzKpNACP59WrRjnKvxsBan7En5pQBFnP
 lhLWClqxgtXlvmSb4Takw+Cu9aS2zCYizQ8eqecX5FKQp1Vufoxs48EqRnq55IY=
 =vJqP
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Bob Peterson:
 "Fix regressions in the gfs2 iomap for block_map implementation we
  recently discovered in commit 3974320ca6"

* tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fixes to "Implement iomap for block_map"
2018-02-14 10:14:59 -08:00
Andreas Gruenbacher
49edd5bf42 gfs2: Fixes to "Implement iomap for block_map"
It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:

In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense.  After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.

In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately.  Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.

Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-13 13:38:10 -07:00
Jia Zhang
595dd46ebf vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
Commit:

  df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data")

... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
However, accessing the vsyscall user page will cause an SMAP fault.

Replace memcpy() with copy_from_user() to fix this bug works, but adding
a common way to handle this sort of user page may be useful for future.

Currently, only vsyscall page requires KCORE_USER.

Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13 09:15:58 +01:00
Linus Torvalds
a9a08845e9 vfs: do bulk POLL* -> EPOLL* replacement
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11 14:34:03 -08:00
Linus Torvalds
ee5daa1361 Merge branch 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more poll annotation updates from Al Viro:
 "This is preparation to solving the problems you've mentioned in the
  original poll series.

  After this series, the kernel is ready for running

      for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
            L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
            for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
      done

  as a for bulk search-and-replace.

  After that, the kernel is ready to apply the patch to unify
  {de,}mangle_poll(), and then get rid of kernel-side POLL... uses
  entirely, and we should be all done with that stuff.

  Basically, that's what you suggested wrt KPOLL..., except that we can
  use EPOLL... instead - they already are arch-independent (and equal to
  what is currently kernel-side POLL...).

  After the preparations (in this series) switch to returning EPOLL...
  from ->poll() instances is completely mechanical and kernel-side
  POLL... can go away. The last step (killing kernel-side POLL... and
  unifying {de,}mangle_poll() has to be done after the
  search-and-replace job, since we need userland-side POLL... for
  unified {de,}mangle_poll(), thus the cherry-pick at the last step.

  After that we will have:

   - POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
     ->poll() still using those will be caught by sparse.

   - eventpoll.c and select.c warning-free wrt __poll_t

   - no more kernel-side definitions of POLL... - userland ones are
     visible through the entire kernel (and used pretty much only for
     mangle/demangle)

   - same behavior as after the first series (i.e. sparc et.al. epoll(2)
     working correctly)"

* 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  annotate ep_scan_ready_list()
  ep_send_events_proc(): return result via esed->res
  preparation to switching ->poll() to returning EPOLL...
  add EPOLLNVAL, annotate EPOLL... and event_poll->event
  use linux/poll.h instead of asm/poll.h
  xen: fix poll misannotation
  smc: missing poll annotations
2018-02-11 13:57:19 -08:00
Linus Torvalds
878e66d06f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  seq_file: fix incomplete reset on read from zero offset
  kernfs: fix regression in kernfs_fop_write caused by wrong type
2018-02-09 19:22:17 -08:00
Linus Torvalds
a28348322f 4.16 minor SMB3 fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGcBAABAgAGBQJafSqsAAoJEIosvXAHck9RHmEMAJyzkwc503WOl9/ZyagcaDli
 4mJEplVgxL6ZcgmaPZrZ1qaZvHd0JWq5bDbPeuuNv+wyqIu14DYVHivaORswfI7y
 Q0p0gslWf+hyS637CcmBajgEZbgAZIAkUktC+KPa7lZcUFDEvgYwHnQNuK3yvhBR
 zRrWeiumWn4l25ahc8GBA5nZ7tDM5xkLpv8DfI0ycCbm5E+Bqnf23m13hTMT7Mt3
 4hBc6iEdi+/IcRkwf5BHEO94hNeWSb4oERLIWxXXkZ3XTSlYtJteV/pdIoJfhHnr
 Th453VUwPfkRVVw3h4feZaIKM6kGPStGg1435+6lBpgTWQgNImd/Kcg3d181U3rs
 /+iORX2KLwwl6orVQnX5IBiUpnB2+ePpRGjMAGedIPSztMVInGInxxT1UZQtMCIg
 fJ6PQ1eH/OlY7WiY16+3YBYvtWPPqJc98P7gyfDocne7ZoT0XkoQ+2YejaNzI2Sz
 8Qkw6Y8gLSQ8tC2duV14evlLmynbB1qRL9n99iD06w==
 =Ps35
 -----END PGP SIGNATURE-----

Merge tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "There are a couple additional security fixes that are still being
  tested that are not in this set."

* tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  Add missing structs and defines from recent SMB3.1.1 documentation
  address lock imbalance warnings in smbdirect.c
  cifs: silence compiler warnings showing up with gcc-8.0.0
  Add some missing debug fields in server and tcon structs
2018-02-09 14:42:57 -08:00
Linus Torvalds
f1517df870 This request is late, apologies.
But it's also a fairly small update this time around.  Some cleanup,
 RDMA fixes, overlayfs fixes, and a fix for an NFSv4 state bug.
 
 The bigger deal for nfsd this time around is Jeff Layton's
 already-merged i_version patches.  This series has a minor conflict with
 that one, and the resolution should be obvious.  (Stephen Rothwell has
 been carrying it in linux-next for what it's worth.)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJafNVvAAoJECebzXlCjuG+yZUP/2SctFtkW638z9frLcIVt5M6
 x5hluw5jtFrVqq/KoMwi7rVaMzhdvcgwwfaLciqrPCOmcMKlOqiWslyCV0wZVCZS
 jabkOeinKVAyPTlESesNyArWKBWaB8QaYDwbkQ5Y76U9Ma5gwSghS1wc8vrNduZY
 2StieESOiOs9LljXf5SqCC5nN9s7gs4qtCK7aZ3JIt4661Lh39LqyO5zxLnc78eL
 USnJKHjTSreY2Vd1/TdNWyZhiim43wdrB+jpy6IoocTqyhYalkCz1iYdJn1arqtP
 iIddPpczKxkHekFVj7/Kfa+ATFtdXIpivOBhhOT0oY8HukTd58bh/oUMrFt4BSuP
 MQst0R9h1sanBE18XBPlXuIK51sm3AjjOGaQycl/Mzes+dMRgIP/KspAcnwwXHqG
 gyZsF3VzliFTc9s0SyiAz2AxNTUnjd+LV3E0DUeivURa6V3pc+sFlQzi8PRxRaep
 0gmhYcZsfwdDKZ/kbQyQdSWN48NxOLFke4fYjmoUtoyILa0NAHEqafeJkR5EiRTm
 tZsL9H/3THEGWygYlXGGBo/J4w5jE3uL/8KkfeuZefzSo0Ujqu0pBALMTnGFLKRx
 Mpw7JEqfUwqIVZ0Qh6q9yIcjr89qWv96UpBqRRIkFX5zOPN7B1BH8C89g8qy3Hyt
 gm/5BTw4FPE0uAM9Nhsd
 =icEX
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux

Pull nfsd update from Bruce Fields:
 "A fairly small update this time around. Some cleanup, RDMA fixes,
  overlayfs fixes, and a fix for an NFSv4 state bug.

  The bigger deal for nfsd this time around was Jeff Layton's
  already-merged i_version patches"

* tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux:
  svcrdma: Fix Read chunk round-up
  NFSD: hide unused svcxdr_dupstr()
  nfsd: store stat times in fill_pre_wcc() instead of inode times
  nfsd: encode stat->mtime for getattr instead of inode->i_mtime
  nfsd: return RESOURCE not GARBAGE_ARGS on too many ops
  nfsd4: don't set lock stateid's sc_type to CLOSED
  nfsd: Detect unhashed stids in nfsd4_verify_open_stid()
  sunrpc: remove dead code in svc_sock_setbufsize
  svcrdma: Post Receives in the Receive completion handler
  nfsd4: permit layoutget of executable-only files
  lockd: convert nlm_rqst.a_count from atomic_t to refcount_t
  lockd: convert nlm_lockowner.count from atomic_t to refcount_t
  lockd: convert nsm_handle.sm_count from atomic_t to refcount_t
2018-02-08 15:18:32 -08:00
Linus Torvalds
a0f79386a4 Mostly cleanups, but three bug fixes:
1. don't pass garbage return codes back up the call chain (Mike Marshall)
 
  2. fix stale inode test (Martin Brandenburg)
 
  3. fix off-by-one errors (Xiongfeng Wang)
 
 Also: add Martin as a reviewer in the Maintainers file.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaejneAAoJEM9EDqnrzg2+XhoQAIDF112mOwLwqDPmr4ty0g6/
 gBcoHOrRFlYWPlS5aubjoZ3jFX2fAeNuHzYS4LIuqVKUdsC+oTKQ2URJ7KKpvLiK
 6zOaz2Y4GLns2sa1ZUKli6nEBbPi6uwoF54FNbwt3b+97wpmJwlnXm9ztyt5REKA
 zOHvLgJAcfGNZEJ7gyB1zjwllu4JeD0A4MoN4vJCtkKLAaNClywu4+V0jwZB+SSN
 8QjDXNqkcD31ahWhQ/CaU4zXlxOOV+4ZR7/p5IKT693hEhV+ikTvmXy8g0+bksxj
 L+FHmQMTO+GqCS5FxuBQd3v1IP5FkoHEmAwvr3C5aMlRAaVJ9eVVIZaC9CpOJBRB
 S/CiaG2Mw8vx8VGOm8O93Z+xDi9tCYP8x4i7b5r62h0T9wSyHJSkSIUd6VIkCV9Q
 c92bX/N3wHBvCPT+RC898plni5HsFpzs3vSs8hiaAICgp64sC8pIqVlZOAdMtJd8
 RL4la/Fited/T+3BpaCTkmnvNk8Ktax7wHYsCt4gSyHN8WRvkzowgC5kV6S30Qlh
 zfoXG0K50FcU8T5r3i8slvUHmsiyYxYwJIk/z1iDgXI7y4IIR6FGDxQmw5TxgNS7
 +veTo6FCxon6QshtpAOeELCau7qNXhtlDdGqqm4+gDfMWoCn0Jem/LzdA2gPXCOr
 iCDwHLiu6WXt7ZHTrgln
 =xrih
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Mostly cleanups, but three bug fixes:

   - don't pass garbage return codes back up the call chain (Mike
     Marshall)

   - fix stale inode test (Martin Brandenburg)

   - fix off-by-one errors (Xiongfeng Wang)

  Also add Martin as a reviewer in the Maintainers file"

* tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: reverse sense of is-inode-stale test in d_revalidate
  orangefs: simplify orangefs_inode_is_stale
  Orangefs: don't propogate whacky error codes
  orangefs: use correct string length
  orangefs: make orangefs_make_bad_inode static
  orangefs: remove ORANGEFS_KERNEL_DEBUG
  orangefs: remove gossip_ldebug and gossip_lerr
  orangefs: make orangefs_client_debug_init static
  MAINTAINERS: update orangefs list and add myself as reviewer
2018-02-08 12:20:41 -08:00
Linus Torvalds
81153336eb AFS development
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWnx0Mvu3V2unywtrAQI3ng//Xdv2rxVjv4znzekb/EkE9QIakH3ET3wt
 hBewQjaGkOWhZKgyE7DnhCMh7y6OrX/oVNtjPU8H7EEHDHVs+nyoGoDu282jlppr
 qO7yMbxZwDtpja7O9hVtIViFZSqlEey/RCq1KKRUl/HDmyyOmAvOZHCpyowUqcYD
 KqJs9Z2/onkP43rwmoKIQPEeKHxRfAs6pTiAG7fUPYC4d6aSskiN5K65N0g4dx4F
 G6pDC/mIJWx2qeeI//CzSxnqhzWAhkozOs9UtvquSrIoNcYMSOQRHGne50n7OqkK
 rZCttm4gSlrEU11cPDNExjKU4z8UM3tmVdudntC8wbng5PFCHTR7JB5nZu1bEjqw
 TpIjb302QnUefzu1AGge03ZnysqDKKBAxKKwD1gYBHaj7Y2CrqP4lo+6QA4ePYTv
 qD7nRZCiQ8rF3PJOYJ7xe944Jziktf6PhnOXyxOSNCv3IT90YD7meOR3MldMjny/
 hM2ahYqfWXjLAjH20Q+B8z7ab9GDdVsBTl06w/ZX+RMrg5CNdDaYe0nfG/tS7H3A
 oD7xIjUwWjqxMBqtXNUe/3GAOnU+ilEiKjq8gmNkBSjRlpO6SMxi02jOp66HwnRs
 tD5qG3Bn2F3hdvEtwcKcS0cVWX511lLF5vkhlBhSbs/XkS+BXULr3vDsl5XclwAw
 /07q8HsHlnM=
 =fSB4
 -----END PGP SIGNATURE-----

Merge tag 'afs-next-20180208' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull afs updates from David Howells:
 "Four fixes:

   - add a missing put

   - two fixes to reset the address iteration cursor correctly

   - fix setting up the fileserver iteration cursor.

  Two cleanups:

   - remove some dead code

   - rearrange a function to be more logically laid out

  And one new feature:

   - Support AFS dynamic root.

     With this one should be able to do, say:

        mkdir /afs
        mount -t afs none /afs -o dyn

     to create a dynamic root and then, provided you have keyutils
     installed, do:

        ls /afs/grand.central.org

     and:

        ls /afs/umich.edu

     to list the root volumes of both those organisations' AFS cells
     without requiring any other setup (the kernel upcall to a program
     in the keyutils package to do DNS access as does NFS)"

* tag 'afs-next-20180208' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Support the AFS dynamic root
  afs: Rearrange afs_select_fileserver() a little
  afs: Remove unused code
  afs: Fix server list handling
  afs: Need to clear responded flag in addr cursor
  afs: Fix missing cursor clearance
  afs: Add missing afs_put_cell()
2018-02-08 12:12:04 -08:00
Linus Torvalds
9e95dae76b Things have been very quiet on the rbd side, as work continues on the
big ticket items slated for the next merge window.
 
 On the CephFS side we have a large number of cap handling improvements,
 a fix for our long-standing abuse of ->journal_info in ceph_readpages()
 and yet another dentry pointer management patch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJafGqnAAoJEEp/3jgCEfOLjNcH/R6G/xyytDMfxaN+D8DBqCPF
 IaQM7RtgYJeRzDIXYYCkDEBPYqLcD2fjHLzFotFNLcgLdeUcSOyfg7NuCOWWq7o2
 t4z6Ekyish3GWZLUmlSdPcToQ+xIlMRshU8ZmzCHTCzx8XjO+CAnCADp5dh8OKZx
 mCpRX16sXdc6ozE1hsGKIkUoNrkdj8d3+HseZ2Uxb/4FZBNgH3cmmg7c5y6M+sp6
 wT4NEES3baqq2v5cVfw7T+d4MNgRm4/JC1aBy1JBkQlmVFNGteQTT7yzo0X1AfJ+
 +kcR10ddg0gD4WGYhL+iZlQCfwyMp7vouHQbgTOgt+rDCitjDy5r1BAamtxnZjM=
 =ctaD
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "Things have been very quiet on the rbd side, as work continues on the
  big ticket items slated for the next merge window.

  On the CephFS side we have a large number of cap handling
  improvements, a fix for our long-standing abuse of ->journal_info in
  ceph_readpages() and yet another dentry pointer management patch"

* tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client:
  ceph: improving efficiency of syncfs
  libceph: check kstrndup() return value
  ceph: try to allocate enough memory for reserved caps
  ceph: fix race of queuing delayed caps
  ceph: delete unreachable code in ceph_check_caps()
  ceph: limit rate of cap import/export error messages
  ceph: fix incorrect snaprealm when adding caps
  ceph: fix un-balanced fsc->writeback_count update
  ceph: track read contexts in ceph_file_info
  ceph: avoid dereferencing invalid pointer during cached readdir
  ceph: use atomic_t for ceph_inode_info::i_shared_gen
  ceph: cleanup traceless reply handling for rename
  ceph: voluntarily drop Fx cap for readdir request
  ceph: properly drop caps for setattr request
  ceph: voluntarily drop Lx cap for link/rename requests
  ceph: voluntarily drop Ax cap for requests that create new inode
  rbd: whitelist RBD_FEATURE_OPERATIONS feature bit
  rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full()
  rbd: use kmem_cache_zalloc() in rbd_img_request_create()
  rbd: obj_request->completion is unused
2018-02-08 11:38:59 -08:00
Nicolas Pitre
a8c6db00bf cramfs: better MTD dependency expression
Commit b9f5fb1800d8 ("cramfs: fix MTD dependency") did what it says.

Since commit 9059a3493efe ("kconfig: fix relational operators for bool
and tristate symbols") it is possible to do it slightly better though.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-08 11:37:31 -08:00
Arnd Bergmann
2285ae760d NFSD: hide unused svcxdr_dupstr()
There is now only one caller left for svcxdr_dupstr() and this is inside
of an #ifdef, so we can get a warning when the option is disabled:

fs/nfsd/nfs4xdr.c:241:1: error: 'svcxdr_dupstr' defined but not used [-Werror=unused-function]

This changes the remaining caller to use a nicer IS_ENABLED() check,
which lets the compiler drop the unused code silently.

Fixes: e40d99e6183e ("NFSD: Clean up symlink argument XDR decoders")
Suggested-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-08 13:40:17 -05:00
Amir Goldstein
39ca1bf624 nfsd: store stat times in fill_pre_wcc() instead of inode times
The time values in stat and inode may differ for overlayfs and stat time
values are the correct ones to use. This is also consistent with the fact
that fill_post_wcc() also stores stat time values.

This means introducing a stat call that could fail, where previously we
were just copying values out of the inode.  To be conservative about
changing behavior, we fall back to copying values out of the inode in
the error case.  It might be better just to clear fh_pre_saved (though
note the BUG_ON in set_change_info).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-08 13:40:17 -05:00
Amir Goldstein
76c479480b nfsd: encode stat->mtime for getattr instead of inode->i_mtime
The values of stat->mtime and inode->i_mtime may differ for overlayfs
and stat->mtime is the correct value to use when encoding getattr.
This is also consistent with the fact that other attr times are also
encoded from stat values.

Both callers of lease_get_mtime() already have the value of stat->mtime,
so the only needed change is that lease_get_mtime() will not overwrite
this value with inode->i_mtime in case the inode does not have an
exclusive lease.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-08 13:40:16 -05:00
J. Bruce Fields
0078117c6d nfsd: return RESOURCE not GARBAGE_ARGS on too many ops
A client that sends more than a hundred ops in a single compound
currently gets an rpc-level GARBAGE_ARGS error.

It would be more helpful to return NFS4ERR_RESOURCE, since that gives
the client a better idea how to recover (for example by splitting up the
compound into smaller compounds).

This is all a bit academic since we've never actually seen a reason for
clients to send such long compounds, but we may as well fix it.

While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the
constant we already use in the 4.1 case, instead of hard-coding 100.
Chances anyone actually uses even 16 ops per compound are small enough
that I think there's a neglible risk or any regression.

This fixes pynfs test COMP6.

Reported-by: "Lu, Xinyu" <luxy.fnst@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-08 13:40:16 -05:00
Linus Torvalds
6fbac201f9 iversion.h related cleanup for v4.16
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJae0mSAAoJEAAOaEEZVoIVs98P+wSbwfgLeyTufmrRYrD9kxfh
 EQXfuvnJqPzRHLJIUXfwzTN3IV9RZ1434ci31lZvQE3PKrgb90QuBLiR6OIKULef
 UqpYRmjsg7BfFBdAnyUR8xSmmeN94PjXQk7tG+YQn096HJVZ6cG5qCA8RjJ9dFoq
 2haDcOfDU+3e8mbtrrF4doP6jGrVwV+okqRsshFBclQv62Kk3m7L5AjQINyZpTM5
 ZKX5JIMOAmlJcHsz/2J1qLAIRQKsvEUbRLV43bzp3E03PuVFPhig3dVtpGPUe+Yi
 OW0JX49hIoTCrQ4KZk6uweLG7ZpaSoppXggEi2ERNCUkCf3nhejLlScfye+yLx7f
 sItgPkOYU0VVF70Y72XH1DbOekZr/XCLZdEEUNCS/P68hnyK0gBNC9zPGetlxMMi
 wjjQ9Qe45vD2JFlrvhHrdUdCnxnE05zC9ckBrmM94uRwIfDR0WVgo6pfebfRkAJd
 Wp4/PfbaySY7vk4oyaXlNxcDIH2NvWwYkioI/K9rRGbB2KjTdXonQojBy+rT0LeS
 f3mufyZYyCxdwu3Wf8WO36H23L+4fseMthKIIPA0aL4wasB9LgD8gDnkyKx28DT4
 S32tdK4UALC8SAVsPr+vSaMVzKOZmuNHac+XB2i+5lHl8G/n4M2a+JFTeR4CnKJ/
 9LsBEBL5Oj7ZXL7lfFIO
 =iEKM
 -----END PGP SIGNATURE-----

Merge tag 'iversion-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull inode->i_version cleanup from Jeff Layton:
 "Goffredo went ahead and sent a patch to rename this function, and
  reverse its sense, as we discussed last week.

  The patch is very straightforward and I figure it's probably best to
  go ahead and merge this to get the API as settled as possible"

* tag 'iversion-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}
2018-02-07 14:25:22 -08:00
Linus Torvalds
fe803f8628 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF and ext2 fixlets from Jan Kara:
 "A UDF fix and an ext2 cleanup"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: drop unneeded newline
  udf: Sanitize nanoseconds for time stamps
2018-02-07 14:23:06 -08:00