57072 Commits

Author SHA1 Message Date
Al Viro
f5c0c26d90 new helper: security_sb_eat_lsm_opts()
combination of alloc_secdata(), security_sb_copy_data(),
security_sb_parse_opt_str() and free_secdata().

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:46:00 -05:00
Al Viro
c039bc3c24 LSM: lift extracting and parsing LSM options into the caller of ->sb_remount()
This paves the way for retaining the LSM options from a common filesystem
mount context during a mount parameter parsing phase to be instituted prior
to actual mount/reconfiguration actions.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:45:41 -05:00
Al Viro
6be8750b4c LSM: lift parsing LSM options into the caller of ->sb_kern_mount()
This paves the way for retaining the LSM options from a common filesystem
mount context during a mount parameter parsing phase to be instituted prior
to actual mount/reconfiguration actions.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21 11:45:30 -05:00
Eric Sandeen
3cc31fa65d iomap: don't search past page end in iomap_is_partially_uptodate
iomap_is_partially_uptodate() is intended to check wither blocks within
the selected range of a not-uptodate page are uptodate; if the range we
care about is up to date, it's an optimization.

However, the iomap implementation continues to check all blocks up to
from+count, which is beyond the page, and can even be well beyond the
iop->uptodate bitmap.

I think the worst that will happen is that we may eventually find a zero
bit and return "not partially uptodate" when it would have otherwise
returned true, and skip the optimization.  Still, it's clearly an invalid
memory access that must be fixed.

So: fix this by limiting the search to within the page as is done in the
non-iomap variant, block_is_partially_uptodate().

Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
page system:

 BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
 Read of size 8 at addr ffff800120c3a318 by task fsstress/22337

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-21 08:42:50 -08:00
Linus Torvalds
f57b620a89 This pull request contains fixes for both UBI and UBIFS:
- Kconfig dependency fixes for our new auth feature
 - Fix for selecting the right compressor when creating a fs
 - Bugfix for a bug in UBIFS's O_TMPFILE implementation
 - Refcounting fixes for UBI
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlwb+MAWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wSIjD/9zyAQUaMXJzI/ct2SXXikyTgRi
 hmLbUhuIEZD/c3tDQMgb3XqL7yYkOt/QnfeMXpi/Fy9wAdKcOU9QCcL6TPfR9B+B
 v22VdDcsU5pp2Hcinm2Yz6v2OyYwchb2YShfMkiZSf0CKAhcsKZhaeqhug/Zq+w5
 xZ6fMPvu8HhinBavq9OgSpPx892vlF1ypXHNeQNxwslz/gWKso47jtmXpz4LDX7+
 s4+p+lagVJ0MHQ3fVhBvEplWr9zPKHvX8uGohADLsDp5bAqRyGNxBwqgGm6J6H7B
 UHuqUjUf19hUZEauwKugmPTM3Y0TkZjV472qsp9omx8w/XozVqfb4YgMBBE3Qwuy
 ZS7GcaGxf54WJbOJFQv7pS3DUmfOjQBnU7YAR7phMBDeOUfxRXWkuzIB+RkED+0d
 7JjTGWQItdbTfXl2QWfSr/1rj45LNKmVPMAI2pKUBnC9dD9pktCcB90vTcrg2vjE
 3U9GbqOVORZvHh2bcR0Y8uDUBOp4v7ybCc+OOTNVWuWQpckp79E/7dkVLbpoI12R
 xEgMEaCSXlwPUxQ/FrxKBWhOZprPXNMo4xw547SDBxMq0IKUniA+YoUxtN3FxhQ8
 OiLkfcQ+5bove9qYfARaRC/hk5lpgMWebAGou/CkfTIPJsw3ZdlmL6HH6ag5i6fS
 fBHsVBdtv0ADSLXylg==
 =nZML
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.20-rc7' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS fixes from Richard Weinberger:

 - Kconfig dependency fixes for our new auth feature

 - Fix for selecting the right compressor when creating a fs

 - Bugfix for a bug in UBIFS's O_TMPFILE implementation

 - Refcounting fixes for UBI

* tag 'upstream-4.20-rc7' of git://git.infradead.org/linux-ubifs:
  ubifs: Handle re-linking of inodes correctly while recovery
  ubi: Do not drop UBI device reference before using
  ubi: Put MTD device after it is not used
  ubifs: Fix default compression selection in ubifs
  ubifs: Fix memory leak on error condition
  ubifs: auth: Add CONFIG_KEYS dependency
  ubifs: CONFIG_UBIFS_FS_AUTHENTICATION should depend on UBIFS_FS
  ubifs: replay: Fix high stack usage
2018-12-20 14:17:24 -08:00
David Howells
43f5e655ef vfs: Separate changing mount flags full remount
Separate just the changing of mount flags (MS_REMOUNT|MS_BIND) from full
remount because the mount data will get parsed with the new fs_context
stuff prior to doing a remount - and this causes the syscall to fail under
some circumstances.

To quote Eric's explanation:

  [...] mount(..., MS_REMOUNT|MS_BIND, ...) now validates the mount options
  string, which breaks systemd unit files with ProtectControlGroups=yes
  (e.g.  systemd-networkd.service) when systemd does the following to
  change a cgroup (v1) mount to read-only:

    mount(NULL, "/run/systemd/unit-root/sys/fs/cgroup/systemd", NULL,
	  MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL)

  ... when the kernel has CONFIG_CGROUPS=y but no cgroup subsystems
  enabled, since in that case the error "cgroup1: Need name or subsystem
  set" is hit when the mount options string is empty.

  Probably it doesn't make sense to validate the mount options string at
  all in the MS_REMOUNT|MS_BIND case, though maybe you had something else
  in mind.

This is also worthwhile doing because we will need to add a mount_setattr()
syscall to take over the remount-bind function.

Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
2018-12-20 16:32:56 +00:00
David Howells
e262e32d6b vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled
Only the mount namespace code that implements mount(2) should be using the
MS_* flags.  Suppress them inside the kernel unless uapi/linux/mount.h is
included.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
2018-12-20 16:32:56 +00:00
Dave Chinner
a837eca241 iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.

The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:

# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
....
SECTION       -- xfs
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch

generic/038 454s ...
 run fstests generic/038 at 2018-12-20 18:43:05
 XFS (sdc): Unmounting Filesystem
 XFS (sdc): Mounting V5 Filesystem
 XFS (sdc): Ending clean mount
 BUG: Bad page state in process kswapd0  pfn:3a7fa
 page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
 flags: 0xfffffc0000000()
 raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
 raw: 0000000000000001 0000000000000000 00000000ffffffff
 page dumped because: non-NULL mapping
 CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
 Call Trace:
  dump_stack+0x67/0x90
  bad_page.cold.116+0x8a/0xbd
  free_pcppages_bulk+0x4bf/0x6a0
  free_unref_page_list+0x10f/0x1f0
  shrink_page_list+0x49d/0xf50
  shrink_inactive_list+0x19d/0x3b0
  shrink_node_memcg.constprop.77+0x398/0x690
  ? shrink_slab.constprop.81+0x278/0x3f0
  shrink_node+0x7a/0x2f0
  kswapd+0x34b/0x6d0
  ? node_reclaim+0x240/0x240
  kthread+0x11f/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x24/0x30
 Disabling lock debugging due to kernel taint
....

The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.

generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.

It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-20 07:22:51 -08:00
Darrick J. Wong
86d163dbfe xfs: stringify scrub types in ftrace output
Use __print_symbolic to print the scrub type in ftrace output.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-12-19 14:02:01 -08:00
Darrick J. Wong
c494213f30 xfs: stringify btree cursor types in ftrace output
Use __print_symbolic to print the btree type in ftrace output.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-12-19 14:02:01 -08:00
Darrick J. Wong
0357d21a6c xfs: move XFS_INODE_FORMAT_STR mappings to libxfs
Move XFS_INODE_FORMAT_STR to libxfs so that we don't forget to keep it
updated, and add necessary TRACE_DEFINE_ENUM.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-12-19 14:02:01 -08:00
Darrick J. Wong
05c753c4cf xfs: move XFS_AG_BTREE_CMP_FORMAT_STR mappings to libxfs
Move XFS_AG_BTREE_CMP_FORMAT_STR to libxfs so that we don't forget to
keep it updated, and TRACE_DEFINE_ENUM the values while we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-12-19 14:02:01 -08:00
Darrick J. Wong
85f8dff00a xfs: fix symbolic enum printing in ftrace output
ftrace's __print_symbolic() has a (very poorly documented) requirement
that any enum values used in the symbol to string translation table be
wrapped in a TRACE_DEFINE_ENUM so that the enum value can be encoded in
the ftrace ring buffer.  Fix this unsatisfied requirement.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-12-19 14:02:01 -08:00
Darrick J. Wong
7af8150f99 xfs: fix function pointer type in ftrace format
Use %pS instead of %pF in ftrace strings so that we record the actual
function address instead of the function descriptor.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-12-19 14:02:00 -08:00
Theodore Ts'o
18f2c4fceb ext4: check for shutdown and r/o file system in ext4_write_inode()
If the file system has been shut down or is read-only, then
ext4_write_inode() needs to bail out early.

Also use jbd2_complete_transaction() instead of ext4_force_commit() so
we only force a commit if it is needed.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-12-19 14:36:58 -05:00
Theodore Ts'o
fde872682e ext4: force inode writes when nfsd calls commit_metadata()
Some time back, nfsd switched from calling vfs_fsync() to using a new
commit_metadata() hook in export_operations().  If the file system did
not provide a commit_metadata() hook, it fell back to using
sync_inode_metadata().  Unfortunately doesn't work on all file
systems.  In particular, it doesn't work on ext4 due to how the inode
gets journalled --- the VFS writeback code will not always call
ext4_write_inode().

So we need to provide our own ext4_nfs_commit_metdata() method which
calls ext4_write_inode() directly.

Google-Bug-Id: 121195940
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-12-19 14:07:58 -05:00
NeilBrown
a52458b48a NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.
SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.

This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.

For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning.  A look-up of a low-level cred will
map this to a machine credential.

Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:46 -05:00
NeilBrown
684f39b4cf NFS: struct nfs_open_dir_context: convert rpc_cred pointer to cred.
Use the common 'struct cred' to pass credentials for readdir.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:46 -05:00
NeilBrown
b68572e07c NFS: change access cache to use 'struct cred'.
Rather than keying the access cache with 'struct rpc_cred',
use 'struct cred'.  Then use cred_fscmp() to compare
credentials rather than comparing the raw pointer.

A benefit of this approach is that in the common case we avoid the
rpc_lookup_cred_nonblock() call which can be slow when the cred cache is large.
This also keeps many fewer items pinned in the rpc cred cache, so the
cred cache is less likely to get large.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
ddf529eeed NFS: move credential expiry tracking out of SUNRPC into NFS.
NFS needs to know when a credential is about to expire so that
it can modify write-back behaviour to finish the write inside the
expiry time.
It currently uses functions in SUNRPC code which make use of a
fairly complex callback scheme and flags in the generic credientials.

As I am working to discard the generic credentials, this has to change.

This patch moves the logic into NFS, in part by finding and caching
the low-level credential in the open_context.  We then make direct
cred-api calls on that.

This makes the code much simpler and removes a dependency on generic
rpc credentials.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
5e16923b43 NFS/SUNRPC: don't lookup machine credential until rpcauth_bindcred().
When NFS creates a machine credential, it is a "generic" credential,
not tied to any auth protocol, and is really just a container for
the princpal name.
This doesn't get linked to a genuine credential until rpcauth_bindcred()
is called.
The lookup always succeeds, so various places that test if the machine
credential is NULL, are pointless.

As a step towards getting rid of generic credentials, this patch gets
rid of generic machine credentials.  The nfs_client and rpc_client
just hold a pointer to a constant principal name.
When a machine credential is wanted, a special static 'struct rpc_cred'
pointer is used. rpcauth_bindcred() recognizes this, finds the
principal from the client, and binds the correct credential.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
f15e1e8bc6 NFSv4: don't require lock for get_renew_cred or get_machine_cred
This lock is no longer necessary.

If nfs4_get_renew_cred() needs to hunt through the open-state
creds for a user cred, it still takes the lock to stablize
the rbtree, but otherwise there are no races.

Note that this completely removes the lock from nfs4_renew_state().
It appears that the original need for the locking here was removed
long ago, and there is no longer anything to protect.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
a534ecb013 NFSv4: add cl_root_cred for use when machine cred is not available.
NFSv4 state management tries a root credential when no machine
credential is available, as can happen with kerberos.
It does this by replacing the cl_machine_cred with a root credential.
This means that any user of the machine credential needs to take
a lock while getting a reference to the machine credential, which is
a little cumbersome.

So introduce an explicit cl_root_cred, and never free either
credential until client shutdown.  This means that no locking
is needed to reference these credentials.  Future patches
will make use of this.

This is only a temporary addition.  both cl_machine_cred and
cl_root_cred will disappear later in the series.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
8276c902bb SUNRPC: remove uid and gid from struct auth_cred
Use cred->fsuid and cred->fsgid instead.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
fc0664fd9b SUNRPC: remove groupinfo from struct auth_cred.
We can use cred->groupinfo (from the 'struct cred') instead.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:45 -05:00
NeilBrown
97f68c6b02 SUNRPC: add 'struct cred *' to auth_cred and rpc_cred
The SUNRPC credential framework was put together before
Linux has 'struct cred'.  Now that we have it, it makes sense to
use it.
This first step just includes a suitable 'struct cred *' pointer
in every 'struct auth_cred' and almost every 'struct rpc_cred'.

The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing
else really makes sense.

For rpc_cred, the pointer is reference counted.
For auth_cred it isn't.  struct auth_cred are either allocated on
the stack, in which case the thread owns a reference to the auth,
or are part of 'struct generic_cred' in which case gc_base owns the
reference, and "acred" shares it.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
Pavel Tikhomirov
ac0aa5e843 nfs: fix comment to nfs_generic_pg_test which does the opposite
Please see comment to filelayout_pg_test for reference.

To: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
Olga Kornievskaia
069d5bf5ec NFSv4: cleanup remove unused nfs4_xdev_fs_type
commit e8f25e6d6d19 "NFS: Remove the NFS v4 xdev mount function"
removed the last use of this.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
Theodore Ts'o
8a363970d1 ext4: avoid declaring fs inconsistent due to invalid file handles
If we receive a file handle, either from NFS or open_by_handle_at(2),
and it points at an inode which has not been initialized, and the file
system has metadata checksums enabled, we shouldn't try to get the
inode, discover the checksum is invalid, and then declare the file
system as being inconsistent.

This can be reproduced by creating a test file system via "mke2fs -t
ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
directory, and then running the following program.

#define _GNU_SOURCE
#include <fcntl.h>

struct handle {
	struct file_handle fh;
	unsigned char fid[MAX_HANDLE_SZ];
};

int main(int argc, char **argv)
{
	struct handle h = {{8, 1 }, { 12, }};

	open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
	return 0;
}

Google-Bug-Id: 120690101
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-12-19 12:29:13 -05:00
Theodore Ts'o
a805622a75 ext4: include terminating u32 in size of xattr entries when expanding inodes
In ext4_expand_extra_isize_ea(), we calculate the total size of the
xattr header, plus the xattr entries so we know how much of the
beginning part of the xattrs to move when expanding the inode extra
size.  We need to include the terminating u32 at the end of the xattr
entries, or else if there is uninitialized, non-zero bytes after the
xattr entries and before the xattr values, the list of xattr entries
won't be properly terminated.

Reported-by: Steve Graham <stgraham2000@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-12-19 12:28:13 -05:00
Ronnie Sahlberg
271b9c0c80 smb3: Fix rmdir compounding regression to strict servers
Some servers require that the setinfo matches the exact size,
and in this case compounding changes introduced by
commit c2e0fe3f5aae ("cifs: make rmdir() use compounding")
caused us to send 8 bytes (padded length) instead of 1 byte
(the size of the structure).  See MS-FSCC section 2.4.11.

Fixing this when we send a SET_INFO command for delete file
disposition, then ends up as an iov of a single byte but this
causes problems with SMB3 and encryption.

To avoid this, instead of creating a one byte iov for the disposition value
and then appending an additional iov with a 7 byte padding we now handle
this as a single 8 byte iov containing both the disposition byte as well as
the padding in one single buffer.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Paulo Alcantara <palcantara@suse.de>
2018-12-19 07:55:32 -06:00
Todd Kjos
80cd795630 binder: fix use-after-free due to ksys_close() during fdget()
44d8047f1d8 ("binder: use standard functions to allocate fds")
exposed a pre-existing issue in the binder driver.

fdget() is used in ksys_ioctl() as a performance optimization.
One of the rules associated with fdget() is that ksys_close() must
not be called between the fdget() and the fdput(). There is a case
where this requirement is not met in the binder driver which results
in the reference count dropping to 0 when the device is still in
use. This can result in use-after-free or other issues.

If userpace has passed a file-descriptor for the binder driver using
a BINDER_TYPE_FDA object, then kys_close() is called on it when
handling a binder_ioctl(BC_FREE_BUFFER) command. This violates
the assumptions for using fdget().

The problem is fixed by deferring the close using task_work_add(). A
new variant of __close_fd() was created that returns a struct file
with a reference. The fput() is deferred instead of using ksys_close().

Fixes: 44d8047f1d87a ("binder: use standard functions to allocate fds")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19 09:40:13 +01:00
Boris Brezillon
f366d3854e Core changes:
- Parse the 4BAIT SFDP section
 - Add a bunch of SPI NOR entries to the flash_info table
 - Add the concept of SFDP fixups and use it to fix a bug on MX25L25635F
 - A bunch of minor cleanups/comestic changes
 -----BEGIN PGP SIGNATURE-----
 
 iQJQBAABCgA6FiEEKmCqpbOU668PNA69Ze02AX4ItwAFAlwVQUYcHGJvcmlzLmJy
 ZXppbGxvbkBib290bGluLmNvbQAKCRBl7TYBfgi3ACIYD/sG0+vRIKK8+NgNUHYy
 nzKICKvdnBrm2RWi+6va5n2pYggyNy1VhWEjmqWLupjxn7NGkjZiBilhfj8Iv6YN
 HScNy7FLHM6pxTKpsZKQLLGKvaUXODgvZwiw3L6T5T3JaJ5nlpE5g8jQy8sCzfjK
 pwKdrOw17caZgoY0bMe2ppCObIDLd+mY+WSHbo6tb4/fohpTX1l9QZYHjfgHU9vP
 CG0z3sU0JCNGXsbQMngfeuyXFjJ4OKdnklbVTeZl673AYtQMBhQEIGNVkVefuBP3
 p8hU0CWRn0Yikc1HGTENvYCnQ7ju3z+16rnLxy3A5CPHhCDrTgUmM8HabYbh+0Si
 0Y1wXpEOZ0OZQ7uMs2Q8SK0GLyxqdxkE0ibHgb7K/aLb+yg8oB7DB4Uenb06CiaQ
 KAZWGgWZlSondX+/GI7YcQozslFFCfixAw6H0kCpQW0/2piXNqsN5BOROEqjueXW
 xeMG0DNnzrQ6/vB9ukLESSB/YvVwfUvt6GSjqMSDdXwx6zyKSvHJ+chCxlK46+Hm
 zIVcvNT3mpwcuVnQOZZeCaiIrDUAySEq/8Ztp9O5/CfkzdQyMWxDPoY9A0HjL2p3
 FmRN7aAB9jJcdHc2tLwcKPRepjliIUMLf0NXdTSizzQz8WJqULZRBN0VWW4sCbLc
 +tTisYjX8fYUz9+kHUcQ4XY16g==
 =JXhP
 -----END PGP SIGNATURE-----

Merge tag 'spi-nor/for-4.21' of git://git.infradead.org/linux-mtd into mtd/next

Core changes:
- Parse the 4BAIT SFDP section
- Add a bunch of SPI NOR entries to the flash_info table
- Add the concept of SFDP fixups and use it to fix a bug on MX25L25635F
- A bunch of minor cleanups/comestic changes
2018-12-18 20:00:52 +01:00
Boris Brezillon
ccec4a4a4f Merge tag 'nand/for-4.21' of git://git.infradead.org/linux-mtd into mtd/next
NAND core changes:
- kernel-doc miscellaneous fixes.
- Third batch of fixes/cleanup to the raw NAND core impacting various
  controller drivers (ams-delta, marvell, fsmc, denali, tegra, vf610):
  * Stopping to pass mtd_info objects to internal functions
  * Reorganizing code to avoid forward declarations
  * Dropping useless test in nand_legacy_set_defaults()
  * Moving nand_exec_op() to internal.h
  * Adding nand_[de]select_target() helpers
  * Passing the CS line to be selected in struct nand_operation
  * Making ->select_chip() optional when ->exec_op() is implemented
  * Deprecating the ->select_chip() hook
  * Moving the ->exec_op() method to nand_controller_ops
  * Moving ->setup_data_interface() to nand_controller_ops
  * Deprecating the dummy_controller field
  * Fixing JEDEC detection
  * Providing a helper for polling GPIO R/B pin

Raw NAND chip drivers changes:
- Macronix:
  * Flagging 1.8V AC chips with a broken GET_FEATURES(TIMINGS)

Raw NAND controllers drivers changes:
- Ams-delta:
  * Fixing the error path
  * SPDX tag added
  * May be compiled with COMPILE_TEST=y
  * Conversion to ->exec_op() interface
  * Dropping .IOADDR_R/W use
  * Use GPIO API for data I/O
- Denali:
  * Removing denali_reset_banks()
  * Removing ->dev_ready() hook
  * Including <linux/bits.h> instead of <linux/bitops.h>
  * Changes to comply with the above fixes/cleanup done in the core.
- FSMC:
  * Adding an SPDX tag to replace the license text
  * Making conversion from chip to fsmc consistent
  * Fixing unchecked return value in fsmc_read_page_hwecc
  * Changes to comply with the above fixes/cleanup done in the core.
- Marvell:
  * Preventing timeouts on a loaded machine (fix)
  * Changes to comply with the above fixes/cleanup done in the core.
- OMAP2:
  * Pass the parent of pdev to dma_request_chan() (fix)
- R852:
  * Use generic DMA API
- sh_flctl:
  * Converting to SPDX identifiers
- Sunxi:
  * Write pageprog related opcodes to the right register: WCMD_SET (fix)
- Tegra:
  * Stop implementing ->select_chip()
- VF610:
  * Adding an SPDX tag to replace the license text
  * Changes to comply with the above fixes/cleanup done in the core.
- Various trivial/spelling/coding style fixes.

SPI-NAND drivers changes:
- Removing the depreacated mt29f_spinand driver from staging.
- Adding support for:
  * Toshiba TC58CVG2S0H
  * GigaDevice GD5FxGQ4xA
  * Winbond W25N01GV
2018-12-18 19:59:16 +01:00
Nick Bowler
a9d25bde1e xfs: Fix x32 ioctls when cmd numbers differ from ia32.
Several ioctl structs change size between native 32-bit (ia32) and x32
applications, because x32 follows the native 64-bit (amd64) integer
alignment rules and uses 64-bit time_t.  In these instances, the ioctl
number changes so userspace simply gets -ENOTTY.  This scenario can be
handled by simply adding more cases.

Looking at the different ioctls implemented here:

- All the ones marked 'No size or alignment issue on any arch' should
  presumably all be fine.

- All the ones under BROKEN_X86_ALIGNMENT are different under integer
  alignment rules.  Since x32 matches amd64 here, we just need both
  sets of cases handled.

- XFS_IOC_SWAPEXT has both integer alignment differences and time_t
  differences.  Since x32 matches amd64 here, we need to add a case
  which calls the native implementation.

- The remaining ioctls have neither 64-bit integers nor time_t, so
  x32 matches ia32 here and no change is required at this level.  The
  bulkstat ioctl implementations have some pointer chasing which is
  handled separately.

Signed-off-by: Nick Bowler <nbowler@draconx.ca>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-18 10:55:21 -08:00
Nick Bowler
7ca860e3c1 xfs: Fix bulkstat compat ioctls on x32 userspace.
The bulkstat family of ioctls are problematic on x32, because there is
a mixup of native 32-bit and 64-bit conventions.  The xfs_fsop_bulkreq
struct contains pointers and 32-bit integers so that matches the native
32-bit layout, and that means the ioctl implementation goes into the
regular compat path on x32.

However, the 'ubuffer' member of that struct in turn refers to either
struct xfs_inogrp or xfs_bstat (or an array of these).  On x32, those
structures match the native 64-bit layout.  The compat implementation
writes out the 32-bit version of these structures.  This is not the
expected format for x32 userspace, causing problems.

Fortunately the functions which actually output these xfs_inogrp and
xfs_bstat structures have an easy way to select which output format
is required, so we just need a little tweak to select the right format
on x32.

Signed-off-by: Nick Bowler <nbowler@draconx.ca>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-18 10:55:20 -08:00
Nick Bowler
c456d64449 xfs: Align compat attrlist_by_handle with native implementation.
While inspecting the ioctl implementations, I noticed that the compat
implementation of XFS_IOC_ATTRLIST_BY_HANDLE does not do exactly the
same thing as the native implementation.  Specifically, the "cursor"
does not appear to be written out to userspace on the compat path,
like it is on the native path.

This adjusts the compat implementation to copy out the cursor just
like the native implementation does.  The attrlist cursor does not
require any special compat handling.  This fixes xfstests xfs/269
on both IA-32 and x32 userspace, when running on an amd64 kernel.

Signed-off-by: Nick Bowler <nbowler@draconx.ca>
Fixes: 0facef7fb053b ("xfs: in _attrlist_by_handle, copy the cursor back to userspace")
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-18 10:55:20 -08:00
Javier Barrio
41c4f85cda quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.
Commit 1fa5efe3622db58cb8c7b9a50665e9eb9a6c7e97 (ext4: Use generic helpers for quotaon
and quotaoff) made possible to call quotactl(Q_XQUOTAON/OFF) on ext4 filesystems
with sysfile quota support. This leads to calling dquot_enable/disable without s_umount
held in excl. mode, because quotactl_cmd_onoff checks only for Q_QUOTAON/OFF.

The following WARN_ON_ONCE triggers (in this case for dquot_enable, ext4, latest Linus' tree):

[  117.807056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: quota,prjquota

[...]

[  155.036847] WARNING: CPU: 0 PID: 2343 at fs/quota/dquot.c:2469 dquot_enable+0x34/0xb9
[  155.036851] Modules linked in: quota_v2 quota_tree ipv6 af_packet joydev mousedev psmouse serio_raw pcspkr i2c_piix4 intel_agp intel_gtt e1000 ttm drm_kms_helper drm agpgart fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core input_leds kvm_intel kvm irqbypass qemu_fw_cfg floppy evdev parport_pc parport button crc32c_generic dm_mod ata_generic pata_acpi ata_piix libata loop ext4 crc16 mbcache jbd2 usb_storage usbcore sd_mod scsi_mod
[  155.036901] CPU: 0 PID: 2343 Comm: qctl Not tainted 4.20.0-rc6-00025-gf5d582777bcb #9
[  155.036903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  155.036911] RIP: 0010:dquot_enable+0x34/0xb9
[  155.036915] Code: 41 56 41 55 41 54 55 53 4c 8b 6f 28 74 02 0f 0b 4d 8d 7d 70 49 89 fc 89 cb 41 89 d6 89 f5 4c 89 ff e8 23 09 ea ff 85 c0 74 0a <0f> 0b 4c 89 ff e8 8b 09 ea ff 85 db 74 6a 41 8b b5 f8 00 00 00 0f
[  155.036918] RSP: 0018:ffffb09b00493e08 EFLAGS: 00010202
[  155.036922] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 0000000000000008
[  155.036924] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9781b67cd870
[  155.036926] RBP: 0000000000000002 R08: 0000000000000000 R09: 61c8864680b583eb
[  155.036929] R10: ffffb09b00493e48 R11: ffffffffff7ce7d4 R12: ffff9781b7ee8d78
[  155.036932] R13: ffff9781b67cd800 R14: 0000000000000004 R15: ffff9781b67cd870
[  155.036936] FS:  00007fd813250b88(0000) GS:ffff9781ba000000(0000) knlGS:0000000000000000
[  155.036939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.036942] CR2: 00007fd812ff61d6 CR3: 000000007c882000 CR4: 00000000000006b0
[  155.036951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  155.036953] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  155.036955] Call Trace:
[  155.037004]  dquot_quota_enable+0x8b/0xd0
[  155.037011]  kernel_quotactl+0x628/0x74e
[  155.037027]  ? do_mprotect_pkey+0x2a6/0x2cd
[  155.037034]  __x64_sys_quotactl+0x1a/0x1d
[  155.037041]  do_syscall_64+0x55/0xe4
[  155.037078]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  155.037105] RIP: 0033:0x7fd812fe1198
[  155.037109] Code: 02 77 0d 48 89 c1 48 c1 e9 3f 75 04 48 8b 04 24 48 83 c4 50 5b c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 ff b8 b3 00 00 00 0f 05 <48> 89 c7 e8 c1 eb ff ff 5a c3 48 63 ff b8 bb 00 00 00 0f 05 48 89
[  155.037112] RSP: 002b:00007ffe8cd7b050 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
[  155.037116] RAX: ffffffffffffffda RBX: 00007ffe8cd7b148 RCX: 00007fd812fe1198
[  155.037119] RDX: 0000000000000000 RSI: 00007ffe8cd7cea9 RDI: 0000000000580102
[  155.037121] RBP: 00007ffe8cd7b0f0 R08: 000055fc8eba8a9d R09: 0000000000000000
[  155.037124] R10: 00007ffe8cd7b074 R11: 0000000000000206 R12: 00007ffe8cd7b168
[  155.037126] R13: 000055fc8eba8897 R14: 0000000000000000 R15: 0000000000000000
[  155.037131] ---[ end trace 210f864257175c51 ]---

and then the syscall proceeds without s_umount locking.

This patch locks the superblock ->s_umount sem. in exclusive mode for all Q_XQUOTAON/OFF
quotactls too in addition to Q_QUOTAON/OFF.

AFAICT, other than ext4, only xfs and ocfs2 are affected by this change.
The VFS will now call in xfs_quota_* functions with s_umount held, which wasn't the case
before. This looks good to me but I can not say for sure. Ext4 and ocfs2 where already
beeing called with s_umount exclusive via quota_quotaon/off which is basically the same.

Signed-off-by: Javier Barrio <javier.barrio.mart@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-12-18 18:29:15 +01:00
Bob Peterson
bc0205612b gfs2: take jdata unstuff into account in do_grow
Before this patch, function do_grow would not reserve enough journal
blocks in the transaction to unstuff jdata files while growing them.
This patch adds the logic to add one more block if the file to grow
is jdata.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-18 10:49:02 -06:00
Jens Axboe
875736bb3f aio: abstract out io_event filler helper
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18 08:29:59 -07:00
Jens Axboe
88a6f18b95 aio: split out iocb copy from io_submit_one()
In preparation of handing in iocbs in a different fashion as well. Also
make it clear that the iocb being passed in isn't modified, by marking
it const throughout.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18 08:29:59 -07:00
Jens Axboe
71ebc6fef0 aio: use iocb_put() instead of open coding it
Replace the percpu_ref_put() + kmem_cache_free() with a call to
iocb_put() instead.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18 08:29:58 -07:00
Jens Axboe
a79d40e9b0 aio: only use blk plugs for > 2 depth submissions
Plugging is meant to optimize submission of a string of IOs, if we don't
have more than 2 being submitted, don't bother setting up a plug.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18 08:29:58 -07:00
Jens Axboe
2bc4ca9bb6 aio: don't zero entire aio_kiocb aio_get_req()
It's 192 bytes, fairly substantial. Most items don't need to be cleared,
especially not upfront. Clear the ones we do need to clear, and leave
the other ones for setup when the iocb is prepared and submitted.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18 08:29:58 -07:00
Christoph Hellwig
432c79978c aio: separate out ring reservation from req allocation
This is in preparation for certain types of IO not needing a ring
reserveration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18 08:29:58 -07:00
Jens Axboe
bc9bff6162 aio: use assigned completion handler
We know this is a read/write request, but in preparation for
having different kinds of those, ensure that we call the assigned
handler instead of assuming it's aio_complete_rq().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18 08:29:58 -07:00
Jens Axboe
4b92543282 Merge branch 'for-4.21/block' into for-4.21/aio
* for-4.21/block: (351 commits)
  blk-mq: enable IO poll if .nr_queues of type poll > 0
  blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
  blk-mq: skip zero-queue maps in blk_mq_map_swqueue
  block: fix blk-iolatency accounting underflow
  blk-mq: fix dispatch from sw queue
  block: mq-deadline: Fix write completion handling
  nvme-pci: don't share queue maps
  blk-mq: only dispatch to non-defauly queue maps if they have queues
  blk-mq: export hctx->type in debugfs instead of sysfs
  blk-mq: fix allocation for queue mapping table
  blk-wbt: export internal state via debugfs
  blk-mq-debugfs: support rq_qos
  block: update sysfs documentation
  block: loop: check error using IS_ERR instead of IS_ERR_OR_NULL in loop_add()
  aoe: add __exit annotation
  block: clear REQ_HIPRI if polling is not supported
  blk-mq: replace and kill blk_mq_request_issue_directly
  blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests
  blk-mq: refactor the code of issue request directly
  block: remove the bio_integrity_advance export
  ...
2018-12-18 08:29:53 -07:00
Arnd Bergmann
d651d1607f vfs: replace current_kernel_time64 with ktime equivalent
current_time is the last remaining caller of current_kernel_time64(),
which is a wrapper around ktime_get_coarse_real_ts64().  This calls the
latter directly for consistency with the rest of the kernel that is moving
to the ktime_get_ family of time accessors, as now documented in
Documentation/core-api/timekeeping.rst.

An open questions is whether we may want to actually call the more
accurate ktime_get_real_ts64() for file systems that save high-resolution
timestamps in their on-disk format.  This would add a small overhead to
each update of the inode stamps but lead to inode timestamps to actually
have a usable resolution better than one jiffy (1 to 10 milliseconds
normally).  Experiments on a variety of hardware platforms show a typical
time of around 100 CPU cycles to read the cycle counter and calculate the
accurate time from that.  On old platforms without a cycle counter, this
can be signiciantly higher, up to several microseconds to access a
hardware clock, but those have become very rare by now.

I traced the original addition of the current_kernel_time() call to set
the nanosecond fields back to linux-2.5.48, where Andi Kleen added a patch
with subject "nanosecond stat timefields".  Andi explains that the
motivation was to introduce as little overhead as possible back then.  At
this time, reading the clock hardware was also more expensive when most
architectures did not have a cycle counter.

One side effect of having more accurate inode timestamp would be having to
write out the inode every time that mtime/ctime/atime get touched on most
systems, whereas many file systems today only write it when the timestamps
have changed, i.e.  at most once per jiffy unless something else changes
as well.  That change would certainly be noticed in some workloads, which
is enough reason to not do it without a good reason, regardless of the
cost of reading the time.

One thing we could still consider however would be to round the timestamps
from current_time() to multiples of NSEC_PER_JIFFY, e.g.  full
milliseconds rather than having six or seven meaningless but confusing
digits at the end of the timestamp.

Link: http://lkml.kernel.org/r/20180726130820.4174359-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-18 16:13:05 +01:00
Al Viro
26cb5a328c exofs_mount(): fix leaks on failure exits
... and don't abuse mount_nodev(), while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
2018-12-17 18:36:33 -05:00
Andrea Gelmini
52042d8e82 btrfs: Fix typos in comments and strings
The typos accumulate over time so once in a while time they get fixed in
a large patch.

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:50 +01:00