52621 Commits

Author SHA1 Message Date
David Howells
45df846273 afs: Fix server list handling
Fix server list handling in the following ways:

 (1) In afs_alloc_volume(), remove duplicate server list build code.  This
     was already done by afs_alloc_server_list() which afs_alloc_volume()
     previously called.  This just results in twice as many VL RPCs.

 (2) In afs_deliver_vl_get_entry_by_name_u(), use the number of server
     records indicated by ->nServers in the UVLDB record returned by the
     VL.GetEntryByNameU RPC call rather than scanning all NMAXNSERVERS
     slots.  Unused slots may contain garbage.

 (3) In afs_alloc_server_list(), don't stop converting a UVLDB record into
     a server list just because we can't look up one of the servers.  Just
     skip that server and go on to the next.  If we can't look up any of
     the servers then we'll fail at the end.

Without this patch, an attempt to view the umich.edu root cell using
something like "ls /afs/umich.edu" on a dynamic root (future patch) mount
or an autocell mount will result in ENOMEDIUM.  The failure is due to kafs
not stopping after nServers'worth of records have been read, but then
trying to access a server with a garbage UUID and getting an error, which
aborts the server list build.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
2018-02-06 14:36:54 +00:00
David Howells
8305e579c6 afs: Need to clear responded flag in addr cursor
In afs_select_fileserver(), we need to clear the ->responded flag in the
address list when reusing it.  We should also clear it in
afs_select_current_fileserver().

To this end, just memset() the object before initialising it.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
2018-02-06 14:36:54 +00:00
David Howells
fe4d774c84 afs: Fix missing cursor clearance
afs_select_fileserver() ends the address cursor it is using in the case in
which we get some sort of network error and run out of addresses to iterate
through, before it jumps to try the next server.  This also needs to be
done when the server aborts with some sort of error that means we should
try the next server.

Fix this by:

 (1) Move the iterate_address afs_end_cursor() call to the next_server
     case.

 (2) End the cursor in the failed case.

 (3) Make afs_end_cursor() clear the ->begun flag and ->addr pointer in the
     address cursor.

 (4) Make afs_end_cursor() able to be called on an already cleared cursor.

Without this, something like the following oops may occur:

	AFS: Assertion failed
	18446612134397189888 == 0 is false
	0xffff88007c279f00 == 0x0 is false
	------------[ cut here ]------------
	kernel BUG at fs/afs/rotate.c:360!
	RIP: 0010:afs_select_fileserver+0x79b/0xa30 [kafs]
	Call Trace:
	 afs_statfs+0xcc/0x180 [kafs]
	 ? p9_client_statfs+0x9e/0x110 [9pnet]
	 ? _cond_resched+0x19/0x40
	 statfs_by_dentry+0x6d/0x90
	 vfs_statfs+0x1b/0xc0
	 user_statfs+0x4b/0x80
	 SYSC_statfs+0x15/0x30
	 SyS_statfs+0xe/0x10
	 entry_SYSCALL_64_fastpath+0x20/0x83

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
2018-02-06 14:36:54 +00:00
David Howells
e44150157f afs: Add missing afs_put_cell()
afs_alloc_volume() needs to release the cell ref it obtained in the case of
an error.  Fix this by adding an afs_put_cell() call into the error path.

This can triggered when a lookup for a cell in a dynamic root or an
autocell mount returns an error whilst trying to look up the server (such
as ENOMEDIUM).  This results in an assertion failure oops when the module
is unloaded due to outstanding refs on a cell record.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
2018-02-06 14:22:03 +00:00
J. Bruce Fields
2502072058 nfsd4: don't set lock stateid's sc_type to CLOSED
There's no point I can see to

	stp->st_stid.sc_type = NFS4_CLOSED_STID;

given release_lock_stateid immediately sets sc_type to 0.

That set of sc_type to 0 should be enough to prevent it being used where
we don't want it to be; NFS4_CLOSED_STID should only be needed for
actual open stateid's that are actually closed.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-05 17:13:17 -05:00
Trond Myklebust
4f1764172a nfsd: Detect unhashed stids in nfsd4_verify_open_stid()
The state of the stid is guaranteed by 2 locks:
- The nfs4_client 'cl_lock' spinlock
- The nfs4_ol_stateid 'st_mutex' mutex

so it is quite possible for the stid to be unhashed after lookup,
but before calling nfsd4_lock_ol_stateid(). So we do need to check
for a zero value for 'sc_type' in nfsd4_verify_open_stid().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Checuk Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 659aefb68eca "nfsd: Ensure we don't recognise lock stateids..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-05 17:13:16 -05:00
Linus Torvalds
e237f98a9c Changes since last update:
- Print scrub build status in the xfs build info.
  - Explicitly call out the remaining two scenarios where we don't
    support
    reflink and never have.
  - Remove EXPERIMENTAL tag from reverse mapping btree!
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaeJjKAAoJEPh/dxk0SrTrZP8P/RT0bcKc1PkmonX6rZBYa9OB
 Mz5X7TpVRsXtZPtGSNM3IBIubjIVEZ/f3s5CZefN08uV8s+AFBjEAdHmeAiGtT/X
 qakQyvsBJ3mEyVsMyzuI7eu4TU3/5Xad7kSp9TFPnXfW8z09Z4GygyGVJPRqpKRQ
 liFzh8BIVgS/IFcpTL+6wKEHdAHEuyz6u/78ylgCtLMuiNiMY1mYv/+U2f7dEV3u
 yiRY4oHGQfOiw1aXy3EO2WUdSKcAQwIJIEsLOllYQRe3f5W2milflFCJF9RoEEuE
 OLmur4PBwFWpTfLVl1BqGa6rr/nhaY1y7Lyy3mVrmv0QiHlnNM/BQ5UKICZJdx5O
 8Ai4ZyaJ5Q/nQxA6USOBHSlkeexMOH82i7gJCCfPtYqW1l0QjStLcoTYjWXa/0u9
 ULEkdnocNm/HSCIGocFrd6dzOKR8TxJDVh3DxIFo8VjTj/XI57+ePfbZT7J+0vuB
 elhKcho87xKHeF1RQfsVdgh+518GGAXp5zZjAJ3P/6GpxuB9sa+ShEEtR7OzSf0K
 sfkXw3P/tH9ladBxWvMC6Gx0tSUSUTAUeYSbfOC1wRio7iI7sf8Gl8SkU65y4RdE
 ZhQp8M4i2+vt9JS/E/mbAVxKIn1iF7L9ZiWlycJXyuqFf7bv1uBXG+tTE7lM7nJA
 YjSmXBWN5j6kxQeUR0NE
 =U54J
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
 "As promised, here's a (much smaller) second pull request for the
  second week of the merge cycle. This time around we have a couple
  patches shutting off unsupported fs configurations, and a couple of
  cleanups.

  Last, we turn off EXPERIMENTAL for the reverse mapping btree, since
  the primary downstream user of that information (online fsck) is now
  upstream and I haven't seen any major failures in a few kernel
  releases.

  Summary:

   - Print scrub build status in the xfs build info.

   - Explicitly call out the remaining two scenarios where we don't
     support reflink and never have.

   - Remove EXPERIMENTAL tag from reverse mapping btree!"

* tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove experimental tag for reverse mapping
  xfs: don't allow reflink + realtime filesystems
  xfs: don't allow DAX on reflink filesystems
  xfs: add scrub to XFS_BUILD_OPTIONS
  xfs: fix u32 type usage in sb validation function
2018-02-05 13:35:56 -08:00
Linus Torvalds
139351f1f9 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This work from Amir adds NFS export capability to overlayfs. NFS
  exporting an overlay filesystem is a challange because we want to keep
  track of any copy-up of a file or directory between encoding the file
  handle and decoding it.

  This is achieved by indexing copied up objects by lower layer file
  handle. The index is already used for hard links, this patchset
  extends the use to NFS file handle decoding"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (51 commits)
  ovl: check ERR_PTR() return value from ovl_encode_fh()
  ovl: fix regression in fsnotify of overlay merge dir
  ovl: wire up NFS export operations
  ovl: lookup indexed ancestor of lower dir
  ovl: lookup connected ancestor of dir in inode cache
  ovl: hash non-indexed dir by upper inode for NFS export
  ovl: decode pure lower dir file handles
  ovl: decode indexed dir file handles
  ovl: decode lower file handles of unlinked but open files
  ovl: decode indexed non-dir file handles
  ovl: decode lower non-dir file handles
  ovl: encode lower file handles
  ovl: copy up before encoding non-connectable dir file handle
  ovl: encode non-indexed upper file handles
  ovl: decode connected upper dir file handles
  ovl: decode pure upper file handles
  ovl: encode pure upper file handles
  ovl: document NFS export
  vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()
  ovl: store 'has_upper' and 'opaque' as bit flags
  ...
2018-02-05 13:05:20 -08:00
Nikolay Borisov
fd649f10c3 btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
Commit 4fde46f0cc71 ("Btrfs: free the stale device") introduced
btrfs_free_stale_device which iterates the device lists for all
registered btrfs filesystems and deletes those devices which aren't
mounted. In a btrfs_devices structure has only 1 device attached to it
and it is unused then btrfs_free_stale_devices will proceed to also free
the btrfs_fs_devices struct itself. Currently this leads to a use after
free since list_for_each_entry will try to perform a check on the
already freed memory to see if it has to terminate the loop.

The fix is to use 'break' when we know we are freeing the current
fs_devs.

Fixes: 4fde46f0cc71 ("Btrfs: free the stale device")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-05 17:15:14 +01:00
Amir Goldstein
9b6faee074 ovl: check ERR_PTR() return value from ovl_encode_fh()
Another fix for an issue reported by 0-day robot.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 8ed5eec9d6c4 ("ovl: encode pure upper file handles")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-05 09:50:29 +01:00
Amir Goldstein
2aed489d16 ovl: fix regression in fsnotify of overlay merge dir
A re-factoring patch in NFS export series has passed the wrong argument
to ovl_get_inode() causing a regression in the very recent fix to
fsnotify of overlay merge dir.

The regression has caused merge directory inodes to be hashed by upper
instead of lower real inode, when NFS export and directory indexing is
disabled. That caused an inotify watch to become obsolete after directory
copy up and drop caches.

LTP test inotify07 was improved to catch this regression.
The regression also caused multiple redirect dirs to same origin not to
be detected on lookup with NFS export disabled. An xfstest was added to
cover this case.

Fixes: 0aceb53e73be ("ovl: do not pass overlay dentry to ovl_get_inode()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-05 09:50:29 +01:00
Linus Torvalds
3462ac5703 Refactor support for encrypted symlinks to move common code to fscrypt.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlp2R3AACgkQ8vlZVpUN
 gaOIdAgApEdlFR2Gf93z2hMj5HxVL5rjkuPJVtVkKu0eH2HMQJyxNmjymrRfuFmM
 8W1CrEvVKi5Aj6r8q4KHIdVV247Ya0SVEhLwKM0LX4CvlZUXmwgCmZ/MPDTXA1eq
 C4vPVuJAuSNGNVYDlDs3+NiMHINGNVnBVQQFSPBP9P+iNWPD7o486712qaF8maVn
 RbfbQ2rWtOIRdlAOD1U5WqgQku59lOsmHk2pc0+X4LHCZFpMoaO80JVjENPAw+BF
 daRt6TX+WljMyx6DRIaszqau876CJhe/tqlZcCLOkpXZP0jJS13yodp26dVQmjCh
 w8YdiY7uHK2D+S/8eyj7h7DIwzu3vg==
 =ZjQP
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypt updates from Ted Ts'o:
 "Refactor support for encrypted symlinks to move common code to fscrypt"

Ted also points out about the merge:
 "This makes the f2fs symlink code use the fscrypt_encrypt_symlink()
  from the fscrypt tree. This will end up dropping the kzalloc() ->
  f2fs_kzalloc() change, which means the fscrypt-specific allocation
  won't get tested by f2fs's kmalloc error injection system; which is
  fine"

* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits)
  fscrypt: fix build with pre-4.6 gcc versions
  fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
  fscrypt: document symlink length restriction
  fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
  fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
  fscrypt: calculate NUL-padding length in one place only
  fscrypt: move fscrypt_symlink_data to fscrypt_private.h
  fscrypt: remove fscrypt_fname_usr_to_disk()
  ubifs: switch to fscrypt_get_symlink()
  ubifs: switch to fscrypt ->symlink() helper functions
  ubifs: free the encrypted symlink target
  f2fs: switch to fscrypt_get_symlink()
  f2fs: switch to fscrypt ->symlink() helper functions
  ext4: switch to fscrypt_get_symlink()
  ext4: switch to fscrypt ->symlink() helper functions
  fscrypt: new helper function - fscrypt_get_symlink()
  fscrypt: new helper functions for ->symlink()
  fscrypt: trim down fscrypt.h includes
  fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
  fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
  ...
2018-02-04 10:43:12 -08:00
Linus Torvalds
617aebe6a9 Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
 available to be copied to/from userspace in the face of bugs. To further
 restrict what memory is available for copying, this creates a way to
 whitelist specific areas of a given slab cache object for copying to/from
 userspace, allowing much finer granularity of access control. Slab caches
 that are never exposed to userspace can declare no whitelist for their
 objects, thereby keeping them unavailable to userspace via dynamic copy
 operations. (Note, an implicit form of whitelisting is the use of constant
 sizes in usercopy operations and get_user()/put_user(); these bypass all
 hardened usercopy checks since these sizes cannot change at runtime.)
 
 This new check is WARN-by-default, so any mistakes can be found over the
 next several releases without breaking anyone's system.
 
 The series has roughly the following sections:
 - remove %p and improve reporting with offset
 - prepare infrastructure and whitelist kmalloc
 - update VFS subsystem with whitelists
 - update SCSI subsystem with whitelists
 - update network subsystem with whitelists
 - update process memory with whitelists
 - update per-architecture thread_struct with whitelists
 - update KVM with whitelists and fix ioctl bug
 - mark all other allocations as not whitelisted
 - update lkdtm for more sensible test overage
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJabvleAAoJEIly9N/cbcAmO1kQAJnjVPutnLSbnUteZxtsv7W4
 43Cggvokfxr6l08Yh3hUowNxZVKjhF9uwMVgRRg9Nl5WdYCN+vCQbHz+ZdzGJXKq
 cGqdKWgexMKX+aBdNDrK7BphUeD46sH7JWR+a/lDV/BgPxBCm9i5ZZCgXbPP89AZ
 NpLBji7gz49wMsnm/x135xtNlZ3dG0oKETzi7MiR+NtKtUGvoIszSKy5JdPZ4m8q
 9fnXmHqmwM6uQFuzDJPt1o+D1fusTuYnjI7EgyrJRRhQ+BB3qEFZApXnKNDRS9Dm
 uB7jtcwefJCjlZVCf2+PWTOEifH2WFZXLPFlC8f44jK6iRW2Nc+wVRisJ3vSNBG1
 gaRUe/FSge68eyfQj5OFiwM/2099MNkKdZ0fSOjEBeubQpiFChjgWgcOXa5Bhlrr
 C4CIhFV2qg/tOuHDAF+Q5S96oZkaTy5qcEEwhBSW15ySDUaRWFSrtboNt6ZVOhug
 d8JJvDCQWoNu1IQozcbv6xW/Rk7miy8c0INZ4q33YUvIZpH862+vgDWfTJ73Zy9H
 jR/8eG6t3kFHKS1vWdKZzOX1bEcnd02CGElFnFYUEewKoV7ZeeLsYX7zodyUAKyi
 Yp5CImsDbWWTsptBg6h9nt2TseXTxYCt2bbmpJcqzsqSCUwOQNQ4/YpuzLeG0ihc
 JgOmUnQNJWCTwUUw5AS1
 =tzmJ
 -----END PGP SIGNATURE-----

Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardened usercopy whitelisting from Kees Cook:
 "Currently, hardened usercopy performs dynamic bounds checking on slab
  cache objects. This is good, but still leaves a lot of kernel memory
  available to be copied to/from userspace in the face of bugs.

  To further restrict what memory is available for copying, this creates
  a way to whitelist specific areas of a given slab cache object for
  copying to/from userspace, allowing much finer granularity of access
  control.

  Slab caches that are never exposed to userspace can declare no
  whitelist for their objects, thereby keeping them unavailable to
  userspace via dynamic copy operations. (Note, an implicit form of
  whitelisting is the use of constant sizes in usercopy operations and
  get_user()/put_user(); these bypass all hardened usercopy checks since
  these sizes cannot change at runtime.)

  This new check is WARN-by-default, so any mistakes can be found over
  the next several releases without breaking anyone's system.

  The series has roughly the following sections:
   - remove %p and improve reporting with offset
   - prepare infrastructure and whitelist kmalloc
   - update VFS subsystem with whitelists
   - update SCSI subsystem with whitelists
   - update network subsystem with whitelists
   - update process memory with whitelists
   - update per-architecture thread_struct with whitelists
   - update KVM with whitelists and fix ioctl bug
   - mark all other allocations as not whitelisted
   - update lkdtm for more sensible test overage"

* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
  lkdtm: Update usercopy tests for whitelisting
  usercopy: Restrict non-usercopy caches to size 0
  kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
  kvm: whitelist struct kvm_vcpu_arch
  arm: Implement thread_struct whitelist for hardened usercopy
  arm64: Implement thread_struct whitelist for hardened usercopy
  x86: Implement thread_struct whitelist for hardened usercopy
  fork: Provide usercopy whitelisting for task_struct
  fork: Define usercopy region in thread_stack slab caches
  fork: Define usercopy region in mm_struct slab caches
  net: Restrict unwhitelisted proto caches to size 0
  sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
  sctp: Define usercopy region in SCTP proto slab cache
  caif: Define usercopy region in caif proto slab cache
  ip: Define usercopy region in IP proto slab cache
  net: Define usercopy region in struct proto slab cache
  scsi: Define usercopy region in scsi_sense_cache slab cache
  cifs: Define usercopy region in cifs_request slab cache
  vxfs: Define usercopy region in vxfs_inode slab cache
  ufs: Define usercopy region in ufs_inode_cache slab cache
  ...
2018-02-03 16:25:42 -08:00
Linus Torvalds
0771ad44a2 - clean up hardirq header usage (Yang Shi)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJabv25AAoJEIly9N/cbcAmGMUP/jscOunFi8fScVbECFt61pMw
 wTrexpPCfJTPcdfIlT+blOieqg6qygWSwbrBiXft8/BC9kxdvYC8VdHXGkg5BdbB
 y3Bj4vgBzO6meOFaHTWOzTFo0bas0ZpJKXx98s1yXQ9Le7RIRm1BjZWxY/Q/41eF
 e4PeCTf1gTi+JV2Gm6GzkbJfjGD/HReKM2c/JjyLmwSf4X2j+QKS/daJiwdFMG67
 Afz1FsOReECQRGGLEchBue0+cGu9uUk7N6Ppiyt9vbfRirpxBwBl+rEuw4vfY5pt
 dpPBzMaYVodGL9bOzJVgEAOiCcyi3iMy3axDdnS7RuC6RchAa4DsuTAaD3eeabTQ
 EDTbc/WOL7uib4+DN9uoJ09UfMhIoFzSgGSxlywOJWN8TH9giyQ2PL9VF/LLB1gi
 lfpV/CLpuOPSvE6zPEEDg3HEoQP0+A/LPNaZR8j1g+I4gkZXxifp+tUJ2rjSCxPS
 a5LP4VJXv2rMR4OVy29L65LUvpFd8EmhD31mVr8agZ0isnefY8nFod7KTH2aiTfz
 vn05VQQpqSt+8MPftqv5vW7d/wYmsFIsTpjtFjrl+TQzcKKRjf4hXyyTwZVjtxOw
 pXpR1pUNkkAIc8AkOdTxyOXEYaIGke8gnFqMvqrRiri82uXscTYS57Ac8oQY9ry9
 dOrg7uPz6WetdgOYra+4
 =C8SU
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore update from Kees Cook:
 "Only a header cleanup this release; nice and quiet. :)

   - clean up hardirq header usage (Yang Shi)"

* tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  fs: pstore: remove unused hardirq.h
2018-02-03 13:55:01 -08:00
Linus Torvalds
23aedc4b9b Only miscellaneous cleanups and bug fixes for ext4 this cycle.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlp16xMACgkQ8vlZVpUN
 gaP1IAf8C48AKVnqy6ftFphzV1CdeGHDwJLL63lChs97fNr1mxo5TZE/6vdYB55j
 k7C7huQ582cEiGWQJ0U4/+En0hF85zkAk5mTfnSao5BqxLr9ANsAocwBUNBXdFSp
 B7IyMo4Dct7NCkwfmKLPRcEqZ49vwyv99TqM/9wUkgUStkTjPT7bhHgarB6VPbhp
 BxoXVnFYgU0sZN0y71IBt8ngWqCK6j7fjw3gsl37oEenG3/h3SO0H9ih1FrysX8S
 VOwwLJq6vfAgEwQvZACnBwWKDYsZpH7akNp9WGeDMByo28t514RNRjIi0mvLHEZa
 h72I8Sb3bwHO9MJNvHFe/0b1Say4vw==
 =dxAX
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Only miscellaneous cleanups and bug fixes for ext4 this cycle"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: create ext4_kset dynamically
  ext4: create ext4_feat kobject dynamically
  ext4: release kobject/kset even when init/register fail
  ext4: fix incorrect indentation of if statement
  ext4: correct documentation for grpid mount option
  ext4: use 'sbi' instead of 'EXT4_SB(sb)'
  ext4: save error to disk in __ext4_grp_locked_error()
  jbd2: fix sphinx kernel-doc build warnings
  ext4: fix a race in the ext4 shutdown path
  mbcache: make sure c_entry_count is not decremented past zero
  ext4: no need flush workqueue before destroying it
  ext4: fixed alignment and minor code cleanup in ext4.h
  ext4: fix ENOSPC handling in DAX page fault handler
  dax: pass detailed error code from dax_iomap_fault()
  mbcache: revert "fs/mbcache.c: make count_objects() more robust"
  mbcache: initialize entry->e_referenced in mb_cache_entry_create()
  ext4: fix up remaining files with SPDX cleanups
2018-02-03 13:49:22 -08:00
Linus Torvalds
6ec4de89b4 Andreas Gruenbacher wrote two additional patches that we would like
merged in this time. Both are regressions:
 
 1. The first fixes another kernel build dependency problem.
 2. The second fixes a performance regression in glock dumps.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJadIS1AAoJENeLYdPf93o7i24H/3orp2uf/0EQFRB3WF7vxuhB
 aFyymb35V5+pkoSOqBRpV8plQR3oNxeQX1uo+a08n5UzW7VHQBApS5m5to5w03dI
 MRZvDUs84weKwjUm+ndhqOgjoUZuTIQ6+/A6bRDu+24AftqwNE5vHrTBvDdZ94zN
 WxCy847aHd21TQ7nKIsLVp7wlllmRuxp1D+VEc7Vmn18eNrGp4TDavP5lq/4YR92
 Zsj1AfhJK1GuAY9AJGMT3ZiFL6Mdg9oj7qSyJ2HjT7q/QJE+odwI8uUPs4HKpiko
 VPBPhTrfgDE2nD4gAYIR41Aog8s8JnLgGK+0P7CqVxB37rq89BSYvApaHQE8yTg=
 =4Ha2
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 fixes from Bob Peterson:
 "Andreas Gruenbacher wrote two additional patches that we would like
  merged in this time. Both are regressions:

   - fix another kernel build dependency problem

   - fix a performance regression in glock dumps"

* tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Glock dump performance regression fix
  gfs2: Fix the crc32c dependency
2018-02-03 13:14:41 -08:00
Ross Zwisler
ee95f4059a Merge branch 'for-4.16/nfit' into libnvdimm-for-next 2018-02-03 00:26:26 -07:00
Filipe Manana
627e08738e Btrfs: fix null pointer dereference when replacing missing device
When we are replacing a missing device we mount the filesystem with the
degraded mode option in which case we are allowed to have a btrfs device
structure without a backing device member (its bdev member is NULL) and
therefore we can't dereference that member. Commit 38b5f68e9811
("btrfs: drop btrfs_device::can_discard to query directly") started to
dereference that member when discarding extents, resulting in a null
pointer dereference:

 [ 3145.322257] BTRFS warning (device sdf): devid 2 uuid 4d922414-58eb-4880-8fed-9c3840f6c5d5 is missing
 [ 3145.364116] BTRFS info (device sdf): dev_replace from <missing disk> (devid 2) to /dev/sdg started
 [ 3145.413489] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
 [ 3145.415085] IP: btrfs_discard_extent+0x6a/0xf8 [btrfs]
 [ 3145.415085] PGD 0 P4D 0
 [ 3145.415085] Oops: 0000 [#1] PREEMPT SMP PTI
 [ 3145.415085] Modules linked in: ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse parport_pc serio_raw i2c_piix4 i2
 [ 3145.415085] CPU: 0 PID: 11989 Comm: btrfs Tainted: G        W        4.15.0-rc9-btrfs-next-55+ #1
 [ 3145.415085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
 [ 3145.415085] RIP: 0010:btrfs_discard_extent+0x6a/0xf8 [btrfs]
 [ 3145.415085] RSP: 0018:ffffc90004813c60 EFLAGS: 00010293
 [ 3145.415085] RAX: ffff88020d39cc00 RBX: ffff88020c4ea2a0 RCX: 0000000000000002
 [ 3145.415085] RDX: 0000000000000000 RSI: ffff88020c4ea240 RDI: 0000000000000000
 [ 3145.415085] RBP: 0000000000000000 R08: 0000000000004000 R09: 0000000000000000
 [ 3145.415085] R10: ffffc90004813ae8 R11: 0000000000000000 R12: 0000000000000000
 [ 3145.415085] R13: ffff88020c418000 R14: 0000000000000000 R15: 0000000000000000
 [ 3145.415085] FS:  00007f565681f8c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
 [ 3145.415085] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 3145.415085] CR2: 00000000000000e0 CR3: 000000020d208006 CR4: 00000000001606f0
 [ 3145.415085] Call Trace:
 [ 3145.415085]  btrfs_finish_extent_commit+0x9a/0x1be [btrfs]
 [ 3145.415085]  btrfs_commit_transaction+0x649/0x7a0 [btrfs]
 [ 3145.415085]  ? start_transaction+0x2b0/0x3b3 [btrfs]
 [ 3145.415085]  btrfs_dev_replace_start+0x274/0x30c [btrfs]
 [ 3145.415085]  btrfs_dev_replace_by_ioctl+0x45/0x59 [btrfs]
 [ 3145.415085]  btrfs_ioctl+0x1a91/0x1d62 [btrfs]
 [ 3145.415085]  ? lock_acquire+0x16a/0x1af
 [ 3145.415085]  ? vfs_ioctl+0x1b/0x28
 [ 3145.415085]  ? trace_hardirqs_on_caller+0x14c/0x1a6
 [ 3145.415085]  vfs_ioctl+0x1b/0x28
 [ 3145.415085]  do_vfs_ioctl+0x5a9/0x5e0
 [ 3145.415085]  ? _raw_spin_unlock_irq+0x34/0x46
 [ 3145.415085]  ? entry_SYSCALL_64_fastpath+0x5/0x8b
 [ 3145.415085]  ? trace_hardirqs_on_caller+0x14c/0x1a6
 [ 3145.415085]  SyS_ioctl+0x52/0x76
 [ 3145.415085]  entry_SYSCALL_64_fastpath+0x1e/0x8b
 [ 3145.415085] RIP: 0033:0x7f56558b3c47
 [ 3145.415085] RSP: 002b:00007ffdcfac4c58 EFLAGS: 00000202
 [ 3145.415085] Code: be 02 00 00 00 4c 89 ef e8 b9 e7 03 00 85 c0 89 c5 75 75 48 8b 44 24 08 45 31 f6 48 8d 58 60 eb 52 48 8b 03 48 8b b8 a0 00 00 00 <48> 8b 87 e0 00
 [ 3145.415085] RIP: btrfs_discard_extent+0x6a/0xf8 [btrfs] RSP: ffffc90004813c60
 [ 3145.415085] CR2: 00000000000000e0
 [ 3145.458185] ---[ end trace 06302e7ac31902bf ]---

This is trivially reproduced by running the test btrfs/027 from fstests
like this:

  $ MOUNT_OPTIONS="-o discard" ./check btrfs/027

Fix this by skipping devices without a backing device before attempting
to discard.

Fixes: 38b5f68e9811 ("btrfs: drop btrfs_device::can_discard to query directly")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:25:44 +01:00
Zygo Blaxell
c8195a7b1a btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
Until v4.14, this warning was very infrequent:

	WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 find_parent_nodes+0xc41/0x14e0
	Modules linked in: [...]
	CPU: 3 PID: 18172 Comm: bees Tainted: G      D W    L  4.11.9-zb64+ #1
	Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101    12/02/2014
	Call Trace:
	 dump_stack+0x85/0xc2
	 __warn+0xd1/0xf0
	 warn_slowpath_null+0x1d/0x20
	 find_parent_nodes+0xc41/0x14e0
	 __btrfs_find_all_roots+0xad/0x120
	 ? extent_same_check_offsets+0x70/0x70
	 iterate_extent_inodes+0x168/0x300
	 iterate_inodes_from_logical+0x87/0xb0
	 ? iterate_inodes_from_logical+0x87/0xb0
	 ? extent_same_check_offsets+0x70/0x70
	 btrfs_ioctl+0x8ac/0x2820
	 ? lock_acquire+0xc2/0x200
	 do_vfs_ioctl+0x91/0x700
	 ? __fget+0x112/0x200
	 SyS_ioctl+0x79/0x90
	 entry_SYSCALL_64_fastpath+0x23/0xc6
	 ? trace_hardirqs_off_caller+0x1f/0x140

Starting with v4.14 (specifically 86d5f9944252 ("btrfs: convert prelimary
reference tracking to use rbtrees")) the WARN_ON occurs three orders of
magnitude more frequently--almost once per second while running workloads
like bees.

Replace the WARN_ON() with a comment rationale for its removal.
The rationale is paraphrased from an explanation by Edmund Nadolski
<enadolski@suse.de> on the linux-btrfs mailing list.

Fixes: 8da6d5815c59 ("Btrfs: added btrfs_find_all_roots()")
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:25:33 +01:00
Nikolay Borisov
952bd3db0d btrfs: Ignore errors from btrfs_qgroup_trace_extent_post
Running generic/019 with qgroups on the scratch device enabled is almost
guaranteed to trigger the BUG_ON in btrfs_free_tree_block. It's supposed
to trigger only on -ENOMEM, in reality, however, it's possible to get
-EIO from btrfs_qgroup_trace_extent_post. This function just finds the
roots of the extent being tracked and sets the qrecord->old_roots list.
If this operation fails nothing critical happens except the quota
accounting can be considered wrong. In such case just set the
INCONSISTENT flag for the quota and print a warning, rather than killing
off the system. Additionally, it's possible to trigger a BUG_ON in
btrfs_truncate_inode_items as well.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ error message adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:25:14 +01:00
Liu Bo
900c998168 Btrfs: fix unexpected -EEXIST when creating new inode
The highest objectid, which is assigned to new inode, is decided at
the time of initializing fs roots.  However, in cases where log replay
gets processed, the btree which fs root owns might be changed, so we
have to search it again for the highest objectid, otherwise creating
new inode would end up with -EEXIST.

cc: <stable@vger.kernel.org> v4.4-rc6+
Fixes: f32e48e92596 ("Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:24:53 +01:00
Liu Bo
1a932ef4e4 Btrfs: fix use-after-free on root->orphan_block_rsv
I got these from running generic/475,

WARNING: CPU: 0 PID: 26384 at fs/btrfs/inode.c:3326 btrfs_orphan_commit_root+0x1ac/0x2b0 [btrfs]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: btrfs_block_rsv_release+0x1c/0x70 [btrfs]
Call Trace:
  btrfs_orphan_release_metadata+0x9f/0x200 [btrfs]
  btrfs_orphan_del+0x10d/0x170 [btrfs]
  btrfs_setattr+0x500/0x640 [btrfs]
  notify_change+0x7ae/0x870
  do_truncate+0xca/0x130
  vfs_truncate+0x2ee/0x3d0
  do_sys_truncate+0xaf/0xf0
  SyS_truncate+0xe/0x10
  entry_SYSCALL_64_fastpath+0x1f/0x96

The race is between btrfs_orphan_commit_root and btrfs_orphan_del,
        t1                                        t2
btrfs_orphan_commit_root                     btrfs_orphan_del
   spin_lock
   check (&root->orphan_inodes)
   root->orphan_block_rsv = NULL;
   spin_unlock
                                             atomic_dec(&root->orphan_inodes);
                                             access root->orphan_block_rsv

Accessing root->orphan_block_rsv must be done before decreasing
root->orphan_inodes.

cc: <stable@vger.kernel.org> v3.12+
Fixes: 703c88e03524 ("Btrfs: fix tracking of orphan inode count")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:24:40 +01:00
Liu Bo
e8f1bc1493 Btrfs: fix btrfs_evict_inode to handle abnormal inodes correctly
This regression is introduced in
commit 3d48d9810de4 ("btrfs: Handle uninitialised inode eviction").

There are two problems,

a) it is ->destroy_inode() that does the final free on inode, not
   ->evict_inode(),
b) clear_inode() must be called before ->evict_inode() returns.

This could end up hitting BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
in evict() because I_CLEAR is set in clear_inode().

Fixes: commit 3d48d9810de4 ("btrfs: Handle uninitialised inode eviction")
Cc: <stable@vger.kernel.org> # v4.7-rc6+
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:24:35 +01:00
Liu Bo
55237a5f24 Btrfs: fix extent state leak from tree log
It's possible that btrfs_sync_log() bails out after one of the two
btrfs_write_marked_extents() which convert extent state's state bit into
EXTENT_NEED_WAIT from EXTENT_DIRTY/EXTENT_NEW, however only EXTENT_DIRTY
and EXTENT_NEW are searched by free_log_tree() so that those extent states
with EXTENT_NEED_WAIT lead to memory leak.

cc: <stable@vger.kernel.org>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:24:30 +01:00
Liu Bo
1846430c24 Btrfs: fix crash due to not cleaning up tree log block's dirty bits
In cases that the whole fs flips into readonly status due to failures in
critical sections, then log tree's blocks are still dirty, and this leads
to a crash during umount time, the crash is about use-after-free,

umount
 -> close_ctree
    -> stop workers
    -> iput(btree_inode)
       -> iput_final
          -> write_inode_now
	     -> ...
	       -> queue job on stop'd workers

cc: <stable@vger.kernel.org> v3.12+
Fixes: 681ae50917df ("Btrfs: cleanup reserved space when freeing tree log on error")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:24:24 +01:00
Liu Bo
e89166990f Btrfs: fix deadlock in run_delalloc_nocow
@cur_offset is not set back to what it should be (@cow_start) if
btrfs_next_leaf() returns something wrong, and the range [cow_start,
cur_offset) remains locked forever.

cc: <stable@vger.kernel.org>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02 16:24:19 +01:00
Darrick J. Wong
76883f7988 xfs: remove experimental tag for reverse mapping
Reverse mapping has had a while to soak, so remove the experimental tag.
Now that we've landed space metadata cross-referencing in scrub, the
feature actually has a purpose.

Reject rmap filesystems with an rt device until the code to support it
is actually implemented.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-02-01 21:07:26 -08:00
Darrick J. Wong
c14632ddac xfs: don't allow reflink + realtime filesystems
We don't support realtime filesystems with reflink either, so fail
those mounts.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-02-01 21:06:16 -08:00
Darrick J. Wong
b6e03c10bf xfs: don't allow DAX on reflink filesystems
Now that reflink is no longer experimental, reject attempts to mount
with DAX until that whole mess gets sorted out.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-02-01 21:06:15 -08:00
Eric Sandeen
494370ccaa xfs: add scrub to XFS_BUILD_OPTIONS
Advertise this config option along with the others.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-02-01 21:06:15 -08:00
Linus Torvalds
ab486bc9a5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:

 - Add a console_msg_format command line option:

     The value "default" keeps the old "[time stamp] text\n" format. The
     value "syslog" allows to see the syslog-like "<log
     level>[timestamp] text" format.

     This feature was requested by people doing regression tests, for
     example, 0day robot. They want to have both filtered and full logs
     at hands.

 - Reduce the risk of softlockup:

     Pass the console owner in a busy loop.

     This is a new approach to the old problem. It was first proposed by
     Steven Rostedt on Kernel Summit 2017. It marks a context in which
     the console_lock owner calls console drivers and could not sleep.
     On the other side, printk() callers could detect this state and use
     a busy wait instead of a simple console_trylock(). Finally, the
     console_lock owner checks if there is a busy waiter at the end of
     the special context and eventually passes the console_lock to the
     waiter.

     The hand-off works surprisingly well and helps in many situations.
     Well, there is still a possibility of the softlockup, for example,
     when the flood of messages stops and the last owner still has too
     much to flush.

     There is increasing number of people having problems with
     printk-related softlockups. We might eventually need to get better
     solution. Anyway, this looks like a good start and promising
     direction.

 - Do not allow to schedule in console_unlock() called from printk():

     This reverts an older controversial commit. The reschedule helped
     to avoid softlockups. But it also slowed down the console output.
     This patch is obsoleted by the new console waiter logic described
     above. In fact, the reschedule made the hand-off less effective.

 - Deprecate "%pf" and "%pF" format specifier:

     It was needed on ia64, ppc64 and parisc64 to dereference function
     descriptors and show the real function address. It is done
     transparently by "%ps" and "pS" format specifier now.

     Sergey Senozhatsky found that all the function descriptors were in
     a special elf section and could be easily detected.

 - Remove printk_symbol() API:

     It has been obsoleted by "%pS" format specifier, and this change
     helped to remove few continuous lines and a less intuitive old API.

 - Remove redundant memsets:

     Sergey removed unnecessary memset when processing printk.devkmsg
     command line option.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
  printk: drop redundant devkmsg_log_str memsets
  printk: Never set console_may_schedule in console_trylock()
  printk: Hide console waiter logic into helpers
  printk: Add console owner and waiter logic to load balance console writes
  kallsyms: remove print_symbol() function
  checkpatch: add pF/pf deprecation warning
  symbol lookup: introduce dereference_symbol_descriptor()
  parisc64: Add .opd based function descriptor dereference
  powerpc64: Add .opd based function descriptor dereference
  ia64: Add .opd based function descriptor dereference
  sections: split dereference_function_descriptor()
  openrisc: Fix conflicting types for _exext and _stext
  lib: do not use print_symbol()
  irq debug: do not use print_symbol()
  sysfs: do not use print_symbol()
  drivers: do not use print_symbol()
  x86: do not use print_symbol()
  unicore32: do not use print_symbol()
  sh: do not use print_symbol()
  mn10300: do not use print_symbol()
  ...
2018-02-01 13:36:15 -08:00
Al Viro
d85e2aa2e3 annotate ep_scan_ready_list()
make it always return __poll_t and have its callbacks do the same

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-01 16:30:06 -05:00
Al Viro
d7ebbe46f4 ep_send_events_proc(): return result via esed->res
preparations for not mixing __poll_t and int in ep_scan_ready_list()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-01 16:29:49 -05:00
Al Viro
cfe39442ab use linux/poll.h instead of asm/poll.h
The only place that has any business including asm/poll.h
is linux/poll.h.  Fortunately, asm/poll.h had only been
included in 3 places beyond that one, and all of them
are trivial to switch to using linux/poll.h.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-01 16:23:11 -05:00
Linus Torvalds
8e44e6600c Merge branch 'KASAN-read_word_at_a_time'
Merge KASAN word-at-a-time fixups from Andrey Ryabinin.

The word-at-a-time optimizations have caused headaches for KASAN, since
the whole point is that we access byte streams in bigger chunks, and
KASAN can be unhappy about the potential extra access at the end of the
string.

We used to have a horrible hack in dcache, and then people got
complaints from the strscpy() case.  This fixes it all up properly, by
adding an explicit helper for the "access byte stream one word at a
time" case.

* emailed patches from Andrey Ryabinin <aryabinin@virtuozzo.com>:
  fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports"
  fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()
  lib/strscpy: Shut up KASAN false-positives in strscpy()
  compiler.h: Add read_word_at_a_time() function.
  compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
2018-02-01 12:20:53 -08:00
Andrey Ryabinin
babcbbc7c4 fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports"
This reverts commit df4c0e36f1b1782b0611a77c52cc240e5c4752dd.

It's no longer needed since dentry_string_cmp() now uses
read_word_at_a_time() to avoid kasan's reports.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 12:20:21 -08:00
Andrey Ryabinin
bfe7aa6c39 fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()
dentry_string_cmp() performs the word-at-a-time reads from 'cs' and may
read slightly more than it was requested in kmallac().  Normally this
would make KASAN to report out-of-bounds access, but this was
workarounded by commit df4c0e36f1b1 ("fs: dcache: manually unpoison
dname after allocation to shut up kasan's reports").

This workaround is not perfect, since it allows out-of-bounds access to
dentry's name for all the code, not just in dentry_string_cmp().

So it would be better to use read_word_at_a_time() instead and revert
commit df4c0e36f1b1.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01 12:20:21 -08:00
Andreas Gruenbacher
7ac07fdaf8 gfs2: Glock dump performance regression fix
Restore an optimization removed in commit 7f19449553 "Fix debugfs glocks
dump": keep the glock hash table iterator active while the glock dump
file is held open.  This avoids having to rescan the hash table from the
start for each read, with quadratically rising runtime.

In addition, use rhastable_walk_peek for resuming a glock dump at the
current position: when a glock doesn't fit in the provided buffer
anymore, the next read must revisit the same glock.

Finally, also restart the dump from the first entry when we notice that
the hash table has been resized in gfs2_glock_seq_start.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-01 11:27:11 -07:00
Andreas Gruenbacher
dcb2cd55cf gfs2: Fix the crc32c dependency
Depend on LIBCRC32C which uses the crypto API to select the appropriate
crc32c implementation.  With the CRYPTO and CRYPTO_CRC32C dependencies,
gfs2 would still need to use the crypto API directly like ext4 and btrfs
do, which isn't necessary.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-01 11:25:31 -07:00
Linus Torvalds
47fcc0360c Driver Core updates for 4.16-rc1
Here is the set of "big" driver core patches for 4.16-rc1.
 
 The majority of the work here is in the firmware subsystem, with reworks
 to try to attempt to make the code easier to handle in the long run, but
 no functional change.  There's also some tree-wide sysfs attribute
 fixups with lots of acks from the various subsystem maintainers, as well
 as a handful of other normal fixes and changes.
 
 And finally, some license cleanups for the driver core and sysfs code.
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLvPw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynNzACgkzjPoBytJWbpWFt6SR6L33/u4kEAnRFvVCGL
 s6ygQPQhZIjKk2Lxa2hC
 =Zihy
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of "big" driver core patches for 4.16-rc1.

  The majority of the work here is in the firmware subsystem, with
  reworks to try to attempt to make the code easier to handle in the
  long run, but no functional change. There's also some tree-wide sysfs
  attribute fixups with lots of acks from the various subsystem
  maintainers, as well as a handful of other normal fixes and changes.

  And finally, some license cleanups for the driver core and sysfs code.

  All have been in linux-next for a while with no reported issues"

* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
  device property: Define type of PROPERTY_ENRTY_*() macros
  device property: Reuse property_entry_free_data()
  device property: Move property_entry_free_data() upper
  firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
  firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
  USB: serial: keyspan: Drop firmware Kconfig options
  sysfs: remove DEBUG defines
  sysfs: use SPDX identifiers
  drivers: base: add coredump driver ops
  sysfs: add attribute specification for /sysfs/devices/.../coredump
  test_firmware: fix missing unlock on error in config_num_requests_store()
  test_firmware: make local symbol test_fw_config static
  sysfs: turn WARN() into pr_warn()
  firmware: Fix a typo in fallback-mechanisms.rst
  treewide: Use DEVICE_ATTR_WO
  treewide: Use DEVICE_ATTR_RO
  treewide: Use DEVICE_ATTR_RW
  sysfs.h: Use octal permissions
  component: add debugfs support
  bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
  ...
2018-02-01 10:00:28 -08:00
Linus Torvalds
5d8515bc23 Staging/IIO patches for 4.16-rc1
Here is the big Staging and IIO driver patches for 4.16-rc1.
 
 There is the normal amount of new IIO drivers added, like all releases.
 
 The networking IPX and the ncpfs filesystem are moved into the staging
 tree, as they are on their way out of the kernel due to lack of use
 anymore.
 
 The visorbus subsystem finall has started moving out of the staging tree
 to the "real" part of the kernel, and the most and fsl-mc codebases are
 almost ready to move out, that will probably happen for 4.17-rc1 if all
 goes well.
 
 Other than that, there is a bunch of license header cleanups in the
 tree, along with the normal amount of coding style churn that we all
 know and love for this codebase.  I also got frustrated at the
 Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
 huge chunks of it that were never even being used.
 
 Full details of everything is in the shortlog.
 
 All of these patches have been in linux-next for a while with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLxoA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk4vgCgjeMlwhtar65DIticIRj626EFxiQAnjGmH8Kd
 d9Xz2Piq8X47uSsC/6AE
 =xxMT
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO updates from Greg KH:
 "Here is the big Staging and IIO driver patches for 4.16-rc1.

  There is the normal amount of new IIO drivers added, like all
  releases.

  The networking IPX and the ncpfs filesystem are moved into the staging
  tree, as they are on their way out of the kernel due to lack of use
  anymore.

  The visorbus subsystem finall has started moving out of the staging
  tree to the "real" part of the kernel, and the most and fsl-mc
  codebases are almost ready to move out, that will probably happen for
  4.17-rc1 if all goes well.

  Other than that, there is a bunch of license header cleanups in the
  tree, along with the normal amount of coding style churn that we all
  know and love for this codebase. I also got frustrated at the
  Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
  huge chunks of it that were never even being used.

  Full details of everything is in the shortlog.

  All of these patches have been in linux-next for a while with no
  reported issues"

* tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits)
  staging: rtlwifi: remove redundant initialization of 'cfg_cmd'
  staging: rtl8723bs: remove a couple of redundant initializations
  staging: comedi: reformat lines to 80 chars or less
  staging: lustre: separate a connection destroy from free struct kib_conn
  Staging: rtl8723bs: Use !x instead of NULL comparison
  Staging: rtl8723bs: Remove dead code
  Staging: rtl8723bs: Change names to conform to the kernel code
  staging: ccree: Fix missing blank line after declaration
  staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd'
  staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST
  staging: fbtft: remove unused FB_TFT_SSD1325 kconfig
  staging: comedi: dt2811: remove redundant initialization of 'ns'
  staging: wilc1000: fix alignments to match open parenthesis
  staging: wilc1000: removed unnecessary defined enums typedef
  staging: wilc1000: remove unnecessary use of parentheses
  staging: rtl8192u: remove redundant initialization of 'timeout'
  staging: sm750fb: fix CamelCase for dispSet var
  staging: lustre: lnet/selftest: fix compile error on UP build
  staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons
  staging: rts5208: Fix "seg_no" calculation in reset_ms_card()
  ...
2018-02-01 09:51:57 -08:00
Eric Biggers
0b1dfa4cc6 fscrypt: fix build with pre-4.6 gcc versions
gcc versions prior to 4.6 require an extra level of braces when using a
designated initializer for a member in an anonymous struct or union.
This caused a compile error with the 'struct qstr' initialization in
__fscrypt_encrypt_symlink().

Fix it by using QSTR_INIT().

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Fixes: 76e81d6d5048 ("fscrypt: new helper functions for ->symlink()")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-02-01 10:51:18 -05:00
Goffredo Baroncelli
c472c07bfe iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}
The function inode_cmp_iversion{+raw} is counter-intuitive, because it
returns true when the counters are different and false when these are equal.

Rename it to inode_eq_iversion{+raw}, which will returns true when
the counters are equal and false otherwise.

Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-02-01 08:15:25 -05:00
Darrick J. Wong
131fa58d39 xfs: fix u32 type usage in sb validation function
Don't use u32, use uint32_t, because this won't work in xfsprogs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-01-31 20:39:20 -08:00
Linus Torvalds
255442c938 Documentation updates for 4.16. New stuff includes refcount_t
documentation, errseq documentation, kernel-doc support for nested
 structure definitions, the removal of lots of crufty kernel-doc support for
 unused formats, SPDX tag documentation, the beginnings of a manual for
 subsystem maintainers, and lots of fixes and updates.
 
 As usual, some of the changesets reach outside of Documentation/ to effect
 kerneldoc comment fixes.  It also adds the new LICENSES directory, of which
 Thomas promises I do not need to be the maintainer.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJab11TAAoJEI3ONVYwIuV6i1UP/1LgGPHW9Ygq5qaLFbReZd/u
 Mx/orrhHX0PdkbCCE+CbL8Vm1m4UKFDTBdlpk3s542zxeeG0ZBXuTnvq4Kyk+cTN
 p4/vsIEzk/Ih13/glGE5MlV+EjiEK+8hK69TIUj7bAyuHmpzofjRz9/1M6RLDGDC
 HY6UI58AXG0yOQWMWCGRMYpQAFUGij2equ7Doe1ugXRq14dx7V4RsOhI140iRk7t
 bquAq1rS2fXniiuPFmLBUe4dWW28isVa/Vl/aXcaWQDKMyT0OLhjOMW36wWKqtPi
 WdVCpHv1NLZNyZZr9S3kvfOwW+BUqpEzfVwssyBLW4h0tsnIx0U0HVhSTY8/TvFZ
 QD9yCSana4LB/e5CHXIX5lBHbjHxf+rETXqVV4MgwDaMvM3mCo4X6WUTJDmZADo6
 vQISEKeb4su5uWAbc9T9xwRSLhZnFVdJ/QuYdNQ5+EpFJYLhzQ9eBvEz6JstSIXL
 p9ASBiPNY3ulpVZ8q0JOHJRBhq5mHJH6Dy8achzbILy2l/ZI4b8lJ53mw9II04cp
 puF96E6HpvuZ8Tgjjrg9U3ZdxXNrUgc/tjk2ZDkyTglk1XF2jKSq2tiNSZ3oLrJm
 XqJPnpCeyJM5UDvwkIBzgC41WEHwe8uvoNbUnc4X7UJSZegFzcSLQXf5qaprHS5k
 XeQ7sbd+S+jzVVjFi0W5
 =Z15Z
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.16' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Documentation updates for 4.16.

  New stuff includes refcount_t documentation, errseq documentation,
  kernel-doc support for nested structure definitions, the removal of
  lots of crufty kernel-doc support for unused formats, SPDX tag
  documentation, the beginnings of a manual for subsystem maintainers,
  and lots of fixes and updates.

  As usual, some of the changesets reach outside of Documentation/ to
  effect kerneldoc comment fixes. It also adds the new LICENSES
  directory, of which Thomas promises I do not need to be the
  maintainer"

* tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits)
  linux-next: docs-rst: Fix typos in kfigure.py
  linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt
  Documentation: Fix misconversion of #if
  docs: add index entry for networking/msg_zerocopy
  Documentation: security/credentials.rst: explain need to sort group_list
  LICENSES: Add MPL-1.1 license
  LICENSES: Add the GPL 1.0 license
  LICENSES: Add Linux syscall note exception
  LICENSES: Add the MIT license
  LICENSES: Add the BSD-3-clause "Clear" license
  LICENSES: Add the BSD 3-clause "New" or "Revised" License
  LICENSES: Add the BSD 2-clause "Simplified" license
  LICENSES: Add the LGPL-2.1 license
  LICENSES: Add the LGPL 2.0 license
  LICENSES: Add the GPL 2.0 license
  Documentation: Add license-rules.rst to describe how to properly identify file licenses
  scripts: kernel_doc: better handle show warnings logic
  fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at
  doc: md: Fix a file name to md-fault.c in fault-injection.txt
  errseq: Add to documentation tree
  ...
2018-01-31 19:25:25 -08:00
Linus Torvalds
dc1efc3cfa Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache updates from Al Viro:
 "Neil Brown's d_move()/d_path() race fix"

* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: close race between getcwd() and d_move()
2018-01-31 19:15:23 -08:00
Linus Torvalds
73da9e1a9f Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - misc fixes

 - ocfs2 updates

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  mm: remove PG_highmem description
  tools, vm: new option to specify kpageflags file
  mm/swap.c: make functions and their kernel-doc agree
  mm, memory_hotplug: fix memmap initialization
  mm: correct comments regarding do_fault_around()
  mm: numa: do not trap faults on shared data section pages.
  hugetlb, mbind: fall back to default policy if vma is NULL
  hugetlb, mempolicy: fix the mbind hugetlb migration
  mm, hugetlb: further simplify hugetlb allocation API
  mm, hugetlb: get rid of surplus page accounting tricks
  mm, hugetlb: do not rely on overcommit limit during migration
  mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
  mm, hugetlb: unify core page allocation accounting and initialization
  mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
  mm/memcontrol.c: make local symbol static
  mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
  include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
  mm/compaction.c: fix comment for try_to_compact_pages()
  mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
  zsmalloc: use U suffix for negative literals being shifted
  ...
2018-01-31 18:46:22 -08:00
Eric Biggers
284cd241a1 userfaultfd: convert to use anon_inode_getfd()
Nothing actually calls userfaultfd_file_create() besides the
userfaultfd() system call itself.  So simplify things by folding it into
the system call and using anon_inode_getfd() instead of
anon_inode_getfile().  Do the same in resolve_userfault_fork() as well.

This removes over 50 lines with no change in functionality.

Link: http://lkml.kernel.org/r/20171229212403.22800-1-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:39 -08:00
Marc-André Lureau
ff62a34210 hugetlb: implement memfd sealing
Implements memfd sealing, similar to shmem:
 - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
   memfd_add_seals(). write() doesn't exist for hugetlbfs.
 - SHRINK: added similar check as shmem_setattr()
 - GROW: added similar check as shmem_setattr() & shmem_fallocate()

Except write() operation that doesn't exist with hugetlbfs, that should
make sealing as close as it can be to shmem support.

Link: http://lkml.kernel.org/r/20171107122800.25517-5-marcandre.lureau@redhat.com
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:39 -08:00
Marc-André Lureau
da14c1e524 hugetlb: expose hugetlbfs_inode_info in header
hugetlbfs inode information will need to be accessed by code in
mm/shmem.c for file sealing operations.  Move inode information
definition from .c file to header for needed access.

Link: http://lkml.kernel.org/r/20171107122800.25517-4-marcandre.lureau@redhat.com
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:39 -08:00