69521 Commits

Author SHA1 Message Date
Jens Axboe
728f13e730 io-wq: remove nr_process accounting
We're now just using fork like we would from userspace, so there's no
need to try and impose extra restrictions or accounting on the user
side of things. That's already being done for us. That also means we
don't have to pass in the user_struct anymore, that's correctly inherited
through ->creds on fork.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 20:33:26 -07:00
Jens Axboe
1c0aa1fae1 io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
A few reasons to do this:

- The naming of the manager and worker have changed. That's a user visible
  change, so makes sense to flag it.

- Opening certain files that use ->signal (like /proc/self or /dev/tty)
  now works, and the flag tells the application upfront that this is the
  case.

- Related to the above, using signalfd will now work as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 20:32:11 -07:00
Jens Axboe
2587890b5e Revert "proc: don't allow async path resolution of /proc/self components"
This reverts commit 8d4c3e76e3be11a64df95ddee52e99092d42fc19.

No longer needed, as the io-wq worker threads have the right identity.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 20:32:11 -07:00
Jens Axboe
9e8d9e829c Revert "proc: don't allow async path resolution of /proc/thread-self components"
This reverts commit 0d4370cfe36b7f1719123b621a4ec4d9c7a25f89.

No longer needed, as the io-wq worker threads have the right identity.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 20:32:11 -07:00
Pavel Begunkov
e5547d2c5e io_uring: fix locked_free_list caches_free()
Don't forget to zero locked_free_nr, it's not a disaster but makes it
attempting to flush it with extra locking when there is nothing in the
list. Also, don't traverse a potentially long list freeing requests
under spinlock, splice the list and do it afterwards.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 19:18:54 -07:00
Jens Axboe
7c977a58dc io_uring: don't attempt IO reissue from the ring exit path
If we're exiting the ring, just let the IO fail with -EAGAIN as nobody
will care anyway. It's not the right context to reissue from.

Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 19:18:13 -07:00
Jens Axboe
37d1e2e364 io_uring: move SQPOLL thread io-wq forked worker
Don't use a kthread for SQPOLL, use a forked worker just like the io-wq
workers. With that done, we can drop the various context grabbing we do
for SQPOLL, it already has everything it needs.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 16:44:42 -07:00
Linus Torvalds
f6e1e1d1e1 Changes in gfs2:
* Log space and revoke accounting rework to fix some failed asserts.
 * Local resource group glock sharing for better local performance.
 * Add support for version 1802 filesystems: trusted xattr support and
   '-o rgrplvb' mounts by default.
 * Actually synchronize on the inode glock's FREEING bit during withdraw
   ("gfs2: fix glock confusion in function signal_our_withdraw").
 * Fix parallel recovery of multiple journals ("gfs2: keep bios separate
   for each journal").
 * Various other bug fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmA1TmwUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpDZhAArnFj5AhWMI2+DD5o05EILdgDSpwh
 JWYT1pfRqR1OZrs7ZZ7tGZB4H6oytYfJ+4mg9Kk7CE7oJKcBh695IPZoIWv8+BCC
 WIgQGJytCFp4tuDNw11HZ0ahgW4zXPyJTt6jidZ5jVkux31JrUS7fVqSsD2vIPqA
 iQMcJIH+NLTlYbNt4d5T/ngaoRcx7m18RWkcxf6Y+/DBnnwIe4ZDpZmkWVykuncv
 OFSvXK8vKyLWGnvH/MIsywfYeU5rj/0AIu66JhVILQ4v5kGYIigwY3quXP2SoITM
 Z0+N5Gj/N4OWSscRS86zyqhnRucrjDkNP2+oGSzJWgtSXE/KplyfInAmQWzhIPRM
 n7T0boTp+gOTzGq7ELCzj44KICLG76WgDwaR2bLHuQ2/ppVrHNltZqncP2iwynN6
 glfST/eHBUBu1qTYLaOAfkUBlhpKDXu0YPcXX7lH6M0JqyvkRUFfuBAU9dic9D9K
 zsxplHGJrZnE9QFWWbS3aOviPlSHaXfkZF0Xv7QCLyuPRhu+e/qfcAoeVhxSd4+e
 I0grs/TxM61jyju9SmqnM7P+8qYS55naYH1V+6iNCU5dax8MvdxNZuneBQIa07U+
 Y84JPQvTBZDUE0gZ8fUzZtnYS7RqyiG7BL+T4W5Ph7LgxXbgQD7CWerYpg7fBm/j
 HEpjKqrS96zfTyk=
 =45VG
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Log space and revoke accounting rework to fix some failed asserts.

 - Local resource group glock sharing for better local performance.

 - Add support for version 1802 filesystems: trusted xattr support and
   '-o rgrplvb' mounts by default.

 - Actually synchronize on the inode glock's FREEING bit during withdraw
   ("gfs2: fix glock confusion in function signal_our_withdraw").

 - Fix parallel recovery of multiple journals ("gfs2: keep bios separate
   for each journal").

 - Various other bug fixes.

* tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (49 commits)
  gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush
  gfs2: Per-revoke accounting in transactions
  gfs2: Rework the log space allocation logic
  gfs2: Minor calc_reserved cleanup
  gfs2: Use resource group glock sharing
  gfs2: Allow node-wide exclusive glock sharing
  gfs2: Add local resource group locking
  gfs2: Add per-reservation reserved block accounting
  gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested}
  gfs2: Check for active reservation in gfs2_release
  gfs2: Don't search for unreserved space twice
  gfs2: Only pass reservation down to gfs2_rbm_find
  gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt
  gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
  gfs2: Add trusted xattr support
  gfs2: Enable rgrplvb for sb_fs_format 1802
  gfs2: Don't skip dlm unlock if glock has an lvb
  gfs2: Lock imbalance on error path in gfs2_recover_one
  gfs2: Move function gfs2_ail_empty_tr
  gfs2: Get rid of current_tail()
  ...
2021-02-23 14:04:04 -08:00
Linus Torvalds
7d6beb71da idmapped-mounts-v5.12
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
 ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
 =yPaw
 -----END PGP SIGNATURE-----

Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Bob Peterson
17d7768408 gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush
In gfs2_ail1_flush, we're using I/O plugging to give the block layer a
better chance of merging I/O requests.  If we're too aggressive here, we
can end up waiting on I/O to complete while still plugged.  Fix that in
a way similar to writeback_sb_inodes, except that we can't use
blk_flush_plug because blk_flush_plug_list is not exported.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-23 19:01:42 +01:00
Andreas Gruenbacher
803074ad77 Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
Merge the resource group glock sharing feature and the revoke accounting rework.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-23 18:54:22 +01:00
Tetsuo Handa
9c7d83ae6b pstore: Fix warning in pstore_kill_sb()
syzbot is hitting WARN_ON(pstore_sb != sb) at pstore_kill_sb() [1], for the
assumption that pstore_sb != NULL is wrong because pstore_fill_super() will
not assign pstore_sb = sb when new_inode() for d_make_root() returned NULL
(due to memory allocation fault injection).

Since mount_single() calls pstore_kill_sb() when pstore_fill_super()
failed, pstore_kill_sb() needs to be aware of such failure path.

[1] https://syzkaller.appspot.com/bug?id=6abacb8da5137cb47a416f2bef95719ed60508a0

Reported-by: syzbot <syzbot+d0cf0ad6513e9a1da5df@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210214031307.57903-1-penguin-kernel@I-love.SAKURA.ne.jp
2021-02-23 09:27:20 -08:00
Al Viro
6f24784f00 whack-a-mole: don't open-code iminor/imajor
several instances creeped back into the tree...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-02-23 10:25:29 -05:00
Al Viro
9652c73246 9p: fix misuse of sscanf() in v9fs_stat2inode()
1) sscanf() return value needs to be checked, damnit
2) sscanf() is perfectly capable of checking for fixed prefix,
no need for that %13s + strncmp with constant string.
3) st->extension is a valid string; no need for voodoo with
str*cpy() there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-02-23 10:25:28 -05:00
Steve French
f1a08655cc cifs: minor simplification to smb2_is_network_name_deleted
Trivial change to clarify code in smb2_is_network_name_deleted

Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-23 04:16:41 -06:00
Rohith Surabattula
9e550b0852 TCON Reconnect during STATUS_NETWORK_NAME_DELETED
When server returns error STATUS_NETWORK_NAME_DELETED, TCON
must be marked for reconnect. So, subsequent IO does the tree
connect again.

Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-23 04:16:00 -06:00
Steve French
23bda5e651 cifs: cleanup a few le16 vs. le32 uses in cifsacl.c
Cleanup some minor sparse warnings in cifsacl.c

Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-22 21:20:44 -06:00
Shyam Prasad N
bc3e9dd9d1 cifs: Change SIDs in ACEs while transferring file ownership.
With cifsacl, when a file/dir ownership is transferred (chown/chgrp),
the ACEs in the DACL for that file will need to replace the old owner
SIDs with the new owner SID.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-22 21:20:44 -06:00
Shyam Prasad N
f506550889 cifs: Retain old ACEs when converting between mode bits and ACL.
When cifsacl mount option is used, retain the ACEs which
should not be modified during chmod. Following is the approach taken:

1. Retain all explicit (non-inherited) ACEs, unless the SID is one
of owner/group/everyone/authenticated-users. We're going to set new
ACEs for these SIDs anyways.
2. At the end of the list of explicit ACEs, place the new list of
ACEs obtained by necessary conversion/encoding.
3. Once the converted/encoded ACEs are set, copy all the remaining
ACEs (inherited) into the new ACL.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-22 21:20:44 -06:00
Shyam Prasad N
c12ead71e8 cifs: Fix cifsacl ACE mask for group and others.
A two line fix which I made while testing my prev fix with
cifsacl mode conversions seem to have gone missing in the final fix
that was submitted. This is that fix.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-22 21:20:44 -06:00
Steve French
40f077a02b cifs: clarify hostname vs ip address in /proc/fs/cifs/DebugData
/proc/fs/cifs/DebugData called the ip address for server sessions
"Name" which is confusing since it is not a hostname. Change
this field name to "Address" and for the list of servers add
new field "Hostname" which is populated from the hostname used
to connect to the server.  See below. And also don't print
[NONE] when the interface list is empty as it is not clear
what 'NONE' referred to.

Servers:
1) ConnectionId: 0x1 Hostname: localhost
Number of credits: 389 Dialect 0x311
TCP status: 1 Instance: 1
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0
In Send: 0 In MaxReq Wait: 0

	Sessions:
	1) Address: 127.0.0.1
...

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-22 21:20:44 -06:00
Steve French
b438fcf128 cifs: change confusing field serverName (to ip_addr)
ses->serverName is not the server name, but the string form
of the ip address of the server.  Change the name to ip_addr
to avoid confusion (and fix the array length to match
maximum length of ipv6 address).

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-22 21:20:43 -06:00
Linus Torvalds
e913a8cdc2 Fixes around VM_FPNMAP and follow_pfn
- replace mm/frame_vector.c by get_user_pages in misc/habana and
   drm/exynos drivers, then move that into media as it's sole user
 - close race in generic_access_phys
 - s390 pci ioctl fix of this series landed in 5.11 already
 - properly revoke iomem mappings (/dev/mem, pci files)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmAzgywACgkQTA9ye/CY
 qnFPbA//RUHB5bD7vwnEglfJhonKSi/Vt3dNQwUI+pCFK8muWvvPyTkGXKjjT2dI
 uAOY2F23wymtIexV3fNLgnMez7kMcupOLkdxJic4GiO+HJn1jnkshdX7/dGtUW7O
 G3yfnf/D27i912tT3j6PN7dVnasAYYtndCgImM027Zigzn4ibY+02tnzd5XTj1F8
 yq8Swx88oqF8v10HxfpF3RLShqT3S17mFmd9dTv0GkZX497Pe75O44XcXzkD33Bj
 wasH2Tz8gMEQx6TNAGlJe13dzDHReh2cG0z2r+6PTA6KnaMMxbEIImHNuhWOmHb/
 nf8Jpu9uMOLzB+3hG3TzISTDBhAgPfoJ8Ov40VJCWMtCVBnyMyPJr28Oobb8Dj3V
 SXvjSVlLeobOLt+E9vAS+Rmas07LCGBdNP9sexxV7S/sveSQ5W+FptaQW03EghwA
 nBYEUC68WqpX99lJCFPmv5zmy5xkecjpU6mLHZljtV1ORzktqWZdVhmC8njHMAMY
 Hi/emnPxEX1FpOD38rr7F9KUUSsy4t/ZaCgVaLcxCcbglCHXSHC41R09p9TBRSJo
 G6Lksjyj4aa+UL5dZDAtLY0shg0bv2u93dGQNaDAC+uzj6D0ErBBzDK570zBKjp/
 75+nqezJlD0d7I6rOl6FwiEYeSrYXJxYEveKVUr8CnH6sfeBlwo=
 =lQoR
 -----END PGP SIGNATURE-----

Merge tag 'topic/iomem-mmap-vs-gup-2021-02-22' of git://anongit.freedesktop.org/drm/drm

Pull follow_pfn() updates from Daniel Vetter:
 "Fixes around VM_FPNMAP and follow_pfn:

   - replace mm/frame_vector.c by get_user_pages in misc/habana and
     drm/exynos drivers, then move that into media as it's sole user

   - close race in generic_access_phys

   - s390 pci ioctl fix of this series landed in 5.11 already

   - properly revoke iomem mappings (/dev/mem, pci files)"

* tag 'topic/iomem-mmap-vs-gup-2021-02-22' of git://anongit.freedesktop.org/drm/drm:
  PCI: Revoke mappings like devmem
  PCI: Also set up legacy files only after sysfs init
  sysfs: Support zapping of binary attr mmaps
  resource: Move devmem revoke code to resource framework
  /dev/mem: Only set filp->f_mapping
  PCI: Obey iomem restrictions for procfs mmap
  mm: Close race in generic_access_phys
  media: videobuf2: Move frame_vector into media subsystem
  mm/frame-vector: Use FOLL_LONGTERM
  misc/habana: Use FOLL_LONGTERM for userptr
  misc/habana: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  drm/exynos: Stop using frame_vector helpers
2021-02-22 17:45:02 -08:00
Linus Torvalds
4b5f9254e4 kconfig for kcmp syscall
drm userspaces uses this, systemd uses this, makes sense to pull it
 out from the checkpoint-restore bundle. Kees reviewed this from
 security pov and is happy with the final version.
 
 LWN coverage: https://lwn.net/Articles/845448/
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmAzaXIACgkQTA9ye/CY
 qnH5FQ//eL/7a/PDICuCRIN2p2aQwHoe9d12q+01RolAgce6F9mR9SFiKGSCR+t7
 daw4G/BaGxSYzvz1IqWbXDMhN87jAXV/IGs9k4OkSIcbnDmMY78EKMZB1c1t7AZo
 zmeAuQvmTAcBogTwC6IE9N1JwhH3fmudq4p8zZ4zLojJNSPjrwCvF/xQI/Yaw52V
 CTfni8mrjYJ+pZ1qn9XP3IceAFEEI27ubZj2DJU+5xpRJAdIAobo0XbVOf8XQ0uc
 /BRLyXFS66EDsY1wWHT6y6UXDNZgbLic0olC1aielaBJh+Wq6bQHgephxpasU5y7
 cZX7XTX2N1q8j8NmgzWLYRgERqtXv0CPHKdimTs8SaUcPDGhxcnwPR6hmdQEC+i6
 IjntWMERjfuyD+s6qVuc7s8WS7+Ry9OxgdVskHASqGpBvsSliXN1o02Am6WUuGsB
 HZxTjCe967FyL4LGU0YjobMTUUSWfYQkOBKABlvYUySNZ0ZHnSygHIWiWjC6b89A
 KmXiHJoocNfDlKwX6bf3OWQ+dGGFu2wo5wYzldIiqYJVidp50xdOosdRE1R6WwuG
 IOLCdNKdqDgtig+90/fFZ06liXZvqUdDafWgUs/g6lLquFrcq5aAIiSdR6PcPKB0
 MwfWcCglLtYrxgDHvNaqnW18yRQq2TGbe+A65aXzLPp45pKP8Hk=
 =uiSj
 -----END PGP SIGNATURE-----

Merge tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm

Pull kcmp kconfig update from Daniel Vetter:
 "Make the kcmp syscall available independently of checkpoint/restore.

  drm userspaces uses this, systemd uses this, so makes sense to pull it
  out from the checkpoint-restore bundle.

  Kees reviewed this from security pov and is happy with the final
  version"

Link: https://lwn.net/Articles/845448/

* tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm:
  kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
2021-02-22 17:15:30 -08:00
Linus Torvalds
7c70f3a748 Optimization:
- Cork the socket while there are queued replies
 
 Fixes:
 
 - DRC shutdown ordering
 - svc_rdma_accept() lockdep splat
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmAsA80ACgkQM2qzM29m
 f5erXA/+MrR3ZtwK2eaTITu13TzzTrMURbp/n0wCCW/Ls1YMb6bn9ggtBwu2W5Cn
 Vb0RO9OLcmoI6CjqPh0CTUvvZspMYOAX4W1jQecKt2ml075APdlqUcv9YWPUQqVJ
 qTg8HxDymvHvY3I3FcBxhzofmGzF8AOmQZJw9uI5Wt/ivBfqGWcAGlxyRmB3mdsm
 cJRK0Sy7QMn2LefMcpMEeSbPA049/NZNRp6fcXnpPQFer42thoosYsNhTlAJfCXC
 C5S0z3/T6rpuJucV9la/WkpUA0YhWbPEHWNdAB5tzSqmoEo4LpzJzjv7uyQU4oue
 QlmChIz9qasgTI/BnCkBIzPD99S4UQcXjX0BnNinkQ77e6+b/vdAR+T+NLHJdkAf
 +7Xz6T9aZNaz2R49CjYl6/kG0rlNkjUzyURRYs/9zEBhogMPH/N4T7Z2M+ljCkeb
 tc3OaFDXZ2rfr7EKBGsfnEKINM1gpYipzILkr8GSHUMZLzOB/64upKySaJVjCGXj
 7Sf1w+vJUWwYc+FqFvbaR4ybr01VIfdsecpn1TtY870zG1JzimzAHVZk1/xC9+CX
 J+lVOXbjawDl1Et3V3fWq6Y7mhAWves/NKPcbSug9sFc4qRHEmPbAq/RRtlsjQcn
 foMr5R8qd8OwEamVypZ2nIFxq4q3b742AS8lZhaK+DyZKq3oLac=
 =+R4U
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull more nfsd updates from Chuck Lever:
 "Here are a few additional NFSD commits for the merge window:

 Optimization:
   - Cork the socket while there are queued replies

  Fixes:
   - DRC shutdown ordering
   - svc_rdma_accept() lockdep splat"

* tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Further clean up svc_tcp_sendmsg()
  SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg()
  SUNRPC: Use TCP_CORK to optimise send performance on the server
  svcrdma: Hold private mutex while invoking rdma_accept()
  nfsd: register pernet ops last, unregister first
2021-02-22 13:29:55 -08:00
Linus Torvalds
20bf195e93 With netfs helper library and fscache rework delayed, just a few cap
handling improvements to avoid grabbing mmap_lock in some code paths
 and deal with capsnaps better and a mount option cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmAzuGwTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzixPqB/9kQxU8IkCF0wOm+dm0tBW3PjYxBFuz
 HryHU6WJHDbX9/enH6PgMj6ZpRwxgzDq8xUpmRKVeaPflej9PnfQyH/On+vQWRUX
 WyWyBx0QqbrKYvYK0cCjHzVC5kbtBA8C/1OSSs5EkJIh518RBMkeru9pYL7+TI5x
 zeQVXzOJB2Bz7y8Odd2RjlkAkix/J1m0LIggRaoWrTygz93PKXfjzhDpa4KC4WZj
 W6LjnYPpYjo34poKx/3N3ZSgGP+Y3F7ZDeNfSnPB2WKs7vzcYUCpWXBSHnHTz+lK
 H2O5GdmxQ6BFp4SZvYtf5e78igH/m/QmzAYGW2EmmKttOcyrb2282snb
 =8MQu
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "With netfs helper library and fscache rework delayed, just a few cap
  handling improvements to avoid grabbing mmap_lock in some code paths
  and deal with capsnaps better and a mount option cleanup"

* tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-client:
  ceph: defer flushing the capsnap if the Fb is used
  libceph: remove osdtimeout option entirely
  libceph: deprecate [no]cephx_require_signatures options
  ceph: allow queueing cap/snap handling after putting cap references
  ceph: clean up inode work queueing
  ceph: fix flush_snap logic after putting caps
2021-02-22 13:27:51 -08:00
Linus Torvalds
9fe1904626 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmAzo34ACgkQnJ2qBz9k
 QNmsUwf8DFq8Uu2PI2BFOzHkEG6F3y+/KPpja9k08q3A1NSM28RYBaFeWc9wGImZ
 jtu3k1+8TiK51OkYGxa5LeIKpaMZrylEGXhdYTyfBJiJSHrjApWiq1jsCvtxk/xt
 3pjI9+OItwmZVo/INYAWS8+QdweX87PkaZtKi0//pqgFdnsjMCKDUxkCIB3IEigk
 I7orTiBpTSgP3iwcuRhchyyCFjIeoW+L2nbNuv8CYjXo9WIAF5ypQx+r1T2f1Ieu
 Vt9u41gwRUYfn3a5YdKMJZgAkcv7a4QYP4+tbSnD9Wl3jtorCBgTC6EDUyGNWqdr
 lqRIJ0jp1ET387J/YAGCGFsdz1AIjw==
 =YTNN
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull isofs, udf, and quota updates from Jan Kara:
 "Several udf, isofs, and quota fixes"

* tag 'fs_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  parser: Fix kernel-doc markups
  udf: handle large user and group ID
  isofs: handle large user and group ID
  parser: add unsigned int parser
  udf: fix silent AED tagLocation corruption
  isofs: release buffer head before return
  quota: Fix memory leak when handling corrupted quota file
2021-02-22 13:25:37 -08:00
Linus Torvalds
db99038542 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmAzolwACgkQnJ2qBz9k
 QNnwAQgAhw7PYZgRGnhm/VEDBD1EiPqNIhV3+EuUcNHlrNERx0jPN3bcoXmJD7FE
 PCCwbsYtQyqjYFipuzvBnUur5s7CxrwyDhvE8bgYdOB43Gy94awwvwF+JbMnBaPj
 gZSvArKD7ISAUpt560jtH5KedNAZnDkPITePME2GQsOpZ9SHHjsJEhSheTaHk0t1
 03Odx6gK5CcRvRD4KQYTa/hvZH95dVHSdakgFODAUoyfR65KMLhBihNOVHZsEVEZ
 S99j0YBY15nxS8ygo+Iz3Y3KOzy9G1xRAzk3wSeDGzhNRfzYP/IIZWWY/KWowmvH
 afx0pa0KiYjgqDpDjsyYPOJ2Ul4cPA==
 =gXlh
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify update from Jan Kara:
 "Make inotify groups be charged against appropriate memcgs"

* tag 'fsnotify_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  inotify, memcg: account inotify instances to kmemcg
2021-02-22 13:23:29 -08:00
Linus Torvalds
d61c6a58ae \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmAzoWUACgkQnJ2qBz9k
 QNnFgQgAlng0JOzeCQvLpwweqFl0FCxYbOsZXC1xDyvfX3TiA6A6oiOR4tx3uhQN
 cOQmJXaiMn4oCXjD1j6WZwGfy23yx0XchaoFK9jy2IqodaB/zUjkiWYYqt0G3XIX
 ud35mxjLAGS12BCD0c+vHy2RMsUFl5ep+5aBHRHZJJhCcYbl7e5ctXZ3xB1Q0mgI
 r639gD8JhH3ICdu9W0NaMvqOrVhJFNmhSGATKL/N96+oKub2x2ycYE4L2OXegxy3
 mnFf26LjA8jt7K+KfHloTvkC6D4HVnnvKFvKiIbGKafiWhAE7q57ZO6BPCMajGue
 3UHIhWGmwKXRU72+nW6N+089GbcO/g==
 =1e+z
 -----END PGP SIGNATURE-----

Merge tag 'lazytime_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull lazytime updates from Jan Kara:
 "Cleanups of the lazytime handling in the writeback code making rules
  for calling ->dirty_inode() filesystem handlers saner"

* tag 'lazytime_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext4: simplify i_state checks in __ext4_update_other_inode_time()
  gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync()
  fs: improve comments for writeback_single_inode()
  fs: drop redundant check from __writeback_single_inode()
  fs: clean up __mark_inode_dirty() a bit
  fs: pass only I_DIRTY_INODE flags to ->dirty_inode
  fs: don't call ->dirty_inode for lazytime timestamp updates
  fat: only specify I_DIRTY_TIME when needed in fat_update_time()
  fs: only specify I_DIRTY_TIME when needed in generic_update_time()
  fs: correctly document the inode dirty flags
2021-02-22 13:17:39 -08:00
Linus Torvalds
c63dca9e23 Description for this pull request:
- Improve file deletion performance with dirsync mount option.
 - fix shift-out-of-bounds in exfat_fill_super() generated by syzkaller.
 -----BEGIN PGP SIGNATURE-----
 
 iQJMBAABCgA2FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmAzA7MYHG5hbWphZS5q
 ZW9uQHNhbXN1bmcuY29tAAoJEGcL+wNRRCEIYKMP/1i4uV1KebeQTkIiNlPMae+H
 hq2TgkrOLp7gKY+G8VoSrYIG9U5fEGh+bo6UwIXQpGDrnFb9AWAE0YoyCUV0sTGS
 6CQmmjl1Cq4gwmBs2yjkbmGa54d2NRExZc2zMewDxnWRQjIvqJftxh1qowjwN13j
 arfjvWGJ61Je/S1jQ0vS0+nC5f4Mcy+60JyAQB4gxX5327o1ZVohs16jYTTZuS4+
 PsARwM+q3wYQ/THeTf01E/a77IfsS49EtR1aTo8/9fCfQCxKdYT0wkJQQ/J0yhTL
 l6LMp1PNHV3iePRPZWAF15n8eB2dXGfcaathgtecY+9arKJVDnAoALt91gxN1bxE
 ZZZWkOlviN4FmdQvKNtVaBhdAxNqybe1P0yLvH1Vm2nY0oyY+qgQ79A/xNWcBvhg
 Q6Y1fV08KLWRPsafoZVntb3pd/DT1ekeedrFuqPe3c6kTziFn1GmUhb+6DGCEoFs
 2sbuxATOJCdNL4GHLt0CRNQYU2Dm6g0yuQ3qodu4wcco9yUgiNXe2sM6NpMXaTH7
 XUlgsaxl8zO5TKrVfre93gIP48HRf2/gZu8ejXs2rUEIzhSy3rsIMyt2DhTUe9Db
 CW2SMGQADG9Aiod4gRaunvMrTE3S7NMXLsOG5K6eThar8fOnqQ/F0NHi5DUWuFdw
 17G5gwimXbZxW0GhTvWH
 =kzJV
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - improve file deletion performance with dirsync mount option

 - fix shift-out-of-bounds in exfat_fill_super() reported by syzkaller

* tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: improve performance of exfat_free_cluster when using dirsync mount option
  exfat: fix shift-out-of-bounds in exfat_fill_super()
2021-02-22 13:15:50 -08:00
Linus Torvalds
0f3d950ddd zonefs changes for 5.12-rc1
Two patches in this pull request:
 * A fix that did not make it in time for 5.11, to correct the file size
   initialization of full sequential zone, from Shin'ichiro
 * Add file operation tracepoints to help with debugging, from Johannes
 
 Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYDLicAAKCRDdoc3SxdoY
 dpN4AQCh1EbZ3NvHRJ4bMWnNQ3OsdkTWYix7ur3C/5Ft7oQbKQEAge6cUgIjEkrD
 u3znsZSYE/RM+LxrhE1RquTwERkSeQk=
 =StDj
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs updates from Damien Le Moal:
 "Two changes:

   - A fix that did not make it in time for 5.11, to correct the file
     size initialization of full sequential zone, from Shin'ichiro

   - Add file operation tracepoints to help with debugging, from
     Johannes"

* tag 'zonefs-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Fix file size of zones in full condition
  zonefs: add tracepoints for file operations
2021-02-22 13:13:51 -08:00
Linus Torvalds
250a25e7a1 Merge branch 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull RCU-safe common_lsm_audit() from Al Viro:
 "Make common_lsm_audit() non-blocking and usable from RCU pathwalk
  context.

  We don't really need to grab/drop dentry in there - rcu_read_lock() is
  enough. There's a couple of followups using that to simplify the
  logics in selinux, but those hadn't soaked in -next yet, so they'll
  have to go in next window"

* 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make dump_common_audit_data() safe to be called from RCU pathwalk
  new helper: d_find_alias_rcu()
2021-02-22 13:05:30 -08:00
Linus Torvalds
205f92d7f2 Merge branch 'work.d_name' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull d_name whack-a-mole from Al Viro:
 "A bunch of places that play with ->d_name in printks instead of using
  proper formats..."

* 'work.d_name' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs_file_mmap(): use %pD
  cifs_debug: use %pd instead of messing with ->d_name
  erofs: use %pd instead of messing with ->d_name
  cramfs: use %pD instead of messing with file_dentry()->d_name
2021-02-22 13:03:30 -08:00
Andreas Gruenbacher
2129b42888 gfs2: Per-revoke accounting in transactions
In the log, revokes are stored as a revoke descriptor (struct
gfs2_log_descriptor), followed by zero or more additional revoke blocks
(struct gfs2_meta_header).  On filesystems with a blocksize of 4k, the
revoke descriptor contains up to 503 revokes, and the metadata blocks
contain up to 509 revokes each.  We've so far been reserving space for
revokes in transactions in block granularity, so a lot more space than
necessary was being allocated and then released again.

This patch switches to assigning revokes to transactions individually
instead.  Initially, space for the revoke descriptor is reserved and
handed out to transactions.  When more revokes than that are reserved,
additional revoke blocks are added.  When the log is flushed, the space
for the additional revoke blocks is released, but we keep the space for
the revoke descriptor block allocated.

Transactions may still reserve more revokes than they will actually need
in the end, but now we won't overshoot the target as much, and by only
returning the space for excess revokes at log flush time, we further
reduce the amount of contention between processes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22 21:16:23 +01:00
Andreas Gruenbacher
fe3e397668 gfs2: Rework the log space allocation logic
The current log space allocation logic is hard to understand or extend.
The principle it that when the log is flushed, we may or may not have a
transaction active that has space allocated in the log.  To deal with
that, we set aside a magical number of blocks to be used in case we
don't have an active transaction.  It isn't clear that the pool will
always be big enough.  In addition, we can't return unused log space at
the end of a transaction, so the number of blocks allocated must exactly
match the number of blocks used.

Simplify this as follows:
 * When transactions are allocated or merged, always reserve enough
   blocks to flush the transaction (err on the safe side).
 * In gfs2_log_flush, return any allocated blocks that haven't been used.
 * Maintain a pool of spare blocks big enough to do one log flush, as
   before.
 * In gfs2_log_flush, when we have no active transaction, allocate a
   suitable number of blocks.  For that, use the spare pool when
   called from logd, and leave the pool alone otherwise.  This means
   that when the log is almost full, logd will still be able to do one
   more log flush, which will result in more log space becoming
   available.

This will make the log space allocator code easier to work with in
the future.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22 21:16:22 +01:00
Andreas Gruenbacher
71b219f4e5 gfs2: Minor calc_reserved cleanup
No functional change.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22 21:16:22 +01:00
Linus Torvalds
0e63a5c6ba It has been a relatively quiet cycle in docsland.
- As promised, the minimum Sphinx version to build the docs is now 1.7,
    and we have dropped support for Python 2 entirely.  That allowed the
    removal of a bunch of compatibility code.
 
  - A set of treewide warning fixups from Mauro that I applied after it
    became clear nobody else was going to deal with them.
 
  - The automarkup mechanism can now create cross-references from relative
    paths to RST files.
 
  - More translations, typo fixes, and warning fixes.
 
 No conflicts with any other tree as far as I know.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmAq4EUPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YTIAH/1I5MlVQwuvNKjwCAEdmltQgHv6SmXSpDkrp
 SGuviWVXxqz8dTyo7C2R12qE/7nP8zGAmclNdX78ynl5qWaj05lQsjBgMYSoQO/F
 +akyLQSL8/8SQrtDPPBcboPuIz9DzkX51kkQthvCf0puJi0ScKVHO9Sk9SKUgDoK
 cnCE9VwpGL7YX/ee2wt91UYREijgJ9P7eQ6rqKvUZ5Itu9ikfu9vQU41GR9tOXDK
 MQK+k38pWdl8wRgTgA0pkVhMf1G732bxTTicvFHXcyqmCkh7++m2+ysT8O+SBBMX
 e5BbP0yysSqThjwFHOW5PWM1AWD5iVz+pnwJwEaJ4K76tJJOw9M=
 =bcDk
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.12' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a relatively quiet cycle in docsland.

   - As promised, the minimum Sphinx version to build the docs is now
     1.7, and we have dropped support for Python 2 entirely. That
     allowed the removal of a bunch of compatibility code.

   - A set of treewide warning fixups from Mauro that I applied after it
     became clear nobody else was going to deal with them.

   - The automarkup mechanism can now create cross-references from
     relative paths to RST files.

   - More translations, typo fixes, and warning fixes"

* tag 'docs-5.12' of git://git.lwn.net/linux: (75 commits)
  docs: kernel-hacking: be more civil
  docs: Remove the Microsoft rhetoric
  Documentation/admin-guide: kernel-parameters: Update nohlt section
  doc/admin-guide: fix spelling mistake: "perfomance" -> "performance"
  docs: Document cross-referencing using relative path
  docs: Enable usage of relative paths to docs on automarkup
  docs: thermal: fix spelling mistakes
  Documentation: admin-guide: Update kvm/xen config option
  docs: Make syscalls' helpers naming consistent
  coding-style.rst: Avoid comma statements
  Documentation: /proc/loadavg: add 3 more field descriptions
  Documentation/submitting-patches: Add blurb about backtraces in commit messages
  Docs: drop Python 2 support
  Move our minimum Sphinx version to 1.7
  Documentation: input: define ABS_PRESSURE/ABS_MT_PRESSURE resolution as grams
  scripts/kernel-doc: add internal hyperlink to DOC: sections
  Update Documentation/admin-guide/sysctl/fs.rst
  docs: Update DTB format references
  docs: zh_CN: add iio index.rst translation
  docs/zh_CN: add iio ep93xx_adc.rst translation
  ...
2021-02-22 10:57:46 -08:00
Johannes Thumshirn
6e37d24599 btrfs: zoned: fix deadlock on log sync
Lockdep with fstests test case btrfs/041 detected a unsafe locking
scenario when we allocate the log node on a zoned filesystem.

btrfs/041
 ============================================
 WARNING: possible recursive locking detected
 5.11.0-rc7+ #939 Not tainted
 --------------------------------------------
 xfs_io/698 is trying to acquire lock:
 ffff88810cd673a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x3d1/0xee0 [btrfs]

 but task is already holding lock:
 ffff88810b0fc3a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x313/0xee0 [btrfs]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&root->log_mutex);
   lock(&root->log_mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by xfs_io/698:
  #0: ffff88810cd66620 (sb_internal){.+.+}-{0:0}, at: btrfs_sync_file+0x2c3/0x570 [btrfs]
  #1: ffff88810b0fc3a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x313/0xee0 [btrfs]

 stack backtrace:
 CPU: 0 PID: 698 Comm: xfs_io Not tainted 5.11.0-rc7+ #939
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
 Call Trace:
  dump_stack+0x77/0x97
  __lock_acquire.cold+0xb9/0x32a
  lock_acquire+0xb5/0x400
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  __mutex_lock+0x7b/0x8d0
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  ? find_first_extent_bit+0x9f/0x100 [btrfs]
  ? __mutex_unlock_slowpath+0x35/0x270
  btrfs_sync_log+0x3d1/0xee0 [btrfs]
  btrfs_sync_file+0x3a8/0x570 [btrfs]
  __x64_sys_fsync+0x34/0x60
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens, because we are taking the ->log_mutex albeit it has already
been locked.

Also while at it, fix the bogus unlock of the tree_log_mutex in the error
handling.

Fixes: 3ddebf27fcd3 ("btrfs: zoned: reorder log node allocation on zoned filesystem")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:08:48 +01:00
Josef Bacik
95c85fba1f btrfs: avoid double put of block group when emptying cluster
It's wrong calling btrfs_put_block_group in
__btrfs_return_cluster_to_free_space if the block group passed is
different than the block group the cluster represents. As this means the
cluster doesn't have a reference to the passed block group. This results
in double put and a use-after-free bug.

Fix this by simply bailing if the block group we passed in does not
match the block group on the cluster.

Fixes: fa9c0d795f7b ("Btrfs: rework allocation clustering")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:45 +01:00
Filipe Manana
3660d0bcdb btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled
When using the NO_HOLES feature, if we clone a file range that spans only
a hole into a range that is at or beyond the current i_size of the
destination file, we end up not setting the full sync runtime flag on the
inode. As a result, if we then fsync the destination file and have a power
failure, after log replay we can end up exposing stale data instead of
having a hole for that range.

The conditions for this to happen are the following:

1) We have a file with a size of, for example, 1280K;

2) There is a written (non-prealloc) extent for the file range from 1024K
   to 1280K with a length of 256K;

3) This particular file extent layout is durably persisted, so that the
   existing superblock persisted on disk points to a subvolume root where
   the file has that exact file extent layout and state;

4) The file is truncated to a smaller size, to an offset lower than the
   start offset of its last extent, for example to 800K. The truncate sets
   the full sync runtime flag on the inode;

6) Fsync the file to log it and clear the full sync runtime flag;

7) Clone a region that covers only a hole (implicit hole due to NO_HOLES)
   into the file with a destination offset that starts at or beyond the
   256K file extent item we had - for example to offset 1024K;

8) Since the clone operation does not find extents in the source range,
   we end up in the if branch at the bottom of btrfs_clone() where we
   punch a hole for the file range starting at offset 1024K by calling
   btrfs_replace_file_extents(). There we end up not setting the full
   sync flag on the inode, because we don't know we are being called in
   a clone context (and not fallocate's punch hole operation), and
   neither do we create an extent map to represent a hole because the
   requested range is beyond eof;

9) A further fsync to the file will be a fast fsync, since the clone
   operation did not set the full sync flag, and therefore it relies on
   modified extent maps to correctly log the file layout. But since
   it does not find any extent map marking the range from 1024K (the
   previous eof) to the new eof, it does not log a file extent item
   for that range representing the hole;

10) After a power failure no hole for the range starting at 1024K is
   punched and we end up exposing stale data from the old 256K extent.

Turning this into exact steps:

  $ mkfs.btrfs -f -O no-holes /dev/sdi
  $ mount /dev/sdi /mnt

  # Create our test file with 3 extents of 256K and a 256K hole at offset
  # 256K. The file has a size of 1280K.
  $ xfs_io -f -s \
              -c "pwrite -S 0xab -b 256K 0 256K" \
              -c "pwrite -S 0xcd -b 256K 512K 256K" \
              -c "pwrite -S 0xef -b 256K 768K 256K" \
              -c "pwrite -S 0x73 -b 256K 1024K 256K" \
              /mnt/sdi/foobar

  # Make sure it's durably persisted. We want the last committed super
  # block to point to this particular file extent layout.
  sync

  # Now truncate our file to a smaller size, falling within a position of
  # the second extent. This sets the full sync runtime flag on the inode.
  # Then fsync the file to log it and clear the full sync flag from the
  # inode. The third extent is no longer part of the file and therefore
  # it is not logged.
  $ xfs_io -c "truncate 800K" -c "fsync" /mnt/foobar

  # Now do a clone operation that only clones the hole and sets back the
  # file size to match the size it had before the truncate operation
  # (1280K).
  $ xfs_io \
        -c "reflink /mnt/foobar 256K 1024K 256K" \
        -c "fsync" \
        /mnt/foobar

  # File data before power failure:
  $ od -A d -t x1 /mnt/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  0262144 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0524288 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  0786432 ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef
  *
  0819200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  1310720

  <power fail>

  # Mount the fs again to replay the log tree.
  $ mount /dev/sdi /mnt

  # File data after power failure:
  $ od -A d -t x1 /mnt/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  0262144 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0524288 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  0786432 ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef
  *
  0819200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  1048576 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73
  *
  1310720

The range from 1024K to 1280K should correspond to a hole but instead it
points to stale data, to the 256K extent that should not exist after the
truncate operation.

The issue does not exists when not using NO_HOLES, because for that case
we use file extent items to represent holes, these are found and copied
during the loop that iterates over extents at btrfs_clone(), and that
causes btrfs_replace_file_extents() to be called with a non-NULL
extent_info argument and therefore set the full sync runtime flag on the
inode.

So fix this by making the code that deals with a trailing hole during
cloning, at btrfs_clone(), to set the full sync flag on the inode, if the
range starts at or beyond the current i_size.

A test case for fstests will follow soon.

Backporting notes: for kernel 5.4 the change goes to ioctl.c into
btrfs_clone before the last call to btrfs_punch_hole_range.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:45 +01:00
Josef Bacik
1119a72e22 btrfs: tree-checker: do not error out if extent ref hash doesn't match
The tree checker checks the extent ref hash at read and write time to
make sure we do not corrupt the file system.  Generally extent
references go inline, but if we have enough of them we need to make an
item, which looks like

key.objectid	= <bytenr>
key.type	= <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY>
key.offset	= hash(tree, owner, offset)

However if key.offset collide with an unrelated extent reference we'll
simply key.offset++ until we get something that doesn't collide.
Obviously this doesn't match at tree checker time, and thus we error
while writing out the transaction.  This is relatively easy to
reproduce, simply do something like the following

  xfs_io -f -c "pwrite 0 1M" file
  offset=2

  for i in {0..10000}
  do
	  xfs_io -c "reflink file 0 ${offset}M 1M" file
	  offset=$(( offset + 2 ))
  done

  xfs_io -c "reflink file 0 17999258914816 1M" file
  xfs_io -c "reflink file 0 35998517829632 1M" file
  xfs_io -c "reflink file 0 53752752058368 1M" file

  btrfs filesystem sync

And the sync will error out because we'll abort the transaction.  The
magic values above are used because they generate hash collisions with
the first file in the main subvol.

The fix for this is to remove the hash value check from tree checker, as
we have no idea which offset ours should belong to.

Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com>
Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment]
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:44 +01:00
Filipe Manana
dd0734f2a8 btrfs: fix race between swap file activation and snapshot creation
When creating a snapshot we check if the current number of swap files, in
the root, is non-zero, and if it is, we error out and warn that we can not
create the snapshot because there are active swap files.

However this is racy because when a task started activation of a swap
file, another task might have started already snapshot creation and might
have seen the counter for the number of swap files as zero. This means
that after the swap file is activated we may end up with a snapshot of the
same root successfully created, and therefore when the first write to the
swap file happens it has to fall back into COW mode, which should never
happen for active swap files.

Basically what can happen is:

1) Task A starts snapshot creation and enters ioctl.c:create_snapshot().
   There it sees that root->nr_swapfiles has a value of 0 so it continues;

2) Task B enters btrfs_swap_activate(). It is not aware that another task
   started snapshot creation but it did not finish yet. It increments
   root->nr_swapfiles from 0 to 1;

3) Task B checks that the file meets all requirements to be an active
   swap file - it has NOCOW set, there are no snapshots for the inode's
   root at the moment, no file holes, no reflinked extents, etc;

4) Task B returns success and now the file is an active swap file;

5) Task A commits the transaction to create the snapshot and finishes.
   The swap file's extents are now shared between the original root and
   the snapshot;

6) A write into an extent of the swap file is attempted - there is a
   snapshot of the file's root, so we fall back to COW mode and therefore
   the physical location of the extent changes on disk.

So fix this by taking the snapshot lock during swap file activation before
locking the extent range, as that is the order in which we lock these
during buffered writes.

Fixes: ed46ff3d42378 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:35 +01:00
Filipe Manana
195a49eaf6 btrfs: fix race between writes to swap files and scrub
When we active a swap file, at btrfs_swap_activate(), we acquire the
exclusive operation lock to prevent the physical location of the swap
file extents to be changed by operations such as balance and device
replace/resize/remove. We also call there can_nocow_extent() which,
among other things, checks if the block group of a swap file extent is
currently RO, and if it is we can not use the extent, since a write
into it would result in COWing the extent.

However we have no protection against a scrub operation running after we
activate the swap file, which can result in the swap file extents to be
COWed while the scrub is running and operating on the respective block
group, because scrub turns a block group into RO before it processes it
and then back again to RW mode after processing it. That means an attempt
to write into a swap file extent while scrub is processing the respective
block group, will result in COWing the extent, changing its physical
location on disk.

Fix this by making sure that block groups that have extents that are used
by active swap files can not be turned into RO mode, therefore making it
not possible for a scrub to turn them into RO mode. When a scrub finds a
block group that can not be turned to RO due to the existence of extents
used by swap files, it proceeds to the next block group and logs a warning
message that mentions the block group was skipped due to active swap
files - this is the same approach we currently use for balance.

Fixes: ed46ff3d42378 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:15 +01:00
Filipe Manana
20903032cd btrfs: avoid checking for RO block group twice during nocow writeback
During the nocow writeback path, we currently iterate the rbtree of block
groups twice: once for checking if the target block group is RO with the
call to btrfs_extent_readonly()), and once again for getting a nocow
reference on the block group with a call to btrfs_inc_nocow_writers().

Since btrfs_inc_nocow_writers() already returns false when the target
block group is RO, remove the call to btrfs_extent_readonly(). Not only
we avoid searching the blocks group rbtree twice, it also helps reduce
contention on the lock that protects it (specially since it is a spin
lock and not a read-write lock). That may make a noticeable difference
on very large filesystems, with thousands of allocated block groups.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:55 +01:00
Nikolay Borisov
3c17916510 btrfs: fix race between extent freeing/allocation when using bitmaps
During allocation the allocator will try to allocate an extent using
cluster policy. Once the current cluster is exhausted it will remove the
entry under btrfs_free_cluster::lock and subsequently acquire
btrfs_free_space_ctl::tree_lock to dispose of the already-deleted entry
and adjust btrfs_free_space_ctl::total_bitmap. This poses a problem
because there exists a race condition between removing the entry under
one lock and doing the necessary accounting holding a different lock
since extent freeing only uses the 2nd lock. This can result in the
following situation:

T1:                                    T2:
btrfs_alloc_from_cluster               insert_into_bitmap <holds tree_lock>
 if (entry->bytes == 0)                   if (block_group && !list_empty(&block_group->cluster_list)) {
    rb_erase(entry)

 spin_unlock(&cluster->lock);
   (total_bitmaps is still 4)           spin_lock(&cluster->lock);
                                         <doesn't find entry in cluster->root>
 spin_lock(&ctl->tree_lock);             <goes to new_bitmap label, adds
<blocked since T2 holds tree_lock>       <a new entry and calls add_new_bitmap>
					    recalculate_thresholds  <crashes,
                                              due to total_bitmaps
					      becoming 5 and triggering
					      an ASSERT>

To fix this ensure that once depleted, the cluster entry is deleted when
both cluster lock and tree locks are held in the allocator (T1), this
ensures that even if there is a race with a concurrent
insert_into_bitmap call it will correctly find the entry in the cluster
and add the new space to it.

CC: <stable@vger.kernel.org> # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:31 +01:00
Qu Wenruo
04d4ba4c90 btrfs: make check_compressed_csum() to be subpage compatible
Currently check_compressed_csum() completely relies on sectorsize ==
PAGE_SIZE to do checksum verification for compressed extents.

To make it subpage compatible, this patch will:
- Do extra calculation for the csum range
  Since we have multiple sectors inside a page, we need to only hash
  the range we want, not the full page anymore.

- Do sector-by-sector hash inside the page

With this patch and previous conversion on
btrfs_submit_compressed_read(), now we can read subpage compressed
extents properly, and do proper csum verification.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:27 +01:00
Qu Wenruo
be6a13613f btrfs: make btrfs_submit_compressed_read() subpage compatible
For compressed read, we always submit page read using page size.  This
doesn't work well with subpage, as for subpage one page can contain
several sectors.  Such submission will read range out of what we want,
and cause problems.

Thankfully to make it subpage compatible, we only need to change how the
last page of the compressed extent is read.

Instead of always adding a full page to the compressed read bio, if we're
at the last page, calculate the size using compressed length, so that we
only add part of the range into the compressed read bio.

Since we are here, also change the PAGE_SIZE used in
lookup_extent_mapping() to sectorsize.
This modification won't cause any functional change, as
lookup_extent_mapping() can handle the case where the search range is
larger than found extent range.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:25 +01:00
Ira Weiny
d70cef0d46 btrfs: fix raid6 qstripe kmap
When a qstripe is required an extra page is allocated and mapped.  There
were 3 problems:

1) There is no corresponding call of kunmap() for the qstripe page.
2) There is no reason to map the qstripe page more than once if the
   number of bits set in rbio->dbitmap is greater than one.
3) There is no reason to map the parity page and unmap it each time
   through the loop.

The page memory can continue to be reused with a single mapping on each
iteration by raid6_call.gen_syndrome() without remapping.  So map the
page for the duration of the loop.

Similarly, improve the algorithm by mapping the parity page just 1 time.

Fixes: 5a6ac9eacb49 ("Btrfs, raid56: support parity scrub on raid56")
CC: stable@vger.kernel.org # 4.4.x: c17af96554a8: btrfs: raid56: simplify tracking of Q stripe presence
CC: stable@vger.kernel.org # 4.4.x
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:21 +01:00
Pavel Begunkov
8e5c66c485 io_uring: clear request count when freeing caches
BUG: KASAN: double-free or invalid-free in io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709

Workqueue: events_unbound io_ring_exit_work
Call Trace:
 [...]
 __cache_free mm/slab.c:3424 [inline]
 kmem_cache_free_bulk+0x4b/0x1b0 mm/slab.c:3744
 io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709
 io_ring_ctx_free fs/io_uring.c:8764 [inline]
 io_ring_exit_work+0x518/0x6b0 fs/io_uring.c:8846
 process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Freed by task 11900:
 [...]
 kmem_cache_free_bulk+0x4b/0x1b0 mm/slab.c:3744
 io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709
 io_uring_flush+0x483/0x6e0 fs/io_uring.c:9237
 filp_close+0xb4/0x170 fs/open.c:1286
 close_files fs/file.c:403 [inline]
 put_files_struct fs/file.c:418 [inline]
 put_files_struct+0x1d0/0x350 fs/file.c:415
 exit_files+0x7e/0xa0 fs/file.c:435
 do_exit+0xc27/0x2ae0 kernel/exit.c:820
 do_group_exit+0x125/0x310 kernel/exit.c:922
 [...]

io_req_caches_free() doesn't zero submit_state->free_reqs, so io_uring
considers just freed requests to be good and sound and will reuse or
double free them. Zero the counter.

Reported-by: syzbot+30b4936dcdb3aafa4fb4@syzkaller.appspotmail.com
Fixes: 41be53e94fb04 ("io_uring: kill cached requests from exiting task closing the ring")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-22 06:31:31 -07:00
Hyeongseok Kim
f728760aa9 exfat: improve performance of exfat_free_cluster when using dirsync mount option
There are stressful update of cluster allocation bitmap when using
dirsync mount option which is doing sync buffer on every cluster bit
clearing. This could result in performance degradation when deleting
big size file.
Fix to update only when the bitmap buffer index is changed would make
less disk access, improving performance especially for truncate operation.

Testing with Samsung 256GB sdcard, mounted with dirsync option
(mount -t exfat /dev/block/mmcblk0p1 /temp/mount -o dirsync)

Remove 4GB file, blktrace result.
[Before] : 39 secs.
Total (blktrace):
 Reads Queued:      0,        0KiB   Writes Queued:      32775,    16387KiB
 Read Dispatches:   0,        0KiB   Write Dispatches:   32775,    16387KiB
 Reads Requeued:    0                Writes Requeued:        0
 Reads Completed:   0,        0KiB   Writes Completed:   32775,    16387KiB
 Read Merges:       0,        0KiB   Write Merges:           0,        0KiB
 IO unplugs:        2                Timer unplugs:          0

[After] : 1 sec.
Total (blktrace):
 Reads Queued:      0,        0KiB   Writes Queued:         13,        6KiB
 Read Dispatches:   0,        0KiB   Write Dispatches:      13,        6KiB
 Reads Requeued:    0                Writes Requeued:        0
 Reads Completed:   0,        0KiB   Writes Completed:      13,        6KiB
 Read Merges:       0,        0KiB   Write Merges:           0,        0KiB
 IO unplugs:        1                Timer unplugs:          0

Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2021-02-22 09:55:14 +09:00