71184 Commits

Author SHA1 Message Date
Roman Gushchin
29264d92a0 writeback, cgroup: switch to rcu_work API in inode_switch_wbs()
Inode's wb switching requires two steps divided by an RCU grace period.
It's currently implemented as an RCU callback inode_switch_wbs_rcu_fn(),
which schedules inode_switch_wbs_work_fn() as a work.

Switching to the rcu_work API allows to do the same in a cleaner and
slightly shorter form.

Link: https://lkml.kernel.org/r/20210608230225.2078447-5-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:48 -07:00
Roman Gushchin
8826ee4fe7 writeback, cgroup: increment isw_nr_in_flight before grabbing an inode
isw_nr_in_flight is used to determine whether the inode switch queue
should be flushed from the umount path.  Currently it's increased after
grabbing an inode and even scheduling the switch work.  It means the
umount path can walk past cleanup_offline_cgwb() with active inode
references, which can result in a "Busy inodes after unmount." message and
use-after-free issues (with inode->i_sb which gets freed).

Fix it by incrementing isw_nr_in_flight before doing anything with the
inode and decrementing in the case when switching wasn't scheduled.

The problem hasn't yet been seen in the real life and was discovered by
Jan Kara by looking into the code.

Link: https://lkml.kernel.org/r/20210608230225.2078447-4-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Jan Kara <jack@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:47 -07:00
Roman Gushchin
592fa00218 writeback, cgroup: add smp_mb() to cgroup_writeback_umount()
A full memory barrier is required between clearing SB_ACTIVE flag in
generic_shutdown_super() and checking isw_nr_in_flight in
cgroup_writeback_umount(), otherwise a new switch operation might be
scheduled after atomic_read(&isw_nr_in_flight) returned 0.  This would
result in a non-flushed isw_wq, and a potential crash.

The problem hasn't yet been seen in the real life and was discovered by
Jan Kara by looking into the code.

Link: https://lkml.kernel.org/r/20210608230225.2078447-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:47 -07:00
Roman Gushchin
4ade5867b4 writeback, cgroup: do not switch inodes with I_WILL_FREE flag
Patch series "cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups", v9.

When an inode is getting dirty for the first time it's associated with a
wb structure (see __inode_attach_wb()).  It can later be switched to
another wb (if e.g.  some other cgroup is writing a lot of data to the
same inode), but otherwise stays attached to the original wb until being
reclaimed.

The problem is that the wb structure holds a reference to the original
memory and blkcg cgroups.  So if an inode has been dirty once and later is
actively used in read-only mode, it has a good chance to pin down the
original memory and blkcg cgroups forever.  This is often the case with
services bringing data for other services, e.g.  updating some rpm
packages.

In the real life it becomes a problem due to a large size of the memcg
structure, which can easily be 1000x larger than an inode.  Also a really
large number of dying cgroups can raise different scalability issues, e.g.
making the memory reclaim costly and less effective.

To solve the problem inodes should be eventually detached from the
corresponding writeback structure.  It's inefficient to do it after every
writeback completion.  Instead it can be done whenever the original memory
cgroup is offlined and writeback structure is getting killed.  Scanning
over a (potentially long) list of inodes and detach them from the
writeback structure can take quite some time.  To avoid scanning all
inodes, attached inodes are kept on a new list (b_attached).  To make it
less noticeable to a user, the scanning and switching is performed from a
work context.

Big thanks to Jan Kara, Dennis Zhou, Hillf Danton and Tejun Heo for their
ideas and contribution to this patchset.

This patch (of 8):

If an inode's state has I_WILL_FREE flag set, the inode will be freed
soon, so there is no point in trying to switch the inode to a different
cgwb.

I_WILL_FREE was ignored since the introduction of the inode switching, so
it looks like it doesn't lead to any noticeable issues for a user.  This
is why the patch is not intended for a stable backport.

Link: https://lkml.kernel.org/r/20210608230225.2078447-1-guro@fb.com
Link: https://lkml.kernel.org/r/20210608230225.2078447-2-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Jan Kara <jack@suse.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:47 -07:00
Jan Kara
1a14e3779d dax: fix ENOMEM handling in grab_mapping_entry()
grab_mapping_entry() has a bug in handling of ENOMEM condition.  Suppose
we have a PMD entry at index i which we are downgrading to a PTE entry.
grab_mapping_entry() will set pmd_downgrade to true, lock the entry, clear
the entry in xarray, and decrement mapping->nrpages.  The it will call:

	entry = dax_make_entry(pfn_to_pfn_t(0), flags);
	dax_lock_entry(xas, entry);

which inserts new PTE entry into xarray.  However this may fail allocating
the new node.  We handle this by:

	if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM))
		goto retry;

however pmd_downgrade stays set to true even though 'entry' returned from
get_unlocked_entry() will be NULL now.  And we will go again through the
downgrade branch.  This is mostly harmless except that mapping->nrpages is
decremented again and we temporarily have an invalid entry stored in
xarray.  Fix the problem by setting pmd_downgrade to false each time we
lookup the entry we work with so that it matches the entry we found.

Link: https://lkml.kernel.org/r/20210622160015.18004-1-jack@suse.cz
Fixes: b15cd800682f ("dax: Convert page fault handlers to XArray")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:47 -07:00
Colin Ian King
7ed6d4e418 ocfs2: remove redundant initialization of variable ret
The variable ret is being initialized with a value that is never read, the
assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Link: https://lkml.kernel.org/r/20210613135148.74658-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:46 -07:00
Chen Huang
f0f798db05 ocfs2: replace simple_strtoull() with kstrtoull()
simple_strtoull() is deprecated in some situation since it does not check
for the range overflow, use kstrtoull() instead.

Link: https://lkml.kernel.org/r/20210526092020.554341-3-chenhuang5@huawei.com
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:46 -07:00
Wan Jiabing
01f0139913 ocfs2: remove repeated uptodate check for buffer
In commit 60f91826ca62 ("buffer: Avoid setting buffer bits that are
already set"), function set_buffer_##name was added a test_bit() to check
buffer, which is the same as function buffer_##name.  The
!buffer_uptodate(bh) here is a repeated check.  Remove it.

Link: https://lkml.kernel.org/r/20210425025702.13628-1-wanjiabing@vivo.com
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:46 -07:00
Colin Ian King
ca49b6d856 ocfs2: remove redundant assignment to pointer queue
The pointer queue is being initialized with a value that is never read and
it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Link: https://lkml.kernel.org/r/20210513113957.57539-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:46 -07:00
Dan Carpenter
54e948c60c ocfs2: fix snprintf() checking
The snprintf() function returns the number of bytes which would have been
printed if the buffer was large enough.  In other words it can return ">=
remain" but this code assumes it returns "== remain".

The run time impact of this bug is not very severe.  The next iteration
through the loop would trigger a WARN() when we pass a negative limit to
snprintf().  We would then return success instead of -E2BIG.

The kernel implementation of snprintf() will never return negatives so
there is no need to check and I have deleted that dead code.

Link: https://lkml.kernel.org/r/20210511135350.GV1955@kadam
Fixes: a860f6eb4c6a ("ocfs2: sysfile interfaces for online file check")
Fixes: 74ae4e104dfc ("ocfs2: Create stack glue sysfs files.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:46 -07:00
Yang Yingliang
74ef829e41 ocfs2: remove unnecessary INIT_LIST_HEAD()
The list_head o2hb_node_events is initialized statically.  It is
unnecessary to initialize by INIT_LIST_HEAD().

Link: https://lkml.kernel.org/r/20210511115847.3817395-1-yangyingliang@huawei.com
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:46 -07:00
Vincent Whitchurch
10dde05b89 squashfs: add option to panic on errors
Add an errors=panic mount option to make squashfs trigger a panic when
errors are encountered, similar to several other filesystems.  This allows
a kernel dump to be saved using which the corruption can be analysed and
debugged.

Inspired by a pre-fs_context patch by Anton Eliasson.

Link: https://lkml.kernel.org/r/20210527125019.14511-1-vincent.whitchurch@axis.com
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:46 -07:00
Desmond Cheong Zhi Xi
d98e4d9541 ntfs: fix validity check for file name attribute
When checking the file name attribute, we want to ensure that it fits
within the bounds of ATTR_RECORD.  To do this, we should check that (attr
record + file name offset + file name length) < (attr record + attr record
length).

However, the original check did not include the file name offset in the
calculation.  This means that corrupted on-disk metadata might not caught
by the incorrect file name check, and lead to an invalid memory access.

An example can be seen in the crash report of a memory corruption error
found by Syzbot:
https://syzkaller.appspot.com/bug?id=a1a1e379b225812688566745c3e2f7242bffc246

Adding the file name offset to the validity check fixes this error and
passes the Syzbot reproducer test.

Link: https://lkml.kernel.org/r/20210614050540.289494-1-desmondcheongzx@gmail.com
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reported-by: syzbot+213ac8bb98f7f4420840@syzkaller.appspotmail.com
Tested-by: syzbot+213ac8bb98f7f4420840@syzkaller.appspotmail.com
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:45 -07:00
Andreas Gruenbacher
7a607a41cd gfs2: Clean up gfs2_unstuff_dinode
Split __gfs2_unstuff_inode off from gfs2_unstuff_dinode and clean up the
code a little.  All remaining callers now pass NULL as the page argument
of gfs2_unstuff_dinode, so remove that argument.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-29 10:56:51 +02:00
Andreas Gruenbacher
64090cbe4b gfs2: Unstuff before locking page in gfs2_page_mkwrite
In gfs2_page_mkwrite, unstuff inodes before locking the page.  That
way, we won't have to pass in the locked page to gfs2_unstuff_inode,
and gfs2_unstuff_inode can look up and lock the page itself.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-29 10:56:51 +02:00
Andreas Gruenbacher
0fc3bcd6b6 gfs2: Clean up the error handling in gfs2_page_mkwrite
We're setting an error number so that block_page_mkwrite_return
translates it into the corresponding VM_FAULT_* code in several places,
but this is getting confusing, so set the VM_FAULT_* codes directly
instead.  (No change in functionality.)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-29 10:56:51 +02:00
Linus Torvalds
c54b245d01 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace rlimit handling update from Eric Biederman:
 "This is the work mainly by Alexey Gladkov to limit rlimits to the
  rlimits of the user that created a user namespace, and to allow users
  to have stricter limits on the resources created within a user
  namespace."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  cred: add missing return error code when set_cred_ucounts() failed
  ucounts: Silence warning in dec_rlimit_ucounts
  ucounts: Set ucount_max to the largest positive value the type can hold
  kselftests: Add test to check for rlimit changes in different user namespaces
  Reimplement RLIMIT_MEMLOCK on top of ucounts
  Reimplement RLIMIT_SIGPENDING on top of ucounts
  Reimplement RLIMIT_MSGQUEUE on top of ucounts
  Reimplement RLIMIT_NPROC on top of ucounts
  Use atomic_t for ucounts reference counting
  Add a reference to ucounts for each cred
  Increase size of ucounts to atomic_long_t
2021-06-28 20:39:26 -07:00
Linus Torvalds
8ec035ac4a fallthrough fixes for Clang for 5.14-rc1
Hi Linus,
 
 Please, pull the following patches that fix many fall-through warnings
 when building with Clang 12.0.0 and this[1] change reverted. Notice
 that in order to enable -Wimplicit-fallthrough for Clang, such change[1]
 is meant to be reverted at some point. So, these patches help to move
 in that direction.
 
 Thanks!
 
 [1] commit e2079e93f562c ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmDaNe8ACgkQRwW0y0cG
 2zFfGA/9G1A/Hrf261/P9olyYe2TRBwLnO1tUDREm3qtJ2JdKpf+7EM3VDm+Ue/A
 qhNmwp5G7nmp7Nqq8MfbdFjeo/rPS67voXiOfO8b0pU+E4XlOc+B1BXL0BWtnP7b
 xvuauklQU6dmCp2u44vsxdBIO6ooR0uQh+7/+1la+mPyEk9mlooQ4lyFcpfA53yt
 zxEGrx0tZBrDXghEI1CkHxOaJaX3qhw4EUYvxe8n2L7Dgx+o2djL/G4/SRYH/xoq
 MZa8TLyCuR3J0Ph4TfDONhMmf8ZLn+j70xBhewcVfZ1JfvGSVw4DQNN44KZCDnrK
 tGsBo5VFksjbmX83LmT8UlqB1rTP4nVQtRmtOPvbQA9kd19yy+Y64Y58FcGU2FHl
 PWt3rQJ1JzBo3TtzQoz7HSJCt9QTil4U7hFbNtcp5BbWQfUPkRgpWcL3FOchZbZ6
 FnLMqHanw2lrKMzZEoyHvg6G7BT67k3rrFgtd/xGSn8ohtfKXaZBYa9PKrQ0LwuG
 o8tQtIX1owj4rbdI1t6Ob4X/tT6Y7DzH8nsF+TsJQ4XeSCD2rURUcYltBMIlEr16
 DFj7iWKIrrX80/JRsBXu7a9h8nn5YptxV12SGRq/Cu/2jfRwjDye4IzsCyqMf67n
 oEN6YC1XYaEUmKXTnI8Z0CxY0qwSTcNjeH5Ci9jWepinsqD3Jxw=
 =Kt2q
 -----END PGP SIGNATURE-----

Merge tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull fallthrough fixes from Gustavo Silva:
 "Fix many fall-through warnings when building with Clang 12.0.0 and
  '-Wimplicit-fallthrough' so that we at some point will be able to
  enable that warning by default"

* tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (26 commits)
  rxrpc: Fix fall-through warnings for Clang
  drm/nouveau/clk: Fix fall-through warnings for Clang
  drm/nouveau/therm: Fix fall-through warnings for Clang
  drm/nouveau: Fix fall-through warnings for Clang
  xfs: Fix fall-through warnings for Clang
  xfrm: Fix fall-through warnings for Clang
  tipc: Fix fall-through warnings for Clang
  sctp: Fix fall-through warnings for Clang
  rds: Fix fall-through warnings for Clang
  net/packet: Fix fall-through warnings for Clang
  net: netrom: Fix fall-through warnings for Clang
  ide: Fix fall-through warnings for Clang
  hwmon: (max6621) Fix fall-through warnings for Clang
  hwmon: (corsair-cpro) Fix fall-through warnings for Clang
  firewire: core: Fix fall-through warnings for Clang
  braille_console: Fix fall-through warnings for Clang
  ipv4: Fix fall-through warnings for Clang
  qlcnic: Fix fall-through warnings for Clang
  bnxt_en: Fix fall-through warnings for Clang
  netxen_nic: Fix fall-through warnings for Clang
  ...
2021-06-28 20:03:38 -07:00
Linus Torvalds
07bdc0746a pstore updates for v5.14-rc1
Use normal block device I/O path for pstore/blk. (Christoph Hellwig,
 Kees Cook, Pu Lehui)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmDaKjAWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj5HEACHiIyjZoWDGtpdaV/GumREeKoH
 UaeqCDZHNjC0AsMGSPuyifAvUHplFH9LPEHsQuvob9kSyuns2ZMDzEeWVIQzDSYI
 yFAaxjqP+H9mwR0QLWZFyOv/RySPahMX8cQJyRe9mt3rBWMrK/dVI51GOG69qad7
 Eb1Z4GCXz05EToWdqUK5DHlTNirbZdyAfeqdWR9Ir4zrK47SFLe4SDf7mzoySCXc
 8U2yAJkaSQGwX+aVh0pgPM+nkYmf26O4iqW+G3blTKap4nxwefUJHxtYkSaiBQPH
 rqNVo9hnk4LnXxJQudPmuVWMahK4OszeSpApmcok3jCZ5+/2ZPPxFYiLE4kGquTh
 ulMVlECDMcc3WC7+HLYmbbg78l1K4fTTdSlcg9uXN0c6M4X+0WgLvifCJyVADQkF
 twwGu06bszqCjbg+dCrabI/e/nDltcc6/T9jVFYmEwckVu0OVP9Y8DndELxngqi1
 NTm+h7ArCQQtO+vm7Eeg08fqZBj4jGCXLGyPB0qI/k+YMbhq3YrGX4uoajpFUz5s
 2k1M/JwOhqPPUiZJqGCxXKkXKDbrmiE2aLRoiSsSwlxL/8bqCCaXi0sc58pWUGci
 93RWEWF6fZnfD4oLnzmWrQpYJHpzckoqzn8DmPQx9ztRjixN3RX/fcpyRdaaYPq9
 btlK81xRoPVyqmH+7A==
 =ul9O
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:
 "Use normal block device I/O path for pstore/blk. (Christoph Hellwig,
  Kees Cook, Pu Lehui)"

* tag 'pstore-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/blk: Include zone in pstore_device_info
  pstore/blk: Fix kerndoc and redundancy on blkdev param
  pstore/blk: Use the normal block device I/O path
  pstore/blk: Move verify_size() macro out of function
  pstore/blk: Improve failure reporting
2021-06-28 19:57:00 -07:00
Linus Torvalds
122fa8c588 for-5.14-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmDZ4TwACgkQxWXV+ddt
 WDvWCQ/8Dgnk+FBC25JOkqgu29VZtvhfWkY1poDRuG+tca6VeMMnDbPgnTQFyeS1
 38F4uNNi/F5UdFuLz3RK0jYgGFKXTp+sFjavFuXeJQpFxe7VSu7JrilZPaA1Dti8
 E8Dp42ilrHDikDbZaT8JB9GSnR7a8tHnIs0RfZSIkHsd+rPs7QPtM0TTzEZyLHqH
 2uYoVyd5EvclvM5JLVGxRZ3lTU64zfZlJg+TnoAkBpilqUHqpD+x5cEoNYbdhbAb
 j3sF11h/zEa/wmU5w5LRd4Qvl3JygCrnAo+6VAxB/u0yzJnH+UwOEJdDDeUpB/9k
 2F/Zy69CUQ7DdXM+Es4TOfAyQ9fpPLt8Z96GIBrdD5BxWbam4pyU5xH4cDPNpsHo
 zRCepdU1zwD6z3cfEYKmUAx89ewC8SE8XlUOWiGun4pBKdi3tgwcrytTnu+02JND
 mEkP4vTWG2bU+S0Si0u/aAKHcFvOwiY9iHM9tmblVvvlSFYrhFAclsytihPwu9NQ
 d9FRQMo9JZbQZXqaWpcmd8eXACz9+5AulIhofpuZLciyhvWpL+CQ+xGNnzJ1DnTH
 ct0m+ByFb33bTpAnblkgCMQa9xuwlM57NxvIclRaDPXWipqyZReih9fbF1TkHbXQ
 0dkrKe8cHn9w+DI1Hs1Hu1zdD7WJJxNMY2x9MowMU9gDVNBbbVs=
 =htVu
 -----END PGP SIGNATURE-----

Merge tag 'for-5.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "A normal mix of improvements, core changes and features that user have
  been missing or complaining about.

  User visible changes:

   - new sysfs exports:
      - add sysfs knob to limit scrub IO bandwidth per device
      - device stats are also available in
           /sys/fs/btrfs/FSID/devinfo/DEVID/error_stats

   - support cancellable resize and device delete ioctls

   - change how the empty value is interpreted when setting a property,
     so far we have only 'btrfs.compression' and we need to distinguish
     a reset to defaults and setting "do not compress", in general the
     empty value will always mean 'reset to defaults' for any other
     property, for compression it's either 'no' or 'none' to forbid
     compression

  Performance improvements:

   - no need for full sync when truncation does not touch extents,
     reported run time change is -12%

   - avoid unnecessary logging of xattrs during fast fsyncs (+17%
     throughput, -17% runtime on xattr stress workload)

  Core:

   - preemptive flushing improvements and fixes
      - adjust clamping logic on multi-threaded workloads to avoid
        flushing too soon
      - take into account global block reserve, may help on almost full
        filesystems
      - continue flushing when there are enough pending delalloc and
        ordered bytes

   - simplify logic around conditional transaction commit, a workaround
     used in the past for throttling that's been superseded by ticket
     reservations that manage the throttling in a better way

   - subpage blocksize preparation:
      - submit read time repair only for each corrupted sector
      - scrub repair now works with sectors and not pages
      - free space cache (v1) works with sectors and not pages
      - more fine grained bio tracking for extents
      - subpage support in page callbacks, extent callbacks, end io
        callbacks

   - simplify transaction abort logic and always abort and don't check
     various potentially unreliable stats tracked by the transaction

   - exclusive operations can do more checks when started and allow eg.
     cancellation of the same running operation

   - ensure relocation never runs while we have send operations running,
     e.g. when zoned background auto reclaim starts

  Fixes:

   - zoned: more sanity checks of write pointer

   - improve error handling in delayed inodes

   - send:
      - fix invalid path for unlink operations after parent
        orphanization
      - fix crash when memory allocations trigger reclaim

   - skip compression of we have only one page (can't make things
     better)

   - empty value of a property newly means reset to default

  Other:

   - lots of cleanups, comment updates, yearly typo fixing

   - disable build on platforms having page size 256K"

* tag 'for-5.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (101 commits)
  btrfs: remove unused btrfs_fs_info::total_pinned
  btrfs: rip out btrfs_space_info::total_bytes_pinned
  btrfs: rip the first_ticket_bytes logic from fail_all_tickets
  btrfs: remove FLUSH_DELAYED_REFS from data ENOSPC flushing
  btrfs: rip out may_commit_transaction
  btrfs: send: fix crash when memory allocations trigger reclaim
  btrfs: ensure relocation never runs while we have send operations running
  btrfs: shorten integrity checker extent data mount option
  btrfs: switch mount option bits to enums and use wider type
  btrfs: props: change how empty value is interpreted
  btrfs: compression: don't try to compress if we don't have enough pages
  btrfs: fix unbalanced unlock in qgroup_account_snapshot()
  btrfs: sysfs: export dev stats in devinfo directory
  btrfs: fix typos in comments
  btrfs: remove a stale comment for btrfs_decompress_bio()
  btrfs: send: use list_move_tail instead of list_del/list_add_tail
  btrfs: disable build on platforms having page size 256K
  btrfs: send: fix invalid path for unlink operations after parent orphanization
  btrfs: inline wait_current_trans_commit_start in its caller
  btrfs: sink wait_for_unblock parameter to async commit
  ...
2021-06-28 16:28:41 -07:00
Linus Torvalds
7aed4d57b1 Changes since last update:
- fix wrong error code overwritten due to sb checksum feature;
 
  - 2 minor cleanups;
 
  - update Chao's email address.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYNnk0xEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBGRsAQDpMdAyTjX+r9YDIC/9SpMUNfzlU8wxMKwg
 OrMn1mjK/gD+J+kKkJsuE4I2zuWlU5BSDHfDRxlEnRIhQN3cpo+bXw0=
 =mRUf
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "No noticable change available for this cycle. Just a bugfix related to
  sb chksum feature, two minor cleanups and Chao's email address update:

   - fix wrong error code overwritten due to sb checksum feature

   - two minor cleanups

   - update Chao's email address"

* tag 'erofs-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  MAINTAINERS: erofs: update my email address
  erofs: clean up file headers & footers
  erofs: remove the occupied parameter from z_erofs_pagevec_enqueue()
  erofs: fix error return code in erofs_read_superblock()
2021-06-28 16:24:18 -07:00
Linus Torvalds
a58e203530 fscrypt updates for 5.14
A couple bug fixes for fs/crypto/:
 
 - Fix handling of major dirhash values that happen to be 0.
 
 - Fix cases where keys were derived differently on big endian systems
   than on little endian systems (affecting some newer features only).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYNn+KhQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK595AP4hhu5pLLjPv+Okep+k+RTze5MzH9rH
 aXJK2T8J4TwGBgD/Qj+AjgLIJwjxk8mx3FliMsOjBxOIYiIpjHVNZect9AI=
 =+6JI
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt updates from Eric Biggers:
 "A couple bug fixes for fs/crypto/:

   - Fix handling of major dirhash values that happen to be 0.

   - Fix cases where keys were derived differently on big endian systems
     than on little endian systems (affecting some newer features only)"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: fix derivation of SipHash keys on big endian CPUs
  fscrypt: don't ignore minor_hash when hash is 0
2021-06-28 16:20:48 -07:00
Tanner Love
a358f40600 once: implement DO_ONCE_LITE for non-fast-path "do once" functionality
Certain uses of "do once" functionality reside outside of fast path,
and so do not require jump label patching via static keys, making
existing DO_ONCE undesirable in such cases.

Replace uses of __section(".data.once") with DO_ONCE_LITE(_IF)?

This patch changes the return values of xfs_printk_once, printk_once,
and printk_deferred_once. Before, they returned whether the print was
performed, but now, they always return true. This is okay because the
return values of the following macros are entirely ignored throughout
the kernel:
- xfs_printk_once
- xfs_warn_once
- xfs_notice_once
- xfs_info_once
- printk_once
- pr_emerg_once
- pr_alert_once
- pr_crit_once
- pr_err_once
- pr_warn_once
- pr_notice_once
- pr_info_once
- pr_devel_once
- pr_debug_once
- printk_deferred_once
- orc_warn

Changes
v3:
  - Expand commit message to explain why changing return values of
    xfs_printk_once, printk_once, printk_deferred_once is benign
v2:
  - Fix i386 build warnings

Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28 15:54:57 -07:00
Linus Torvalds
54a728dc5e Scheduler udpates for this cycle:
- Changes to core scheduling facilities:
 
     - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
       coordinated scheduling across SMT siblings. This is a much
       requested feature for cloud computing platforms, to allow
       the flexible utilization of SMT siblings, without exposing
       untrusted domains to information leaks & side channels, plus
       to ensure more deterministic computing performance on SMT
       systems used by heterogenous workloads.
 
       There's new prctls to set core scheduling groups, which
       allows more flexible management of workloads that can share
       siblings.
 
     - Fix task->state access anti-patterns that may result in missed
       wakeups and rename it to ->__state in the process to catch new
       abuses.
 
  - Load-balancing changes:
 
      - Tweak newidle_balance for fair-sched, to improve
        'memcache'-like workloads.
 
      - "Age" (decay) average idle time, to better track & improve workloads
        such as 'tbench'.
 
      - Fix & improve energy-aware (EAS) balancing logic & metrics.
 
      - Fix & improve the uclamp metrics.
 
      - Fix task migration (taskset) corner case on !CONFIG_CPUSET.
 
      - Fix RT and deadline utilization tracking across policy changes
 
      - Introduce a "burstable" CFS controller via cgroups, which allows
        bursty CPU-bound workloads to borrow a bit against their future
        quota to improve overall latencies & batching. Can be tweaked
        via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
 
      - Rework assymetric topology/capacity detection & handling.
 
  - Scheduler statistics & tooling:
 
      - Disable delayacct by default, but add a sysctl to enable
        it at runtime if tooling needs it. Use static keys and
        other optimizations to make it more palatable.
 
      - Use sched_clock() in delayacct, instead of ktime_get_ns().
 
  - Misc cleanups and fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZcPoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g3yw//WfhIqy7Psa9d/MBMjQDRGbTuO4+w22Dj
 vmWFU44Q4KJxQHWeIgUlrK+dzvYWvNmflUs2CUUOiDVzxFTHMIyBtL4qCBUbx4Ns
 vKAcB9wsWZge2o3WzZqpProRhdoRaSKw8egUr2q7rACVBkckY7eGP/OjWxXU8BdA
 b7D0LPWwuIBFfN4pFYeCDLn32Dqr9s6Chyj+ZecabdG7EE6Gu+f1diVcxy7JE/mc
 4WWL0D1RqdgpGrBEuMJIxPYekdrZiuy4jtEbztz5gbTBteN1cj3BLfqn0Pc/e6rO
 Vyuc5mXCAmzRVi18z6g6bsVl+IA/nrbErENB2OHOhOYtqiZxqGTd4GPWZszMyY17
 5AsEO5+5pcaBsy4gyp09qURggBu9zhJnMVmOI3rIHZkmkhwzc6uUJlyhDCTiFWOz
 3ZF3LjbZEyCKodMD8qMHbs3axIBpIfZqjzkvSKyFnvfXEGVytVse7NUuWtQ36u92
 GnURxVeYY1TDVXvE1Y8owNKMxknKQ6YRlypP7Dtbeo/qG6hShp0xmS7qDLDi0ybZ
 ZlK+bDECiVoDf3nvJo+8v5M82IJ3CBt4UYldeRJsa1YCK/FsbK8tp91fkEfnXVue
 +U6LPX0AmMpXacR5HaZfb3uBIKRw/QMdP/7RFtBPhpV6jqCrEmuqHnpPQiEVtxwO
 UmG7bt94Trk=
 =3VDr
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler udpates from Ingo Molnar:

 - Changes to core scheduling facilities:

    - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
      coordinated scheduling across SMT siblings. This is a much
      requested feature for cloud computing platforms, to allow the
      flexible utilization of SMT siblings, without exposing untrusted
      domains to information leaks & side channels, plus to ensure more
      deterministic computing performance on SMT systems used by
      heterogenous workloads.

      There are new prctls to set core scheduling groups, which allows
      more flexible management of workloads that can share siblings.

    - Fix task->state access anti-patterns that may result in missed
      wakeups and rename it to ->__state in the process to catch new
      abuses.

 - Load-balancing changes:

    - Tweak newidle_balance for fair-sched, to improve 'memcache'-like
      workloads.

    - "Age" (decay) average idle time, to better track & improve
      workloads such as 'tbench'.

    - Fix & improve energy-aware (EAS) balancing logic & metrics.

    - Fix & improve the uclamp metrics.

    - Fix task migration (taskset) corner case on !CONFIG_CPUSET.

    - Fix RT and deadline utilization tracking across policy changes

    - Introduce a "burstable" CFS controller via cgroups, which allows
      bursty CPU-bound workloads to borrow a bit against their future
      quota to improve overall latencies & batching. Can be tweaked via
      /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.

    - Rework assymetric topology/capacity detection & handling.

 - Scheduler statistics & tooling:

    - Disable delayacct by default, but add a sysctl to enable it at
      runtime if tooling needs it. Use static keys and other
      optimizations to make it more palatable.

    - Use sched_clock() in delayacct, instead of ktime_get_ns().

 - Misc cleanups and fixes.

* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/doc: Update the CPU capacity asymmetry bits
  sched/topology: Rework CPU capacity asymmetry detection
  sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
  psi: Fix race between psi_trigger_create/destroy
  sched/fair: Introduce the burstable CFS controller
  sched/uclamp: Fix uclamp_tg_restrict()
  sched/rt: Fix Deadline utilization tracking during policy change
  sched/rt: Fix RT utilization tracking during policy change
  sched: Change task_struct::state
  sched,arch: Remove unused TASK_STATE offsets
  sched,timer: Use __set_current_state()
  sched: Add get_current_state()
  sched,perf,kvm: Fix preemption condition
  sched: Introduce task_is_running()
  sched: Unbreak wakeups
  sched/fair: Age the average idle time
  sched/cpufreq: Consider reduced CPU capacity in energy calculation
  sched/fair: Take thermal pressure into account while estimating energy
  thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
  sched/fair: Return early from update_tg_cfs_load() if delta == 0
  ...
2021-06-28 12:14:19 -07:00
Muchun Song
8b0ed8443a writeback: fix obtain a reference to a freeing memcg css
The caller of wb_get_create() should pin the memcg, because
wb_get_create() relies on this guarantee. The rcu read lock
only can guarantee that the memcg css returned by css_from_id()
cannot be released, but the reference of the memcg can be zero.

  rcu_read_lock()
  memcg_css = css_from_id()
  wb_get_create(memcg_css)
      cgwb_create(memcg_css)
          // css_get can change the ref counter from 0 back to 1
          css_get(memcg_css)
  rcu_read_unlock()

Fix it by holding a reference to the css before calling
wb_get_create(). This is not a problem I encountered in the
real world. Just the result of a code review.

Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Link: https://lore.kernel.org/r/20210402091145.80635-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-06-28 18:35:02 +02:00
Andreas Gruenbacher
5d49d3508b gfs2: Fix error handling in init_statfs
On an error path, init_statfs calls iput(pn) after pn has already been put.
Fix that by setting pn to NULL after the initial iput.

Fixes: 97fd734ba17e ("gfs2: lookup local statfs inodes prior to journal recovery")
Cc: stable@vger.kernel.org # v5.10+
Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28 14:30:00 +02:00
Andreas Gruenbacher
d3c51c55cb gfs2: Fix underflow in gfs2_page_mkwrite
On filesystems with a block size smaller than PAGE_SIZE and non-empty
files smaller then PAGE_SIZE, gfs2_page_mkwrite could end up allocating
excess blocks beyond the end of the file, similar to fallocate.  This
doesn't make sense; fix it.

Reported-by: Bob Peterson <rpeterso@redhat.com>
Fixes: 184b4e60853d ("gfs2: Fix end-of-file handling in gfs2_page_mkwrite")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28 14:13:38 +02:00
Baokun Li
38a618dbf4 gfs2: Use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28 14:13:38 +02:00
Andreas Gruenbacher
0f1616f6df gfs2: Fix do_gfs2_set_flags description
Commit 88b631cbfbeb ("gfs2: convert to fileattr") changed the argument list
without updating the description.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-06-28 14:13:38 +02:00
Steve French
0fa757b5d3 smb3: prevent races updating CurrentMid
There was one place where we weren't locking CurrentMid, and although
likely to be safe since even without the lock since it is during
negotiate protocol, it is more consistent to lock it in this last remaining
place, and avoids confusing Coverity warning.

Addresses-Coverity: 1486665 ("Data race condition")
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-25 14:02:26 -05:00
Linus Torvalds
7ce32ac6fb Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "24 patches, based on 4a09d388f2ab382f217a764e6a152b3f614246f6.

  Subsystems affected by this patch series: mm (thp, vmalloc, hugetlb,
  memory-failure, and pagealloc), nilfs2, kthread, MAINTAINERS, and
  mailmap"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  mailmap: add Marek's other e-mail address and identity without diacritics
  MAINTAINERS: fix Marek's identity again
  mm/page_alloc: do bulk array bounds check after checking populated elements
  mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array
  mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
  mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned
  mm/memory-failure: use a mutex to avoid memory_failure() races
  mm, futex: fix shared futex pgoff on shmem huge page
  kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
  kthread_worker: split code for canceling the delayed work timer
  mm/vmalloc: unbreak kasan vmalloc support
  KVM: s390: prepare for hugepage vmalloc
  mm/vmalloc: add vmalloc_no_huge
  nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
  mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
  mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
  mm: page_vma_mapped_walk(): get vma_address_end() earlier
  mm: page_vma_mapped_walk(): use goto instead of while (1)
  mm: page_vma_mapped_walk(): add a level of indentation
  mm: page_vma_mapped_walk(): crossing page table boundary
  ...
2021-06-25 11:05:03 -07:00
Linus Torvalds
edf54d9d0a Two -rc1 regression fixes: one in the auth code affecting old clusters
and one in the filesystem for proper propagation of MDS request errors.
 Also included a locking fix for async creates, marked for stable.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmDV440THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi3CXB/0aA0Ka+weQtIxxX3zl1thsE1APxoKe
 va77EfTJZbN12UHKAJ6sJUpXCLFc5hVJETw7w3qyz22VvJIPUQWd+h4w4eTXJ4QK
 Fab6+HT0/p0NxZ29rxa1bkHnrRAD30cpNd6WXcAeMJ3ZKvZfPtPnIWXSmCbJYGLV
 xhwx8y6kzjE60B60bjcQzuSpsMQkq0OpdXYdyxq3RysCjTCyDfpGuFnDHSv3aklm
 d6tyv2nUDM/oF/CEFZrTeaLrIZsYxxkpJHKkm7Xy70bUv8IMW97CKJSFjKYucyYd
 iV7VbtIKPq3sbGrmkaWm4nET5Z0C+m+JD2AhR17ylbdQy91hKaGrbnpw
 =RTBT
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two regression fixes from the merge window: one in the auth code
  affecting old clusters and one in the filesystem for proper
  propagation of MDS request errors.

  Also included a locking fix for async creates, marked for stable"

* tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client:
  libceph: set global_id as soon as we get an auth ticket
  libceph: don't pass result into ac->ops->handle_reply()
  ceph: fix error handling in ceph_atomic_open and ceph_lookup
  ceph: must hold snap_rwsem when filling inode for async create
2021-06-25 09:50:30 -07:00
Linus Torvalds
9e736cf7d6 netfslib fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmDQ9Z4ACgkQ+7dXa6fL
 C2sVfg/9H3OlL4Vwv5CERgb5kbDoALAh5epYMFyvNVy9l5iTvYekxG3qna6ee0+G
 pTRHSU5tZUCUslpafv8LUz1RE/iM127y+BP+qq2joG9jkT4q3QfeV4oDr+dgZWCL
 obp+rjQSTKkaGh3eqjDx7gCSp5mqQI/M8MXe1VOXxUzAnsf2nH4iJLv/A9NN17xW
 l7sQfZJGHD1BPqsxxFiSr+UCkCLuLDybUDYL6+PZcFu0jSON0h4yEtIwsAA7a1Zw
 TvgUpequrbyTORI3sHJ8eIixWosh4yLTJ4pDqs8qDqq3Wm5eTZjss0wxEaVfpaTu
 cg/CcNUBiHJ/6Q8r+JiuPbEnfnQ9woL8951/CNi+cOGhTy9LNoG70orSZKKynmW5
 QmpYBK5BkM57EPj7DYZJRI1Bwy41pJapFj8tXbMjObU+ZyaXMysmBDanIRK/cLoy
 fr4Sz+1D8yJPQ0GDgC4051CxrhynOEnRo8JbESg8CD4PnqFeM7EoCh48H2oVvR4N
 9v2xuaRyBvi2KTmKSktRe+s1DS80acVMUYP33nT2zAthvL91VVgMY3Hz2/QrlAp8
 h1hREME8aRcN4LrBIgp/ET150hUh44U2A07PqnYYe0653MH7aHHFfk134ZuTmbub
 Pfc1MHtWgPAWN4c140ILBRTidJeShszoJGgpD6tflkj10VI2s34=
 =2c6l
 -----END PGP SIGNATURE-----

Merge tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull netfs fixes from David Howells:
 "This contains patches to fix netfs_write_begin() and afs_write_end()
  in the following ways:

  (1) In netfs_write_begin(), extract the decision about whether to skip
      a page out to its own helper and have that clear around the region
      to be written, but not clear that region. This requires the
      filesystem to patch it up afterwards if the hole doesn't get
      completely filled.

  (2) Use offset_in_thp() in (1) rather than manually calculating the
      offset into the page.

  (3) Due to (1), afs_write_end() now needs to handle short data write
      into the page by generic_perform_write(). I've adopted an
      analogous approach to ceph of just returning 0 in this case and
      letting the caller go round again.

  It also adds a note that (in the future) the len parameter may extend
  beyond the page allocated. This is because the page allocation is
  deferred to write_begin() and that gets to decide what size of THP to
  allocate."

Jeff Layton points out:
 "The netfs fix in particular fixes a data corruption bug in cephfs"

* tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  netfs: fix test for whether we can skip read when writing beyond EOF
  afs: Fix afs_write_end() to handle short writes
2021-06-25 09:41:29 -07:00
Pavel Skripkin
8fd0c1b064 nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
My local syzbot instance hit memory leak in nilfs2.  The problem was in
missing kobject_put() in nilfs_sysfs_delete_device_group().

kobject_del() does not call kobject_cleanup() for passed kobject and it
leads to leaking duped kobject name if kobject_put() was not called.

Fail log:

  BUG: memory leak
  unreferenced object 0xffff8880596171e0 (size 8):
  comm "syz-executor379", pid 8381, jiffies 4294980258 (age 21.100s)
  hex dump (first 8 bytes):
    6c 6f 6f 70 30 00 00 00                          loop0...
  backtrace:
     kstrdup+0x36/0x70 mm/util.c:60
     kstrdup_const+0x53/0x80 mm/util.c:83
     kvasprintf_const+0x108/0x190 lib/kasprintf.c:48
     kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289
     kobject_add_varg lib/kobject.c:384 [inline]
     kobject_init_and_add+0xc9/0x160 lib/kobject.c:473
     nilfs_sysfs_create_device_group+0x150/0x800 fs/nilfs2/sysfs.c:999
     init_nilfs+0xe26/0x12b0 fs/nilfs2/the_nilfs.c:637

Link: https://lkml.kernel.org/r/20210612140559.20022-1-paskripkin@gmail.com
Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24 19:40:53 -07:00
Steve French
0060a4f28a cifs: fix missing spinlock around update to ses->status
In the other places where we update ses->status we protect the
updates via GlobalMid_Lock. So to be consistent add the same
locking around it in cifs_put_smb_ses where it was missing.

Addresses-Coverity: 1268904 ("Data race condition")
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-24 16:09:10 -05:00
Christoph Hellwig
0384264ea8 block: pass a gendisk to bdev_disk_changed
bdev_disk_changed can only operate on whole devices.  Make that clear
by passing a gendisk instead of the struct block_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210624123240.441814-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-24 12:01:06 -06:00
Christoph Hellwig
630161cfdf block: move bdev_disk_changed
Move bdev_disk_changed to block/partitions/core.c, together with the
rest of the partition scanning code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210624123240.441814-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-24 12:01:06 -06:00
Zhang Yi
acc6100d3f fs: remove bdev_try_to_free_page callback
After remove the unique user of sop->bdev_try_to_free_page() callback,
we could remove the callback and the corresponding blkdev_releasepage()
at all.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-9-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:55:42 -04:00
Zhang Yi
3b672e3aed ext4: remove bdev_try_to_free_page() callback
After we introduce a jbd2 shrinker to release checkpointed buffer's
journal head, we could free buffer without bdev_try_to_free_page()
under memory pressure. So this patch remove the whole
bdev_try_to_free_page() callback directly. It also remove many
use-after-free issues relate to it together.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-8-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:55:41 -04:00
Zhang Yi
dbf2bab793 jbd2: simplify journal_clean_one_cp_list()
Now that __try_to_free_cp_buf() remove checkpointed buffer or transaction
when the buffer is not 'busy', which is only called by
journal_clean_one_cp_list(). This patch simplify this function by remove
__try_to_free_cp_buf() and invoke __cp_buffer_busy() directly.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-7-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:55:39 -04:00
Zhang Yi
4ba3fcdde7 jbd2,ext4: add a shrinker to release checkpointed buffers
Current metadata buffer release logic in bdev_try_to_free_page() have
a lot of use-after-free issues when umount filesystem concurrently, and
it is difficult to fix directly because ext4 is the only user of
s_op->bdev_try_to_free_page callback and we may have to add more special
refcount or lock that is only used by ext4 into the common vfs layer,
which is unacceptable.

One better solution is remove the bdev_try_to_free_page callback, but
the real problem is we cannot easily release journal_head on the
checkpointed buffer, so try_to_free_buffers() cannot release buffers and
page under memory pressure, which is more likely to trigger
out-of-memory. So we cannot remove the callback directly before we find
another way to release journal_head.

This patch introduce a shrinker to free journal_head on the checkpointed
transaction. After the journal_head got freed, try_to_free_buffers()
could free buffer properly.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-6-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:54:49 -04:00
Zhang Yi
214eb5a4d8 jbd2: remove redundant buffer io error checks
Now that __jbd2_journal_remove_checkpoint() can detect buffer io error
and mark journal checkpoint error, then we abort the journal later
before updating log tail to ensure the filesystem works consistently.
So we could remove other redundant buffer io error checkes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-5-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:33:50 -04:00
Zhang Yi
235d68069c jbd2: don't abort the journal when freeing buffers
Now that we can be sure the journal is aborted once a buffer has failed
to be written back to disk, we can remove the journal abort logic in
jbd2_journal_try_to_free_buffers() which was introduced in
commit c044f3d8360d ("jbd2: abort journal if free a async write error
metadata buffer"), because it may cost and propably is not safe.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-4-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:33:50 -04:00
Zhang Yi
fcf37549ae jbd2: ensure abort the journal if detect IO error when writing original buffer back
Although we merged c044f3d8360 ("jbd2: abort journal if free a async
write error metadata buffer"), there is a race between
jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the
jbd2_log_do_checkpoint() may still fail to detect the buffer write
io error flag which may lead to filesystem inconsistency.

jbd2_journal_try_to_free_buffers()     ext4_put_super()
                                        jbd2_journal_destroy()
  __jbd2_journal_remove_checkpoint()
  detect buffer write error              jbd2_log_do_checkpoint()
                                         jbd2_cleanup_journal_tail()
                                           <--- lead to inconsistency
  jbd2_journal_abort()

Fix this issue by introducing a new atomic flag which only have one
JBD2_CHECKPOINT_IO_ERROR bit now, and set it in
__jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer
which has write_io_error flag. Then jbd2_journal_destroy() will detect
this mark and abort the journal to prevent updating log tail.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-3-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:33:49 -04:00
Zhang Yi
1866cba842 jbd2: remove the out label in __jbd2_journal_remove_checkpoint()
The 'out' lable just return the 'ret' value and seems not required, so
remove this label and switch to return appropriate value immediately.
This patch also do some minor cleanup, no logical change.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:33:06 -04:00
yangerkun
0caaefbaf2 ext4: no need to verify new add extent block
ext4_ext_grow_indepth will add a new extent block which has init the
expected content. We can mark this buffer as verified so to stop a
useless check in __read_extent_tree_block.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210609075545.1442160-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:29:13 -04:00
yangerkun
d07621d9b9 jbd2: clean up misleading comments for jbd2_fc_release_bufs
This comments was for jbd2_fc_wait_bufs, not for jbd2_fc_release_bufs.
Remove this misleading comments.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210608141236.459441-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:27:32 -04:00
Josh Triplett
b1489186cc ext4: add check to prevent attempting to resize an fs with sparse_super2
The in-kernel ext4 resize code doesn't support filesystem with the
sparse_super2 feature. It fails with errors like this and doesn't finish
the resize:
EXT4-fs (loop0): resizing filesystem from 16640 to 7864320 blocks
EXT4-fs warning (device loop0): verify_reserved_gdb:760: reserved GDT 2 missing grp 1 (32770)
EXT4-fs warning (device loop0): ext4_resize_fs:2111: error (-22) occurred during file system resize
EXT4-fs (loop0): resized filesystem to 2097152

To reproduce:
mkfs.ext4 -b 4096 -I 256 -J size=32 -E resize=$((256*1024*1024)) -O sparse_super2 ext4.img 65M
truncate -s 30G ext4.img
mount ext4.img /mnt
python3 -c 'import fcntl, os, struct ; fd = os.open("/mnt", os.O_RDONLY | os.O_DIRECTORY) ; fcntl.ioctl(fd, 0x40086610, struct.pack("Q", 30 * 1024 * 1024 * 1024 // 4096), False) ; os.close(fd)'
dmesg | tail
e2fsck ext4.img

The userspace resize2fs tool has a check for this case: it checks if the
filesystem has sparse_super2 set and if the kernel provides
/sys/fs/ext4/features/sparse_super2. However, the former check requires
manually reading and parsing the filesystem superblock.

Detect this case in ext4_resize_begin and error out early with a clear
error message.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Link: https://lore.kernel.org/r/74b8ae78405270211943cd7393e65586c5faeed1.1623093259.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:22:36 -04:00
Josh Triplett
e9f9f61d0c ext4: consolidate checks for resize of bigalloc into ext4_resize_begin
Two different places checked for attempts to resize a filesystem with
the bigalloc feature. Move the check into ext4_resize_begin, which both
places already call.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Link: https://lore.kernel.org/r/bee03303d999225ecb3bfa5be8576b2f4c6edbe6.1623093259.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:22:36 -04:00
Ritesh Harjani
310c097c2b ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
ext4_xattr_ibody_inline_set() & ext4_xattr_ibody_set() have the exact
same definition.  Hence remove ext4_xattr_ibody_inline_set() and all
its call references. Convert the callers of it to call
ext4_xattr_ibody_set() instead.

[ Modified to preserve ext4_xattr_ibody_set() and remove
  ext4_xattr_ibody_inline_set() instead. -- TYT ]

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/fd566b799bbbbe9b668eb5eecde5b5e319e3694f.1622685482.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:09:39 -04:00