Commit Graph

1089419 Commits

Author SHA1 Message Date
Christoph Hellwig
40349d0e16 drbd: remove assign_p_sizes_qlim
Fold each branch into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:58 -06:00
Christoph Hellwig
968786b9ef target: fix discard alignment on partitions
Use the proper bdev_discard_alignment helper that accounts for partition
offsets.

Fixes: c66ac9db8d ("[SCSI] target: Add LIO target core v4.0.0-rc6")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:58 -06:00
Christoph Hellwig
817e8b51eb target: pass a block_device to target_configure_unmap_from_queue
The SCSI target drivers is a consumer of the block layer and shoul
d generally work on struct block_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:58 -06:00
Christoph Hellwig
179d8609d8 target: remove an incorrect unmap zeroes data deduction
For block devices, the SCSI target drivers implements UNMAP as calls to
blkdev_issue_discard, which does not guarantee zeroing just because
Write Zeroes is supported.

Note that this does not affect the file backed path which uses
fallocate to punch holes.

Fixes: 2237498f0b ("target/iblock: Convert WRITE_SAME to blkdev_issue_zeroout")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:58 -06:00
Jan Kara
075a53b78b bfq: Make sure bfqg for which we are queueing requests is online
Bios queued into BFQ IO scheduler can be associated with a cgroup that
was already offlined. This may then cause insertion of this bfq_group
into a service tree. But this bfq_group will get freed as soon as last
bio associated with it is completed leading to use after free issues for
service tree users. Fix the problem by making sure we always operate on
online bfq_group. If the bfq_group associated with the bio is not
online, we pick the first online parent.

CC: stable@vger.kernel.org
Fixes: e21b7a0b98 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:32 -06:00
Jan Kara
4e54a2493e bfq: Get rid of __bio_blkcg() usage
BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio
would not be associated with any blkcg, the usage of __bio_blkcg() in
BFQ is prone to races with the task being migrated between cgroups as
__bio_blkcg() calls at different places could return different blkcgs.

Convert BFQ to the new situation where bio->bi_blkg is initialized in
bio_set_dev() and thus practically always valid. This allows us to save
blkcg_gq lookup and noticeably simplify the code.

CC: stable@vger.kernel.org
Fixes: 0fe061b9f0 ("blkcg: fix ref count issue with bio_blkcg() using task_css")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-8-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:29 -06:00
Jan Kara
09f8718680 bfq: Track whether bfq_group is still online
Track whether bfq_group is still online. We cannot rely on
blkcg_gq->online because that gets cleared only after all policies are
offlined and we need something that gets updated already under
bfqd->lock when we are cleaning up our bfq_group to be able to guarantee
that when we see online bfq_group, it will stay online while we are
holding bfqd->lock lock.

CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-7-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:27 -06:00
Jan Kara
5f550ede5e bfq: Remove pointless bfq_init_rq() calls
We call bfq_init_rq() from request merging functions where requests we
get should have already gone through bfq_init_rq() during insert and
anyway we want to do anything only if the request is already tracked by
BFQ. So replace calls to bfq_init_rq() with RQ_BFQQ() instead to simply
skip requests untracked by BFQ. We move bfq_init_rq() call in
bfq_insert_request() a bit earlier to cover request merging and thus
can transfer FIFO position in case of a merge.

CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-6-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:25 -06:00
Jan Kara
fc84e1f941 bfq: Drop pointless unlock-lock pair
In bfq_insert_request() we unlock bfqd->lock only to call
trace_block_rq_insert() and then lock bfqd->lock again. This is really
pointless since tracing is disabled if we really care about performance
and even if the tracepoint is enabled, it is a quick call.

CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-5-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:22 -06:00
Jan Kara
ea591cd4eb bfq: Update cgroup information before merging bio
When the process is migrated to a different cgroup (or in case of
writeback just starts submitting bios associated with a different
cgroup) bfq_merge_bio() can operate with stale cgroup information in
bic. Thus the bio can be merged to a request from a different cgroup or
it can result in merging of bfqqs for different cgroups or bfqqs of
already dead cgroups and causing possible use-after-free issues. Fix the
problem by updating cgroup information in bfq_merge_bio().

CC: stable@vger.kernel.org
Fixes: e21b7a0b98 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-4-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:20 -06:00
Jan Kara
3bc5e683c6 bfq: Split shared queues on move between cgroups
When bfqq is shared by multiple processes it can happen that one of the
processes gets moved to a different cgroup (or just starts submitting IO
for different cgroup). In case that happens we need to split the merged
bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
we will just account IO time to wrong entities etc.

Similarly if the bfqq is scheduled to merge with another bfqq but the
merge didn't happen yet, cancel the merge as it need not be valid
anymore.

CC: stable@vger.kernel.org
Fixes: e21b7a0b98 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-3-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:16 -06:00
Jan Kara
c1cee4ab36 bfq: Avoid merging queues with different parents
It can happen that the parent of a bfqq changes between the moment we
decide two queues are worth to merge (and set bic->stable_merge_bfqq)
and the moment bfq_setup_merge() is called. This can happen e.g. because
the process submitted IO for a different cgroup and thus bfqq got
reparented. It can even happen that the bfqq we are merging with has
parent cgroup that is already offline and going to be destroyed in which
case the merge can lead to use-after-free issues such as:

BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50
Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544

CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G            E     5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0x46/0x5a
 print_address_description.constprop.0+0x1f/0x140
 ? __bfq_deactivate_entity+0x9cb/0xa50
 kasan_report.cold+0x7f/0x11b
 ? __bfq_deactivate_entity+0x9cb/0xa50
 __bfq_deactivate_entity+0x9cb/0xa50
 ? update_curr+0x32f/0x5d0
 bfq_deactivate_entity+0xa0/0x1d0
 bfq_del_bfqq_busy+0x28a/0x420
 ? resched_curr+0x116/0x1d0
 ? bfq_requeue_bfqq+0x70/0x70
 ? check_preempt_wakeup+0x52b/0xbc0
 __bfq_bfqq_expire+0x1a2/0x270
 bfq_bfqq_expire+0xd16/0x2160
 ? try_to_wake_up+0x4ee/0x1260
 ? bfq_end_wr_async_queues+0xe0/0xe0
 ? _raw_write_unlock_bh+0x60/0x60
 ? _raw_spin_lock_irq+0x81/0xe0
 bfq_idle_slice_timer+0x109/0x280
 ? bfq_dispatch_request+0x4870/0x4870
 __hrtimer_run_queues+0x37d/0x700
 ? enqueue_hrtimer+0x1b0/0x1b0
 ? kvm_clock_get_cycles+0xd/0x10
 ? ktime_get_update_offsets_now+0x6f/0x280
 hrtimer_interrupt+0x2c8/0x740

Fix the problem by checking that the parent of the two bfqqs we are
merging in bfq_setup_merge() is the same.

Link: https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/
CC: stable@vger.kernel.org
Fixes: 430a67f9d6 ("block, bfq: merge bursts of newly-created queues")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-2-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:11 -06:00
Jan Kara
70456e5210 bfq: Avoid false marking of bic as stably merged
bfq_setup_cooperator() can mark bic as stably merged even though it
decides to not merge its bfqqs (when bfq_setup_merge() returns NULL).
Make sure to mark bic as stably merged only if we are really going to
merge bfqqs.

CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Fixes: 430a67f9d6 ("block, bfq: merge bursts of newly-created queues")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:34:02 -06:00
Christoph Hellwig
852ad96cb0 pktcdvd: stop using bio_reset
Just initialize the bios on-demand.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220406061228.410163-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:30:47 -06:00
Christoph Hellwig
066ff57101 block: turn bio_kmalloc into a simple kmalloc wrapper
Remove the magic autofree semantics and require the callers to explicitly
call bio_init to initialize the bio.

This allows bio_free to catch accidental bio_put calls on bio_init()ed
bios as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Coly Li <colyli@suse.de>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220406061228.410163-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:30:41 -06:00
Christoph Hellwig
7655db8093 target/pscsi: remove pscsi_get_bio
Remove pscsi_get_bio and simplify the code flow in the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220406061228.410163-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:29:41 -06:00
Christoph Hellwig
46a2d4ccc4 squashfs: always use bio_kmalloc in squashfs_bio_read
If a plain kmalloc that is not backed by a mempool is safe here for a
large read (and the actual page allocations), it must also be for a
small one, so simplify the code a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220406061228.410163-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:29:41 -06:00
Christoph Hellwig
f9e69aa9cc btrfs: simplify ->flush_bio handling
Use and embedded bios that is initialized when used instead of
bio_kmalloc plus bio_reset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220406061228.410163-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:29:41 -06:00
Mike Snitzer
b53f3dcd70 block: allow use of per-cpu bio alloc cache by block drivers
Refine per-cpu bio alloc cache interfaces so that DM and other block
drivers can properly create and use the cache:

DM uses bioset_init_from_src() to do its final bioset initialization,
so must update bioset_init_from_src() to set BIOSET_PERCPU_CACHE if
%src bioset has a cache.

Also move bio_clear_polled() to include/linux/bio.h to allow users
outside of block core.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220324203526.62306-3-snitzer@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 16:49:40 -06:00
Mike Snitzer
0df71650c0 block: allow using the per-cpu bio cache from bio_alloc_bioset
Replace the BIO_PERCPU_CACHE bio-internal flag with a REQ_ALLOC_CACHE
one that can be passed to bio_alloc / bio_alloc_bioset, and implement
the percpu cache allocation logic in a helper called from
bio_alloc_bioset.  This allows any bio_alloc_bioset user to use the
percpu caches instead of having the functionality tied to struct kiocb.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[hch: refactored a bit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220324203526.62306-2-snitzer@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 16:49:40 -06:00
Linus Torvalds
b2d229d4dd Linux 5.18-rc3 2022-04-17 13:57:31 -07:00
Linus Torvalds
a1901b464e xen: branch for v5.18-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYlvMdwAKCRCAXGG7T9hj
 vueqAQCeJiUt0Adhs7ACqzvTsBc1TGXD44J6AAfwedMgtdgtvAD+LvWXLhTcgiCb
 DT03AIpI1Z/40QgPYuJ3o4yAZN7eUg4=
 =dV8F
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.18-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixlet from Juergen Gross:
 "A single cleanup patch for the Xen balloon driver"

* tag 'for-linus-5.18-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: don't use PV mode extra memory for zone device allocations
2022-04-17 10:29:10 -07:00
Linus Torvalds
3a69a44278 Two x86 fixes related to TSX:
- Use either MSR_TSX_FORCE_ABORT or MSR_IA32_TSX_CTRL to disable TSX to
     cover all CPUs which allow to disable it.
 
   - Disable TSX development mode at boot so that a microcode update which
     provides TSX development mode does not suddenly make the system
     vulnerable to TSX Asynchronous Abort.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb5LYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVVbD/9cxZWkFctCiymedUZqLabkfpYSki65
 MngdpCPzCNaaIdlp44lwCido5+gJsY9unXdm3OAUzLjv6SsxxpDr5njz1/C6TM1l
 XmWjlkLEbG2QDPd1Ybd/lpYQORBmiukyo8v8x0yFT7ZzwvSddoDZAbeUtkQBrIin
 sDTeExsewKzL2X5qXhttrHLHu1PYgurn4ThIrrG+eg2e4FNk6UUFUS3TOyMvzJDg
 NWJ7N5pGy9YkR7CISq1q+qdnH55pGaUrgonDi2qBTt3EaH0fQtZP2ZtIOYr3O4nI
 YCx6isrIiGUB6kSygofxmk4B+22CaUJXd2OcUxMZ/Th/a2aCK+35BtGVPXQGi6nU
 d7m+ZWB7dShOiejFygS59ty+5L5kliKXYZfUASsq1CLoXH8K1xUwBMkbY5FQ2WH1
 Ue4KUvjguNqsgSRAfeHdOi6B36oot0Xf9JO013Wm3V/r9hsGPtSOjWwFuVvT/euw
 a9iFtruATxDssBxH/l0djCKnwwm5yuOt1OpyizcIMFnlCgRD06h/6zgAvsJK7c8d
 dh6lC4D2mXP1e2wtEyZelve1tmRJ/FeReyG2V5FNU7m1mWYGm1rJZ4AEvnbrzcbC
 ePwFva0lPu8GVKG6HRgHfR8PjuQ7TFmKPKytT7fboIqQpTIY+1Q75wYD4eXkSu8Q
 /ltzXQz/8lz7bA==
 =UQaW
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two x86 fixes related to TSX:

   - Use either MSR_TSX_FORCE_ABORT or MSR_IA32_TSX_CTRL to disable TSX
     to cover all CPUs which allow to disable it.

   - Disable TSX development mode at boot so that a microcode update
     which provides TSX development mode does not suddenly make the
     system vulnerable to TSX Asynchronous Abort"

* tag 'x86-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsx: Disable TSX development mode at boot
  x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
2022-04-17 09:55:59 -07:00
Linus Torvalds
fbb9c58e56 A small set of fixes for the timers core:
- Fix the warning condition in __run_timers() which does not take into
     account, that a CPU base (especially the deferrable base) has never a
     timer armed on it and therefore the next_expiry value can become stale.
 
   - Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to prevent
     endless spam in dmesg.
 
   - Remove the double star from a comment which is not meant to be in
     kernel-doc format.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb44gTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXptD/4tKI9W4q0CAZnXw/G9cJdUK2u2VQGb
 QfB0DMABuDLMlHUSO92T49IcfjQUR7hz36kHXY0OxwALaEB8ftap6yZLfI1864mZ
 RFatWcTl71ZYY0SSio0rMnbDawTeN9N69WLPR+A6fDH0ahBqTwpeJoa1zj5w99LF
 kFoEtI5vEBtf+m94PljBTclxOKTXmRRCzEZRpJ6L4Ze9SHR7aMKtBgp84M7a3IAw
 IzXq0Kw0MRIm4P26uOpsVzi2KOq6+F1O1zy4L8cHVeU+Y6HC1qWLPXsyhipb0liE
 6STFs1L7ANX+dpMW8kmorXc3RcN4xCuCY4Sb17W6Hqiq3syQ+Ho2G0CTheL0yOpM
 GVqf7SVes3EGAJwWyKaFmhpgWGKcS93p18XGUrf5V+rVIU+ggQcc2vM9E8FiM03d
 TA6T9+1u0TvtPlmd/KQ3OPLMEtu6/3R0fOLEXnv+Cb/PYIYMvPpb9cu6LjJkNqGv
 6R+3dWORI7mYLZD+vS3Y5mrmB5OECGXTPNJ+aBr46PfdvSVLxvmOhFlNR7eQKHZ8
 wFwOC6n4wkD5Y1VOVGEn0GVGWVkZXdlQzZOQGqBhJPjsfPfOaGacyLqCcsZH4V6w
 kitKI8z7A7evLozZMQqKZskwLG1BTQ13YeElb5jJMpG3aTG7lSmPk+iGgLFMDWkc
 Z2k4YD0Uw+92Tw==
 =SffK
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "A small set of fixes for the timers core:

   - Fix the warning condition in __run_timers() which does not take
     into account that a CPU base (especially the deferrable base) never
     has a timer armed on it and therefore the next_expiry value can
     become stale.

   - Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to
     prevent endless spam in dmesg.

   - Remove the double star from a comment which is not meant to be in
     kernel-doc format"

* tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/sched: Fix non-kernel-doc comment
  tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
  timers: Fix warning condition in __run_timers()
2022-04-17 09:53:01 -07:00
Linus Torvalds
0e59732ed6 Two fixes for the SMP core:
- Make the warning condition in flush_smp_call_function_queue() correct,
    which checks a just emptied list head for being empty instead of
    validating that there was no pending entry on the offlined CPU at all.
 
  - The @cpu member of struct cpuhp_cpu_state is initialized when the CPU
    hotplug thread for the upcoming CPU is created. That's too late because
    the creation of the thread can fail and then the following rollback
    operates on CPU0. Get rid of the CPU member and hand the CPU number to
    the involved functions directly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb4skTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUYaD/9SWzwEqwKICAhWWSMtRg6cvtVTAKGL
 xWXNb5ZxLA8MdG9CYu7WaUF7mU3E8o7kF/dItvkdj0/9mg/IqcVnP1CEOfp0ioHc
 1cbAZShORfFlOowO1WR9Xe3MSZB2mXPHsjh3FWPeGTDxnSByMDXROMFyotxji+7q
 g1ZsyVbq3+hSVTOhnhpxrh8MS3qcCAsgYtHKRnPjOE/tqbNXSmbhmeuN7QBKLVp6
 AD6DaWj8jGtZ8yzbpm0Ve3ZsjH+1SdZAzm4yhh3FHMKsSOwB8yuNwq1oNbbEjxWu
 mTg+LCBBMGYybSTa9sUpDG5Xfc3/dyWnzkGnmp7M8bfElrI3x4d+sMdVhfFRjEXB
 xlXmNqRoExgZwifyoEJbmSbG2SgLiWqyLqgpUvLS/yHud6IurFbu36TrByYRMMsG
 CyaiLMdWIlBibo+xGkVgnLyc2io98KeoLJoc35HM7VYyn9oNI4hUU9OJDIkG5D2i
 lyS+qMTFInFSOm7L5u/SUTYe4I0sXYJ6uEXxhR/Gi3ITXb7aLOJyxhNtq//u1dmg
 6vZHgs8HZWylG3LxVn1QERWotVlPuNPRilsUGCVMsCHjOFnoPy+eM3ANtR2x/uaE
 XpQ58jnhkhPXZfmnsjDEILkWB67J0/9FujbcPvixH8SRvUgBB7xRRbuo9zXPchhw
 VDKy1BCIcZ93gA==
 =XNLP
 -----END PGP SIGNATURE-----

Merge tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP fixes from Thomas Gleixner:
 "Two fixes for the SMP core:

   - Make the warning condition in flush_smp_call_function_queue()
     correct, which checked a just emptied list head for being empty
     instead of validating that there was no pending entry on the
     offlined CPU at all.

   - The @cpu member of struct cpuhp_cpu_state is initialized when the
     CPU hotplug thread for the upcoming CPU is created. That's too late
     because the creation of the thread can fail and then the following
     rollback operates on CPU0. Get rid of the CPU member and hand the
     CPU number to the involved functions directly"

* tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
  smp: Fix offline cpu check in flush_smp_call_function_queue()
2022-04-17 09:46:15 -07:00
Linus Torvalds
7e1777f5ec A single fix for the interrupt affinity spreading logic to take into
account that there can be an imbalance between present and possible CPUs,
 which causes already assigned bits to be overwritten.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb4cETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWrDD/wOd9X4zQz6cZci2Xl091IhMlS2/sBm
 afY98mTK56rX/BEgybMhzXZtBOSdDL2UDnBYznuYiPWof09+mqJGp4cLJAtVvK7E
 cDoLdpqmFuQzO8Ka1lUgdgwR+d32pFmaBASbl/0COdl8qgNskQp+jD6BaztNxYbZ
 HjkN2chMXqRlCOkKCthNajPEB3thmW2H0gDiZT/VEGYkQfp+HdoH7QhpIXt8HkWd
 lb1g1bpDBUwsGrVJ2PlI+XvwmYdW/WOyebIq2D1XBhSS1RXVTt5xfXh/ZfhULzgn
 2bnhvXU7XwDZRJNte1r8EFHSufXZUwhZIa9DRKHoy1fxNaxufRy0VJLEqcRHbcBO
 xGIvXFfjmn+vmO6hwC1/DeP7Dz6WiTYoxTwJVpvgURqcEdr1HvnSy1CdMEOsAMUa
 ZQDJzudV47a2qj80MRO1TSovW2gtDPdSWEUp5oOx2nYUZbqME146ubglZEcXLzq7
 Y7BeMaO1VzXkfyk6ImtrBYY+qrXaT1ZQvCOGPrOltnlU0cBLkQMSUpjTstIcaOAp
 iIvPLjVR1fDdgEBRck+ZAsM9snefz160wEbx23PzGuh2BenLpQfKZHIYhDO8wo2l
 4uUmnp7VY0FS5I38f+rWNvqhFfJV1oda0dQfprP+s3Usz5oSegl764as5hjGcu+y
 0Z/mCvaPODGctg==
 =wSdp
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single fix for the interrupt affinity spreading logic to take into
  account that there can be an imbalance between present and possible
  CPUs, which causes already assigned bits to be overwritten"

* tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/affinity: Consider that CPUs on nodes can be unbalanced
2022-04-17 09:42:03 -07:00
Linus Torvalds
9a921a6ff7 Power Supply Fixes for 5.18 cycle
Regression fix for the 5.18 cycle:
 * Fix a regression with battery data failing to load from DT
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE72YNB0Y/i3JqeVQT2O7X88g7+poFAmJWoKMACgkQ2O7X88g7
 +ppmlBAAjfWLNNjsx6oMp71CCVm0KyubJZSYO9dGgCCutQhASUMAmfAnAg6V3rr1
 fABm0UYfT3uMwkWw3vth8o6GMAYAiu4nZ1s/dLzObDzZVz4HHl/rEBYE5iEUZ47B
 CXPYL/kRXoJZ0jAn7aNHvA6Bb/qEKRPCcrooae73927fT35Pk1dYqJ9rxpWRDR8b
 f46Am26xuSlC6mzaJUVQwVvg3pwjJT5lCdBIbQ6FSqvVLDpSg/vdJtXlJHtya1ec
 l6j6I9+JB4YtBLOtmaGxLLt9cpow6SRI8yTzpm5qF5wIIuFpzbx5c/sdTwVeMJ+Y
 OM5LHwOyusYzQN90nse6xMzUpTNZ7Rq1wYQXH4QtvD2Xm9LrMN4arAw7s25PScQE
 DBF8qVnxVA7EseHN8H4NNrgWGEcg+eByEa9m20uUN0H5mkKa5Qrn/Yk8/qOcZ/ge
 oCPkvOnX6wALbmtcoaSgS7GQLcIW2GwN6HA3scQNV/8oDica+gX0hMk2ZThwMBGi
 ygPjcc2qUnTD3QliXi8Rh85iTfnVHnntsMjkQRdCrxcCY1jV0zIAMp1f7I22ToQG
 qD/CUvsmZc0x/dx8O8U3LgZRW3lyXG/VCgM0Axbh41HKpEVi3ayMmTkiPR0+dMpe
 CRh75/1iEdwt0p0m/LIkCPC1VwK1loPmTlyS/ycAabP3W/SDHjI=
 =S4f6
 -----END PGP SIGNATURE-----

Merge tag 'for-v5.18-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply fixes from Sebastian Reichel:

 - Fix a regression with battery data failing to load from DT

* tag 'for-v5.18-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: Reset err after not finding static battery
  power: supply: samsung-sdi-battery: Add missing charge restart voltages
2022-04-17 09:36:27 -07:00
Linus Torvalds
bd0c7d755b Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Regular set of fixes for drivers and the dev-interface"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: ismt: Fix undefined behavior due to shift overflowing the constant
  i2c: dev: Force case user pointers in compat_i2cdev_ioctl()
  i2c: dev: check return value when calling dev_set_name()
  i2c: qcom-geni: Use dev_err_probe() for GPI DMA error
  i2c: imx: Implement errata ERR007805 or e7805 bus frequency limit
  i2c: pasemi: Wait for write xfers to finish
2022-04-17 09:31:27 -07:00
Linus Torvalds
a2c29ccd94 Devicetree fixes for v5.18, part 2:
- Fix scalar property schemas with array constraints
 
 - Fix 'enum' lists with duplicate entries
 
 - Fix incomplete if/then/else schemas
 
 - Add Renesas RZ/V2L SoC support to Mali Bifrost binding
 
 - Maintainers update for Marvell irqchip
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmJbQjQQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw4z4D/0W1W4LaydU5+Yf1jjbZQlDtMFQU+F7YxPP
 rNtv5lCfp9MQ+9Ar9Z2g1k8cu3bZ+umZox5EF8aauAQZLSLbZceuBnfdJ0yDuQHP
 ypwcRSNDGClf9S7ldVXP3yJmdh76Ot+jCTZRiWNYY6JNek/5PU9nqo852P0wMNPs
 r3Q8kNuRDVclmUAxS2c2HUDlkMPQAnL2rxbNk31/nyma3qrRR4RPk7naxFvkrBs9
 MrzEoHRqv2u7HeFzbO6VxOzRi7MyIvD0KYaPJwb9Gm47jY3RjG/WTTGviZuSGoII
 P8Y9AYrSp8VvCtF5n8BJ35i8Yag/omvCiyzMouXc5yDfNfQBCi8BClcJI90eNCFM
 lSsU/9k3PhT4bY2MnH7Y8ZrPfe95LBGmLzODk8+GZzfoBnvOtK+BnDR3AG/3VKWK
 RDQJ3RTy0ssq+b7JgWA9XAHFmNzmUY7QgV9z2OjtHYXCiXRmyijN5JLGanCvTQ7V
 LVkToRw/MBIL9kECGjDD64660AS1FvP1568X92vwfaekPgSFcjD8TukuswfObuIn
 +853NX7Gl1yJDbcoQXVpvpdXXg35XMNjUBnrp3QCfP08hkqNgVo7ZcuEVQor9nxJ
 O8VMyWUVRVuWQ8tcnWyRPQUnSX44lIPChIF2kjCOKXES1tFUIhsV6+/vJ55qdq0V
 ifzUslHLWA==
 =X/5S
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-fixes-for-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Fix scalar property schemas with array constraints

 - Fix 'enum' lists with duplicate entries

 - Fix incomplete if/then/else schemas

 - Add Renesas RZ/V2L SoC support to Mali Bifrost binding

 - Maintainers update for Marvell irqchip

* tag 'devicetree-fixes-for-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: display: panel-timing: Define a single type for properties
  dt-bindings: Fix array constraints on scalar properties
  dt-bindings: gpu: mali-bifrost: Document RZ/V2L SoC
  dt-bindings: net: snps: remove duplicate name
  dt-bindings: Fix 'enum' lists with duplicate entries
  dt-bindings: irqchip: mrvl,intc: refresh maintainers
  dt-bindings: Fix incomplete if/then/else schemas
  dt-bindings: power: renesas,apmu: Fix cpus property limits
  dt-bindings: extcon: maxim,max77843: fix ports type
2022-04-16 17:07:50 -07:00
Linus Torvalds
de6e933668 gpio fixes for v5.18-rc3
- fix the set/get_multiple() callbacks in gpio-sim
 - use correct format characters in gpiolib-acpi
 - use an unsigned type for pins in gpiolib-acpi
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmJbI2UACgkQEacuoBRx
 13Iu1hAAmPpHwe64eECqWdCR0b6L/XZg4gihDaXa4EObL91gBNQL/SR+CnhCGmjm
 8PJwoKSUIHyUGI9qQ83Na4jtDrcNV+9H2EgdgA7PHMDX6P5kS6NuaxzvJ7UPi+xB
 tBVFgqGwx25JAE71/22bmH3CDWTdPDXm9FSTPItQiOXweBBzJKNGQegJYzo3Z1M7
 3jLwkvU0LUI7YvjZHRG8vSp2MV3PGWUsDnaBOQ+ajhBQLnEgTudQSsh9Z2e/zZVR
 MJ26FG5pkzbJ+fY4ern9S84PR/IDcPPMcjO63VHO8AY0gR78aNNBuOj7TTOdsbk6
 Rs+z9LPFsNsJBNxqD6iWXMKK35ox6jThqja31w3v9C41uf1sfMFidaUqrBX+k6uN
 d8oX2VcvsXVtnh9KqKsAijpQQxkZbrfkfh/O4xy/IOZv/r6WHUUtzae/DelfmNhu
 eM+dVfaPywxQ+HSiqWjklgz6vKzAxNEJJbklFpbwfyJPeWucpD5HGhyWftI4QzsP
 1kx0HxcQ20sE5Euiw8yqcR0ZaBc+Sdu7TU8kt2NbMxpBDf58Rs6bd19MsD/p9+Ge
 JF09kNTh3iWTbtpcK7yww1TccbuzvKUVImsUWWnjLIk8e3Ar0XP3LGWCs0WgqYEf
 z39CQJ8iVN2/xOtsm0WLUDnce6FATqc2O75sLB3qkf1TaMYQj/s=
 =Yt3p
 -----END PGP SIGNATURE-----

Merge tag 'gpio-fixes-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "A single fix for gpio-sim and two patches for GPIO ACPI pulled from
  Andy:

   - fix the set/get_multiple() callbacks in gpio-sim

   - use correct format characters in gpiolib-acpi

   - use an unsigned type for pins in gpiolib-acpi"

* tag 'gpio-fixes-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: sim: fix setting and getting multiple lines
  gpiolib: acpi: Convert type for pin to be unsigned
  gpiolib: acpi: use correct format characters
2022-04-16 17:01:43 -07:00
Linus Torvalds
70a0cec818 ARM: SoC fixes for 5.18, part 2
There are a number of SoC bugfixes that came in since the merge window,
 and more of them are already pending. This batch includes
 
  - A boot time regression fix for davinci that triggered on
    multi_v5_defconfig when booting any platform
 
  - Defconfig updates to address removed features, changed symbol
    names or dependencies, for gemini, ux500, and pxa
 
  - Email address changes for Krzysztof Kozlowski
 
  - Build warning fixes for ep93xx and iop32x
 
  - Devicetree warning fixes across many platforms
 
  - Minor bugfixes for the reset controller, memory controller
    and SCMI firmware subsystems plus the versatile-express board
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmJbNdgACgkQmmx57+YA
 GNlqag/+MyNA0d4VWqxv/5KScfM1TB/oF+G55BwkoDQRGAsfon8ocZHx7dnGk+k8
 lVOYrgx1FOwBLpYmJ34SVKNznNV1x7cJB6XwwK8vDj1SievjScz8E5fx1rdO5Ayu
 YQFlrLjOqSXucObQgbviHACc5uv7RB1bKYKESN/idklbY9TgNS5TIEHZxeldDkxY
 bSSu52RSdvklf5XjYAMLph0hEmhY9N090C3ftBP5WTaHVDuniquS2ubSRxyomVia
 WQsRFi7haXZrXFw7B20dz/nrq89yibBxHqiOAvvC09Ce2woo5sSvwxeRstls4IVt
 bXwQNg7EsezZvZ+MSnNlHk6kPLG51ECm1dB3cCk++N23NLbd34GYzbK/TwbRBzyw
 jeBrsLD5lzENBNBG5mfAlpDMq7HoPLRshEV+5FIGcQZtDKHZnA3c2ARHNFfAikma
 3ozasK6BzRsnSQIUwWaoli9w3pj79/DOvdEoSdCVTk+RQ5Fm1aWoZXtiPin/yvsa
 MOMkJOwdo42+kAi79PRVfR2JRPCC/P1JcmKykvn7Tb3AphkZBdRGjll6ZYdzt2hR
 tynfPiBxXT+r61lgPM5Fs3NBZSZ2IPDePlYs5W2fHCIhof9XQrziPmHmM+OiXj2a
 JwXLX6ymLFgtFRgK2ChtRgzxjHCyrk7pRGneWHxQlM7yqeliepg=
 =Y3N8
 -----END PGP SIGNATURE-----

Merge tag 'soc-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "There are a number of SoC bugfixes that came in since the merge
  window, and more of them are already pending.

  This batch includes:

   - A boot time regression fix for davinci that triggered on
     multi_v5_defconfig when booting any platform

   - Defconfig updates to address removed features, changed symbol names
     or dependencies, for gemini, ux500, and pxa

   - Email address changes for Krzysztof Kozlowski

   - Build warning fixes for ep93xx and iop32x

   - Devicetree warning fixes across many platforms

   - Minor bugfixes for the reset controller, memory controller and SCMI
     firmware subsystems plus the versatile-express board"

* tag 'soc-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (34 commits)
  ARM: config: Update Gemini defconfig
  arm64: dts: qcom/sdm845-shift-axolotl: Fix boolean properties with values
  ARM: dts: align SPI NOR node name with dtschema
  ARM: dts: Fix more boolean properties with values
  arm/arm64: dts: qcom: Fix boolean properties with values
  arm64: dts: imx: Fix imx8*-var-som touchscreen property sizes
  arm: dts: imx: Fix boolean properties with values
  arm64: dts: tegra: Fix boolean properties with values
  arm: dts: at91: Fix boolean properties with values
  arm: configs: imote2: Drop defconfig as board support dropped.
  ep93xx: clock: Don't use plain integer as NULL pointer
  ep93xx: clock: Fix UAF in ep93xx_clk_register_gate()
  ARM: vexpress/spc: Fix all the kernel-doc build warnings
  ARM: vexpress/spc: Fix kernel-doc build warning for ve_spc_cpu_in_wfi
  ARM: config: u8500: Re-enable AB8500 battery charging
  ARM: config: u8500: Add some common hardware
  memory: fsl_ifc: populate child nodes of buses and mfd devices
  ARM: config: Refresh U8500 defconfig
  firmware: arm_scmi: Fix sparse warnings in OPTEE transport driver
  firmware: arm_scmi: Replace zero-length array with flexible-array member
  ...
2022-04-16 16:51:39 -07:00
Linus Torvalds
92edbe32e3 Random number generator fixes for Linux 5.18-rc3.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmJavk0ACgkQSfxwEqXe
 A640KA/9FHngPjK8AOuFJJRfdNYu5CWcn4PD/xmYGhP1QUuKOfIF6rUGrl7w4jCh
 +mKICOt2Rt2ZqSscS1ccpddiwAcoURrra6pW8Oo5IiuHlPBWR/jow9FzeLZpP4II
 5mcWSZX4/MJFs9o6T+25FE9tPjsVcEi1hgYEU0ecXtXYK+mIUbPobF1pWlI+C61F
 8sYQ4GrpJHLWreou7SYfzbI3siaXmewie8aYgrqvU8Bt7U2UVTA0j8VxeX7+r87/
 xZ+/n+E3WOiEt2h0UyOB6+5bIL0t6qJI/plc8kQN/R5UHSoMrT06MdwrThI4exI3
 YDDf488aKiYPdeQt3kFXH4o1PVK9R072Z+ZK63jfUxGOQs1DXI2DxCSqgO3fPfIR
 v9uy0kG6zWG+C8RuX3VV12gIL6/XOFRx7UHVwSnt+A1Li/DPEdNoYlKesSGOKuel
 vbXBmed2z5v02KMd1Y9LtrioLco8JDFYD0OEtbEaGjv7Kt+EcVtapo1e7N2VSmk6
 IKxp7soEorJSR13rR5vyeF+yOZmxn6BOePc7m48O0Wqx76DHlyXiMPNTJUrHIC2z
 XfD8+P1mHA2Iz71T7YI9Dmzw9uIVZuUteiEpvc0o/uy8z3YzOUftnqQBsAKwbxz5
 HKrIdkocQNbXK46GKpjK6OY8BCwOeRM+z/XdJwYDek7G1+1ULpg=
 =2JCC
 -----END PGP SIGNATURE-----

Merge tag 'random-5.18-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator fixes from Jason Donenfeld:

 - Per your suggestion, random reads now won't fail if there's a page
   fault after some non-zero amount of data has been read, which makes
   the behavior consistent with all other reads in the kernel.

 - Rather than an inconsistent mix of random_get_entropy() returning an
   unsigned long or a cycles_t, now it just returns an unsigned long.

 - A memcpy() was replaced with an memmove(), because the addresses are
   sometimes overlapping. In practice the destination is always before
   the source, so not really an issue, but better to be correct than
   not.

* tag 'random-5.18-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: use memmove instead of memcpy for remaining 32 bytes
  random: make random_get_entropy() return an unsigned long
  random: allow partial reads if later user copies fail
2022-04-16 16:42:53 -07:00
Linus Torvalds
90ea17a9e2 SCSI fixes on 20220416
13 fixes, all in drivers.  The most extensive changes are in the iscsi
 series (affecting drivers qedi, cxgbi and bnx2i), the next most is
 scsi_debug, but that's just a simple revert and then minor updates to
 pm80xx.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYlseaSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRlWAP9ygp0e
 i9eU3ZXsiVJbi/b1UrQwBj1z2oO579J4f286cwEA5ko+q8eAzvj3jxkQarBv79tt
 RvYEBYVBXc5igl3VnuI=
 =aO3T
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "13 fixes, all in drivers.

  The most extensive changes are in the iscsi series (affecting drivers
  qedi, cxgbi and bnx2i), the next most is scsi_debug, but that's just a
  simple revert and then minor updates to pm80xx"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: iscsi: MAINTAINERS: Add Mike Christie as co-maintainer
  scsi: qedi: Fix failed disconnect handling
  scsi: iscsi: Fix NOP handling during conn recovery
  scsi: iscsi: Merge suspend fields
  scsi: iscsi: Fix unbound endpoint error handling
  scsi: iscsi: Fix conn cleanup and stop race during iscsid restart
  scsi: iscsi: Fix endpoint reuse regression
  scsi: iscsi: Release endpoint ID when its freed
  scsi: iscsi: Fix offload conn cleanup when iscsid restarts
  scsi: iscsi: Move iscsi_ep_disconnect()
  scsi: pm80xx: Enable upper inbound, outbound queues
  scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63
  Revert "scsi: scsi_debug: Address races following module load"
2022-04-16 13:38:26 -07:00
Bartosz Golaszewski
0ebb4fbe31 intel-gpio for v5.18-2
* Couple of fixes related to handling unsigned value of the pin from ACPI
 
 The following is an automated git shortlog grouped by driver:
 
 gpiolib:
  -  acpi: Convert type for pin to be unsigned
  -  acpi: use correct format characters
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqaflIX74DDDzMJJtb7wzTHR8rCgFAmJUN7cACgkQb7wzTHR8
 rCjeMQ/+KZHhBgPHE2Rl928bI/IQYuhcGjFmsLUt/KRFFsjVpzFQro9QI0IFRava
 /8j6cUy8mTV6v5cxScljusS7++j8K3C6kazsXr2ieGPKpaGDHtKs93QjbC52jF/T
 x/CI7NbnkvJHPWmpIYSuc/vGHHCHAAsnI62OHAIyEARRYBbd5VAkyrvnchH2skaK
 bHH0+1La4l2fBkmfA6fpnwLVnRNXEeRTUdQdGk/14Snv1zGNeciGR2/wbg0YtEtf
 CvEPi9U7d7RJCjBiege9kcb8R4ABOngypL1C9r7XvtFy17glxkc+6nrJx3NgW3bY
 Eksi4naML2oe4R/PyZ740wztJCUKdR6oV/OmyRz5FhTA4+ZWRz3pL3o2Rv+DTZ5g
 AcugQZCMe8mNGerFLPxBHpD1s0eD5Gxrnq7CeEpQ0wpBqBj90iZPAvE2mMTghqjH
 Vm9OsLX/X8/vAPLeMLgHy9tDwl5FRqkM7BxV3qBp14MBpPfzDmQc3VS5SGcw/car
 eZI7mQH85iSpnYjkqfDAwQtr3Jbxt8OuEnADWPiJknuTwGigQm9MLh37cW9xih0r
 BQWsHMO9ephWhDSwpKVl+7fbQpIUd2TQFFBOnzIW69ALevD16FkEdnn08HEI2Kd6
 mEB60/aA04PNyS6EtbD4ZbCNDjdB3S7EITPPrVPeGycD2JAp5sA=
 =i+RD
 -----END PGP SIGNATURE-----

Merge tag 'intel-gpio-v5.18-2' of gitolite.kernel.org:pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-current

intel-gpio for v5.18-2

* Couple of fixes related to handling unsigned value of the pin from ACPI

gpiolib:
 -  acpi: Convert type for pin to be unsigned
 -  acpi: use correct format characters
2022-04-16 21:57:00 +02:00
Linus Torvalds
b00868396d dma-mapping fixes for Linux 5.18
- avoid a double memory copy for swiotlb (Chao Gao)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmJaWd0LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYM8MhAAsFPr3ebQD5iEGDuDTY0MPRFUgQzIWXymEgsy5c5t
 maJkJMA19ouOnQiLxNu8T750fo5ywuFHHDp9WJHjplA69KXJIWczdbBNVkHFzfkc
 oG2hzNsouVdLXz0speaFpvcuRFscWH8HQ0AqUO4/PrQmJudOEF8ekEu/8IloYULs
 np1IKrIlIpGk5JEzgIHGcIqxmeclYtETQ1XkNyDOOUZAlsEcQY5pTNgujq++a1uL
 /zku2SOg2EPdeEXFHlH3N7AwsDq/s7So4qLrMUoeXzC8qO9n60kuYKhnMd0nz0H0
 49a4YFqUrBVI8yW4EW6MBMLkZnK4f0ATjdn7SYToh1gVGrCOypUWPjjBtnOStqC8
 DqjByjQra0xf2re4ED/oPtRtQLKSC5seht4HDThOIG62A8uuKySk2NUFQ2u3d8mp
 KOz/v/xg/3BBjjxidhiQtuYv2WxaI5ilfNl1bpThm23Dw4Mqi00f77swUWxJlESj
 qsjKgdE3We9GlkihyPf4/M5PtPphrWOHoUMBH8IAe10o4DuI29aJnXrerCEaKLGr
 RIA2ALiNeVnoUKhSLPDB86pcTZufoEiVtAGRDEADjgU2ic+Ytw312Aswh1PFznzH
 r/6X4tcKoWzjmzpBkCrxm+RYfqipoNb+sbgiaI7O6z/AiBQUR0itc72Tug1lR0xq
 2KI=
 =D8o8
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:

 - avoid a double memory copy for swiotlb (Chao Gao)

* tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: avoid redundant memory sync for swiotlb
2022-04-16 11:20:21 -07:00
Jason A. Donenfeld
35a33ff380 random: use memmove instead of memcpy for remaining 32 bytes
In order to immediately overwrite the old key on the stack, before
servicing a userspace request for bytes, we use the remaining 32 bytes
of block 0 as the key. This means moving indices 8,9,a,b,c,d,e,f ->
4,5,6,7,8,9,a,b. Since 4 < 8, for the kernel implementations of
memcpy(), this doesn't actually appear to be a problem in practice. But
relying on that characteristic seems a bit brittle. So let's change that
to a proper memmove(), which is the by-the-books way of handling
overlapping memory copies.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-16 12:53:31 +02:00
Linus Torvalds
59250f8a7f Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 patches.

  Subsystems affected by this patch series: MAINTAINERS, binfmt, and
  mm (tmpfs, secretmem, kasan, kfence, pagealloc, zram, compaction,
  hugetlb, vmalloc, and kmemleak)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
  mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
  revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
  revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
  hugetlb: do not demote poisoned hugetlb pages
  mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
  mm: fix unexpected zeroed page mapping with zram swap
  mm, page_alloc: fix build_zonerefs_node()
  mm, kfence: support kmem_dump_obj() for KFENCE objects
  kasan: fix hw tags enablement when KUNIT tests are disabled
  irq_work: use kasan_record_aux_stack_noalloc() record callstack
  mm/secretmem: fix panic when growing a memfd_secret
  tmpfs: fix regressions from wider use of ZERO_PAGE
  MAINTAINERS: Broadcom internal lists aren't maintainers
2022-04-15 15:57:18 -07:00
Linus Torvalds
ce673f630c - Fix memory corruption in DM integrity target when tag_size is less
than digest size.
 
 - Fix DM multipath's historical-service-time path selector to not use
   sched_clock() and ktime_get_ns(); only use ktime_get_ns().
 
 - Fix dm_io->orig_bio NULL pointer dereference in dm_zone_map_bio()
   due to 5.18 changes that overlooked DM zone's use of ->orig_bio
 
 - Fix for regression that broke the use of dm_accept_partial_bio() for
   "abnormal" IO (e.g. WRITE ZEROES) that does not need duplicate bios
 
 - Fix DM's issuing of empty flush bio so that it's size is 0.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmJZ17MACgkQxSPxCi2d
 A1pC9Qf+NXc9u0Amf7GeORpQqAWPpogdMma3TIO3F1WaG2U5AspwaJpeZueVfmCg
 z5IV4UcPtH5nJQ4BXre7r0yRqKuAX3b1EaxSApetwvs3AZJpqnFnoyFx/Bv9aU74
 ZTtniNwrC+Z/TgZKk9hz+pdtEzq9p9shPDNnI32nYCMTxOFdH3ep+dBSs4/zaByb
 3zSxFIEIaElP0rQJEGofgzlInIvzKOpKhWqnMBMILmoVkctdgiOpB817E/BkaMgS
 hfRDjTZjgdwpXzJJd0MMUlde0MJfV7eo6pMw7e0nbO1I2E1aOtcY+XrlMJTTc0nW
 py07slHJIzcoI9kBioDq7askbNO0oQ==
 =V+AB
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix memory corruption in DM integrity target when tag_size is less
   than digest size.

 - Fix DM multipath's historical-service-time path selector to not use
   sched_clock() and ktime_get_ns(); only use ktime_get_ns().

 - Fix dm_io->orig_bio NULL pointer dereference in dm_zone_map_bio() due
   to 5.18 changes that overlooked DM zone's use of ->orig_bio

 - Fix for regression that broke the use of dm_accept_partial_bio() for
   "abnormal" IO (e.g. WRITE ZEROES) that does not need duplicate bios

 - Fix DM's issuing of empty flush bio so that it's size is 0.

* tag 'for-5.18/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix bio length of empty flush
  dm: allow dm_accept_partial_bio() for dm_io without duplicate bios
  dm zone: fix NULL pointer dereference in dm_zone_map_bio
  dm mpath: only use ktime_get_ns() in historical selector
  dm integrity: fix memory corruption when tag_size is less than digest size
2022-04-15 15:20:59 -07:00
Patrick Wang
23c2d497de mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
The kmemleak_*_phys() apis do not check the address for lowmem's min
boundary, while the caller may pass an address below lowmem, which will
trigger an oops:

  # echo scan > /sys/kernel/debug/kmemleak
  Unable to handle kernel paging request at virtual address ff5fffffffe00000
  Oops [#1]
  Modules linked in:
  CPU: 2 PID: 134 Comm: bash Not tainted 5.18.0-rc1-next-20220407 #33
  Hardware name: riscv-virtio,qemu (DT)
  epc : scan_block+0x74/0x15c
   ra : scan_block+0x72/0x15c
  epc : ffffffff801e5806 ra : ffffffff801e5804 sp : ff200000104abc30
   gp : ffffffff815cd4e8 tp : ff60000004cfa340 t0 : 0000000000000200
   t1 : 00aaaaaac23954cc t2 : 00000000000003ff s0 : ff200000104abc90
   s1 : ffffffff81b0ff28 a0 : 0000000000000000 a1 : ff5fffffffe01000
   a2 : ffffffff81b0ff28 a3 : 0000000000000002 a4 : 0000000000000001
   a5 : 0000000000000000 a6 : ff200000104abd7c a7 : 0000000000000005
   s2 : ff5fffffffe00ff9 s3 : ffffffff815cd998 s4 : ffffffff815d0e90
   s5 : ffffffff81b0ff28 s6 : 0000000000000020 s7 : ffffffff815d0eb0
   s8 : ffffffffffffffff s9 : ff5fffffffe00000 s10: ff5fffffffe01000
   s11: 0000000000000022 t3 : 00ffffffaa17db4c t4 : 000000000000000f
   t5 : 0000000000000001 t6 : 0000000000000000
  status: 0000000000000100 badaddr: ff5fffffffe00000 cause: 000000000000000d
    scan_gray_list+0x12e/0x1a6
    kmemleak_scan+0x2aa/0x57e
    kmemleak_write+0x32a/0x40c
    full_proxy_write+0x56/0x82
    vfs_write+0xa6/0x2a6
    ksys_write+0x6c/0xe2
    sys_write+0x22/0x2a
    ret_from_syscall+0x0/0x2

The callers may not quite know the actual address they pass(e.g. from
devicetree).  So the kmemleak_*_phys() apis should guarantee the address
they finally use is in lowmem range, so check the address for lowmem's
min boundary.

Link: https://lkml.kernel.org/r/20220413122925.33856-1-patrick.wang.shcn@gmail.com
Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:56 -07:00
Omar Sandoval
c12cd77cb0 mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
Commit 3ee48b6af4 ("mm, x86: Saving vmcore with non-lazy freeing of
vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
purge the vmap areas instead of doing it lazily.

Commit 690467c81b ("mm/vmalloc: Move draining areas out of caller
context") moved the purging from the vunmap() caller to a worker thread.
Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
(possibly forever).  For example, consider the following scenario:

 1. Thread reads from /proc/vmcore. This eventually calls
    __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
    vmap_lazy_nr to lazy_max_pages() + 1.

 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
    pages (one page plus the guard page) to the purge list and
    vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
    drain_vmap_work is scheduled.

 3. Thread returns from the kernel and is scheduled out.

 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
    frees the 2 pages on the purge list. vmap_lazy_nr is now
    lazy_max_pages() + 1.

 5. This is still over the threshold, so it tries to purge areas again,
    but doesn't find anything.

 6. Repeat 5.

If the system is running with only one CPU (which is typicial for kdump)
and preemption is disabled, then this will never make forward progress:
there aren't any more pages to purge, so it hangs.  If there is more
than one CPU or preemption is enabled, then the worker thread will spin
forever in the background.  (Note that if there were already pages to be
purged at the time that set_iounmap_nonlazy() was called, this bug is
avoided.)

This can be reproduced with anything that reads from /proc/vmcore
multiple times.  E.g., vmcore-dmesg /proc/vmcore.

It turns out that improvements to vmap() over the years have obsoleted
the need for this "optimization".  I benchmarked `dd if=/proc/vmcore
of=/dev/null` with 4k and 1M read sizes on a system with a 32GB vmcore.
The test was run on 5.17, 5.18-rc1 with a fix that avoided the hang, and
5.18-rc1 with set_iounmap_nonlazy() removed entirely:

    |5.17  |5.18+fix|5.18+removal
  4k|40.86s|  40.09s|      26.73s
  1M|24.47s|  23.98s|      21.84s

The removal was the fastest (by a wide margin with 4k reads).  This
patch removes set_iounmap_nonlazy().

Link: https://lkml.kernel.org/r/52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com
Fixes: 690467c81b  ("mm/vmalloc: Move draining areas out of caller context")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:56 -07:00
Andrew Morton
aeb7923733 revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
Despite Mike's attempted fix (925346c129), regressions reports
continue:

  https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/
  https://bugzilla.kernel.org/show_bug.cgi?id=215720
  https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info

So revert this patch.

Fixes: 9630f0d60f ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:56 -07:00
Andrew Morton
354e923df0 revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
Commit 925346c129 ("fs/binfmt_elf: fix PT_LOAD p_align values for
loaders") was an attempt to fix regressions due to 9630f0d60f
("fs/binfmt_elf: use PT_LOAD p_align values for static PIE").

But regressionss continue to be reported:

  https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/
  https://bugzilla.kernel.org/show_bug.cgi?id=215720
  https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info

This patch reverts the fix, so the original can also be reverted.

Fixes: 925346c129 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders")
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:56 -07:00
Mike Kravetz
5a317412ef hugetlb: do not demote poisoned hugetlb pages
It is possible for poisoned hugetlb pages to reside on the free lists.
The huge page allocation routines which dequeue entries from the free
lists make a point of avoiding poisoned pages.  There is no such check
and avoidance in the demote code path.

If a hugetlb page on the is on a free list, poison will only be set in
the head page rather then the page with the actual error.  If such a
page is demoted, then the poison flag may follow the wrong page.  A page
without error could have poison set, and a page with poison could not
have the flag set.

Check for poison before attempting to demote a hugetlb page.  Also,
return -EBUSY to the caller if only poisoned pages are on the free list.

Link: https://lkml.kernel.org/r/20220307215707.50916-1-mike.kravetz@oracle.com
Fixes: 8531fc6f52 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:55 -07:00
Charan Teja Kalla
31ca72fa75 mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
The below warning is reported when CONFIG_COMPACTION=n:

   mm/compaction.c:56:27: warning: 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' defined but not used [-Wunused-const-variable=]
      56 | static const unsigned int HPAGE_FRAG_CHECK_INTERVAL_MSEC = 500;
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix it by moving 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' under
CONFIG_COMPACTION defconfig.

Also since this is just a 'static const int' type, use #define for it.

Link: https://lkml.kernel.org/r/1647608518-20924-1-git-send-email-quic_charante@quicinc.com
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Nitin Gupta <nigupta@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:55 -07:00
Minchan Kim
e914d8f003 mm: fix unexpected zeroed page mapping with zram swap
Two processes under CLONE_VM cloning, user process can be corrupted by
seeing zeroed page unexpectedly.

      CPU A                        CPU B

  do_swap_page                do_swap_page
  SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
  swap_readpage valid data
    swap_slot_free_notify
      delete zram entry
                              swap_readpage zeroed(invalid) data
                              pte_lock
                              map the *zero data* to userspace
                              pte_unlock
  pte_lock
  if (!pte_same)
    goto out_nomap;
  pte_unlock
  return and next refault will
  read zeroed data

The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
increase the refcount of swap slot at copy_mm so it couldn't catch up
whether it's safe or not to discard data from backing device.  In the
case, only the lock it could rely on to synchronize swap slot freeing is
page table lock.  Thus, this patch gets rid of the swap_slot_free_notify
function.  With this patch, CPU A will see correct data.

      CPU A                        CPU B

  do_swap_page                do_swap_page
  SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
                              swap_readpage original data
                              pte_lock
                              map the original data
                              swap_free
                                swap_range_free
                                  bd_disk->fops->swap_slot_free_notify
  swap_readpage read zeroed data
                              pte_unlock
  pte_lock
  if (!pte_same)
    goto out_nomap;
  pte_unlock
  return
  on next refault will see mapped data by CPU B

The concern of the patch would increase memory consumption since it
could keep wasted memory with compressed form in zram as well as
uncompressed form in address space.  However, most of cases of zram uses
no readahead and do_swap_page is followed by swap_free so it will free
the compressed form from in zram quickly.

Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
Fixes: 0bcac06f27 ("mm, swap: skip swapcache for swapin of synchronous device")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:55 -07:00
Juergen Gross
e553f62f10 mm, page_alloc: fix build_zonerefs_node()
Since commit 6aa303defb ("mm, vmscan: only allocate and reclaim from
zones with pages managed by the buddy allocator") only zones with free
memory are included in a built zonelist.  This is problematic when e.g.
all memory of a zone has been ballooned out when zonelists are being
rebuilt.

The decision whether to rebuild the zonelists when onlining new memory
is done based on populated_zone() returning 0 for the zone the memory
will be added to.  The new zone is added to the zonelists only, if it
has free memory pages (managed_zone() returns a non-zero value) after
the memory has been onlined.  This implies, that onlining memory will
always free the added pages to the allocator immediately, but this is
not true in all cases: when e.g. running as a Xen guest the onlined new
memory will be added only to the ballooned memory list, it will be freed
only when the guest is being ballooned up afterwards.

Another problem with using managed_zone() for the decision whether a
zone is being added to the zonelists is, that a zone with all memory
used will in fact be removed from all zonelists in case the zonelists
happen to be rebuilt.

Use populated_zone() when building a zonelist as it has been done before
that commit.

There was a report that QubesOS (based on Xen) is hitting this problem.
Xen has switched to use the zone device functionality in kernel 5.9 and
QubesOS wants to use memory hotplugging for guests in order to be able
to start a guest with minimal memory and expand it as needed.  This was
the report leading to the patch.

Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com
Fixes: 6aa303defb ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:55 -07:00
Marco Elver
2dfe63e61c mm, kfence: support kmem_dump_obj() for KFENCE objects
Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been
producing garbage data due to the object not actually being maintained
by SLAB or SLUB.

Fix this by implementing __kfence_obj_info() that copies relevant
information to struct kmem_obj_info when the object was allocated by
KFENCE; this is called by a common kmem_obj_info(), which also calls the
slab/slub/slob specific variant now called __kmem_obj_info().

For completeness, kmem_dump_obj() now displays if the object was
allocated by KFENCE.

Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com
Fixes: b89fb5ef0c ("mm, kfence: insert KFENCE hooks for SLUB")
Fixes: d3fb45f370 ("mm, kfence: insert KFENCE hooks for SLAB")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>	[slab]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:55 -07:00
Vincenzo Frascino
b1add418d4 kasan: fix hw tags enablement when KUNIT tests are disabled
Kasan enables hw tags via kasan_enable_tagging() which based on the mode
passed via kernel command line selects the correct hw backend.
kasan_enable_tagging() is meant to be invoked indirectly via the cpu
features framework of the architectures that support these backends.
Currently the invocation of this function is guarded by
CONFIG_KASAN_KUNIT_TEST which allows the enablement of the correct backend
only when KUNIT tests are enabled in the kernel.

This inconsistency was introduced in commit:

  ed6d74446c ("kasan: test: support async (again) and asymm modes for HW_TAGS")

... and prevents to enable MTE on arm64 when KUNIT tests for kasan hw_tags are
disabled.

Fix the issue making sure that the CONFIG_KASAN_KUNIT_TEST guard does not
prevent the correct invocation of kasan_enable_tagging().

Link: https://lkml.kernel.org/r/20220408124323.10028-1-vincenzo.frascino@arm.com
Fixes: ed6d74446c ("kasan: test: support async (again) and asymm modes for HW_TAGS")
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:55 -07:00
Zqiang
25934fcfb9 irq_work: use kasan_record_aux_stack_noalloc() record callstack
On PREEMPT_RT kernel and KASAN is enabled.  the kasan_record_aux_stack()
may call alloc_pages(), and the rt-spinlock will be acquired, if currently
in atomic context, will trigger warning:

  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 239, name: bootlogd
  Preemption disabled at:
  [<ffffffffbab1a531>] rt_mutex_slowunlock+0xa1/0x4e0
  CPU: 3 PID: 239 Comm: bootlogd Tainted: G        W 5.17.1-rt17-yocto-preempt-rt+ #105
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
  Call Trace:
     __might_resched.cold+0x13b/0x173
     rt_spin_lock+0x5b/0xf0
     get_page_from_freelist+0x20c/0x1610
     __alloc_pages+0x25e/0x5e0
     __stack_depot_save+0x3c0/0x4a0
     kasan_save_stack+0x3a/0x50
     __kasan_record_aux_stack+0xb6/0xc0
     kasan_record_aux_stack+0xe/0x10
     irq_work_queue_on+0x6a/0x1c0
     pull_rt_task+0x631/0x6b0
     do_balance_callbacks+0x56/0x80
     __balance_callbacks+0x63/0x90
     rt_mutex_setprio+0x349/0x880
     rt_mutex_slowunlock+0x22a/0x4e0
     rt_spin_unlock+0x49/0x80
     uart_write+0x186/0x2b0
     do_output_char+0x2e9/0x3a0
     n_tty_write+0x306/0x800
     file_tty_write.isra.0+0x2af/0x450
     tty_write+0x22/0x30
     new_sync_write+0x27c/0x3a0
     vfs_write+0x3f7/0x5d0
     ksys_write+0xd9/0x180
     __x64_sys_write+0x43/0x50
     do_syscall_64+0x44/0x90
     entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix it by using kasan_record_aux_stack_noalloc() to avoid the call to
alloc_pages().

Link: https://lkml.kernel.org/r/20220402142555.2699582-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:55 -07:00
Axel Rasmussen
f9b141f936 mm/secretmem: fix panic when growing a memfd_secret
When one tries to grow an existing memfd_secret with ftruncate, one gets
a panic [1].  For example, doing the following reliably induces the
panic:

    fd = memfd_secret();

    ftruncate(fd, 10);
    ptr = mmap(NULL, 10, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    strcpy(ptr, "123456789");

    munmap(ptr, 10);
    ftruncate(fd, 20);

The basic reason for this is, when we grow with ftruncate, we call down
into simple_setattr, and then truncate_inode_pages_range, and eventually
we try to zero part of the memory.  The normal truncation code does this
via the direct map (i.e., it calls page_address() and hands that to
memset()).

For memfd_secret though, we specifically don't map our pages via the
direct map (i.e.  we call set_direct_map_invalid_noflush() on every
fault).  So the address returned by page_address() isn't useful, and
when we try to memset() with it we panic.

This patch avoids the panic by implementing a custom setattr for
memfd_secret, which detects resizes specifically (setting the size for
the first time works just fine, since there are no existing pages to try
to zero), and rejects them with EINVAL.

One could argue growing should be supported, but I think that will
require a significantly more lengthy change.  So, I propose a minimal
fix for the benefit of stable kernels, and then perhaps to extend
memfd_secret to support growing in a separate patch.

[1]:

  BUG: unable to handle page fault for address: ffffa0a889277028
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD afa01067 P4D afa01067 PUD 83f909067 PMD 83f8bf067 PTE 800ffffef6d88060
  Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  CPU: 0 PID: 281 Comm: repro Not tainted 5.17.0-dbg-DEV #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  RIP: 0010:memset_erms+0x9/0x10
  Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
  RSP: 0018:ffffb932c09afbf0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffda63c4249dc0 RCX: 0000000000000fd8
  RDX: 0000000000000fd8 RSI: 0000000000000000 RDI: ffffa0a889277028
  RBP: ffffb932c09afc00 R08: 0000000000001000 R09: ffffa0a889277028
  R10: 0000000000020023 R11: 0000000000000000 R12: ffffda63c4249dc0
  R13: ffffa0a890d70d98 R14: 0000000000000028 R15: 0000000000000fd8
  FS:  00007f7294899580(0000) GS:ffffa0af9bc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffa0a889277028 CR3: 0000000107ef6006 CR4: 0000000000370ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? zero_user_segments+0x82/0x190
   truncate_inode_partial_folio+0xd4/0x2a0
   truncate_inode_pages_range+0x380/0x830
   truncate_setsize+0x63/0x80
   simple_setattr+0x37/0x60
   notify_change+0x3d8/0x4d0
   do_sys_ftruncate+0x162/0x1d0
   __x64_sys_ftruncate+0x1c/0x20
   do_syscall_64+0x44/0xa0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  Modules linked in: xhci_pci xhci_hcd virtio_net net_failover failover virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_hcd evdev ehci_pci ehci_hcd 9pnet_virtio 9p netfs 9pnet
  CR2: ffffa0a889277028

[lkp@intel.com: secretmem_iops can be static]
  Signed-off-by: kernel test robot <lkp@intel.com>
[axelrasmussen@google.com: return EINVAL]

Link: https://lkml.kernel.org/r/20220324210909.1843814-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220412193023.279320-1-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:54 -07:00