IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
After recent changes to the retrying and failure counting in
migrate_pages_batch(), it was found that it's unnecessary to count
retrying and failure for normal, large, and THP folios separately.
Because we don't use retrying and failure number of large folios directly.
So, in this patch, we simplified retrying and failure counting of large
folios via counting retrying and failure of normal and large folios
together. This results in the reduced line number.
Previously, in migrate_pages_batch we need to track whether the source
folio is large/THP before splitting. So is_large is used to cache
folio_test_large() result. Now, we don't need that variable any more
because we don't count retrying and failure of large folios (only counting
that of THP folios). So, in this patch, is_large is removed to simplify
the code.
This is just code cleanup, no functionality changes are expected.
Link: https://lkml.kernel.org/r/20230510031829.11513-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The format string in __add_pages and __remove_pages has a typo and prints
e.g., "Misaligned __add_pages start: 0xfc605 end: #fc609" instead of
"Misaligned __add_pages start: 0xfc605 end: 0xfc609" Fix "#%lx" => "%#lx"
Link: https://lkml.kernel.org/r/20230510090758.3537242-1-rick.wertenbroek@gmail.com
Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
page_endio() is not used anymore. Remove it.
Link: https://lkml.kernel.org/r/20230510124716.73655-1-p.raghav@samsung.com
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Check the write offset end bounds before using it as the offset into the
pivot array. This avoids a possible out-of-bounds access on the pivot
array if the write extends to the last slot in the node, in which case the
node maximum should be used as the end pivot.
akpm: this doesn't affect any current callers, but new users of mapletree
may encounter this problem if backported into earlier kernels, so let's
fix it in -stable kernels in case of this.
Link: https://lkml.kernel.org/r/20230506024752.2550-1-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
All other instances of gup_huge_pXd() perform the unshare check, so update
the PGD-specific function to do so as well.
While checking pgd_write() might seem unusual, this function already
performs such a check via pgd_access_permitted() so this is in line with
the existing implementation.
David said:
: This change makes unshare handling across all GUP-fast variants
: consistent, which is desirable as GUP-fast is complicated enough
: already even when consistent.
:
: This function was the only one I seemed to have missed (or left out and
: forgot why -- maybe because it's really dead code for now). The COW
: selftest would identify the problem, so far there was no report.
: Either the selftest wasn't run on corresponding architectures with that
: hugetlb size, or that code is still dead code and unused by
: architectures.
:
: the original commit(s) that added unsharing explain why we care about
: these checks:
:
: a7f226604170acd6 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page")
: 84209e87c6963f92 ("mm/gup: reliable R/O long-term pinning in COW mappings")
Link: https://lkml.kernel.org/r/cb971ac8dd315df97058ea69442ecc007b9a364a.1683381545.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Set the 'empty' bool directly from the result of the function that
determines its value instead of adding additional logic.
Link: https://lkml.kernel.org/r/20230126215125.4069751-13-kbusch@meta.com
Fixes: 2d55c16c0c54 ("dmapool: create/destroy cleanup")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Calling hugetlb_set_vma_policy() later avoids setting the vma policy
and then dropping it on a page cache hit.
Link: https://lkml.kernel.org/r/20230502235622.3652586-1-ackerleytng@google.com
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Test cachestat on a newly created file, /dev/ files, /proc/ files and a
directory. Also test on a shmem file (which can also be tested with
huge pages since tmpfs supports huge pages).
[colin.i.king@gmail.com: fix spelling mistake "trucate" -> "truncate"]
Link: https://lkml.kernel.org/r/20230505110855.2493457-1-colin.i.king@gmail.com
[mpe@ellerman.id.au: avoid excessive stack allocation]
Link: https://lkml.kernel.org/r/877ctfa6yv.fsf@mail.lhotse
Link: https://lkml.kernel.org/r/20230503013608.2431726-4-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
cachestat is previously only wired in for x86 (and architectures using
the generic unistd.h table):
https://lore.kernel.org/lkml/20230503013608.2431726-1-nphamcs@gmail.com/
This patch wires cachestat in for all the other architectures.
[nphamcs@gmail.com: wire up cachestat for arm64]
Link: https://lkml.kernel.org/r/20230511092843.3896327-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20230510195806.2902878-1-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is currently no good way to query the page cache state of large file
sets and directory trees. There is mincore(), but it scales poorly: the
kernel writes out a lot of bitmap data that userspace has to aggregate,
when the user really doesn not care about per-page information in that
case. The user also needs to mmap and unmap each file as it goes along,
which can be quite slow as well.
Some use cases where this information could come in handy:
* Allowing database to decide whether to perform an index scan or
direct table queries based on the in-memory cache state of the
index.
* Visibility into the writeback algorithm, for performance issues
diagnostic.
* Workload-aware writeback pacing: estimating IO fulfilled by page
cache (and IO to be done) within a range of a file, allowing for
more frequent syncing when and where there is IO capacity, and
batching when there is not.
* Computing memory usage of large files/directory trees, analogous to
the du tool for disk usage.
More information about these use cases could be found in the following
thread:
https://lore.kernel.org/lkml/20230315170934.GA97793@cmpxchg.org/
This patch implements a new syscall that queries cache state of a file and
summarizes the number of cached pages, number of dirty pages, number of
pages marked for writeback, number of (recently) evicted pages, etc. in a
given range. Currently, the syscall is only wired in for x86
architecture.
NAME
cachestat - query the page cache statistics of a file.
SYNOPSIS
#include <sys/mman.h>
struct cachestat_range {
__u64 off;
__u64 len;
};
struct cachestat {
__u64 nr_cache;
__u64 nr_dirty;
__u64 nr_writeback;
__u64 nr_evicted;
__u64 nr_recently_evicted;
};
int cachestat(unsigned int fd, struct cachestat_range *cstat_range,
struct cachestat *cstat, unsigned int flags);
DESCRIPTION
cachestat() queries the number of cached pages, number of dirty
pages, number of pages marked for writeback, number of evicted
pages, number of recently evicted pages, in the bytes range given by
`off` and `len`.
An evicted page is a page that is previously in the page cache but
has been evicted since. A page is recently evicted if its last
eviction was recent enough that its reentry to the cache would
indicate that it is actively being used by the system, and that
there is memory pressure on the system.
These values are returned in a cachestat struct, whose address is
given by the `cstat` argument.
The `off` and `len` arguments must be non-negative integers. If
`len` > 0, the queried range is [`off`, `off` + `len`]. If `len` ==
0, we will query in the range from `off` to the end of the file.
The `flags` argument is unused for now, but is included for future
extensibility. User should pass 0 (i.e no flag specified).
Currently, hugetlbfs is not supported.
Because the status of a page can change after cachestat() checks it
but before it returns to the application, the returned values may
contain stale information.
RETURN VALUE
On success, cachestat returns 0. On error, -1 is returned, and errno
is set to indicate the error.
ERRORS
EFAULT cstat or cstat_args points to an invalid address.
EINVAL invalid flags.
EBADF invalid file descriptor.
EOPNOTSUPP file descriptor is of a hugetlbfs file
[nphamcs@gmail.com: replace rounddown logic with the existing helper]
Link: https://lkml.kernel.org/r/20230504022044.3675469-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20230503013608.2431726-3-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "cachestat: a new syscall for page cache state of files",
v13.
There is currently no good way to query the page cache statistics of large
files and directory trees. There is mincore(), but it scales poorly: the
kernel writes out a lot of bitmap data that userspace has to aggregate,
when the user really does not care about per-page information in that
case. The user also needs to mmap and unmap each file as it goes along,
which can be quite slow as well.
Some use cases where this information could come in handy:
* Allowing database to decide whether to perform an index scan or direct
table queries based on the in-memory cache state of the index.
* Visibility into the writeback algorithm, for performance issues
diagnostic.
* Workload-aware writeback pacing: estimating IO fulfilled by page cache
(and IO to be done) within a range of a file, allowing for more
frequent syncing when and where there is IO capacity, and batching
when there is not.
* Computing memory usage of large files/directory trees, analogous to
the du tool for disk usage.
More information about these use cases could be found in this thread:
https://lore.kernel.org/lkml/20230315170934.GA97793@cmpxchg.org/
This series of patches introduces a new system call, cachestat, that
summarizes the page cache statistics (number of cached pages, dirty pages,
pages marked for writeback, evicted pages etc.) of a file, in a specified
range of bytes. It also include a selftest suite that tests some typical
usage. Currently, the syscall is only wired in for x86 architecture.
This interface is inspired by past discussion and concerns with fincore,
which has a similar design (and as a result, issues) as mincore. Relevant
links:
https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04207.htmlhttps://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04209.html
I have also developed a small tool that computes the memory usage of files
and directories, analogous to the du utility. User can choose between
mincore or cachestat (with cachestat exporting more information than
mincore). To compare the performance of these two options, I benchmarked
the tool on the root directory of a Meta's server machine, each for five
runs:
Using cachestat
real -- Median: 33.377s, Average: 33.475s, Standard Deviation: 0.3602
user -- Median: 4.08s, Average: 4.1078s, Standard Deviation: 0.0742
sys -- Median: 28.823s, Average: 28.8866s, Standard Deviation: 0.2689
Using mincore:
real -- Median: 102.352s, Average: 102.3442s, Standard Deviation: 0.2059
user -- Median: 10.149s, Average: 10.1482s, Standard Deviation: 0.0162
sys -- Median: 91.186s, Average: 91.2084s, Standard Deviation: 0.2046
I also ran both syscalls on a 2TB sparse file:
Using cachestat:
real 0m0.009s
user 0m0.000s
sys 0m0.009s
Using mincore:
real 0m37.510s
user 0m2.934s
sys 0m34.558s
Very large files like this are the pathological case for mincore. In
fact, to compute the stats for a single 2TB file, mincore takes as long as
cachestat takes to compute the stats for the entire tree! This could
easily happen inadvertently when we run it on subdirectories. Mincore is
clearly not suitable for a general-purpose command line tool.
Regarding security concerns, cachestat() should not pose any additional
issues. The caller already has read permission to the file itself (since
they need an fd to that file to call cachestat). This means that the
caller can access the underlying data in its entirety, which is a much
greater source of information (and as a result, a much greater security
risk) than the cache status itself.
The latest API change (in v13 of the patch series) is suggested by Jens
Axboe. It allows for 64-bit length argument, even on 32-bit architecture
(which is previously not possible due to the limit on the number of
syscall arguments). Furthermore, it eliminates the need for compatibility
handling - every user can use the same ABI.
This patch (of 4):
In preparation for computing recently evicted pages in cachestat, refactor
workingset_refault and lru_gen_refault to expose a helper function that
would test if an evicted page is recently evicted.
[penguin-kernel@I-love.SAKURA.ne.jp: add missing rcu_read_unlock() in lru_gen_refault()]
Link: https://lkml.kernel.org/r/610781bc-cf11-fc89-a46f-87cb8235d439@I-love.SAKURA.ne.jp
Link: https://lkml.kernel.org/r/20230503013608.2431726-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20230503013608.2431726-2-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Before commit 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the
charge path"), all memcg oom killers were delayed to page fault path. And
the explicit wakeup is used in this case:
thread A:
...
if (locked) { // complete oom-kill, hold the lock
mem_cgroup_oom_unlock(memcg);
...
}
...
thread B:
...
if (locked && !memcg->oom_kill_disable) {
...
} else {
schedule(); // can't acquire the lock
...
}
...
The reason is that thread A kicks off the OOM-killer, which leads to
wakeups from the uncharges of the exiting task. But thread B is not
guaranteed to see them if it enters the OOM path after the OOM kills but
before thread A releases the lock.
Now only oom_kill_disable case is handled from the #PF path. In that case
it is userspace to trigger the wake up not the #PF path itself. All
potential paths to free some charges are responsible to call
memcg_oom_recover() , so the explicit wakeup is not needed in the
mem_cgroup_oom_synchronize() path which doesn't release any memory itself.
Link: https://lkml.kernel.org/r/20230419030739.115845-2-haifeng.xu@shopee.com
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mem_cgroup_oom_synchronize() is only used when the memcg oom handling is
handed over to the edge of the #PF path. Since commit 29ef680ae7c2
("memcg, oom: move out_of_memory back to the charge path") this is the
case only when the kernel memcg oom killer is disabled
(current->memcg_in_oom is only set if memcg->oom_kill_disable). Therefore
a check for oom_kill_disable in mem_cgroup_oom_synchronize() is not
required.
Link: https://lkml.kernel.org/r/20230419030739.115845-1-haifeng.xu@shopee.com
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Previous patches removed the only caller of cgroup_rstat_flush_atomic().
Remove the function and simplify the code.
Link: https://lkml.kernel.org/r/20230421174020.2994750-6-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Previous patches removed all callers of mem_cgroup_flush_stats_atomic().
Remove the function and simplify the code.
Link: https://lkml.kernel.org/r/20230421174020.2994750-5-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, we approximate the root usage by adding the memcg stats for
anon, file, and conditionally swap (for memsw). To read the memcg stats
we need to invoke an rstat flush. rstat flushes can be expensive, they
scale with the number of cpus and cgroups on the system.
mem_cgroup_usage() is called by memcg_events()->mem_cgroup_threshold()
with irqs disabled, so such an expensive operation with irqs disabled can
cause problems.
Instead, approximate the root usage from global state. This is not 100%
accurate, but the root usage has always been ill-defined anyway.
Link: https://lkml.kernel.org/r/20230421174020.2994750-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The previous patch moved the wb_over_bg_thresh()->mem_cgroup_wb_stats()
code path in wb_writeback() outside the lock section. We no longer need
to flush the stats atomically. Flush the stats non-atomically.
Link: https://lkml.kernel.org/r/20230421174020.2994750-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "cgroup: eliminate atomic rstat flushing", v5.
A previous patch series [1] changed most atomic rstat flushing contexts to
become non-atomic. This was done to avoid an expensive operation that
scales with # cgroups and # cpus to happen with irqs disabled and
scheduling not permitted. There were two remaining atomic flushing
contexts after that series. This series tries to eliminate them as well,
eliminating atomic rstat flushing completely.
The two remaining atomic flushing contexts are:
(a) wb_over_bg_thresh()->mem_cgroup_wb_stats()
(b) mem_cgroup_threshold()->mem_cgroup_usage()
For (a), flushing needs to be atomic as wb_writeback() calls
wb_over_bg_thresh() with a spinlock held. However, it seems like the call
to wb_over_bg_thresh() doesn't need to be protected by that spinlock, so
this series proposes a refactoring that moves the call outside the lock
criticial section and makes the stats flushing in mem_cgroup_wb_stats()
non-atomic.
For (b), flushing needs to be atomic as mem_cgroup_threshold() is called
with irqs disabled. We only flush the stats when calculating the root
usage, as it is approximated as the sum of some memcg stats (file, anon,
and optionally swap) instead of the conventional page counter. This
series proposes changing this calculation to use the global stats instead,
eliminating the need for a memcg stat flush.
After these 2 contexts are eliminated, we no longer need
mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can
remove them and simplify the code.
[1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/
This patch (of 5):
wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
flush, which can be expensive on large systems. Currently,
wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
have to do the rstat flush atomically. On systems with a lot of
cpus and/or cgroups, this can cause us to disable irqs for a long time,
potentially causing problems.
Move the call to wb_over_bg_thresh() outside the lock section in
preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
The list_empty(&wb->work_list) check should be okay outside the lock
section of wb->list_lock as it is protected by a separate lock
(wb->work_lock), and wb_over_bg_thresh() doesn't seem like it is
modifying any of wb->b_* lists the wb->list_lock is protecting.
Also, the loop seems to be already releasing and reacquring the
lock, so this refactoring looks safe.
Link: https://lkml.kernel.org/r/20230421174020.2994750-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20230421174020.2994750-2-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
__pageblock_pfn_to_page() currently performs both pfn_valid check and
pfn_to_online_page(). The former one is redundant because the latter is a
stronger check. Drop pfn_valid().
Link: https://lkml.kernel.org/r/c3868b58c6714c09a43440d7d02c7b4eed6e03f6.1682342634.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For the /proc/sys/vm/compact_memory file, the admin-guide states: When 1
is written to the file, all zones are compacted such that free memory is
available in contiguous blocks where possible. This can be important for
example in the allocation of huge pages although processes will also
directly compact memory as required
But it was not strictly followed, writing any value would cause all zones
to be compacted.
It has been slightly optimized to comply with the admin-guide. Enforce
the 1 on the unlikely chance that the sysctl handler is ever extended to
do something different.
Commit ef4984384172 ("mm/compaction: remove unused variable
sysctl_compact_memory") has also been optimized a bit here, as the
declaration in the external header file has been eliminated, and
sysctl_compact_memory also needs to be verified.
[akpm@linux-foundation.org: add __read_mostly, per Mel]
Link: https://lkml.kernel.org/r/tencent_DFF54DB2A60F3333F97D3F6B5441519B050A@qq.com
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: William Lam <william.lam@bytedance.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Fu Wei <wefu@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "memcg: OOM log improvements", v2.
This short patch series brings back some cgroup v1 stats in OOM logs
that were unnecessarily changed before. It also makes memcg OOM logs
less reliant on printk() internals.
This patch (of 2):
Commit c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM")
made sure we dump all the stats in memory.stat during a cgroup OOM, but it
also introduced a slight behavioral change. The code used to print the
non-hierarchical v1 cgroup stats for the entire cgroup subtree, now it
only prints the v2 cgroup stats for the cgroup under OOM.
For cgroup v1 users, this introduces a few problems:
(a) The non-hierarchical stats of the memcg under OOM are no longer
shown.
(b) A couple of v1-only stats (e.g. pgpgin, pgpgout) are no longer
shown.
(c) We show the list of cgroup v2 stats, even in cgroup v1. This list
of stats is not tracked with v1 in mind. While most of the stats seem
to be working on v1, there may be some stats that are not fully or
correctly tracked.
Although OOM log is not set in stone, we should not change it for no
reason. When upgrading the kernel version to a version including commit
c8713d0b2312 ("mm: memcontrol: dump memory.stat during cgroup OOM"), these
behavioral changes are noticed in cgroup v1.
The fix is simple. Commit c8713d0b2312 ("mm: memcontrol: dump memory.stat
during cgroup OOM") separated stats formatting from stats display for v2,
to reuse the stats formatting in the OOM logs. Do the same for v1.
Move the v2 specific formatting from memory_stat_format() to
memcg_stat_format(), add memcg1_stat_format() for v1, and make
memory_stat_format() select between them based on cgroup version. Since
memory_stat_show() now works for both v1 & v2, drop memcg_stat_show().
Link: https://lkml.kernel.org/r/20230428132406.2540811-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20230428132406.2540811-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, we format all the memcg stats into a buffer in
mem_cgroup_print_oom_meminfo() and use pr_info() to dump it to the logs.
However, this buffer is large in size. Although it is currently working
as intended, ther is a dependency between the memcg stats buffer and the
printk record size limit.
If we add more stats in the future and the buffer becomes larger than the
printk record size limit, or if the prink record size limit is reduced,
the logs may be truncated.
It is safer to use seq_buf_do_printk(), which will automatically break up
the buffer at line breaks and issue small printk() calls.
Refactor the code to move the seq_buf from memory_stat_format() to its
callers, and use seq_buf_do_printk() to print the seq_buf in
mem_cgroup_print_oom_meminfo().
Link: https://lkml.kernel.org/r/20230428132406.2540811-2-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The MIGRATE_SYNC_LIGHT mode is intended to block for things that will
finish quickly but not for things that will take a long time. Exactly how
long is too long is not well defined, but waits of tens of milliseconds is
likely non-ideal.
When putting a Chromebook under memory pressure (opening over 90 tabs on a
4GB machine) it was fairly easy to see delays waiting for some locks in
the kcompactd code path of > 100 ms. While the laptop wasn't amazingly
usable in this state, it was still limping along and this state isn't
something artificial. Sometimes we simply end up with a lot of memory
pressure.
Putting the same Chromebook under memory pressure while it was running
Android apps (though not stressing them) showed a much worse result (NOTE:
this was on a older kernel but the codepaths here are similar). Android
apps on ChromeOS currently run from a 128K-block, zlib-compressed,
loopback-mounted squashfs disk. If we get a page fault from something
backed by the squashfs filesystem we could end up holding a folio lock
while reading enough from disk to decompress 128K (and then decompressing
it using the somewhat slow zlib algorithms). That reading goes through
the ext4 subsystem (because it's a loopback mount) before eventually
ending up in the block subsystem. This extra jaunt adds extra overhead.
Without much work I could see cases where we ended up blocked on a folio
lock for over a second. With more extreme memory pressure I could see up
to 25 seconds.
We considered adding a timeout in the case of MIGRATE_SYNC_LIGHT for the
two locks that were seen to be slow [1] and that generated much
discussion. After discussion, it was decided that we should avoid waiting
for the two locks during MIGRATE_SYNC_LIGHT if they were being held for
IO. We'll continue with the unbounded wait for the more full SYNC modes.
With this change, I couldn't see any slow waits on these locks with my
previous testcases.
NOTE: The reason I stated digging into this originally isn't because some
benchmark had gone awry, but because we've received in-the-field crash
reports where we have a hung task waiting on the page lock (which is the
equivalent code path on old kernels). While the root cause of those
crashes is likely unrelated and won't be fixed by this patch, analyzing
those crash reports did point out these very long waits seemed like
something good to fix. With this patch we should no longer hang waiting
on these locks, but presumably the system will still be in a bad shape and
hang somewhere else.
[1] https://lore.kernel.org/r/20230421151135.v2.1.I2b71e11264c5c214bc59744b9e13e4c353bc5714@changeid
Link: https://lkml.kernel.org/r/20230428135414.v3.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A memcg pointer in the percpu stock can be accessed by drain_all_stock()
from another cpu in a lockless way. In theory it might lead to an issue,
similar to the one which has been discovered with stock->cached_objcg,
where the pointer was zeroed between the check for being NULL and
dereferencing. In this case the issue is unlikely a real problem, but to
make it bulletproof and similar to stock->cached_objcg, let's annotate all
accesses to stock->cached with READ_ONCE()/WTRITE_ONCE().
Link: https://lkml.kernel.org/r/20230502160839.361544-2-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Prevent a bogus setting for the number of HT siblings, which is caused
by the CPUID evaluation trainwreck of X86. That recomputes the value
for each CPU, so the last CPU "wins". That can cause completely bogus
sibling values.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBxMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodfbEACp+oRBD2zIcmf8kpakMxIbH2CkDAxl
AgqGHYVKAvsqKPQZhYmOFEsk+0isMzssuVaub9gMeLRmpMQxUqv7K20pluvWmuQm
h7MTYEAvCZpBU2BGB1mixp57tQ/gpLSB4uaJU2Q5hh9po6uImvalIdqThf7ArxgA
mm3BhtbFObLGwRaV2981zdsEs8Cjs9RusnAOrX5LbBBb6ibcnrqWy7DAhySMA0zr
7RfaGyG/T9z6/BgX54FBQ7aDNU/RkZ8YYQoMj0kAFXdM3V+sa3oPfMRQpACPHKm4
kwskJfWVdMq06kgOD7N9+WrOfBAgIsmzasT6VhTuGAGNo4CiEOthLqXDHwf4NLhN
hj0XReHVyBiQ1KuI2lGNJ71c30KRkh7SCdGNiEFtnlpteXnS4bRnzWtfMf/+2SCC
WOziRRYRzuAvvoTwoQS1BYJrDAB9f5jOX8FTYbC12rEk/RPOB6t4iFNLUzid7u1H
yPYphoftXlQlli7EajVc5pSZCftMHRCzWernptiPFU5tMvyagYppcg25JY54rquC
CTJQqoa/HPAEZhBjgZElwO9QBgu+VyJUq3R/Z+LbkQuz5z+PVDfqeDuDWhaqbcz1
WGtdGdNo4bqVhwqgwyHjgKCFe05mF6tPuotnYZE/VSkXUCvLkjGEVg8HQ7WiPhJd
IYDqxkIhX5h1Mg==
=l43p
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu fix from Thomas Gleixner:
"A single fix for x86:
- Prevent a bogus setting for the number of HT siblings, which is
caused by the CPUID evaluation trainwreck of X86. That recomputes
the value for each CPU, so the last CPU "wins". That can cause
completely bogus sibling values"
* tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
- Make the CHA discovery based on MSR readout to work around broken
discovery tables in some SPR firmwares.
- Prevent saving PEBS configuration which has software bits set that
cause a crash when restored into the relevant MSR.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBlETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofUsEAC6VMmscn6MDmCpO71+o80IhFUeN/Pa
/Qu4kKCQmeT/gonMG+bKCYSNobgdi4a7KNDRZypTjzMORTDmS3lT4aTbzy3ry+eI
7rUTolAo3HiY0ZwPhU/evrGDf0O2d/19nd9pXiLZFinH9WsMf7mKsXhAJvlq6SKu
SliGCQlB7+0bicdYgBkJqWPUbKY7QvcCcXiPx1Ujxg7CbMTcuj7MLzu/TVYmwZYH
Je/k+vBnhl54PF9nGiqNYZGBPh4cBgkpHEFGFrvwplBRBdwHrjapNxhIRcA2N66u
mr+f7Cl6StOP16Bn4dzG9nHLVTDz5NVgsbtPStVTcOoIzfOhch7mOL+tu/pafXdt
zRL7U6usfbQqt1rcBBtPoeFlEifQwHoz3ZGFSyIZn5IRfXtXlDJHyPxlRYMB5/uz
WjR10GzE0tZFLetvW4PdlMHcwwrOZNU+crvADmCk4KzzVDip9QLbomiGfxHVHI2J
Fjstd6ZaNkuDHRy/r76YfwRDDzQWPvtsaWRLzD4aFQQxYdHHZOabBittK9UvENvu
MjLRO+1ymN52Ap6TdcK9ZMNpb8ol/Xn5aYivNRlHsUjJ8RxJj94sElIlNHLI2yoa
hICdMcGSCd7H+FIkXdgT2O8OBxEsf139xSrG2oSTOZFCjkR1BSxtyqNY47MFVSiA
KBtbnJMkde48pg==
=Z3GU
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A small set of perf fixes:
- Make the MSR-readout based CHA discovery work around broken
discovery tables in some SPR firmwares.
- Prevent saving PEBS configuration which has software bits set that
cause a crash when restored into the relevant MSR"
* tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Correct the number of CHAs on SPR
perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
- Ensure that the stack pointer on x86 is aligned again so that the
unwinder does not read past the end of the stack
- Discard .note.gnu.property section which has a pointlessly different
alignment than the other note sections. That confuses tooling of all
sorts including readelf, libbpf and pahole.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBW0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoczHD/4vCopMPBFsT5w5HhSTQSX5Kglx3UQL
g+qa0Z/uJKLpMgGkBzWBQGXHH/UqyY4sk5T3DDOkVQjdSfD+zmJu4R44wAonEU+y
SDW2MeBciyp1WtjwPNWNY9QgDUQj7Cr2dSLTVqBR9akladTcj7EKHApq0pgaqsbi
MnmfXzPyH17PRfyW8FbsBjdQvhpkYZSZntQ0cU+W5VtsUtZvg2w4I8IU76ri9L0E
OYBbk2zV8RabGVJBv79fnm4fXb78+Zkp/xvs5sTBRKRS+3T5FNNJjo8tNp1AtxdF
eJMlOeY7PjSFyAL6+/+/uRqYNCnH4Rw2MlMqJJIhzKYCw7BFo0FAhi+mro63Z85b
byGuriv1g7suOgOCK8IzTl7e1nTDP4YCZ0cMDDvcRceIiqR6Rrk3C/B5cBVz0Bng
iFYFUJSlLT4G+tGZwupV2UUGsCwS5+vVCG/FVWes8Mz4g4zr2Dmrh6YMLzrcDshE
eAY2yKMypD/JD9cJa4KOVJuFARaSDJDiBpsO1BjRC6MxLFaw1biB0mE2hkwn3LJe
fLVvU/sWnIQ+dQoy7+GBzm7Pi5SmKBuLDsgOt/ocbCwfUwUQMGf+I7bNg+08UOOS
dNzM6/agdd8rLZTS/Ma7EvnA367ZwVOb5COVMqwxAoKZbPkvgz1xvRGkFQCHYapx
BwMILF9FIM0kvg==
=TPwY
-----END PGP SIGNATURE-----
Merge tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull unwinder fixes from Thomas Gleixner:
"A set of unwinder and tooling fixes:
- Ensure that the stack pointer on x86 is aligned again so that the
unwinder does not read past the end of the stack
- Discard .note.gnu.property section which has a pointlessly
different alignment than the other note sections. That confuses
tooling of all sorts including readelf, libbpf and pahole"
* tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/show_trace_log_lvl: Ensure stack pointer is aligned, again
vmlinux.lds.h: Discard .note.gnu.property section
- Prevent that the allocation path wakes up kswapd. That's a long
standing issue due to the GFP_ATOMIC allocation flag. As debug objects
can be invoked from pretty much any context waking kswapd can end up
in arbitrary lock chains versus the waitqueue lock.
- Correct the explicit lockdep wait-type violation in
debug_object_fill_pool().
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzCBQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoa8FD/sFaHGSVtNTYgkV75umETMWbx+nR0Sp
Y/i62MswIWU/DWmD9IKaBxlHpBByHgopBAozDnUix6RfQvf8V/GSU6PWa9HAR2QH
rYwQCN/2/e8yQNAFv+9AiYGzPU3fRI/z7rYgfhhiWoLjivMFUCXypjBG0BAiCBxC
pYKZDMhBeySIUjtEL6xjcflA8XXKuLUPGy1WeKBxRgJeNvM0GlbifNXoy0JaXBso
NK+1FOG7zm05r2RqZjN0rAVRrrdgA4JYygpYC8YmzePoFQVXLeUnlbjjW9uYX+hz
MoLuVeF+rKk9NHNu3NoD4kFgrNp3NXAAAzH1MJwIADy9THtsyWAeEgyUkkie9aiX
Oa8eSjpJQjUv5h+VRKpMhh2RAAAhCYDuX/QC2FLImLy+GRF3dMhsAmuYgKXN2kHa
CFkM84vStMiMVxKhwtLpxVE7VOrxzXxbqMO65kMrCXYxK1SfKtEZr8FrORvUjU7G
MmH+D9sB034nkCBU+oGMsMYAAzB4rLp5Cw9qqvwWLfJvWLcUoPxjgUV6hLR6mNXx
6+2133Tf68Fz4TgyEDN9XhQ7QEsKKGTTDMJ5JYolnrRe54sUJSsX+44khrbocSde
WcEfcwhR+mjDDx0eVB2oT9bedxMf639mqPNn//EqJkzS4s+sECC8OiHbdvL3ArUq
S92nrMxvyMB42Q==
=7B4m
-----END PGP SIGNATURE-----
Merge tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects fixes from Thomas Gleixner:
"Two fixes for debugobjects:
- Prevent the allocation path from waking up kswapd.
That's a long standing issue due to the GFP_ATOMIC allocation flag.
As debug objects can be invoked from pretty much any context waking
kswapd can end up in arbitrary lock chains versus the waitqueue
lock
- Correct the explicit lockdep wait-type violation in
debug_object_fill_pool()"
* tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Don't wake up kswapd from fill_pool()
debugobjects,locking: Annotate debug_object_fill_pool() wait type violation
- Prevent loss of state in the MIPS GIC interrupt controller.
- Disable pseudo NMIs on Mediatek based Chromebooks as they have
firmware issues which cause instantenous chrashes and freezes
wen pseudo NMIs are used.
- Fix the error handling path in the MBIGEN driver and a defined but not
used warning in the meson-gpio interrupt chip driver.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRzBAQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoT/7D/9CDePULfT8t58VwpS/ZXGB1pAYhyWX
FyQnxGeFz+H0NEylhB5LhqkmCFq60IZ+fBIYS9LmBU9GSJIvVXnp62SOPDmFmMwj
fAj6y5JfpOxVZNb8SJ+JGVvwm2henRvOgeYKB4R/APk37dJcWzPruv/E64J7z0BC
IeqFdq2cvkTnEDgE3Fnt6kZ2/iS8gd+Vtp/O/+pzqbt3u3vcogygSpNE8WE8Y+WE
fF2+97EZx4oQRwIltNjaLBeW63ycQzx2+UCMy/84QYsTfi/wlquGcKWrBYwx+CDf
XqrYMK6dMXW22o3VyrMxM6Jrd4raU7KsxWWOpqhClNabQjNbNgFdnMH56lJPFoqx
84tr2+RnxL1bGHQDiiQtCgBfRe0BGm82qUlQkKDXwcmBIDNm2rfeE1xGG6fHsylk
lPT3dQBR3DvG5EkxKIe3BF6ZPlwhmCe76zsU1Dcf2tVoDZhN66Ck+/7VboTS1Ibb
d/iR5gId0NUc3KFGQCPzo4roY2p+sp/PqZNm+U5aLZbfmFHsDPCQoh/N17y3Trh/
A4UuVwIy1JklosXvozb040M/rPIPZoei12nEQARYCR2VxafLO3d3OuE9Kp5tzOwb
jCD4Img53VzYcf5KTcj0FgHcPNL18+dRZ3/vn1X27BlPWKRjfTTVOcQbs6cyZuo4
TmjaMYwX2vR7tQ==
=OHtk
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Prevent loss of state in the MIPS GIC interrupt controller
- Disable pseudo NMIs on Mediatek based Chromebooks as they have
firmware issues which cause instantenous chrashes and freezes wen
pseudo NMIs are used
- Fix the error handling path in the MBIGEN driver and a defined but
not used warning in the meson-gpio interrupt chip driver"
* tag 'irq-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mbigen: Unify the error handling in mbigen_of_create_domain()
irqchip/meson-gpio: Mark OF related data as maybe unused
irqchip/mips-gic: Use raw spinlock for gic_lock
irqchip/mips-gic: Don't touch vl_map if a local interrupt is not routable
irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues
dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for Mediatek SoCs w/ broken FW
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZHGVRAAKCRCAXGG7T9hj
vqtJAQDizKasLE7tSnfs/FrZ/4xPaDLe3bQifMx2C1dtYCjRcAD/ciZSa1L0WzZP
dpEZnlYRzsR3bwLktQEMQFOvlbh1SwE=
=K860
-----END PGP SIGNATURE-----
Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a double free fix in the Xen pvcalls backend driver
- a fix for a regression causing the MSI related sysfs entries to not
being created in Xen PV guests
- a fix in the Xen blkfront driver for handling insane input data
better
* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/pci/xen: populate MSI sysfs entries
xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
xen/blkfront: Only check REQ_FUA for writes
Here are some small driver fixes for 6.4-rc4. They are just two
different types:
- binder fixes and reverts for reported problems and regressions in
the binder "driver".
- coresight driver fixes for reported problems.
All of these have been in linux-next for over a week with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZHG/lQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylPvACgx7Rq0HOFp4k984U/Z+cGzX2wSPIAn2W+6SKx
0Wgv1hIeXQb/p4LO+U2j
=LyzR
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are some small driver fixes for 6.4-rc4. They are just two
different types:
- binder fixes and reverts for reported problems and regressions in
the binder "driver".
- coresight driver fixes for reported problems.
All of these have been in linux-next for over a week with no reported
problems"
* tag 'char-misc-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: fix UAF of alloc->vma in race with munmap()
binder: add lockless binder_alloc_(set|get)_vma()
Revert "android: binder: stop saving a pointer to the VMA"
Revert "binder_alloc: add missing mmap_lock calls when using the VMA"
binder: fix UAF caused by faulty buffer cleanup
coresight: perf: Release Coresight path when alloc trace id failed
coresight: Fix signedness bug in tmc_etr_buf_insert_barrier_packet()
- Stop trusting capacity data before the "media ready" indication
- Add missing HDM decoder capability enable for the cold-plug case
- Fix a debug message induced crash
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZHFLXgAKCRDfioYZHlFs
Z2XxAQDaJl6CVwalMooLpnDN15eiGH2CFDRec9FK62ZH+m27CgD/Zx5QMZUp6YBC
b9A3Tmx1MzIekO1CYRq9vi4ybXMCAAg=
=cmO2
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link fixes from Dan Williams:
"The 'media ready' series prevents the driver from acting on bad
capacity information, and it moves some checks earlier in the init
sequence which impacts topics in the queue for 6.5.
Additional hotplug testing uncovered a missing enable for memory
decode. A debug crash fix is also included.
Summary:
- Stop trusting capacity data before the "media ready" indication
- Add missing HDM decoder capability enable for the cold-plug case
- Fix a debug message induced crash"
* tag 'cxl-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Explicitly initialize resources when media is not ready
cxl/port: Fix NULL pointer access in devm_cxl_add_port()
cxl: Move cxl_await_media_ready() to before capacity info retrieval
cxl: Wait Memory_Info_Valid before access memory related info
cxl/port: Enable the HDM decoder capability for switch ports
There have not been a lot of fixes for for the soc tree in 6.4, but
these have been sitting here for too long.
For the devicetree side, there is one minor warning fix for vexpress,
the rest all all for the the NXP i.MX platforms: SoC specific bugfixes
for the iMX8 clocks and its USB-3.0 gadget device, as well as board
specific fixes for regulators and the phy on some of the i.MX boards.
The microchip risc-v and arm32 maintainers now also add a shared
maintainer file entry for the arm64 parts.
The remaining fixes are all for firmware drivers, addressing mistakes in
the optee, scmi and ff-a firmware driver implementation, mostly in the
error handling code, incorrect use of the alloc_workqueue() interface in
SCMI, and compatibility with corner cases of the firmware implementation.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmRxHzQACgkQYKtH/8kJ
UidW5g//W9PQIG70mWZr1uDIluUyyEnSp4gsuPHgDSSrQS7paSud3E8FF7ofmiKQ
57Tuv+y9vVJtmzfWlR+L1VybPrOSGAsuzqag9/inwcfew7yPlOU9Z9Dse1v2/ClA
GWuMsWjcQmCBgmluSMBupt6ZWbz3cZaLcJAuGmIYzUbeL/Cx45EnMnjc3ESpa6Ca
ow3t/uoAk+KSgjMLyf2bec8BUiueRCJrqQmYHAg4utvehfTUl98epinRbtoi0VDR
5HYlYvYLi0u0em14Nx1CaNVgvM13C5/LOoJNFlDcV+x2LCW9aPCqiRi5vfalOGwj
r0NiA3cPIizJc4oiiK3+PqT5PGfBlkjJx/C6LfCZRsRnu6XegXo/prCNELOJPesD
ecRU9NXAjWup5PLSRPNRz3ekI4yySbM6JnoNEe+I7+djrSkmtjGBggwX6Is+fSY0
KkwhKEml6pI9w9zXh0f63R8QvKo91SUPIe/PgV65p4yeRf5JJgxeJRDk/dyzDfj8
rViloBfTndS762mI7l46xEs3wWY0+CcxRoh53VFe50pqHbrIMY21pjarMEhMSI9V
dFnkXcIIJwkt8TcoLOPMD0xSb0g3C2bEnAK253EeNtBkCdp7eLGYmUdTle+cOjyw
0wafZSrbtBStLby+9F4zxYbrbArqxH5GrKanuX6jyiLtYefAa2U=
=XeTU
-----END PGP SIGNATURE-----
Merge tag 'arm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There have not been a lot of fixes for for the soc tree in 6.4, but
these have been sitting here for too long.
For the devicetree side, there is one minor warning fix for vexpress,
the rest all all for the the NXP i.MX platforms: SoC specific bugfixes
for the iMX8 clocks and its USB-3.0 gadget device, as well as board
specific fixes for regulators and the phy on some of the i.MX boards.
The microchip risc-v and arm32 maintainers now also add a shared
maintainer file entry for the arm64 parts.
The remaining fixes are all for firmware drivers, addressing mistakes
in the optee, scmi and ff-a firmware driver implementation, mostly in
the error handling code, incorrect use of the alloc_workqueue()
interface in SCMI, and compatibility with corner cases of the firmware
implementation"
* tag 'arm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: update arm64 Microchip entries
arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at super speed
dt-binding: cdns,usb3: Fix cdns,on-chip-buff-size type
arm64: dts: colibri-imx8x: delete adc1 and dsp
arm64: dts: colibri-imx8x: fix iris pinctrl configuration
arm64: dts: colibri-imx8x: move pinctrl property from SoM to eval board
arm64: dts: colibri-imx8x: fix eval board pin configuration
arm64: dts: imx8mp: Fix video clock parents
ARM: dts: imx6qdl-mba6: Add missing pvcie-supply regulator
ARM: dts: imx6ull-dhcor: Set and limit the mode for PMIC buck 1, 2 and 3
arm64: dts: imx8mn-var-som: fix PHY detection bug by adding deassert delay
arm64: dts: imx8mn: Fix video clock parents
firmware: arm_ffa: Set reserved/MBZ fields to zero in the memory descriptors
firmware: arm_ffa: Fix FFA device names for logical partitions
firmware: arm_ffa: Fix usage of partition info get count flag
firmware: arm_ffa: Check if ffa_driver remove is present before executing
arm64: dts: arm: add missing cache properties
ARM: dts: vexpress: add missing cache properties
firmware: arm_scmi: Fix incorrect alloc_workqueue() invocation
optee: fix uninited async notif value
- Test for and return error for invalid pfns through the pin pages
interface. (Yan Zhao)
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmRxEkgbHGFsZXgud2ls
bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsi7OwP/RLQRC9fWJwJYAg+owh5
ExpvVvNUH+2wGKH63rjUNdfIClOG1ScvTKb6InDJYpiOuJw05T/9PAZQNUTEbFn3
/hQe7SRtpeylRAL5mG+q0hP8ZSuACVwB3Zi10kggfdEsEd47+CyhciJfPSbMtNU5
onyK19z8MBXtEy64r5rMISRTuaY5zBBzQ5PrOMDsucKoNr8d0EOionIuisDQws9Z
n5lSziCA9ie7EJAo94PYR4hVTEGBXWFWJp4T8RXPKJWw93cYvidIoPsspIW+tV+5
MXkxBDULm9y0fX9y1d04RCK2LRIRefmRx9Oa5gWKK7EFU0rrW3Uh2bScQiDj9jn8
qTsbQ2vC2/NlHrNIiQx817cIbPosGp8TeAWdbg3elM6aUTeMD0KslNo9Rj7CXxi9
GWPZYb4O4BiggR5evw9e3dC8iKrP76wuUKvXOtwXpBQ9ScbMnhpUVZQNe21K+wtu
hjBYgI+SdqSMgulbBuXRJ2u/hSAA0ybGprR3oyfA8ekZBP6tbG88CRmEvrEvV3IQ
+dMasXL08m4zvjT6Ka1BD+RPfq7r+VoX66g0+IeVCS3/LG45lYCl9wxfU2NLbWnR
bufIVqOtyZ+HQIugtUI4DxiBjXyOsbdmYaiDHKnyhDjWwdk0nIvBAU/VG5yCjgBL
nt1um8xoXemjzuYZfl4s6Y6g
=0s58
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.4-rc4' of https://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
- Test for and return error for invalid pfns through the pin pages
interface (Yan Zhao)
* tag 'vfio-v6.4-rc4' of https://github.com/awilliam/linux-vfio:
vfio/type1: check pfn valid before converting to struct page
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRw6C8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpm0aD/9h4AAwmFAhVYxcr2Lk4AXwTz1sMDnQ58P3
/OhwvogCsAV9K4venjuDYPL0IdHOq+Txh2S51DnhqGlJlU9xRxPMbTjCToUMqfCw
VjUo0aF9XfQdqPWR2/zh9Zi1YwZd0rC+KBRRvlEdWszoze0H3L8rbxBNllSzdeKl
hECLNpq8z45nzPAYateISs7Cv9OvGuqkaPbjCdHqGQ60Tdt4QIcnnPUUPfntcZHI
tjHnAc3uN4WyrEmMPlTd6MtVV6VCQKAKz65L9bvvNlESverXNv923p8TPzEdb8tR
utSD995Wxh8rBOC3+WA5+d1pYI+C1cUADmgE6QN/8jqu04No4l5Ob3+8ImPmrnGW
a0VSX/BKt63QnXgFFYfn5yBMVMkjtoFtD87WAlLK50fYJTDn+ouVzqhcTsBJ9f/5
xzyFvof8vVp83/Xlkwv/NumDTcyauBobiN6mI1mTBk0hakmxqNgb5xEDvbf5Q/M0
jpp/9nBqmnFJTZRhRVu/G1RdeqDMDN1+/ws3uc3v+K+aeDMGx44l6g3N7f7kAPor
Xk/Cky4x0KirxvpgJzhvS2mOSXbpnAKqYGK2+nIFS00dtbvUX9bERJoxTZbyihC+
AkcPpO92UvxR7nsjgQY7R/euzqD4HKEYExzImT9c3Iug9z3Mtot0FncY6Jb8CEHW
Vkh6wxsg6A==
=MvS9
-----END PGP SIGNATURE-----
Merge tag 'block-6.4-2023-05-26' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"A few fixes for the storage side of things:
- Fix bio caching condition for passthrough IO (Anuj)
- end-of-device check fix for zero sized devices (Christoph)
- Update Paolo's email address
- NVMe pull request via Keith with a single quirk addition
- Fix regression in how wbt enablement is done (Yu)
- Fix race in active queue accounting (Tian)"
* tag 'block-6.4-2023-05-26' of git://git.kernel.dk/linux:
NVMe: Add MAXIO 1602 to bogus nid list.
block: make bio_check_eod work for zero sized devices
block: fix bio-cache for passthru IO
block, bfq: update Paolo's address in maintainer list
blk-mq: fix race condition in active queue accounting
blk-wbt: fix that wbt can't be disabled by default
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRw6B0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpmudD/96eBoqSNKXwMYnByOpWTst9JdMCaIuqV50
LXJlyht+ti3jJR+K/DrFZS7o5SSPwx2nFEdpsIYmusrowusHdzrHDyX1xoVXWhwF
Gt1xVF8d8MxJ+/JkS+qSp9d1VrCApXPBruINwBHYHGjanmBUWPkkal6Dn9XIbSPg
dU6nMOIUZQdKpx/ThVjIYp+pcOo4fajVIpOJV6x+yFsF2/ySgL6bsVX0AE1XV5O4
jH3K2JtI+xkWXVfBHytPd9oK/UbyUgjVj01Ykxo6bBjjVAX0Yy2a8L8aSayK5o0J
KC1/wX2U493xhjsjGeHCmxNoJzK5yOmdLFsALyPnbLlEtdaXVCK3YPlroxgGu2hJ
UZzmZMYFEMmAXHcc3rXY6KTWHjP9idL0kx+8ncs8BoibLPJeVM3jAIbljaVeN+fU
yvQMN5TWEWypwzHtSCSm/JQzR1NxF42vw/cPoKZ1yub7B9lWtexzZqjH69S6oiuy
SL1HWSYd+LTP5m70uHCKVziONDANMKUBSUVzn63Uwpl/15PuZkx3RAe08k8LO8Eh
S0Z82lvH2qFpNGA0EiGxQviOExH0Ym76lonMLMGEdENv6eDn1mUP4hejHyCJ0Dai
+6cCiO0p+Pala4qxGevHrFn8WQ01+4ffAj6aTgyQnzcR7Cv1nmDJ+yCG69FlQj4D
JnANAsG/EA==
=Ixp+
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.4-2023-05-26' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix for the conditional schedule with the SQPOLL thread,
dropping the uring_lock if we do need to reschedule"
* tag 'io_uring-6.4-2023-05-26' of git://git.kernel.dk/linux:
io_uring: unlock sqd->lock before sq thread release CPU
Fix a regression introduced inadvertently during the 6.3 cycle by a
commit making the Intel int340x thermal driver use sysfs_emit_at()
instead of scnprintf() (Srinivas Pandruvada).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmRw3GoSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxd/8P/3/0nPD8CaK5Oa6fBQFwiR8unCTfQSh7
7Tdut0HV47W9m8gj58S3okCXT4iR3Sv9M8wWYbH81oSXrrRKCS1BOVSK65jvHq8Q
3qbnft0xOBaYzcP0zlKQ6D0YkJKTjS8c6uqdiv0xMyvYd7kdbcPACaW+9HutIp9o
k92sSWo/sbxi4aKr9BjoGULwT2AcwbW+4gchrokHh01SFqIDDWFV/lR/hjPkwMkG
HX8LjjOboy1bzy8Vdam5Q41MNKY3jdKz2qabd9kgBgXVd/rTH9D1cfYud07Hdi7A
qpkvQSaPA08jt+Kh+rAZWj4GUU058EEm2hKFg0oqc4oRMxgRcB9MlRu61XTjzhDY
XrPFboOHHD1XWU8hCV0yh3QmjxyVh3njaoSPeosPJLtZ7RFCp90T0SHxSKYCvo7C
d0TEW1JDdVWvoqZl2JB8rjSixKIZX1neKLDjuZc9Fdgx0Y2N70eR7SZTiu3wohSO
muYDHyeJ6SV9DwKWPrdVQGvjNfmMdFay7Z7WdaugTFVI475rLRF05iZyijrie3b0
ntvy4LtqEB4Lj3mdl5I6j7b72YcHMzZCuwl7DHMWwoPo4AsePKYxslpJI/wrf0CD
oUqhz8H/5ftA2VMdWjaIDB10u1pS1qVtNgLahguVc2HM5cCk7TPVw3ZjDlenXofw
GTGRPMJnGtxP
=Qxds
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix a regression introduced inadvertently during the 6.3 cycle by a
commit making the Intel int340x thermal driver use sysfs_emit_at()
instead of scnprintf() (Srinivas Pandruvada)"
* tag 'thermal-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: Add new line for UUID display
Fix 3 issues related to the ->fast_switch callback in the AMD
P-state cpufreq driver (Gautham R. Shenoy and Wyes Karny).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmRw2+cSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxW68P/2W9mcDxlepkfN9KEUgZ6ianmoRdHqSU
KpL8uB4Snavd0QQYnRUbNPeDht5cAMZT1cEzmcdkGAzGz2SpD/44LH0T8zFO24aM
SAdlo6j72VWiRmR5POvEZbKPRczCWFPlpqKkQyh227Msb5tGayo7fRa1VpRqo/AH
YwYWaW5Ur5XfU0c8E5K+P3Rj2dz+RsFzZ4chb5yd14ae2Qr8f6EnOVte0ja0bFUF
uPneqJ27U4MoDW1s/bkud32lJTeGLi8BwgIIyP6L7a8m5QdWxv+pTZMsa/NW7a+v
3+3GrXoLZp6xeBGdt0z6/04BwpcRUh6ZpVlloGeYoSMpnQpGleqeJaVepWZCAKDm
K0Gc7nLTLQUMO877oOz17cDLdDgXbIDmngw9ubEW0N7g5yOGzSCTfQyYYItN0dky
mnTMXU5TM2LktH3V1y34HTTZTZU5TDZkLzoFNAgHqmlFhEqEdaN6l5AJJ6rPt9IM
ZwJNpLz3R4rQ+XwGofebQGSlwpQVAzUrigiHJ2VJAbFxWqw4iohYkgD+1e8B/VzG
e5DygskN1FFTJ7oMXlPSDWqulCRghOillqtUWocdHJAbnkglkLpexxqUAsoNahDt
qA553sM5/lp29uNlpuz6AbqHscjOto94aC3uK3kxNUGsDy6CRcbn5VVIN5ceDVk+
SnzKHfnh65vv
=6XJv
-----END PGP SIGNATURE-----
Merge tag 'pm-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix three issues related to the ->fast_switch callback in the AMD
P-state cpufreq driver (Gautham R. Shenoy and Wyes Karny)"
* tag 'pm-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: amd-pstate: Update policy->cur in amd_pstate_adjust_perf()
cpufreq: amd-pstate: Remove fast_switch_possible flag from active driver
cpufreq: amd-pstate: Add ->fast_switch() callback
When media is not ready do not assume that the capacity information from
the identify command is valid, i.e. ->total_bytes
->partition_align_bytes ->{volatile,persistent}_only_bytes. Explicitly
zero out the capacity resources and exit early.
Given zero-init of those fields this patch is functionally equivalent to
the prior state, but it improves readability and robustness going
forward.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168506118166.3004974.13523455340007852589.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
- fix incorrect output in in-tree gpio tools
- fix a shell coding issue in gpio-sim selftests
- correctly set the permissions for debugfs attributes exposed by gpio-mockup
- fix chip name and pin count in gpio-f7188x for one of the supported models
- fix numberspace pollution when using dynamically and statically allocated
GPIOs together
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmRws3YACgkQEacuoBRx
13JHlhAAkyUNjtvkcTdO9buB2AXB3e/JH31MgSEEzoAi9kr54n70D+mtmSz5iX0X
4a1KjZrd5Gd7nSGA3AHll7jxL1Rfa4vJsTID/khfhIPB0arGl7brZ0J2fItRUGqz
oxAvpH3OpSWI+QWQMzp/NEZV0F5rfWBme93w6OqYItzGNr4LfP+3x49zPDUngOjp
d3EsDzQbgxezYvrmWjPvZuZvk3RYud+dfU8bjvtnI6TJZqIxzePmcqEAFgqsnDJF
VLEJQFPGpLaX7U+ZepcAQAs2s5ggk6Mb0JdGKp1w9hQ9w91TClBUgUURG6hzdLmZ
w6/X3iGaFGaHugA4DmWhYd4FWE7H58dWazw/RVK4RM2ZGZ9cT8R8OVQlDsBhesyG
tOTY4RJlU1DgMt66nlUvf8r3ECdm2ayvFq7qSN9xRDQ0W4Ti3+QU2+/re3s8jevR
VYAo1BHtVai28+lCMjopK+fYBe8G4DqsGx9Uh3y0RxYIFvMKdo4EQF9Ku3yTEcq3
3FNFO00KU5ktKgnx7z6bnO7+F3OvHUeSRh44U8O6pm/odfdkUn7CHjFVi2dM6yO5
763UKwMZq9rDT2owdwljolIRECb8IAN0siBhhHYYW4oAFTD2w2vTQ/y2KYrxbntv
CptwauQMM9vjzg9bxMP+4R5YVZhZhtJ4I1oa2kQx7eXIPBy/6OM=
=R4dV
-----END PGP SIGNATURE-----
Merge tag 'gpio-fixes-for-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix incorrect output in in-tree gpio tools
- fix a shell coding issue in gpio-sim selftests
- correctly set the permissions for debugfs attributes exposed by
gpio-mockup
- fix chip name and pin count in gpio-f7188x for one of the supported
models
- fix numberspace pollution when using dynamically and statically
allocated GPIOs together
* tag 'gpio-fixes-for-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio-f7188x: fix chip name and pin count on Nuvoton chip
gpiolib: fix allocation of mixed dynamic/static GPIOs
gpio: mockup: Fix mode of debugfs files
selftests: gpio: gpio-sim: Fix BUG: test FAILED due to recent change
tools: gpio: fix debounce_period_us output of lsgpio
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmRwqRsACgkQxWXV+ddt
WDuCPQ//T8JVY6usnGF/Fw/3zbtDNvrdQLDfp3HovIg7gmLIBda0bT05w4Q46FUU
l4BV0bHyTUNWPlXUmrrSmt8HipRe2z4Wjwc16azdLmSs5zf0FO1LbsCKDmM8Ncid
LTi2jzyyb3E44ZzC/i7RCaBt+vYRb2ZmtZ/glh3K4H0GgTAYl1GxZoAoYgBnvmlG
nvmlWWDaM2cRKaUREm75il37LKLIlW5jvdUFQrqwWNgUH72ay5/7SZxHywlk8x6b
qwhhp+s6bMUNzi6CqE2SLnESjI9yl0l/0gLebhDXVulo0BiCrti+YLpueP4eQs1B
yYXX3PvHOXhoN4tUQ4yDF9G57To4Gw1aiQOnWOOLcbyGG1ZgyekpoRRXh6r74LKt
FDyWT+u/xd78by1km3VzqmvKtqHnRFNMYfP+MMDIhyhy5prKCWeVo7bC+2FP+89o
kv9+0Z0w0lkLycFfLaewZkEv0/WY8GMuT7kptHQ2Ao6ulAvG+j97sgVBFGXJjeCr
B1OAGdeTF79IV139bCxPA62cat87Zrh15mZN+y7U32Vs2JkOqbT0LTQGKoVs/TCI
AyHCDb8oOfGiebibnEDrDNtubz7NFCq4ntZRmuv5FJ+l2d1wl6ZvsI+DoYP7Zide
DLR7ZtPs1Yvm27xDjs+fVmMx4nuNGikEbPZPxJro1CjLVzCEt7k=
=elHB
-----END PGP SIGNATURE-----
Merge tag 'for-6.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- handle memory allocation error in checksumming helper (reported by
syzbot)
- fix lockdep splat when aborting a transaction, add NOFS protection
around invalidate_inode_pages2 that could allocate with GFP_KERNEL
- reduce chances to hit an ENOSPC during scrub with RAID56 profiles
* tag 'for-6.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: use nofs when cleaning up aborted transactions
btrfs: handle memory allocation failure in btrfs_csum_one_bio
btrfs: scrub: try harder to mark RAID56 block groups read-only
core:
- fix drmm_mutex_init lock class
mgag200:
- fix gamma lut initialisation
pl111:
- fix FB depth on IMPD-1 framebuffer
amdgpu:
- Fix missing BO unlocking in KIQ error path
- Avoid spurious secure display error messages
- SMU13 fix
- Fix an OD regression
- GPU reset display IRQ warning fix
- MST fix
radeon:
- Fix a DP regression
i915:
- PIPEDMC disabling fix for bigjoiner config
panel:
- fix aya neo air plus quirk
sched:
- remove redundant NULL check
qaic:
- fix NNC message corruption
- Grab ch_lock during QAIC_ATTACH_SLICE_BO
- Flush the transfer list again
- Validate if BO is sliced before slicing
- Validate user data before grabbing any lock
- initialize ret variable to 0
- silence some uninitialized variable warnings
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmRwSkgACgkQDHTzWXnE
hr7biw//U0n2qc0rBv+G2qdCu7zF6kuJZ83gStBRiybTm0qynUJA+wppwSyqUEDg
EQqoIy0hMiKpcAxFQuXpLA3tk2F7UGHxEV2aaKEMhwrU+HBCDrpUqVCZjhe6DQbT
bwj+SMvPCfrHhGfPbiGw7Vsw/qPYbC5ulrsn2th9vvwDuxL1wwtYIpJzA/xhV7eO
sWjUKYVtuXSrI/oqZLzoZU6YCjf7G2wShkkSHTna/zjmHdWbYEiyqZ6huL70LJBD
fZWI5w9HM8ABMMrE4W95TubMZOwDvsG7mb6G0WnS7P4Lq6Aab5CT1I6IkBB86pYz
kSW1VMUpPIE9+E7m3OBL66L/i6xCmBZ5nEHqjsK9DYvpHX/L2IG26SdE/e6Ai29Y
aMLxi4hM45FNfAOIB9Z36JJ9gaYCXf2hL/KAmrWovVlpT1vLkH/6DZ3nWNiNuhvT
5PKKTM4w3zOHzv0R/xrb3dfg+0idSPlEd4+0bsjM5STAMc1pyp6PhEHtEI2HqFwM
vkJ+RqK/lVXnDXyjgZnjbGsDaUQ3Rcw8/G3l+79j8mJcfeRYknGw/698BGPDBcpG
9hMxxIk8ttXYqhwmTCGTy1D/xf3lPOLmdTTACokvOEyd6VxijaeI2apbV0HYVnca
mPvSdxpKs+xx6nACSuxOG02E6Z4k85/kdNZeCmHEV9KGJACLIpw=
=IDNm
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2023-05-26' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This week's collection is pretty spread out, accel/qaic has a bunch of
fixes, amdgpu, then lots of single fixes across a bunch of places.
core:
- fix drmm_mutex_init lock class
mgag200:
- fix gamma lut initialisation
pl111:
- fix FB depth on IMPD-1 framebuffer
amdgpu:
- Fix missing BO unlocking in KIQ error path
- Avoid spurious secure display error messages
- SMU13 fix
- Fix an OD regression
- GPU reset display IRQ warning fix
- MST fix
radeon:
- Fix a DP regression
i915:
- PIPEDMC disabling fix for bigjoiner config
panel:
- fix aya neo air plus quirk
sched:
- remove redundant NULL check
qaic:
- fix NNC message corruption
- Grab ch_lock during QAIC_ATTACH_SLICE_BO
- Flush the transfer list again
- Validate if BO is sliced before slicing
- Validate user data before grabbing any lock
- initialize ret variable to 0
- silence some uninitialized variable warnings"
* tag 'drm-fixes-2023-05-26' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Have Payload Properly Created After Resume
drm/amd/display: Fix warning in disabling vblank irq
drm/amd/pm: Fix output of pp_od_clk_voltage
drm/amd/pm: add missing NotifyPowerSource message mapping for SMU13.0.7
drm/radeon: reintroduce radeon_dp_work_func content
drm/amdgpu: don't enable secure display on incompatible platforms
drm:amd:amdgpu: Fix missing buffer object unlock in failure path
accel/qaic: Fix NNC message corruption
accel/qaic: Grab ch_lock during QAIC_ATTACH_SLICE_BO
accel/qaic: Flush the transfer list again
accel/qaic: Validate if BO is sliced before slicing
accel/qaic: Validate user data before grabbing any lock
accel/qaic: initialize ret variable to 0
drm/i915: Fix PIPEDMC disabling for a bigjoiner configuration
drm: fix drmm_mutex_init()
drm/sched: Remove redundant check
drm: panel-orientation-quirks: Change Air's quirk to support Air Plus
accel/qaic: silence some uninitialized variable warnings
drm/pl111: Fix FB depth on IMPD-1 framebuffer
drm/mgag200: Fix gamma lut not initialized.
I tried to streamline our user memory copy code fairly aggressively in
commit adfcf4231b8c ("x86: don't use REP_GOOD or ERMS for user memory
copies"), in order to then be able to clean up the code and inline the
modern FSRM case in commit 577e6a7fd50d ("x86: inline the 'rep movs' in
user copies for the FSRM case").
We had reports [1] of that causing regressions earlier with blogbench,
but that turned out to be a horrible benchmark for that case, and not a
sufficient reason for re-instating "rep movsb" on older machines.
However, now Eric Dumazet reported [2] a regression in performance that
seems to be a rather more real benchmark, where due to the removal of
"rep movs" a TCP stream over a 100Gbps network no longer reaches line
speed.
And it turns out that with the simplified the calling convention for the
non-FSRM case in commit 427fda2c8a49 ("x86: improve on the non-rep
'copy_user' function"), re-introducing the ERMS case is actually fairly
simple.
Of course, that "fairly simple" is glossing over several missteps due to
having to fight our assembler alternative code. This code really wanted
to rewrite a conditional branch to have two different targets, but that
made objtool sufficiently unhappy that this instead just ended up doing
a choice between "jump to the unrolled loop, or use 'rep movsb'
directly".
Let's see if somebody finds a case where the kernel memory copies also
care (see commit 68674f94ffc9: "x86: don't use REP_GOOD or ERMS for
small memory copies"). But Eric does argue that the user copies are
special because networking tries to copy up to 32KB at a time, if
order-3 pages allocations are possible.
In-kernel memory copies are typically small, unless they are the special
"copy pages at a time" kind that still use "rep movs".
Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1]
Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2]
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Fixes: adfcf4231b8c ("x86: don't use REP_GOOD or ERMS for user memory copies")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull NVMe fix from Keith:
"nvme fixes for 6.4
One nvme quirk (Tatsuki)"
* tag 'nvme-6.4-2023-05-26' of git://git.infradead.org/nvme:
NVMe: Add MAXIO 1602 to bogus nid list.