IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Similarly to util_avg and util_sum, don't sync load_sum with the low
bound of load_avg but only ensure that load_sum stays in the correct range.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-5-vincent.guittot@linaro.org
Similarly to util_avg and util_sum, don't sync runnable_sum with the low
bound of runnable_avg but only ensure that runnable_sum stays in the
correct range.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-4-vincent.guittot@linaro.org
Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
https://bugzilla.kernel.org/show_bug.cgi?id=215045
He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.
update_tg_cfs_util() is not the only place where we round util_sum and
lost some accumulated contributions that are not already reflected in
util_avg. Modify update_tg_cfs_util() and detach_entity_load_avg() to not
sync util_sum with the new util_avg. Instead of always setting util_sum to
the low bound of util_avg, which can significantly lower the utilization,
we propagate the difference. In addition, we also check that cfs's util_sum
always stays above the lower bound for a given util_avg as it has been
observed that sched_entity's util_sum is sometimes above cfs one.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-3-vincent.guittot@linaro.org
Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
https://bugzilla.kernel.org/show_bug.cgi?id=215045
He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.
Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.
In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.
Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
Reported-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org
With write operation on psi files replacing old trigger with a new one,
the lifetime of its waitqueue is totally arbitrary. Overwriting an
existing trigger causes its waitqueue to be freed and pending poll()
will stumble on trigger->event_wait which was destroyed.
Fix this by disallowing to redefine an existing psi trigger. If a write
operation is used on a file descriptor with an already existing psi
trigger, the operation will fail with EBUSY error.
Also bypass a check for psi_disabled in the psi_trigger_destroy as the
flag can be flipped after the trigger is created, leading to a memory
leak.
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Reported-by: syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Analyzed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com
Time readers that cannot take locks (due to NMI etc..) currently make
use of perf_event::shadow_ctx_time, which, for that event gives:
time' = now + (time - timestamp)
or, alternatively arranged:
time' = time + (now - timestamp)
IOW, the progression of time since the last time the shadow_ctx_time
was updated.
There's problems with this:
A) the shadow_ctx_time is per-event, even though the ctx_time it
reflects is obviously per context. The direct concequence of this
is that the context needs to iterate all events all the time to
keep the shadow_ctx_time in sync.
B) even with the prior point, the context itself might not be active
meaning its time should not advance to begin with.
C) shadow_ctx_time isn't consistently updated when ctx_time is
There are 3 users of this stuff, that suffer differently from this:
- calc_timer_values()
- perf_output_read()
- perf_event_update_userpage() /* A */
- perf_event_read_local() /* A,B */
In particular, perf_output_read() doesn't suffer at all, because it's
sample driven and hence only relevant when the event is actually
running.
This same was supposed to be true for perf_event_update_userpage(),
after all self-monitoring implies the context is active *HOWEVER*, as
per commit f79256532682 ("perf/core: fix userpage->time_enabled of
inactive events") this goes wrong when combined with counter
overcommit, in that case those events that do not get scheduled when
the context becomes active (task events typically) miss out on the
EVENT_TIME update and ENABLED time is inflated (for a little while)
with the time the context was inactive. Once the event gets rotated
in, this gets corrected, leading to a non-monotonic timeflow.
perf_event_read_local() made things even worse, it can request time at
any point, suffering all the problems perf_event_update_userpage()
does and more. Because while perf_event_update_userpage() is limited
by the context being active, perf_event_read_local() users have no
such constraint.
Therefore, completely overhaul things and do away with
perf_event::shadow_ctx_time. Instead have regular context time updates
keep track of this offset directly and provide perf_event_time_now()
to complement perf_event_time().
perf_event_time_now() will, in adition to being context wide, also
take into account if the context is active. For inactive context, it
will not advance time.
This latter property means the cgroup perf_cgroup_info context needs
to grow addition state to track this.
Additionally, since all this is strictly per-cpu, we can use barrier()
to order context activity vs context time.
Fixes: 7d9285e82db5 ("perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf event change needed for subsequent bpf helpers"")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Song Liu <song@kernel.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/YcB06DasOBtU0b00@hirez.programming.kicks-ass.net
Pull module updates from Luis Chamberlain:
"The biggest change here is in-kernel support for module decompression.
This change is being made to help support LSMs like LoadPin as
otherwise it loses link between the source of kernel module on the
disk and binary blob that is being loaded into the kernel.
kmod decompression is still done by userspace even with this is done,
both because there are no measurable gains in not doing so and as it
adds a secondary extra check for validating the module before loading
it into the kernel.
The rest of the changes are minor, the only other change worth
mentionin there is Jessica Yu is now bowing out of maintenance of
modules as she's taking a break from work.
While there were other changes posted for modules, those have not yet
received much review of testing so I'm not yet comfortable in merging
any of those changes yet."
* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: fix signature check failures when using in-kernel decompression
kernel: Fix spelling mistake "compresser" -> "compressor"
MAINTAINERS: add mailing lists for kmod and modules
module.h: allow #define strings to work with MODULE_IMPORT_NS
module: add in-kernel support for decompressing
MAINTAINERS: Remove myself as modules maintainer
module: Remove outdated comment
Pull signal/exit/ptrace updates from Eric Biederman:
"This set of changes deletes some dead code, makes a lot of cleanups
which hopefully make the code easier to follow, and fixes bugs found
along the way.
The end-game which I have not yet reached yet is for fatal signals
that generate coredumps to be short-circuit deliverable from
complete_signal, for force_siginfo_to_task not to require changing
userspace configured signal delivery state, and for the ptrace stops
to always happen in locations where we can guarantee on all
architectures that the all of the registers are saved and available on
the stack.
Removal of profile_task_ext, profile_munmap, and profile_handoff_task
are the big successes for dead code removal this round.
A bunch of small bug fixes are included, as most of the issues
reported were small enough that they would not affect bisection so I
simply added the fixes and did not fold the fixes into the changes
they were fixing.
There was a bug that broke coredumps piped to systemd-coredump. I
dropped the change that caused that bug and replaced it entirely with
something much more restrained. Unfortunately that required some
rebasing.
Some successes after this set of changes: There are few enough calls
to do_exit to audit in a reasonable amount of time. The lifetime of
struct kthread now matches the lifetime of struct task, and the
pointer to struct kthread is no longer stored in set_child_tid. The
flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
removed. Issues where task->exit_code was examined with
signal->group_exit_code should been examined were fixed.
There are several loosely related changes included because I am
cleaning up and if I don't include them they will probably get lost.
The original postings of these changes can be found at:
https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org
https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org
https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org
I trimmed back the last set of changes to only the obviously correct
once. Simply because there was less time for review than I had hoped"
* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
ptrace/m68k: Stop open coding ptrace_report_syscall
ptrace: Remove unused regs argument from ptrace_report_syscall
ptrace: Remove second setting of PT_SEIZED in ptrace_attach
taskstats: Cleanup the use of task->exit_code
exit: Use the correct exit_code in /proc/<pid>/stat
exit: Fix the exit_code for wait_task_zombie
exit: Coredumps reach do_group_exit
exit: Remove profile_handoff_task
exit: Remove profile_task_exit & profile_munmap
signal: clean up kernel-doc comments
signal: Remove the helper signal_group_exit
signal: Rename group_exit_task group_exec_task
coredump: Stop setting signal->group_exit_task
signal: Remove SIGNAL_GROUP_COREDUMP
signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
signal: Make coredump handling explicit in complete_signal
signal: Have prepare_signal detect coredumps using signal->core_state
signal: Have the oom killer detect coredumps using signal->core_state
exit: Move force_uaccess back into do_exit
exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmHhw7oTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXrjSB/979LV4Dn1PMcFYsSdlFEMeHcjzJdw/
kFnLPXMaPJyfg6QPuf83jxzw9uxw8fcePMdVq/FFBtmVV9fJMAv62B8jaGS1p58c
WnAg+7zsTN+xEoJn+tskSSon8BNMWVrl41zP3K4Ged+5j8UEBk62GB8Orz1qkpwL
fTh3/+xAvczJeD4zZb1dAm4WnmcQJ4vhg45p07jX6owvnwQAikMFl45aSW54I5o8
vAxGzFgdsZ2NtExnRNKh3b3DozA8JUE89KckBSZnDtq4rH8Fyy6Wij56Hc6v6Cml
SUohiNbHX7hsNwit/lxL8wuF97IiA0pQSABobEg3rxfTghTUep51LlaN
=/m4A
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- More patches for Hyper-V isolation VM support (Tianyu Lan)
- Bug fixes and clean-up patches from various people
* tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
scsi: storvsc: Fix storvsc_queuecommand() memory leak
x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
Drivers: hv: vmbus: Initialize request offers message for Isolation VM
scsi: storvsc: Fix unsigned comparison to zero
swiotlb: Add CONFIG_HAS_IOMEM check around swiotlb_mem_remap()
x86/hyperv: Fix definition of hv_ghcb_pg variable
Drivers: hv: Fix definition of hypercall input & output arg variables
net: netvsc: Add Isolation VM support for netvsc driver
scsi: storvsc: Add Isolation VM support for storvsc driver
hyper-v: Enable swiotlb bounce buffer for Isolation VM
x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
swiotlb: Add swiotlb bounce buffer remap function for HV IVM
New:
- The Real Time Linux Analysis (RTLA) tool is added to the tools directory.
- Can safely filter on user space pointers with: field.ustring ~ "match-string"
- eprobes can now be filtered like any other event.
- trace_marker(_raw) now uses stream_open() to allow multiple threads to safely
write to it. Note, this could possibly break existing user space, but we will
not know until we hear about it, and then can revert the change if need be.
- New field in events to display when bottom halfs are disabled.
- Sorting of the ftrace functions are now done at compile time instead of
at bootup.
Infrastructure changes to support future efforts:
- Added __rel_loc type for trace events. Similar to __data_loc but the offset
to the dynamic data is based off of the location of the descriptor and not
the beginning of the event. Needed for user defined events.
- Some simplification of event trigger code.
- Make synthetic events process its callback better to not hinder other
event callbacks that are registered. Needed for user defined events.
And other small fixes and clean ups.
-
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYeGvcxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qrZtAP9ICjJxX54MTErElhhUL/NFLV7wqhJi
OIAgmp6jGVRqPAD+JxQtBnGH+3XMd71ioQkTfQ1rp+jBz2ERBj2DmELUAg0=
=zmda
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New:
- The Real Time Linux Analysis (RTLA) tool is added to the tools
directory.
- Can safely filter on user space pointers with: field.ustring ~
"match-string"
- eprobes can now be filtered like any other event.
- trace_marker(_raw) now uses stream_open() to allow multiple threads
to safely write to it. Note, this could possibly break existing
user space, but we will not know until we hear about it, and then
can revert the change if need be.
- New field in events to display when bottom halfs are disabled.
- Sorting of the ftrace functions are now done at compile time
instead of at bootup.
Infrastructure changes to support future efforts:
- Added __rel_loc type for trace events. Similar to __data_loc but
the offset to the dynamic data is based off of the location of the
descriptor and not the beginning of the event. Needed for user
defined events.
- Some simplification of event trigger code.
- Make synthetic events process its callback better to not hinder
other event callbacks that are registered. Needed for user defined
events.
And other small fixes and cleanups"
* tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
tracing: Add ustring operation to filtering string pointers
rtla: Add rtla timerlat hist documentation
rtla: Add rtla timerlat top documentation
rtla: Add rtla timerlat documentation
rtla: Add rtla osnoise hist documentation
rtla: Add rtla osnoise top documentation
rtla: Add rtla osnoise man page
rtla: Add Documentation
rtla/timerlat: Add timerlat hist mode
rtla: Add timerlat tool and timelart top mode
rtla/osnoise: Add the hist mode
rtla/osnoise: Add osnoise top mode
rtla: Add osnoise tool
rtla: Helper functions for rtla
rtla: Real-Time Linux Analysis tool
tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails
tracing: Remove duplicate warnings when calling trace_create_file()
tracing/kprobes: 'nmissed' not showed correctly for kretprobe
tracing: Add test for user space strings when filtering on string pointers
tracing: Have syscall trace events use trace_event_buffer_lock_reserve()
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmHhbrwACgkQUqAMR0iA
lPJkAg//ZSvlc/dB7LdAiVmc2JBnDC49yxn7OguCW6t5T8RmVuaurPa/ZNDcZomw
tbT7QSjndg0LHAGiWRrjGzQZ89D7/cbTzOxz821F0LLFDjiEMbUB+tCYznHud1cP
5+83QMwAkpME+RAj471SAHlhGAGJ3YKgFBbUP9sHSbtQqXzLGHBl4MluMh1wpVPq
K1lyVG8TaOCXhWtpuO8MVrz7Q52qqi/85fHZKbQq9Tm5x+nDxVfGw24QcrkQB/Il
4B2T5UP9rzbHQYemc0hascLMJc1mxcFMtK8WKH48cPEL+kQarvsRn6fvVlCxdNUe
ZgQfhixDQPQyC0/CaN7qM9EOpKbwoY8+Lc36kq06YMAJj/NPa4LE/1j2DyRvaein
lJ2vy/E/TdaR71pApL02P5BeU4XAmhm10D/Qpj0pgRuERYVayNE/7wqzyrae7mqm
8zgpUDftGS6BXnwmRWmgFYI5bZWVfzwGjs+JjKIDXPhWHZInpOCkHGCehvabd+4U
02jwCAmKmpkhAxslm2579CZ0+VFK52z5mUwrWwsxhflUD1q3dNj7CAtPKYLqw5Xe
iz2rrulDlwVll5pdbRf18A/+I2YSdz8pDwCpI6WKggompd6r+du3Y/GTGTBhoRcD
L9Q/rENx1gpJgdZYL5zefKlrjMSQhdwp3CRk34DEronTTkty5mc=
=fTdn
-----END PGP SIGNATURE-----
Merge tag 'livepatching-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Correctly handle kobjects when a livepatch init fails
- Avoid CPU hogging when searching for many livepatched symbols
- Add livepatch API page into documentation
* tag 'livepatching-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
livepatch: Avoid CPU hogging with cond_resched
livepatch: Fix missing unlock on error in klp_enable_patch()
livepatch: Fix kobject refcount bug on klp_init_patch_early failure path
Documentation: livepatch: Add livepatch API page
Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
cpumask_first() is a more effective analogue of 'next' version if n == -1
(which means start == 0). This patch replaces 'next' with 'first' where
things look trivial.
There's no cpumask_first_zero() function, so create it.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory
presented and locked down by memblock allocator. So no pages are added
into buddy of DMA zone. Please check commit f1d4d47c5851 ("x86/setup:
Always reserve the first 1M of RAM").
Then in kdump kernel of x86_64, it always prints below failure message:
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
Call Trace:
dump_stack+0x7f/0xa1
warn_alloc.cold+0x72/0xd6
__alloc_pages_slowpath.constprop.0+0xf29/0xf50
__alloc_pages+0x24d/0x2c0
alloc_page_interleave+0x13/0xb0
atomic_pool_expand+0x118/0x210
__dma_atomic_pool_init+0x45/0x93
dma_atomic_pool_init+0xdb/0x176
do_one_initcall+0x67/0x320
kernel_init_freeable+0x290/0x2dc
kernel_init+0xa/0x111
ret_from_fork+0x22/0x30
Mem-Info:
......
DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
Here, let's check if DMA zone has managed pages, then create
atomic_pool_dma if yes. Otherwise just skip it.
Link: https://lkml.kernel.org/r/20211223094435.248523-3-bhe@redhat.com
Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For embedded systems with low total memory, having to run applications
with relatively large memory requirements, 10% max limitation for
watermark_scale_factor poses an issue of triggering direct reclaim every
time such application is started. This results in slow application
startup times and bad end-user experience.
By increasing watermark_scale_factor max limit we allow vendors more
flexibility to choose the right level of kswapd aggressiveness for their
device and workload requirements.
Link: https://lkml.kernel.org/r/20211124193604.2758863-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Fengfei Xi <xi.fengfei@h3c.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch to add anonymous vma names causes a build failure in some
configurations:
include/linux/mm_types.h: In function 'is_same_vma_anon_name':
include/linux/mm_types.h:924:37: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration]
924 | return name && vma_name && !strcmp(name, vma_name);
| ^~~~~~
include/linux/mm_types.h:22:1: note: 'strcmp' is defined in header '<string.h>'; did you forget to '#include <string.h>'?
This should not really be part of linux/mm_types.h in the first place,
as that header is meant to only contain structure defintions and need a
minimum set of indirect includes itself.
While the header clearly includes more than it should at this point,
let's not make it worse by including string.h as well, which would pull
in the expensive (compile-speed wise) fortify-string logic.
Move the new functions into a separate header that only needs to be
included in a couple of locations.
Link: https://lkml.kernel.org/r/20211207125710.2503446-1-arnd@kernel.org
Fixes: "mm: add a field to store names for private anonymous memory"
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In many userspace applications, and especially in VM based applications
like Android uses heavily, there are multiple different allocators in
use. At a minimum there is libc malloc and the stack, and in many cases
there are libc malloc, the stack, direct syscalls to mmap anonymous
memory, and multiple VM heaps (one for small objects, one for big
objects, etc.). Each of these layers usually has its own tools to
inspect its usage; malloc by compiling a debug version, the VM through
heap inspection tools, and for direct syscalls there is usually no way
to track them.
On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages
mapped in userspace and slice their usage by process, shared (COW) vs.
unique mappings, backing, etc. This can account for real physical
memory usage even in cases like fork without exec (which Android uses
heavily to share as many private COW pages as possible between
processes), Kernel SamePage Merging, and clean zero pages. It produces
a measurement of the pages that only exist in that process (USS, for
unique), and a measurement of the physical memory usage of that process
with the cost of shared pages being evenly split between processes that
share them (PSS).
If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap
walking tool that can understand the heap debugging of every layer, or
for every layer's heap debugging tools to implement the pagemap walking
logic, in which case it is hard to get a consistent view of memory
across the whole system.
Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that somebody
needs to clean up on crashes. It needs to be readable while the process
is still running, so it has to have some sort of synchronization with
every layer of userspace. Efficiently tracking the ranges requires
reimplementing something like the kernel vma trees, and linking to it
from every layer of userspace. It requires more memory, more syscalls,
more runtime cost, and more complexity to separately track regions that
the kernel is already tracking.
This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
userspace-provided name for anonymous vmas. The names of named
anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as
[anon:<name>].
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)
Setting the name to NULL clears it. The name length limit is 80 bytes
including NUL-terminator and is checked to contain only printable ascii
characters (including space), except '[',']','\','$' and '`'.
Ascii strings are being used to have a descriptive identifiers for vmas,
which can be understood by the users reading /proc/pid/maps or
/proc/pid/smaps. Names can be standardized for a given system and they
can include some variable parts such as the name of the allocator or a
library, tid of the thread using it, etc.
The name is stored in a pointer in the shared union in vm_area_struct
that points to a null terminated string. Anonymous vmas with the same
name (equivalent strings) and are otherwise mergeable will be merged.
The name pointers are not shared between vmas even if they contain the
same name. The name pointer is stored in a union with fields that are
only used on file-backed mappings, so it does not increase memory usage.
CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
feature. It keeps the feature disabled by default to prevent any
additional memory overhead and to avoid confusing procfs parsers on
systems which are not ready to support named anonymous vmas.
The patch is based on the original patch developed by Colin Cross, more
specifically on its latest version [1] posted upstream by Sumit Semwal.
It used a userspace pointer to store vma names. In that design, name
pointers could be shared between vmas. However during the last
upstreaming attempt, Kees Cook raised concerns [2] about this approach
and suggested to copy the name into kernel memory space, perform
validity checks [3] and store as a string referenced from
vm_area_struct.
One big concern is about fork() performance which would need to strdup
anonymous vma names. Dave Hansen suggested experimenting with
worst-case scenario of forking a process with 64k vmas having longest
possible names [4]. I ran this experiment on an ARM64 Android device
and recorded a worst-case regression of almost 40% when forking such a
process.
This regression is addressed in the followup patch which replaces the
pointer to a name with a refcounted structure that allows sharing the
name pointer between vmas of the same name. Instead of duplicating the
string during fork() or when splitting a vma it increments the refcount.
[1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
[2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
[3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
[4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/
Changes for prctl(2) manual page (in the options section):
PR_SET_VMA
Sets an attribute specified in arg2 for virtual memory areas
starting from the address specified in arg3 and spanning the
size specified in arg4. arg5 specifies the value of the attribute
to be set. Note that assigning an attribute to a virtual memory
area might prevent it from being merged with adjacent virtual
memory areas due to the difference in that attribute's value.
Currently, arg2 must be one of:
PR_SET_VMA_ANON_NAME
Set a name for anonymous virtual memory areas. arg5 should
be a pointer to a null-terminated string containing the
name. The name length including null byte cannot exceed
80 bytes. If arg5 is NULL, the name of the appropriate
anonymous virtual memory areas will be reset. The name
can contain only printable ascii characters (including
space), except '[',']','\','$' and '`'.
This feature is available only if the kernel is built with
the CONFIG_ANON_VMA_NAME option enabled.
[surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table]
Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com
[surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy,
added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the
work here was done by Colin Cross, therefore, with his permission, keeping
him as the author]
Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com
Signed-off-by: Colin Cross <ccross@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Shaohua Li <shli@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new helper function kthread_run_on_cpu(), which includes
kthread_create_on_cpu/wake_up_process().
In some cases, use kthread_run_on_cpu() directly instead of
kthread_create_on_node/kthread_bind/wake_up_process() or
kthread_create_on_cpu/wake_up_process() or
kthreadd_create/kthread_bind/wake_up_process() to simplify the code.
[akpm@linux-foundation.org: export kthread_create_on_cpu to modules]
Link: https://lkml.kernel.org/r/20211022025711.3673-2-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Cai Huoqing <caihuoqing@baidu.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
task_pt_regs() can return NULL on powerpc for kernel threads. This is
then used in __bpf_get_stack() to check for user mode, resulting in a
kernel oops. Guard against this by checking return value of
task_pt_regs() before trying to obtain the call chain.
Fixes: fa28dcb82a38f8 ("bpf: Introduce helper bpf_get_task_stack()")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d5ef83c361cc255494afd15ff1b4fb02a36e1dcf.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
The new flag MODULE_INIT_COMPRESSED_FILE unintentionally trips check in
module_sig_check(). The check was supposed to catch case when version
info or magic was removed from a signed module, making signature
invalid, but it was coded too broadly and was catching this new flag as
well.
Change the check to only test the 2 particular flags affecting signature
validity.
Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Since referencing user space pointers is special, if the user wants to
filter on a field that is a pointer to user space, then they need to
specify it.
Add a ".ustring" attribute to the field name for filters to state that the
field is pointing to user space such that the kernel can take the
appropriate action to read that pointer.
Link: https://lore.kernel.org/all/yt9d8rvmt2jq.fsf@linux.ibm.com/
Fixes: 77360f9bbc7e ("tracing: Add test for user space strings when filtering on string pointers")
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If start_per_cpu_kthreads() called from osnoise_workload_start() returns
error, event hooks are left in broken state: unhook_irq_events() called
but unhook_thread_events() and unhook_softirq_events() not called, and
trace_osnoise_callback_enabled flag not cleared.
On the next tracer enable, hooks get not installed due to
trace_osnoise_callback_enabled flag.
And on the further tracer disable an attempt to remove non-installed
hooks happened, hitting a WARN_ON_ONCE() in tracepoint_remove_func().
Fix the error path by adding the missing part of cleanup.
While at this, introduce osnoise_unhook_events() to avoid code
duplication between this error path and normal tracer disable.
Link: https://lkml.kernel.org/r/20220109153459.3701773-1-nikita.yushchenko@virtuozzo.com
Cc: stable@vger.kernel.org
Fixes: bce29ac9ce0b ("trace: Add osnoise tracer")
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since the same warning message is already printed in the
trace_create_file() function, there is no need to print it again.
Link: https://lkml.kernel.org/r/20220109162232.361747-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The 'nmissed' column of the 'kprobe_profile' file for kretprobe is
not showed correctly, kretprobe can be skipped by two reasons,
shortage of kretprobe_instance which is counted by tk->rp.nmissed,
and kprobe itself is missed by some reason, so to show the sum.
Link: https://lkml.kernel.org/r/20220107150242.5019-1-xyz.sun.ok@gmail.com
Cc: stable@vger.kernel.org
Fixes: 4a846b443b4e ("tracing/kprobes: Cleanup kprobe tracer code")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Xiangyang Zhang <xyz.sun.ok@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pingfan reported that the following causes a fault:
echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter
echo 1 > events/syscalls/sys_enter_at/enable
The reason is that trace event filter treats the user space pointer
defined by "filename" as a normal pointer to compare against the "cpu"
string. The following bug happened:
kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60
#PF: supervisor read access in kernel mode
#PF: error_code(0x0001) - permissions violation
PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867
Oops: 0001 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:strlen+0x0/0x20
Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11
48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8
48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246
RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000
RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60
RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000
R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380
R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0
FS: 00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
filter_pred_pchar+0x18/0x40
filter_match_preds+0x31/0x70
ftrace_syscall_enter+0x27a/0x2c0
syscall_trace_enter.constprop.0+0x1aa/0x1d0
do_syscall_64+0x16/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fa861d88664
The above happened because the kernel tried to access user space directly
and triggered a "supervisor read access in kernel mode" fault. Worse yet,
the memory could not even be loaded yet, and a SEGFAULT could happen as
well. This could be true for kernel space accessing as well.
To be even more robust, test both kernel and user space strings. If the
string fails to read, then simply have the filter fail.
Note, TASK_SIZE is used to determine if the pointer is user or kernel space
and the appropriate strncpy_from_kernel/user_nofault() function is used to
copy the memory. For some architectures, the compare to TASK_SIZE may always
pick user space or kernel space. If it gets it wrong, the only thing is that
the filter will fail to match. In the future, this needs to be fixed to have
the event denote which should be used. But failing a filter is much better
than panicing the machine, and that can be solved later.
Link: https://lore.kernel.org/all/20220107044951.22080-1-kernelfans@gmail.com/
Link: https://lkml.kernel.org/r/20220110115532.536088fd@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Pingfan Liu <kernelfans@gmail.com>
Tested-by: Pingfan Liu <kernelfans@gmail.com>
Fixes: 87a342f5db69d ("tracing/filters: Support filtering for char * strings")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently, the syscall trace events call trace_buffer_lock_reserve()
directly, which means that it misses out on some of the filtering
optimizations provided by the helper function
trace_event_buffer_lock_reserve(). Have the syscall trace events call that
instead, as it was missed when adding the update to use the temp buffer
when filtering.
Link: https://lkml.kernel.org/r/20220107225839.823118570@goodmis.org
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 0fc1b09ff1ff4 ("tracing: Use temp buffer when filtering events")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Now that ftrace function pointers are sorted at compile time, add a test
that makes sure they are sorted at run time. This test is only run if it is
configured in.
Link: https://lkml.kernel.org/r/20211206151858.4d21a24d@gandalf.local.home
Cc: Yinan Liu <yinan@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When the kernel starts, the initialization of ftrace takes
up a portion of the time (approximately 6~8ms) to sort mcount
addresses. We can save this time by moving mcount-sorting to
compile time.
Link: https://lkml.kernel.org/r/20211212113358.34208-2-yinan@linux.alibaba.com
Signed-off-by: Yinan Liu <yinan@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
kstrndup() is a memory allocation-related function, it returns NULL when
some internal memory errors happen. It is better to check the return
value of it so to catch the memory error in time.
Link: https://lkml.kernel.org/r/tencent_4D6E270731456EB88712ED7F13883C334906@qq.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
kstrdup() returns NULL when some internal memory errors happen, it is
better to check the return value of it so to catch the memory error in
time.
Link: https://lkml.kernel.org/r/tencent_3C2E330722056D7891D2C83F29C802734B06@qq.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Disabling only bottom halves via local_bh_disable() disables also
preemption but this remains invisible to tracing. On a CONFIG_PREEMPT
kernel one might wonder why there is no scheduling happening despite the
N flag in the trace. The reason might be the a rcu_read_lock_bh()
section.
Add a 'b' to the tracing output if in task context with disabled bottom
halves.
Link: https://lkml.kernel.org/r/YbcbtdtC/bjCKo57@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Treewide cleanup and consolidation of MSI interrupt handling in
preparation for further changes in this area which are necessary to:
- address existing shortcomings in the VFIO area
- support the upcoming Interrupt Message Store functionality which
decouples the message store from the PCI config/MMIO space
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf+SETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYobzGD/wNEFl5qQo5mNZ9thP6JSJFOItm7zMc
2QgzCYOqNwAv4jL6Dqo+EHtbShYqDyWzKdKccgqNjmdIqgW8q7/fubN1OPzRsClV
CZG997AsXDGXYlQcE3tXZjkeCWnWEE2AGLnygSkFV1K/r9ALAtFfTBJAWB+UD+Zc
1P8Kxo0q0Jg+DQAMAA5bWfSSjo/Pmpr/1AFjY7+GA8BBeJJgWOyW7H1S+GYEWVOE
RaQP81Sbd6x1JkopxkNqSJ/lbNJfnPJxi2higB56Y0OYn5CuSarYbZUM7oQ2V61t
jN7pcEEvTpjLd6SJ93ry8WOcJVMTbccCklVfD0AfEwwGUGw2VM6fSyNrZfnrosUN
tGBEO8eflBJzGTAwSkz1EhiGKna4o1NBDWpr0sH2iUiZC5G6V2hUDbM+0PQJhDa8
bICwguZElcUUPOprwjS0HXhymnxghTmNHyoEP1yxGoKLTrwIqkH/9KGustWkcBmM
hNtOCwQNqxcOHg/r3MN0KxttTASgoXgNnmFliAWA7XwseRpLWc95XPQFa5sptRhc
EzwumEz17EW1iI5/NyZQcY+jcZ9BdgCqgZ9ECjZkyN4U+9G6iACUkxVaHUUs77jl
a0ISSEHEvJisFOsOMYyFfeWkpIKGIKP/bpLOJEJ6kAdrUWFvlRGF3qlav3JldXQl
ypFjPapDeB5guw==
=vKzd
-----END PGP SIGNATURE-----
Merge tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI irq updates from Thomas Gleixner:
"Rework of the MSI interrupt infrastructure.
This is a treewide cleanup and consolidation of MSI interrupt handling
in preparation for further changes in this area which are necessary
to:
- address existing shortcomings in the VFIO area
- support the upcoming Interrupt Message Store functionality which
decouples the message store from the PCI config/MMIO space"
* tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
genirq/msi: Populate sysfs entry only once
PCI/MSI: Unbreak pci_irq_get_affinity()
genirq/msi: Convert storage to xarray
genirq/msi: Simplify sysfs handling
genirq/msi: Add abuse prevention comment to msi header
genirq/msi: Mop up old interfaces
genirq/msi: Convert to new functions
genirq/msi: Make interrupt allocation less convoluted
platform-msi: Simplify platform device MSI code
platform-msi: Let core code handle MSI descriptors
bus: fsl-mc-msi: Simplify MSI descriptor handling
soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs()
soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation
NTB/msi: Convert to msi_on_each_desc()
PCI: hv: Rework MSI handling
powerpc/mpic_u3msi: Use msi_for_each-desc()
powerpc/fsl_msi: Use msi_for_each_desc()
powerpc/pasemi/msi: Convert to msi_on_each_dec()
powerpc/cell/axon_msi: Convert to msi_on_each_desc()
powerpc/4xx/hsta: Rework MSI handling
...
Core:
- Make the clocksource watchdog more robust by better validation checks
of the measurement.
Drivers:
- New drivers for MStar and SSD20xd SOCs
- The usual cleanups and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf+n0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQ2lD/9WCp+fGTmOt5zb8dOyuyLFjDljStPZ
zNi4d4Iu3gcBIRcjACtbSI2rAPK5gQyM38c9nlmtFv3zihfmz5bQkMTQ1N7O84Nu
c1iEuTW69l/ZvykSJWApsGIY8zgA41efoLYzhg/dCpQGE2fINiRDyU5ZxbJXmwMW
ipjBCf3F9/WLWoTgvl3cTayd/l+7fnpeM6w9MfujHLyCXCwz484KW/7UIMkTCcxF
b7Y3bTLxP4a/iT/ltFDqvLUjUuJWdmCh6gihcEL+9PD/h6KmQnND+p9KB7tbMRy/
DUOBTCi5gY66RQeGRJPVe+Cx/Wi+8vCiyfXUuSoQGqE39HVYOUzMwWOjOncjLad4
fXSzzCIKRwsB3qKw+2GnDeEx1hIw1/K88V2tA+OgQjdWIginOClzy0jb0dkBRbo5
H1U6mPxb+CTKAl1hXAkfDDCenLTiiGBFbvJUydiJYMcFEZYM166e/jA53xIKHNAz
WEphVRAPA269uIxYBXJU7pA6M5bYqbHhhmrxyWOBbhhZGGj3x685PA1wioeNayMp
SMA7s7kZaOBDuTtjRY/dFDkd/27HKWDkxjZCbbslRRKKO0Zz7qixzspV5LETnABO
NzR5TcNimCyvfKEzSG1PFmzx9P/cnspyLvWj560xL0Z9x1MnsHtiUpibJ8a/Gb45
riPKWGedog8BgQ==
=7vCU
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for the time(r) subsystem:
Core:
- Make the clocksource watchdog more robust by better validation
checks of the measurement.
Drivers:
- New drivers for MStar and SSD20xd SOCs
- The usual cleanups and improvements all over the place"
* tag 'timers-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings: timer: Add Mstar MSC313e timer devicetree bindings documentation
clocksource/drivers/msc313e: Add support for ssd20xd-based platforms
clocksource/drivers: Add MStar MSC313e timer support
clocksource/drivers/pistachio: Fix -Wunused-but-set-variable warning
clocksource/drivers/timer-imx-sysctr: Set cpumask to cpu_possible_mask
clocksource/drivers/imx-sysctr: Mark two variable with __ro_after_init
clocksource/drivers/renesas,ostm: Make RENESAS_OSTM symbol visible
clocksource/drivers/renesas-ostm: Add RZ/G2L OSTM support
dt-bindings: timer: renesas: ostm: Document Renesas RZ/G2L OSTM
clocksource/drivers/exynos_mct: Fix silly typo resulting in checkpatch warning
clocksource: Reduce the default clocksource_watchdog() retries to 2
clocksource: Avoid accidental unstable marking of clocksources
dt-bindings: timer: tpm-timer: Add imx8ulp compatible string
reset: Add of_reset_control_get_optional_exclusive()
clocksource/drivers/exynos_mct: Refactor resources allocation
dt-bindings: timer: remove rockchip,rk3066-timer compatible string from rockchip,rk-timer.yaml
dt-bindings: timer: cadence_ttc: Add power-domains
Core:
- Provide a new interface for affinity hints to provide a separation
between hint and actual affinity change which has become a hidden
property of the current interface
- Fix up the in tree usage of the affinity hint interfaces
Drivers:
- No new irqchip drivers!
- Fix GICv3 redistributor table reservation with RT across kexec
- Fix GICv4.1 redistributor view of the VPE table across kexec
- Add support for extra interrupts on spear-shirq
- Make obtaining some interrupts optional for the Renesas drivers
- Various cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf9v0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRK6D/9bQmyITmJ4KLn0HZ1DsvkuR/GB7I8v
yTF99FxIi/F0jlJ7+87Hdm68cfYPMahpiHqSlsf/QE2kkuWYDJmMaPUao14XMdG6
jxrJ1OZtZXeDXyAWkB/gjmiuqyW/e/Myndg0UNUrJ66GqKfxfxtz1/4GfLjgDpIu
TfZQdojvo6T7NTVnU8aAkgKUhM2jL/HxPiR3VUJ+VneSfwKLHzr3+lTY9zkSvJ8s
ATqqGn6+GugJmDWaCI13IJcmBhPU/Gvs+Eqnwz7Xez/6wJftYvJh7vGec3ixS9pw
skjPDnwuHcPl+h0mYMv7ySN7WuqTr0iqYIepdvLUfq6D1WjnHvF5XNcV4W7EzPJN
B/pBosJ97ZAiHgrWsb35/S3bJ0mnB3Ib4WOOIcnRM36JUdNZrnKJntCsyrrmUsYA
s6J1og9Ut7it+F9OFvsuZ2pUv25U8BlzhgfJen8Z0fzV1/2f5LQN0gQGVxqVpwkg
3Cmd5Rmy5h2vlcKKHklLxIP24+UMIb2WyhsDiZ/qYH3zSFFnQPUJ6fvmZIxN/fPx
exU5O8kgsXSwauXWHJJBb+qhKNcUNvUwKGHNMAvM9mh1xytU6ZowjTqqOlCfBWlg
dRXT2xI0ex7liXek6yXa4lN1tabIdnvmYTmueUoFiOCqbUPBO8LTutjdehsUMa4d
xV0a8WEzuk9Q/A==
=myJA
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Provide a new interface for affinity hints to provide a separation
between hint and actual affinity change which has become a hidden
property of the current interface
- Fix up the in tree usage of the affinity hint interfaces
Drivers:
- No new irqchip drivers!
- Fix GICv3 redistributor table reservation with RT across kexec
- Fix GICv4.1 redistributor view of the VPE table across kexec
- Add support for extra interrupts on spear-shirq
- Make obtaining some interrupts optional for the Renesas drivers
- Various cleanups and bug fixes"
* tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
irqchip/renesas-intc-irqpin: Use platform_get_irq_optional() to get the interrupt
irqchip/renesas-irqc: Use platform_get_irq_optional() to get the interrupt
irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time
irqchip/ingenic-tcu: Use correctly sized arguments for bit field
irqchip/gic-v2m: Add const to of_device_id
irqchip/imx-gpcv2: Mark imx_gpcv2_instance with __ro_after_init
irqchip/spear-shirq: Add support for IRQ 0..6
irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime
irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve
irqchip/gic-v3-its: Give the percpu rdist struct its own flags field
net/mlx4: Use irq_update_affinity_hint()
net/mlx5: Use irq_set_affinity_and_hint()
hinic: Use irq_set_affinity_and_hint()
scsi: lpfc: Use irq_set_affinity()
mailbox: Use irq_update_affinity_hint()
ixgbe: Use irq_update_affinity_hint()
be2net: Use irq_update_affinity_hint()
enic: Use irq_update_affinity_hint()
RDMA/irdma: Use irq_update_affinity_hint()
scsi: mpt3sas: Use irq_set_affinity_and_hint()
...
There is a spelling mistake in a pr_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
"Cleanup of the perf/kvm interaction."
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvbkACgkQEsHwGGHe
VUrX7w/9FwKUm0WlGcQIAOSdWk85N2qAVH3brYcQHNpTCVe68TOqTCrxCDrGgyUq
2XnCOim99MUlnsVU6QRZqF4yJ8S1tGrc0COJ/qR4SGntucu0oYuDe2aMVq+mWUD7
/IThA0oMRfhki9WwAyUuyCrXzk4blZdlrXyYIRMJGl9xeGNy3cvUtU8f68Kiy22E
OcmQ/o9Etsr38dueAMU1KYEmgSTvG47rS8nfyRUu3QpJHbyLmRXH32PQrm3tduxS
Bw3gMAH5vqq1UDZJ8ZvsPsO0vFX7dtnKEwEKz4qdtRWk9gi8oLGHIwIXC+VtNqpf
mCmX33Jw8uFz9h3JhE84J0j/CgsWHoU6MOs0MOch4Tb69/BfCjQnw1enImhejG8q
YEIDjJf/vgRNaw9PYshiTHT+EJTe9inT3S4eK/ynLRDUEslAqyWZZm7bUE/XrEDi
yRyGIxry/hNZVvRkXT9QBw32fpgnIH2NAMPLEjJSGCRxT89Tfqz0aRDfacCuHTTh
P8pAeiDuy/6RkDlQckOZJWOFFh2IHsykX2l3IJcHqVRqt4ob9b+SZB5qoH/Mv9qb
MSAqdFUupYZFC+6XuPAeX5/Mo+wSkP+pYYSbWNxjUa0yNiYecOjE7/8T2SB2y6Mx
lk2L0ypsZUYSmpHSfvOdPmf6ucj19/5B4+VCX6PQfcNJTnvvhTE=
=tU5G
-----END PGP SIGNATURE-----
Merge tag 'perf_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Borislav Petkov:
"Cleanup of the perf/kvm interaction."
* tag 'perf_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Drop guest callback (un)register stubs
KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c
KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y
KVM: arm64: Convert to the generic perf callbacks
KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
KVM: Move x86's perf guest info callbacks to generic KVM
KVM: x86: More precisely identify NMI from guest when handling PMI
KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable
perf/core: Use static_call to optimize perf_guest_info_callbacks
perf: Force architectures to opt-in to guest callbacks
perf: Add wrappers for invoking guest callbacks
perf/core: Rework guest callbacks to prepare for static_call support
perf: Drop dead and useless guest "support" from arm, csky, nds32 and riscv
perf: Stop pretending that perf can handle multiple guest callbacks
KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest
KVM: x86: Register perf callbacks after calling vendor's hardware_setup()
perf: Protect perf_guest_cbs with RCU
The commit 1f1562fcd04a ("cgroup/cpuset: Don't let child cpusets
restrict parent in default hierarchy") inteded to relax the check only
on the default hierarchy (or v2 mode) but it dropped the check in v1
too.
This patch returns and separates the legacy-only validations so that
they can be considered only in the v1 mode, which should enforce the old
constraints for the sake of compatibility.
Fixes: 1f1562fcd04a ("cgroup/cpuset: Don't let child cpusets restrict parent in default hierarchy")
Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Here is the set of changes for the driver core for 5.17-rc1.
Lots of little things here, including:
- kobj_type cleanups
- auxiliary_bus documentation updates
- auxiliary_device conversions for some drivers (relevant
subsystems all have provided acks for these)
- kernfs lock contention reduction for some workloads
- other tiny cleanups and changes.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYd7deA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ym8ngCgw0ANwrRPE5b1dthEmfU2f8Knk5kAn0pHQv6R
VRZJypgNfU/Pt0ykstZD
=CO9J
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of changes for the driver core for 5.17-rc1.
Lots of little things here, including:
- kobj_type cleanups
- auxiliary_bus documentation updates
- auxiliary_device conversions for some drivers (relevant subsystems
all have provided acks for these)
- kernfs lock contention reduction for some workloads
- other tiny cleanups and changes.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits)
kobject documentation: remove default_attrs information
drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb
debugfs: lockdown: Allow reading debugfs files that are not world readable
driver core: Make bus notifiers in right order in really_probe()
driver core: Move driver_sysfs_remove() after driver_sysfs_add()
firmware: edd: remove empty default_attrs array
firmware: dmi-sysfs: use default_groups in kobj_type
qemu_fw_cfg: use default_groups in kobj_type
firmware: memmap: use default_groups in kobj_type
sh: sq: use default_groups in kobj_type
headers/uninline: Uninline single-use function: kobject_has_children()
devtmpfs: mount with noexec and nosuid
driver core: Simplify async probe test code by using ktime_ms_delta()
nilfs2: use default_groups in kobj_type
kobject: remove kset from struct kset_uevent_ops callbacks
driver core: make kobj_type constant.
driver core: platform: document registration-failure requirement
vdpa/mlx5: Use auxiliary_device driver data helpers
net/mlx5e: Use auxiliary_device driver data helpers
soundwire: intel: Use auxiliary_device driver data helpers
...
- refactor the dma-direct coherent allocator
- turn an macro into an inline in scatterlist.h (Logan Gunthorpe)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmHcfpILHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYP0MA/+N8Vs2ZgemWunnTyZxCOlia+bq9QPXOMlM+GLLYwQ
pMizhYXKmiM6HHkLDpVOtzOQTBztqBmWzt7a5z/xlwtS3SIT4ktS1MkrPwW56l3F
Ml8u1OLv3rTXF/6K9PHU1XvXBIdkFkAnoVLPn1k6V9C8MiEUzo43Nx01p5ooNq/u
+cptbN119BKQ19EAyOWM5QC9hhQoUNRTeWYd9AQjxTMC6mLPF1VKD1KjUM7gwNB5
TxqGReYTJNNXbEsN+s6JvkOPId8Ps9JpwSys6Kx9lcXYqNchTCpo+HApvV1hcQrm
hWBY2u94TPRuBIIp6xe3JBAPhbRTwF5gLEOwJxS+UTj4vcioKUdYnRxEt+JfD5AA
XjlFyAbCwOPw4qXe00+EefQpuSim1nkUb0FXUJC/zRRJjnjmPxxsnZEnLaxlmyOd
FlYwyY1x/TqiFr6ZfVr67msnPFy4tRRyXa/W9zcXKwPgI6/3nbYjkpbnK/RocZx+
4vo8LK0zaHEstrCcCbAwRZk5YgOHRFzrnDhRYAifqJqzlvKqEeF9S8hIRSypcIyG
GD4ViVK4RJQwtKG3XfG8GPhKLTKoL1NQgbNjszSgc1wIt5E+CJ5PUK2TuBVc9I85
xScxWeGVfgV7a8hDZ+fyc07VV+1El7Bk3qN88Uassu1HhMbcOJeVJ2dV7u/VJRf5
vIQ=
=YkF5
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.17' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- refactor the dma-direct coherent allocator
- turn an macro into an inline in scatterlist.h (Logan Gunthorpe)
* tag 'dma-mapping-5.17' of git://git.infradead.org/users/hch/dma-mapping:
lib/scatterlist: cleanup macros into static inline functions
dma-direct: add a dma_direct_use_pool helper
dma-direct: factor the swiotlb code out of __dma_direct_alloc_pages
dma-direct: drop two CONFIG_DMA_RESTRICTED_POOL conditionals
dma-direct: warn if there is no pool for force unencrypted allocations
dma-direct: fail allocations that can't be made coherent
dma-direct: refactor the !coherent checks in dma_direct_alloc
dma-direct: factor out a helper for DMA_ATTR_NO_KERNEL_MAPPING allocations
dma-direct: clean up the remapping checks in dma_direct_alloc
dma-direct: always leak memory that can't be re-encrypted
dma-direct: don't call dma_set_decrypted for remapped allocations
dma-direct: factor out dma_set_{de,en}crypted helpers
Current scheme of having userspace decompress kernel modules before
loading them into the kernel runs afoul of LoadPin security policy, as
it loses link between the source of kernel module on the disk and binary
blob that is being loaded into the kernel. To solve this issue let's
implement decompression in kernel, so that we can pass a file descriptor
of compressed module file into finit_module() which will keep LoadPin
happy.
To let userspace know what compression/decompression scheme kernel
supports it will create /sys/module/compression attribute. kmod can read
this attribute and decide if it can pass compressed file to
finit_module(). New MODULE_INIT_COMPRESSED_DATA flag indicates that the
kernel should attempt to decompress the data read from file descriptor
prior to trying load the module.
To simplify things kernel will only implement single decompression
method matching compression method selected when generating modules.
This patch implements gzip and xz; more can be added later,
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Since commit e513cc1c07e2 ("module: Remove stop_machine from module
unloading") this comment is no longer correct. Remove it.
Signed-off-by: Yu Chen <chen.yu@easystack.cn>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
"Lots of cleanups and preparation; highlights:
- futex: Cleanup and remove runtime futex_cmpxchg detection
- rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
- kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
annotate the racy owner->on_cpu access *once*.
- atomic64: Dead-Code-Elemination"
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvssACgkQEsHwGGHe
VUrbBg//VQvz5BwddIJDj9utt5AvSixNcTF5mJyFKCSIqO0S4J8nCNcvJjZ2bs4S
w1YmInFbp0WFGUhaIZiw0e6KWJUoINTng4MfHDZosS1doT2of53ZaQqXs3i81jDz
87w8ADVHL0x4+BNjdsIwbcuPSDTmJFoyFOdeXTIl9hv9ZULT8m4Mt+LJuUHNZ+vF
rS1jyseVPWkcm5y+Yie0rhip+ygzbfbt0ArsLfRcrBJsKr6oxLxV2DDF+2djXuuP
d2OgGT7VkbgAhoKpzVXUiHsT6ppR5Mn5TLSa4EZ4bPPCUFldOhKuCAImF3T6yVIa
44iX5vQN9v5VHBy6ocPbdOIBuYBYVGCMurh1t7pbpB6G+mmSxMiyta5MY37POwjv
K2JT9mC2A6a4d17gue5FT3mnJMBB4eHwVaDfAwCZs/5rRNuoTz4aY5Xy04Mq0ltI
39uarwBd5hwSugBWg44AS5E9h52E654FQ7g6iS4NtUvJuuaXBTl43EcZWx2+mnPL
zY+iOMVMgg33VIVcm/mlf/6zWL0LXPmILUiA1fp4Q9/n8u1EuOOyeA/GsC9Pl3wO
HY3KpYJA5eQpIk/JEnzKm5ZE3pCrUdH6VDC/SB4owQtafQG6OxyQVP1Gj7KYxZsD
NqqpJ4nkKooc5f5DqVEN8wrjyYsnVxEfriEG09OoR6wI3MqyUA4=
=vrYy
-----END PGP SIGNATURE-----
Merge tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Borislav Petkov:
"Lots of cleanups and preparation. Highlights:
- futex: Cleanup and remove runtime futex_cmpxchg detection
- rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
- kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
annotate the racy owner->on_cpu access *once*.
- atomic64: Dead-Code-Elemination"
[ Description above by Peter Zijlstra ]
* tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/atomic: atomic64: Remove unusable atomic ops
futex: Fix additional regressions
locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
x86/mm: Include spinlock_t definition in pgtable.
locking: Mark racy reads of owner->on_cpu
locking: Make owner_on_cpu() into <linux/sched.h>
lockdep/selftests: Adapt ww-tests for PREEMPT_RT
lockdep/selftests: Skip the softirq related tests on PREEMPT_RT
lockdep/selftests: Unbalanced migrate_disable() & rcu_read_lock().
lockdep/selftests: Avoid using local_lock_{acquire|release}().
lockdep: Remove softirq accounting on PREEMPT_RT.
locking/rtmutex: Add rt_mutex_lock_nest_lock() and rt_mutex_lock_killable().
locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
locking: Remove rt_rwlock_is_contended().
sched: Trigger warning if ->migration_disabled counter underflows.
futex: Fix sparc32/m68k/nds32 build regression
futex: Remove futex_cmpxchg detection
futex: Ensure futex_atomic_cmpxchg_inatomic() is present
kernel/locking: Use a pointer in ww_mutex_trylock().
"Mostly minor things this time; some highlights:
- core-sched: Add 'Forced Idle' accounting; this allows to track how
much CPU time is 'lost' due to core scheduling constraints.
- psi: Fix for MEM_FULL; a task running reclaim would be counted as a
runnable task and prevent MEM_FULL from being reported.
- cpuacct: Long standing fixes for some cgroup accounting issues.
- rt: Bandwidth timer could, under unusual circumstances, be failed to
armed, leading to indefinite throttling."
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvGkACgkQEsHwGGHe
VUq3tQ/9GdaCpbo+WgtM20vo3FqzoRCWAtZZRLWm87g9G7FKE6tD1JCZ+cXn63jR
wz4nuTMGg0lHkrmMiHoeTWoRo7Brw3vPdKTbFBxRaPS3gi3qyz8gaDHSKzAHTJSx
L3j5XaTLcZnXwXV0MOphbK8ZD2W0f9PJZJjwYy1HFUrXh1AFT0WaMXL3aXuaZr8M
jYZoB8r5qXsTBgzNZR8unq5bSUXgvoDAqupFU8gvQWYvNFV4NGK9WFQLlznG1ZhE
aE7oHRbpCnb4avbv9xIm/QgLEHeCVTb/4kLBPk57nrW+aXTHX4ZTHuFtFs0nfDHS
yHSgie3hthr5lFQ/c2G4a5bi5EfPcyURmgNHpWrs2zWWtWzVtqy1WAQ//m8twd14
9cMeefQzttPUbOjykj5QNCJPqkkGgKlblz3p9j8NwUBYUBtBIejsEP0UFPoVgZuL
DjeGhPuGGeTqkVEhLD/pb9kSzUsi1ptTJtnzT9EvtBOi+EpnZnFC6jB98qcuRT19
jhlXwlFNH+SNnMrCniTjLhQK5gVEbvzbU86/nj9CHWDTNdu6DFeJv1S+ZBsjRHUe
f8dV9+laXdLK5QJKAeAubq8ciMvacW8pTf/5PJfaFCJHHDs8rgmx/Ip6TxCZzVEG
XEhNqOmMNnvbkj+9a1yk6SyD9QkVmitZrvRiqeoGayQMjsphT3E=
=H0vR
-----END PGP SIGNATURE-----
Merge tag 'sched_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Borislav Petkov:
"Mostly minor things this time; some highlights:
- core-sched: Add 'Forced Idle' accounting; this allows to track how
much CPU time is 'lost' due to core scheduling constraints.
- psi: Fix for MEM_FULL; a task running reclaim would be counted as a
runnable task and prevent MEM_FULL from being reported.
- cpuacct: Long standing fixes for some cgroup accounting issues.
- rt: Bandwidth timer could, under unusual circumstances, be failed
to armed, leading to indefinite throttling."
[ Description above by Peter Zijlstra ]
* tag 'sched_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Replace CFS internal cpu_util() with cpu_util_cfs()
sched/fair: Cleanup task_util and capacity type
sched/rt: Try to restart rt period timer when rt runtime exceeded
sched/fair: Document the slow path and fast path in select_task_rq_fair
sched/fair: Fix per-CPU kthread and wakee stacking for asym CPU capacity
sched/fair: Fix detection of per-CPU kthreads waking a task
sched/cpuacct: Make user/system times in cpuacct.stat more precise
sched/cpuacct: Fix user/system in shown cpuacct.usage*
cpuacct: Convert BUG_ON() to WARN_ON_ONCE()
cputime, cpuacct: Include guest time in user time in cpuacct.stat
psi: Fix PSI_MEM_FULL state when tasks are in memstall and doing reclaim
sched/core: Forced idle accounting
psi: Add a missing SPDX license header
psi: Remove repeated verbose comment