IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
All pm_power_off_prepare() users were converted to sys-off handler API.
Remove the obsolete global callback variable.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add platform-level registration helpers that will ease transition of the
arch/platform power-off callbacks to the new sys-off based API, allowing
us to remove the global pm_power_off variable in the future.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add kernel_can_power_off() helper that replaces open-coded checks of
the global pm_power_off variable. This is a necessary step towards
supporting chained power-off handlers.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add weak stub for the global pm_power_off callback variable. This will
allow us to remove pm_power_off definitions from arch/ code and transition
to the new sys-off based API that will replace the global variable.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add do_kernel_power_off() helper that will remove open-coded pm_power_off
invocations from the architecture code. This is the first step on the way
to remove the global pm_power_off variable, which will allow us to
implement consistent power-off chaining support.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Wrap legacy power-off callbacks into sys-off handlers in order to
support co-existence of both legacy and new callbacks while we're
in process of upgrading legacy callbacks to the new API.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In order to support power-off chaining we need to get rid of the global
pm_* variables, replacing them with the new kernel API functions that
support chaining.
Introduce new generic sys-off handler API that brings the following
features:
1. Power-off and restart handlers are registered using same API function
that supports chaining, hence all power-off and restart modes will
support chaining using this unified function.
2. Prevents notifier priority collisions by disallowing registration of
multiple handlers at the non-default priority level.
3. Supports passing opaque user argument to callback, which allows us to
remove global variables from drivers.
This patch adds support of the following sys-off modes:
- SYS_OFF_MODE_POWER_OFF_PREPARE that replaces global pm_power_off_prepare
variable and provides chaining support for power-off-prepare handlers.
- SYS_OFF_MODE_POWER_OFF that replaces global pm_power_off variable and
provides chaining support for power-off handlers.
- SYS_OFF_MODE_RESTART that provides a better restart API, removing a need
from drivers to have a global scratch variable by utilizing the opaque
callback argument.
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add variant of blocking/atomic_notifier_chain_register() functions that
allow registration of a notifier only if it has unique priority, otherwise
-EBUSY error code is returned by the new functions.
Reviewed-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add atomic_notifier_call_chain_is_empty() that returns true if given
atomic call chain is empty.
The first user of this new notifier API function will be the kernel
power-off core code that will support power-off call chains. The core
code will need to check whether there is a power-off handler registered
at all in order to decide whether to halt machine or power it off.
Reviewed-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make cgroup_debug static since it's only used in cgroup.c
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
random32.c has two random number generators in it: one that is meant to
be used deterministically, with some predefined seed, and one that does
the same exact thing as random.c, except does it poorly. The first one
has some use cases. The second one no longer does and can be replaced
with calls to random.c's proper random number generator.
The relatively recent siphash-based bad random32.c code was added in
response to concerns that the prior random32.c was too deterministic.
Out of fears that random.c was (at the time) too slow, this code was
anonymously contributed. Then out of that emerged a kind of shadow
entropy gathering system, with its own tentacles throughout various net
code, added willy nilly.
Stop👏making👏bespoke👏random👏number👏generators👏.
Fortunately, recent advances in random.c mean that we can stop playing
with this sketchiness, and just use get_random_u32(), which is now fast
enough. In micro benchmarks using RDPMC, I'm seeing the same median
cycle count between the two functions, with the mean being _slightly_
higher due to batches refilling (which we can optimize further need be).
However, when doing *real* benchmarks of the net functions that actually
use these random numbers, the mean cycles actually *decreased* slightly
(with the median still staying the same), likely because the additional
prandom code means icache misses and complexity, whereas random.c is
generally already being used by something else nearby.
The biggest benefit of this is that there are many users of prandom who
probably should be using cryptographically secure random numbers. This
makes all of those accidental cases become secure by just flipping a
switch. Later on, we can do a tree-wide cleanup to remove the static
inline wrapper functions that this commit adds.
There are also some low-ish hanging fruits for making this even faster
in the future: a get_random_u16() function for use in the networking
stack will give a 2x performance boost there, using SIMD for ChaCha20
will let us compute 4 or 8 or 16 blocks of output in parallel, instead
of just one, giving us large buffers for cheap, and introducing a
get_random_*_bh() function that assumes irqs are already disabled will
shave off a few cycles for ordinary calls. These are things we can chip
away at down the road.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:
WARN_ON(context->context != AUDIT_CTX_UNUSED);
WARN_ON(context->name_count);
if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
audit_panic("unrecoverable error in audit_syscall_entry()");
return;
}
These problematic dummy contexts are created via the following call
chain:
exit_to_user_mode_prepare
-> arch_do_signal_or_restart
-> get_signal
-> task_work_run
-> tctx_task_work
-> io_req_task_submit
-> io_issue_sqe
-> audit_uring_entry
Cc: stable@vger.kernel.org
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Julian Orth <ju.orth@gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
swiotlb_find_slots() skips slots according to io tlb aligned mask
calculated from min aligned mask and original physical address
offset. This affects max mapping size. The mapping size can't
achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is
non-zero. This will cause system boot up failure in Hyper-V
Isolation VM where swiotlb force is enabled. Scsi layer use return
value of dma_max_mapping_size() to set max segment size and it
finally calls swiotlb_max_mapping_size(). Hyper-V storage driver
sets min align mask to 4k - 1. Scsi layer may pass 256k length of
request buffer with 0~4k offset and Hyper-V storage driver can't
get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't
find 256k length bounce buffer with offset. Make swiotlb_max_mapping
_size() take min align mask into account.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Use the newly added suite_{init,exit} support for suite-wide init and
cleanup. This avoids the unsupported method by which the test used to do
suite-wide init and cleanup (avoiding issues such as missing TAP
headers, and possible future conflicts).
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
in the middle of the arguments. This reordering broke BPF programs which
relied on the old argument list. While tracepoints are not considered
stable ABI, it's not trivial to make BPF cope with such a change, but it's
being worked on. For now restore the original argument order and move the
new argument to the end of the argument list.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKAxKQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoThID/4jp/8GiKsf1jPgKkU39Yw7qAePzObQ
V9K2XLxSwH27D+UpmOPODnckzHJMtX0M4Z+sGMgGSPe/IvOVj+NEmUiQGU29sDwg
T7If2FSHMutPCB9QL26kxjmebU+SdllRwrJylOA1ZNduunczxKlpATJ5vneCC/Qt
D5VpB3XlwT31pd9UdoW/kV5uQK6bFR7qREWXhONZ+HyzsKJdV0vGe2ZX6U7ek2/d
XJxETE1eXlsMr+2VY5lkxhr596uPJgDAM9g+OknO/Lal/I7WoUchDN2giItzn6RY
XWxPK85mE59MwTa6PQCJcO8A7r2KcHfGrbFVjA9h1jhREtsZigb9ZemDgQ+s8goT
znIIlTO2l7ed2VDMU/mt3zZuS0rMshn/8Axk+AN3N6gKffV6F4q0BpZUUccGe+FM
tfQ34YGmMKx6uuyHPPZCQd1buJuDuXNyZF7XFO3uxv9BGt3x42aswAbx1zYIV+ZR
Uj/Vnojoc1aBdffVSUL0he+vjutYixx4gb8nh0ZFa5FTe70XDvPGTUTTOSW6BOq0
yiFOWtG8MbziVBDE2iKmfUMT+dPQd0+PW8szk8J9yOJyOnTu9y6KkyWl2JRllSxT
Qv7icnMN5P1xqN/c4P+8Iq0CrVItyxMJ0Ouc29tsNPHYkzsBo4c0XAn94mib1O17
zyJYW0F9UVHOSg==
=6Bvx
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"The recent expansion of the sched switch tracepoint inserted a new
argument in the middle of the arguments. This reordering broke BPF
programs which relied on the old argument list.
While tracepoints are not considered stable ABI, it's not trivial to
make BPF cope with such a change, but it's being worked on. For now
restore the original argument order and move the new argument to the
end of the argument list"
* tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/tracing: Append prev_state to tp args instead
interrupt code. The consolidation of the interrupt handler invocation code
added an unconditional warning when generic_handle_domain_irq() is invoked
from outside hard interrupt context. That's overbroad as the requirement
for invoking these handlers in hard interrupt context is only required for
certain interrupt types. The subsequently called code already contains a
warning which triggers conditionally for interrupt chips which indicate
this requirement in their properties. Remove the overbroad one.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKAwyATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYod4+D/9KrvIKGNSRKThw1zx4w1FzeOtRjhiT
HdwiNENKUhClWQHTzfv1xHOEv1hFVTuuz5tP2zIfzkKrDe1/dijZY9P/QvdEhp+s
idRzxaclWlxsxv4K8zqD1i/0klQ91YBA5aADgn4t1vY4WdWtJpbkFW8tndoAUAZR
THrFBGrvBdhjsSwK5VVfZcwNNeIh0lGG83vE8zPnzI7fbNxuAa1pI9bAigSa9jIT
zYcMm+mmC7eIdjeLD/Vx5Rujn3/MOLfmAfv9TwNIH2heQo6RwtINt0mzuDqKibOh
ly6T1Ol12WQuOLy5dYHglvogAzhJP49RbsQHCxU9S7BaWqcVfHuN88WhU/JXgfHn
UGdE3ppJpNHk/IqGSUyilDUzXgR9YH3j+XOYNnG2PidDWl5aPwuU1h9L7wdJnDZy
5Ou6JVmQjYc2+A6YeCZsNl+FdyvWpH+Gc/oGi09Saf1kCFuAVW11mkhRFHawWfHW
SZRpbSWxE+v0QFDd6T+IajSEwifw4+Ua8yjxRUU1dpsTcxHdFxGBlFFIebeYXlzJ
Xx2fASyCdlMzlEj7qegU2Y67yn0+yQjziZLaOCMtDtbWFO9APV447lEb5FcImqgi
XTT2HHw5sPZpLLoCED2zRoAsrh+aK9rJyH9pWEoRYvxVgmO613Qkw8GVJSmm8mO+
tZraqHFkoTuxRg==
=pJj8
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a recent (introduced in 5.16) regression in the core
interrupt code.
The consolidation of the interrupt handler invocation code added an
unconditional warning when generic_handle_domain_irq() is invoked from
outside hard interrupt context. That's overbroad as the requirement
for invoking these handlers in hard interrupt context is only required
for certain interrupt types. The subsequently called code already
contains a warning which triggers conditionally for interrupt chips
which indicate this requirement in their properties.
Remove the overbroad one"
* tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq()
The IRQ simulator uses irq_work to trigger an interrupt. Without the
IRQ_WORK_HARD_IRQ flag the irq_work will be performed in thread context
on PREEMPT_RT. This causes locking errors later in handle_simple_irq()
which expects to be invoked with disabled interrupts.
Triggering individual interrupts in hardirq context should not lead to
unexpected high latencies since this is also what the hardware
controller does. Also it is used as a simulator so...
Use IRQ_WORK_INIT_HARD() to carry out the irq_work in hardirq context on
PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/YnuZBoEVMGwKkLm+@linutronix.de
With debugobjects enabled the timer hint for freeing of active timers
embedded inside delayed works is always the same, i.e. the hint is
delayed_work_timer_fn, even though the function the delayed work is going
to run can be wildly different depending on what work was queued. Enabling
workqueue debugobjects doesn't help either because the delayed work isn't
considered active until it is actually queued to run on a workqueue. If the
work is freed while the timer is pending the work isn't considered active
so there is no information from workqueue debugobjects.
Special case delayed works in the timer debugobjects hint logic so that the
delayed work function is returned instead of the delayed_work_timer_fn.
This will help to understand which delayed work was pending that got
freed.
Apply the same treatment for kthread_delayed_work because it follows the
same pattern.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220511201951.42408-1-swboyd@chromium.org
Instead of having uninitialized versions of arguments as separate
bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
modifier to denote that the argument is uninitialized.
Doing so cleans up some of the logic in the verifier. We no longer
need to do two checks against an argument type (eg "if
(base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
versions of the same argument type will now share the same base type.
In the near future, MEM_UNINIT will be used by dynptr helper functions
as well.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20220509224257.3222614-2-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The addition of random_get_entropy_fallback() provides access to
whichever time source has the highest frequency, which is useful for
gathering entropy on platforms without available cycle counters. It's
not necessarily as good as being able to quickly access a cycle counter
that the CPU has, but it's still something, even when it falls back to
being jiffies-based.
In the event that a given arch does not define get_cycles(), falling
back to the get_cycles() default implementation that returns 0 is really
not the best we can do. Instead, at least calling
random_get_entropy_fallback() would be preferable, because that always
needs to return _something_, even falling back to jiffies eventually.
It's not as though random_get_entropy_fallback() is super high precision
or guaranteed to be entropic, but basically anything that's not zero all
the time is better than returning zero all the time.
Finally, since random_get_entropy_fallback() is used during extremely
early boot when randomizing freelists in mm_init(), it can be called
before timekeeping has been initialized. In that case there really is
nothing we can do; jiffies hasn't even started ticking yet. So just give
up and return 0.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Theodore Ts'o <tytso@mit.edu>
An inclusion of cache.h in printk.h was added in 2014 in commit
c28aa1f0a847 ("printk/cache: mark printk_once test variable
__read_mostly") in order to bring in the definition of __read_mostly. The
usage of __read_mostly was later removed in commit 3ec25826ae33 ("printk:
Tie printk_once / printk_deferred_once into .data.once for reset") which
made the inclusion of cache.h unnecessary, so remove it.
We have a small amount of code that depended on the inclusion of cache.h
from printk.h; fix that code to include the appropriate header.
This fixes a circular inclusion on arm64 (linux/printk.h -> linux/cache.h
-> asm/cache.h -> linux/kasan-enabled.h -> linux/static_key.h ->
linux/jump_label.h -> linux/bug.h -> asm/bug.h -> linux/printk.h) that
would otherwise be introduced by the next patch.
Build tested using {allyesconfig,defconfig} x {arm64,x86_64}.
Link: https://linux-review.googlesource.com/id/I8fd51f72c9ef1f2d6afd3b2cbc875aa4792c1fba
Link: https://lkml.kernel.org/r/20220427195820.1716975-1-pcc@google.com
Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The combination of jit blinding and pointers to bpf subprogs causes:
[ 36.989548] BUG: unable to handle page fault for address: 0000000100000001
[ 36.990342] #PF: supervisor instruction fetch in kernel mode
[ 36.990968] #PF: error_code(0x0010) - not-present page
[ 36.994859] RIP: 0010:0x100000001
[ 36.995209] Code: Unable to access opcode bytes at RIP 0xffffffd7.
[ 37.004091] Call Trace:
[ 37.004351] <TASK>
[ 37.004576] ? bpf_loop+0x4d/0x70
[ 37.004932] ? bpf_prog_3899083f75e4c5de_F+0xe3/0x13b
The jit blinding logic didn't recognize that ld_imm64 with an address
of bpf subprogram is a special instruction and proceeded to randomize it.
By itself it wouldn't have been an issue, but jit_subprogs() logic
relies on two step process to JIT all subprogs and then JIT them
again when addresses of all subprogs are known.
Blinding process in the first JIT phase caused second JIT to miss
adjustment of special ld_imm64.
Fix this issue by ignoring special ld_imm64 instructions that don't have
user controlled constants and shouldn't be blinded.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220513011025.13344-1-alexei.starovoitov@gmail.com
nslabs can shrink when allocations or the remap don't succeed, so make
sure to use it for all sizing. For that remove the bytes value that
can get stale and replace it with local calculations and a boolean to
indicate if the originally requested size could not be allocated.
Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
default_nslabs should only be used to initialize nslabs, after that we
need to use the local variable that can shrink when allocations or the
remap don't succeed.
Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
For historical reasons the switlb code paniced when the metadata could
not be allocated, but just printed a warning when the actual main
swiotlb buffer could not be allocated. Restore this somewhat unexpected
behavior as changing it caused a boot failure on the Microchip RISC-V
PolarFire SoC Icicle kit.
Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl")
Reported-by: Conor Dooley <Conor.Dooley@microchip.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <Conor.Dooley@microchip.com>
Earlier the PREEMPT_RT patch had a PREEMPT_RT_FULL and PREEMPT_RT_BASE
Kconfig option. The latter was a subset of the functionality that was
enabled with PREEMPT_RT_FULL and was mainly useful for debugging.
During the merging efforts the two Kconfig options were abandoned in the
v5.4.3-rt1 release and since then there is only PREEMPT_RT which enables
the full features set (as PREEMPT_RT_FULL did in earlier releases).
Replace the PREEMPT_RT_FULL reference with PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/YnvWUvq1vpqCfCU7@linutronix.de
Pointer buf is being assigned a value that is not being read, buf is being
re-assigned in the next starement. The assignment is redundant and can be
removed.
Cleans up clang scan build warning:
kernel/relay.c:443:8: warning: Although the value stored to 'buf' is
used in the enclosing expression, the value is never actually read
from 'buf' [deadcode.DeadStores]
Link: https://lkml.kernel.org/r/20220508212152.58753-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
At the end of get_last_crashkernel(), the judgement of ck_cmdline is
obviously unnecessary and causes redundance, let's clean it up.
Link: https://lkml.kernel.org/r/20220506104116.259323-1-sensor1010@163.com
Signed-off-by: lizhe <sensor1010@163.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull cgroup fix from Tejun Heo:
"Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which
were hot-added"
* 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
Now check_exported_symbol() always succeeds.
Merge it into find_exported_symbol_in_search() to make the code concise.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Currently, !fsa->gplok && syms->license == GPL_ONLY) is checked after
bsearch() succeeds.
It is meaningless to do the binary search in the GPL symbol table when
fsa->gplok is false because we know find_exported_symbol_in_section()
will fail anyway.
This check should be done before bsearch().
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
There is no need to use an opaque pointer for check_exported_symbol()
or find_exported_symbol_in_section.
Pass (struct find_symbol_arg *) explicitly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
The error log for inherit_taint() doesn't really help to find the
symbol which violates GPL rules.
For example,
if a module has 300 symbol and includes 50 disallowed symbols,
the log only shows the content below and we have no idea what symbol is.
AAA: module using GPL-only symbols uses symbols from proprietary module BBB.
It's hard for user who doesn't really know how the symbol was parsing.
This patch add symbol name to tell the offending symbols explicitly.
AAA: module using GPL-only symbols uses symbols SSS from proprietary module BBB.
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Currently, only the initial module that tainted the kernel is
recorded e.g. when an out-of-tree module is loaded.
The purpose of this patch is to allow the kernel to maintain a record of
each unloaded module that taints the kernel. So, in addition to
displaying a list of linked modules (see print_modules()) e.g. in the
event of a detected bad page, unloaded modules that carried a taint/or
taints are displayed too. A tainted module unload count is maintained.
The number of tracked modules is not fixed. This feature is disabled by
default.
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
No functional change.
This patch migrates module_assert_mutex_or_preempt() to internal.h.
So, the aforementiond function can be used outside of main/or core
module code yet will remain restricted for internal use only.
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
No functional change.
The purpose of this patch is to modify module_flags_taint() to accept
a module's taints bitmap as a parameter and modifies all users
accordingly. Furthermore, it is now possible to access a given
module's taint flags data outside of non-essential code yet does
remain for internal use only.
This is in preparation for module unload taint tracking support.
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Hardware core level testing features require near simultaneous execution
of WRMSR instructions on all threads of a core to initiate a test.
Provide a customized cut down version of stop_machine_cpuslocked() that
just operates on the threads of a single core.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not
include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of
the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline
can exceed BPF_MAX_TRAMP_PROGS.
When this happens, the assignment '*progs++ = aux->prog' in
bpf_trampoline_get_progs() will cause progs array overflow as the
progs field in the bpf_tramp_progs struct can only hold at most
BPF_MAX_TRAMP_PROGS bpf programs.
Fixes: 88fd9e5352fe ("bpf: Refactor trampoline update code")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Link: https://lore.kernel.org/r/20220430130803.210624-1-ytcoode@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add new ebpf helpers bpf_map_lookup_percpu_elem.
The implementation method is relatively simple, refer to the implementation
method of map_lookup_elem of percpu map, increase the parameters of cpu, and
obtain it according to the specified cpu.
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.
This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.
If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.
Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
Currently both expedited and regular grace period stall warnings use
a single timeout value that with units of seconds. However, recent
Android use cases problem require a sub-100-millisecond expedited RCU CPU
stall warning. Given that expedited RCU grace periods normally complete
in far less than a single millisecond, especially for small systems,
this is not unreasonable.
Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel
configuration that defaults to 20 msec on Android and remains the same
as that of the non-expedited stall warnings otherwise. It also can be
changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout.
[ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ]
Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.
This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When dma_direct_alloc_pages encounters a highmem page it just gives up
currently. But what we really should do is to try memory using the
page allocator instead - without this platforms with a global highmem
CMA pool will fail all dma_alloc_pages allocations.
Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API")
Reported-by: Mark O'Neill <mao@tumblingdice.co.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In commit e458716a92b57 ("PM: EM: Mark inefficiencies in CPUFreq"),
cpufreq_cpu_get() is called without a cpufreq_cpu_put(), permanently
increasing the reference counts of the policy struct.
Decrement the reference count once the policy struct is not used
anymore.
Fixes: e458716a92b57 ("PM: EM: Mark inefficiencies in CPUFreq")
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The change to call update_rq_clock() before activate_task()
commit 840d719604b0 ("sched/deadline: Update rq_clock of later_rq
when pushing a task") is no longer needed since commit f4904815f97a
("sched/deadline: Fix double accounting of rq/running bw in push & pull")
removed the add_running_bw() before the activate_task().
So we remove some comments that are no longer needed and update
rq clock in activate_task().
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lore.kernel.org/r/20220430085843.62939-3-jiahao.os@bytedance.com
When we use raw_spin_rq_lock() to acquire the rq lock and have to
update the rq clock while holding the lock, the kernel may issue
a WARN_DOUBLE_CLOCK warning.
Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
rq_lock(), there is no corresponding change to rq->clock_update_flags.
In particular, we have obtained the rq lock of other CPUs, the
rq->clock_update_flags of this CPU may be RQCF_UPDATED at this time, and
then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.
So we need to clear RQCF_UPDATED of rq->clock_update_flags to avoid
the WARN_DOUBLE_CLOCK warning.
For the sched_rt_period_timer() and migrate_task_rq_dl() cases
we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
rq_lock()/rq_unlock().
For the {pull,push}_{rt,dl}_task() cases, we add the
double_rq_clock_clear_update() function to clear RQCF_UPDATED of
rq->clock_update_flags, and call double_rq_clock_clear_update()
before double_lock_balance()/double_rq_lock() returns to avoid the
WARN_DOUBLE_CLOCK warning.
Some call trace reports:
Call Trace 1:
<IRQ>
sched_rt_period_timer+0x10f/0x3a0
? enqueue_top_rt_rq+0x110/0x110
__hrtimer_run_queues+0x1a9/0x490
hrtimer_interrupt+0x10b/0x240
__sysvec_apic_timer_interrupt+0x8a/0x250
sysvec_apic_timer_interrupt+0x9a/0xd0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20
Call Trace 2:
<TASK>
activate_task+0x8b/0x110
push_rt_task.part.108+0x241/0x2c0
push_rt_tasks+0x15/0x30
finish_task_switch+0xaa/0x2e0
? __switch_to+0x134/0x420
__schedule+0x343/0x8e0
? hrtimer_start_range_ns+0x101/0x340
schedule+0x4e/0xb0
do_nanosleep+0x8e/0x160
hrtimer_nanosleep+0x89/0x120
? hrtimer_init_sleeper+0x90/0x90
__x64_sys_nanosleep+0x96/0xd0
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Call Trace 3:
<TASK>
deactivate_task+0x93/0xe0
pull_rt_task+0x33e/0x400
balance_rt+0x7e/0x90
__schedule+0x62f/0x8e0
do_task_dead+0x3f/0x50
do_exit+0x7b8/0xbb0
do_group_exit+0x2d/0x90
get_signal+0x9df/0x9e0
? preempt_count_add+0x56/0xa0
? __remove_hrtimer+0x35/0x70
arch_do_signal_or_restart+0x36/0x720
? nanosleep_copyout+0x39/0x50
? do_nanosleep+0x131/0x160
? audit_filter_inodes+0xf5/0x120
exit_to_user_mode_prepare+0x10f/0x1e0
syscall_exit_to_user_mode+0x17/0x30
do_syscall_64+0x40/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Call Trace 4:
update_rq_clock+0x128/0x1a0
migrate_task_rq_dl+0xec/0x310
set_task_cpu+0x84/0x1e4
try_to_wake_up+0x1d8/0x5c0
wake_up_process+0x1c/0x30
hrtimer_wakeup+0x24/0x3c
__hrtimer_run_queues+0x114/0x270
hrtimer_interrupt+0xe8/0x244
arch_timer_handler_phys+0x30/0x50
handle_percpu_devid_irq+0x88/0x140
generic_handle_domain_irq+0x40/0x60
gic_handle_irq+0x48/0xe0
call_on_irq_stack+0x2c/0x60
do_interrupt_handler+0x80/0x84
Steps to reproduce:
1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
3. Run some rt/dl tasks that periodically work and sleep, e.g.
Create 2*n rt or dl (90% running) tasks via rt-app (on a system
with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
on PREEMPT_RT kernel.
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20220430085843.62939-2-jiahao.os@bytedance.com