IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add atomic_notifier_call_chain_is_empty() that returns true if given
atomic call chain is empty.
The first user of this new notifier API function will be the kernel
power-off core code that will support power-off call chains. The core
code will need to check whether there is a power-off handler registered
at all in order to decide whether to halt machine or power it off.
Reviewed-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make cgroup_debug static since it's only used in cgroup.c
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
random32.c has two random number generators in it: one that is meant to
be used deterministically, with some predefined seed, and one that does
the same exact thing as random.c, except does it poorly. The first one
has some use cases. The second one no longer does and can be replaced
with calls to random.c's proper random number generator.
The relatively recent siphash-based bad random32.c code was added in
response to concerns that the prior random32.c was too deterministic.
Out of fears that random.c was (at the time) too slow, this code was
anonymously contributed. Then out of that emerged a kind of shadow
entropy gathering system, with its own tentacles throughout various net
code, added willy nilly.
Stop👏making👏bespoke👏random👏number👏generators👏.
Fortunately, recent advances in random.c mean that we can stop playing
with this sketchiness, and just use get_random_u32(), which is now fast
enough. In micro benchmarks using RDPMC, I'm seeing the same median
cycle count between the two functions, with the mean being _slightly_
higher due to batches refilling (which we can optimize further need be).
However, when doing *real* benchmarks of the net functions that actually
use these random numbers, the mean cycles actually *decreased* slightly
(with the median still staying the same), likely because the additional
prandom code means icache misses and complexity, whereas random.c is
generally already being used by something else nearby.
The biggest benefit of this is that there are many users of prandom who
probably should be using cryptographically secure random numbers. This
makes all of those accidental cases become secure by just flipping a
switch. Later on, we can do a tree-wide cleanup to remove the static
inline wrapper functions that this commit adds.
There are also some low-ish hanging fruits for making this even faster
in the future: a get_random_u16() function for use in the networking
stack will give a 2x performance boost there, using SIMD for ChaCha20
will let us compute 4 or 8 or 16 blocks of output in parallel, instead
of just one, giving us large buffers for cheap, and introducing a
get_random_*_bh() function that assumes irqs are already disabled will
shave off a few cycles for ordinary calls. These are things we can chip
away at down the road.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:
WARN_ON(context->context != AUDIT_CTX_UNUSED);
WARN_ON(context->name_count);
if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
audit_panic("unrecoverable error in audit_syscall_entry()");
return;
}
These problematic dummy contexts are created via the following call
chain:
exit_to_user_mode_prepare
-> arch_do_signal_or_restart
-> get_signal
-> task_work_run
-> tctx_task_work
-> io_req_task_submit
-> io_issue_sqe
-> audit_uring_entry
Cc: stable@vger.kernel.org
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Julian Orth <ju.orth@gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
swiotlb_find_slots() skips slots according to io tlb aligned mask
calculated from min aligned mask and original physical address
offset. This affects max mapping size. The mapping size can't
achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is
non-zero. This will cause system boot up failure in Hyper-V
Isolation VM where swiotlb force is enabled. Scsi layer use return
value of dma_max_mapping_size() to set max segment size and it
finally calls swiotlb_max_mapping_size(). Hyper-V storage driver
sets min align mask to 4k - 1. Scsi layer may pass 256k length of
request buffer with 0~4k offset and Hyper-V storage driver can't
get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't
find 256k length bounce buffer with offset. Make swiotlb_max_mapping
_size() take min align mask into account.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Use the newly added suite_{init,exit} support for suite-wide init and
cleanup. This avoids the unsupported method by which the test used to do
suite-wide init and cleanup (avoiding issues such as missing TAP
headers, and possible future conflicts).
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
in the middle of the arguments. This reordering broke BPF programs which
relied on the old argument list. While tracepoints are not considered
stable ABI, it's not trivial to make BPF cope with such a change, but it's
being worked on. For now restore the original argument order and move the
new argument to the end of the argument list.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKAxKQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoThID/4jp/8GiKsf1jPgKkU39Yw7qAePzObQ
V9K2XLxSwH27D+UpmOPODnckzHJMtX0M4Z+sGMgGSPe/IvOVj+NEmUiQGU29sDwg
T7If2FSHMutPCB9QL26kxjmebU+SdllRwrJylOA1ZNduunczxKlpATJ5vneCC/Qt
D5VpB3XlwT31pd9UdoW/kV5uQK6bFR7qREWXhONZ+HyzsKJdV0vGe2ZX6U7ek2/d
XJxETE1eXlsMr+2VY5lkxhr596uPJgDAM9g+OknO/Lal/I7WoUchDN2giItzn6RY
XWxPK85mE59MwTa6PQCJcO8A7r2KcHfGrbFVjA9h1jhREtsZigb9ZemDgQ+s8goT
znIIlTO2l7ed2VDMU/mt3zZuS0rMshn/8Axk+AN3N6gKffV6F4q0BpZUUccGe+FM
tfQ34YGmMKx6uuyHPPZCQd1buJuDuXNyZF7XFO3uxv9BGt3x42aswAbx1zYIV+ZR
Uj/Vnojoc1aBdffVSUL0he+vjutYixx4gb8nh0ZFa5FTe70XDvPGTUTTOSW6BOq0
yiFOWtG8MbziVBDE2iKmfUMT+dPQd0+PW8szk8J9yOJyOnTu9y6KkyWl2JRllSxT
Qv7icnMN5P1xqN/c4P+8Iq0CrVItyxMJ0Ouc29tsNPHYkzsBo4c0XAn94mib1O17
zyJYW0F9UVHOSg==
=6Bvx
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"The recent expansion of the sched switch tracepoint inserted a new
argument in the middle of the arguments. This reordering broke BPF
programs which relied on the old argument list.
While tracepoints are not considered stable ABI, it's not trivial to
make BPF cope with such a change, but it's being worked on. For now
restore the original argument order and move the new argument to the
end of the argument list"
* tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/tracing: Append prev_state to tp args instead
interrupt code. The consolidation of the interrupt handler invocation code
added an unconditional warning when generic_handle_domain_irq() is invoked
from outside hard interrupt context. That's overbroad as the requirement
for invoking these handlers in hard interrupt context is only required for
certain interrupt types. The subsequently called code already contains a
warning which triggers conditionally for interrupt chips which indicate
this requirement in their properties. Remove the overbroad one.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKAwyATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYod4+D/9KrvIKGNSRKThw1zx4w1FzeOtRjhiT
HdwiNENKUhClWQHTzfv1xHOEv1hFVTuuz5tP2zIfzkKrDe1/dijZY9P/QvdEhp+s
idRzxaclWlxsxv4K8zqD1i/0klQ91YBA5aADgn4t1vY4WdWtJpbkFW8tndoAUAZR
THrFBGrvBdhjsSwK5VVfZcwNNeIh0lGG83vE8zPnzI7fbNxuAa1pI9bAigSa9jIT
zYcMm+mmC7eIdjeLD/Vx5Rujn3/MOLfmAfv9TwNIH2heQo6RwtINt0mzuDqKibOh
ly6T1Ol12WQuOLy5dYHglvogAzhJP49RbsQHCxU9S7BaWqcVfHuN88WhU/JXgfHn
UGdE3ppJpNHk/IqGSUyilDUzXgR9YH3j+XOYNnG2PidDWl5aPwuU1h9L7wdJnDZy
5Ou6JVmQjYc2+A6YeCZsNl+FdyvWpH+Gc/oGi09Saf1kCFuAVW11mkhRFHawWfHW
SZRpbSWxE+v0QFDd6T+IajSEwifw4+Ua8yjxRUU1dpsTcxHdFxGBlFFIebeYXlzJ
Xx2fASyCdlMzlEj7qegU2Y67yn0+yQjziZLaOCMtDtbWFO9APV447lEb5FcImqgi
XTT2HHw5sPZpLLoCED2zRoAsrh+aK9rJyH9pWEoRYvxVgmO613Qkw8GVJSmm8mO+
tZraqHFkoTuxRg==
=pJj8
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a recent (introduced in 5.16) regression in the core
interrupt code.
The consolidation of the interrupt handler invocation code added an
unconditional warning when generic_handle_domain_irq() is invoked from
outside hard interrupt context. That's overbroad as the requirement
for invoking these handlers in hard interrupt context is only required
for certain interrupt types. The subsequently called code already
contains a warning which triggers conditionally for interrupt chips
which indicate this requirement in their properties.
Remove the overbroad one"
* tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq()
The IRQ simulator uses irq_work to trigger an interrupt. Without the
IRQ_WORK_HARD_IRQ flag the irq_work will be performed in thread context
on PREEMPT_RT. This causes locking errors later in handle_simple_irq()
which expects to be invoked with disabled interrupts.
Triggering individual interrupts in hardirq context should not lead to
unexpected high latencies since this is also what the hardware
controller does. Also it is used as a simulator so...
Use IRQ_WORK_INIT_HARD() to carry out the irq_work in hardirq context on
PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/YnuZBoEVMGwKkLm+@linutronix.de
With debugobjects enabled the timer hint for freeing of active timers
embedded inside delayed works is always the same, i.e. the hint is
delayed_work_timer_fn, even though the function the delayed work is going
to run can be wildly different depending on what work was queued. Enabling
workqueue debugobjects doesn't help either because the delayed work isn't
considered active until it is actually queued to run on a workqueue. If the
work is freed while the timer is pending the work isn't considered active
so there is no information from workqueue debugobjects.
Special case delayed works in the timer debugobjects hint logic so that the
delayed work function is returned instead of the delayed_work_timer_fn.
This will help to understand which delayed work was pending that got
freed.
Apply the same treatment for kthread_delayed_work because it follows the
same pattern.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220511201951.42408-1-swboyd@chromium.org
Instead of having uninitialized versions of arguments as separate
bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
modifier to denote that the argument is uninitialized.
Doing so cleans up some of the logic in the verifier. We no longer
need to do two checks against an argument type (eg "if
(base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
versions of the same argument type will now share the same base type.
In the near future, MEM_UNINIT will be used by dynptr helper functions
as well.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20220509224257.3222614-2-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The addition of random_get_entropy_fallback() provides access to
whichever time source has the highest frequency, which is useful for
gathering entropy on platforms without available cycle counters. It's
not necessarily as good as being able to quickly access a cycle counter
that the CPU has, but it's still something, even when it falls back to
being jiffies-based.
In the event that a given arch does not define get_cycles(), falling
back to the get_cycles() default implementation that returns 0 is really
not the best we can do. Instead, at least calling
random_get_entropy_fallback() would be preferable, because that always
needs to return _something_, even falling back to jiffies eventually.
It's not as though random_get_entropy_fallback() is super high precision
or guaranteed to be entropic, but basically anything that's not zero all
the time is better than returning zero all the time.
Finally, since random_get_entropy_fallback() is used during extremely
early boot when randomizing freelists in mm_init(), it can be called
before timekeeping has been initialized. In that case there really is
nothing we can do; jiffies hasn't even started ticking yet. So just give
up and return 0.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Theodore Ts'o <tytso@mit.edu>
An inclusion of cache.h in printk.h was added in 2014 in commit
c28aa1f0a847 ("printk/cache: mark printk_once test variable
__read_mostly") in order to bring in the definition of __read_mostly. The
usage of __read_mostly was later removed in commit 3ec25826ae33 ("printk:
Tie printk_once / printk_deferred_once into .data.once for reset") which
made the inclusion of cache.h unnecessary, so remove it.
We have a small amount of code that depended on the inclusion of cache.h
from printk.h; fix that code to include the appropriate header.
This fixes a circular inclusion on arm64 (linux/printk.h -> linux/cache.h
-> asm/cache.h -> linux/kasan-enabled.h -> linux/static_key.h ->
linux/jump_label.h -> linux/bug.h -> asm/bug.h -> linux/printk.h) that
would otherwise be introduced by the next patch.
Build tested using {allyesconfig,defconfig} x {arm64,x86_64}.
Link: https://linux-review.googlesource.com/id/I8fd51f72c9ef1f2d6afd3b2cbc875aa4792c1fba
Link: https://lkml.kernel.org/r/20220427195820.1716975-1-pcc@google.com
Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The combination of jit blinding and pointers to bpf subprogs causes:
[ 36.989548] BUG: unable to handle page fault for address: 0000000100000001
[ 36.990342] #PF: supervisor instruction fetch in kernel mode
[ 36.990968] #PF: error_code(0x0010) - not-present page
[ 36.994859] RIP: 0010:0x100000001
[ 36.995209] Code: Unable to access opcode bytes at RIP 0xffffffd7.
[ 37.004091] Call Trace:
[ 37.004351] <TASK>
[ 37.004576] ? bpf_loop+0x4d/0x70
[ 37.004932] ? bpf_prog_3899083f75e4c5de_F+0xe3/0x13b
The jit blinding logic didn't recognize that ld_imm64 with an address
of bpf subprogram is a special instruction and proceeded to randomize it.
By itself it wouldn't have been an issue, but jit_subprogs() logic
relies on two step process to JIT all subprogs and then JIT them
again when addresses of all subprogs are known.
Blinding process in the first JIT phase caused second JIT to miss
adjustment of special ld_imm64.
Fix this issue by ignoring special ld_imm64 instructions that don't have
user controlled constants and shouldn't be blinded.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220513011025.13344-1-alexei.starovoitov@gmail.com
nslabs can shrink when allocations or the remap don't succeed, so make
sure to use it for all sizing. For that remove the bytes value that
can get stale and replace it with local calculations and a boolean to
indicate if the originally requested size could not be allocated.
Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
default_nslabs should only be used to initialize nslabs, after that we
need to use the local variable that can shrink when allocations or the
remap don't succeed.
Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
For historical reasons the switlb code paniced when the metadata could
not be allocated, but just printed a warning when the actual main
swiotlb buffer could not be allocated. Restore this somewhat unexpected
behavior as changing it caused a boot failure on the Microchip RISC-V
PolarFire SoC Icicle kit.
Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl")
Reported-by: Conor Dooley <Conor.Dooley@microchip.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <Conor.Dooley@microchip.com>
Earlier the PREEMPT_RT patch had a PREEMPT_RT_FULL and PREEMPT_RT_BASE
Kconfig option. The latter was a subset of the functionality that was
enabled with PREEMPT_RT_FULL and was mainly useful for debugging.
During the merging efforts the two Kconfig options were abandoned in the
v5.4.3-rt1 release and since then there is only PREEMPT_RT which enables
the full features set (as PREEMPT_RT_FULL did in earlier releases).
Replace the PREEMPT_RT_FULL reference with PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/YnvWUvq1vpqCfCU7@linutronix.de
Pointer buf is being assigned a value that is not being read, buf is being
re-assigned in the next starement. The assignment is redundant and can be
removed.
Cleans up clang scan build warning:
kernel/relay.c:443:8: warning: Although the value stored to 'buf' is
used in the enclosing expression, the value is never actually read
from 'buf' [deadcode.DeadStores]
Link: https://lkml.kernel.org/r/20220508212152.58753-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
At the end of get_last_crashkernel(), the judgement of ck_cmdline is
obviously unnecessary and causes redundance, let's clean it up.
Link: https://lkml.kernel.org/r/20220506104116.259323-1-sensor1010@163.com
Signed-off-by: lizhe <sensor1010@163.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull cgroup fix from Tejun Heo:
"Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which
were hot-added"
* 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
Now check_exported_symbol() always succeeds.
Merge it into find_exported_symbol_in_search() to make the code concise.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Currently, !fsa->gplok && syms->license == GPL_ONLY) is checked after
bsearch() succeeds.
It is meaningless to do the binary search in the GPL symbol table when
fsa->gplok is false because we know find_exported_symbol_in_section()
will fail anyway.
This check should be done before bsearch().
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
There is no need to use an opaque pointer for check_exported_symbol()
or find_exported_symbol_in_section.
Pass (struct find_symbol_arg *) explicitly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
The error log for inherit_taint() doesn't really help to find the
symbol which violates GPL rules.
For example,
if a module has 300 symbol and includes 50 disallowed symbols,
the log only shows the content below and we have no idea what symbol is.
AAA: module using GPL-only symbols uses symbols from proprietary module BBB.
It's hard for user who doesn't really know how the symbol was parsing.
This patch add symbol name to tell the offending symbols explicitly.
AAA: module using GPL-only symbols uses symbols SSS from proprietary module BBB.
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Currently, only the initial module that tainted the kernel is
recorded e.g. when an out-of-tree module is loaded.
The purpose of this patch is to allow the kernel to maintain a record of
each unloaded module that taints the kernel. So, in addition to
displaying a list of linked modules (see print_modules()) e.g. in the
event of a detected bad page, unloaded modules that carried a taint/or
taints are displayed too. A tainted module unload count is maintained.
The number of tracked modules is not fixed. This feature is disabled by
default.
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
No functional change.
This patch migrates module_assert_mutex_or_preempt() to internal.h.
So, the aforementiond function can be used outside of main/or core
module code yet will remain restricted for internal use only.
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
No functional change.
The purpose of this patch is to modify module_flags_taint() to accept
a module's taints bitmap as a parameter and modifies all users
accordingly. Furthermore, it is now possible to access a given
module's taint flags data outside of non-essential code yet does
remain for internal use only.
This is in preparation for module unload taint tracking support.
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Hardware core level testing features require near simultaneous execution
of WRMSR instructions on all threads of a core to initiate a test.
Provide a customized cut down version of stop_machine_cpuslocked() that
just operates on the threads of a single core.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not
include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of
the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline
can exceed BPF_MAX_TRAMP_PROGS.
When this happens, the assignment '*progs++ = aux->prog' in
bpf_trampoline_get_progs() will cause progs array overflow as the
progs field in the bpf_tramp_progs struct can only hold at most
BPF_MAX_TRAMP_PROGS bpf programs.
Fixes: 88fd9e5352fe ("bpf: Refactor trampoline update code")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Link: https://lore.kernel.org/r/20220430130803.210624-1-ytcoode@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add new ebpf helpers bpf_map_lookup_percpu_elem.
The implementation method is relatively simple, refer to the implementation
method of map_lookup_elem of percpu map, increase the parameters of cpu, and
obtain it according to the specified cpu.
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.
This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.
If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.
Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.
There's two spots of bother with this:
- PREEMPT_RT has task->saved_state which complicates matters,
meaning task_is_{traced,stopped}() needs to check an additional
variable.
- An alternative freezer implementation that itself relies on a
special TASK state would loose TASK_TRACED/TASK_STOPPED and will
result in misbehaviour.
As such, add additional state to task->jobctl to track this state
outside of task->__state.
NOTE: this doesn't actually fix anything yet, just adds extra state.
--EWB
* didn't add a unnecessary newline in signal.h
* Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
instead of in signal_wake_up_state. This prevents the clearing
of TASK_STOPPED and TASK_TRACED from getting lost.
* Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.
Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implement a new jobctl flag TASK_PTRACE_FROZEN. This new flag is set
in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).
In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal. Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set. This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.
Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending. The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.
Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop. Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1]. The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger. To complement removing the BUG_ON an attempt
to better handle the race was added.
The code detected the tracer had gone away and did not call
do_notify_parent_cldstop. The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.
The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.
The code to prevent sending spurious SIGTRAPs is a failure. At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired. If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee. The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.
Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that. This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.
To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.
[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-9-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
After ptrace_freeze_traced succeeds it is known that the tracee
has a __state value of __TASK_TRACED and that no __ptrace_unlink will
happen because the tracer is waiting for the tracee, and the tracee is
in ptrace_stop.
The function ptrace_freeze_traced can succeed at any point after
ptrace_stop has set TASK_TRACED and dropped siglock. The read_lock on
tasklist_lock only excludes ptrace_attach.
This means that the !current->ptrace which executes under a read_lock
of tasklist_lock will never see a ptrace_freeze_trace as the tracer
must have gone away before the tasklist_lock was taken and
ptrace_attach can not occur until the read_lock is dropped. As
ptrace_freeze_traced depends upon ptrace_attach running before it can
run that excludes ptrace_freeze_traced until __state is set to
TASK_RUNNING. This means that task_is_traced will fail in
ptrace_freeze_attach and ptrace_freeze_attached will fail.
On the current->ptrace branch of ptrace_stop which will be reached any
time after ptrace_freeze_traced has succeed it is known that __state
is __TASK_TRACED and schedule() will be called with that state.
Use a WARN_ON_ONCE to document that wait_task_inactive(TASK_TRACED)
should never fail. Remove the stale comment about may_ptrace_stop.
Strictly speaking this is not true because if PREEMPT_RT is enabled
wait_task_inactive can fail because __state can be changed. I don't
see this as a problem as the ptrace code is currently broken on
PREMPT_RT, and this is one of the issues. Failing and warning when
the assumptions of the code are broken is good.
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-8-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop. At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.
While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.
When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop. Making the userspace
visible behavior of PTRACE_KILL a noop in those case.
As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.
Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0. This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it. As that has always
been the intent of the code this seems like a reasonable change.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-7-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").
Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED. In all other cases arch_ptrace_stop
takes care of the register saving.
In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.
This makes ptrace_attach_sync_user_rbs completely unnecessary. So just
remove it.
I read through the code to verify that ptrace_attach_sync_user_rbs is
unnecessary. What I found is that the code is quite dead.
Reading ptrace_attach_sync_user_rbs it is easy to see that the it does
nothing unless __state == TASK_STOPPED.
Calling arch_ptrace_attach (aka ptrace_attach_sync_user_rbs) after
ptrace_traceme it is easy to see that because we are talking about the
current process the value of __state is TASK_RUNNING. Which means
ptrace_attach_sync_user_rbs does nothing.
The only other call of arch_ptrace_attach (aka
ptrace_attach_sync_user_rbs) is after ptrace_attach.
If the task is running (and PTRACE_SEIZE is not specified), a SIGSTOP
is sent which results in do_signal_stop setting JOBCTL_TRAP_STOP on
the target task (as it is ptraced) and the target task stopping
in ptrace_stop with __state == TASK_TRACED.
If the task was already stopped then ptrace_attach sets
JOBCTL_TRAPPING and JOBCTL_TRAP_STOP, wakes it out of __TASK_STOPPED,
and waits until the JOBCTL_TRAPPING_BIT is clear. At which point
the task stops in ptrace_stop.
In both cases there are a couple of funning excpetions such as if the
traced task receiveds a SIGCONT, or is set a fatal signal.
However in all of those cases the tracee never stops in __state
TASK_STOPPED. Which is a long way of saying that ptrace_attach_sync_user_rbs
is guaranteed never to do anything.
Cc: linux-ia64@vger.kernel.org
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value. As
the wrapper adds no real value update the code to directly call the
wrapped function.
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-1-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Currently both expedited and regular grace period stall warnings use
a single timeout value that with units of seconds. However, recent
Android use cases problem require a sub-100-millisecond expedited RCU CPU
stall warning. Given that expedited RCU grace periods normally complete
in far less than a single millisecond, especially for small systems,
this is not unreasonable.
Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel
configuration that defaults to 20 msec on Android and remains the same
as that of the non-expedited stall warnings otherwise. It also can be
changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout.
[ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ]
Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.
This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When dma_direct_alloc_pages encounters a highmem page it just gives up
currently. But what we really should do is to try memory using the
page allocator instead - without this platforms with a global highmem
CMA pool will fail all dma_alloc_pages allocations.
Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API")
Reported-by: Mark O'Neill <mao@tumblingdice.co.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Qian Cai <quic_qiancai@quicinc.com> wrote:
> Reverting the last 3 commits of the series fixed a boot crash.
>
> 1b2552cbdbe0 fork: Stop allowing kthreads to call execve
> 753550eb0ce1 fork: Explicitly set PF_KTHREAD
> 68d85f0a33b0 init: Deal with the init process being a user mode process
>
> BUG: KASAN: null-ptr-deref in task_nr_scan_windows.isra.0
> arch_atomic_long_read at ./include/linux/atomic/atomic-long.h:29
> (inlined by) atomic_long_read at ./include/linux/atomic/atomic-instrumented.h:1266
> (inlined by) get_mm_counter at ./include/linux/mm.h:1996
> (inlined by) get_mm_rss at ./include/linux/mm.h:2049
> (inlined by) task_nr_scan_windows at kernel/sched/fair.c:1123
> Read of size 8 at addr 00000000000003d0 by task swapper/0/1
With the change to init and the user mode helper processes to not have
PF_KTHREAD set before they call kernel_execve the PF_KTHREAD test in
task_tick_numa became insufficient to detect all tasks that have
"->mm == NULL". Correct that by testing for "->mm == NULL" directly.
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Fixes: 1b2552cbdbe0 ("fork: Stop allowing kthreads to call execve")
Link: https://lkml.kernel.org/r/87r150ug1l.fsf_-_@email.froward.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>