39419 Commits

Author SHA1 Message Date
Colin Ian King
47b7eae62a relay: remove redundant assignment to pointer buf
Pointer buf is being assigned a value that is not being read, buf is being
re-assigned in the next starement.  The assignment is redundant and can be
removed.

Cleans up clang scan build warning:
kernel/relay.c:443:8: warning: Although the value stored to 'buf' is
used in the enclosing expression, the value is never actually read
from 'buf' [deadcode.DeadStores]

Link: https://lkml.kernel.org/r/20220508212152.58753-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-12 20:38:37 -07:00
lizhe
a7bd57b87f kernel/crash_core.c: remove redundant check of ck_cmdline
At the end of get_last_crashkernel(), the judgement of ck_cmdline is
obviously unnecessary and causes redundance, let's clean it up.

Link: https://lkml.kernel.org/r/20220506104116.259323-1-sensor1010@163.com
Signed-off-by: lizhe <sensor1010@163.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-12 20:38:36 -07:00
Jakub Kicinski
9b19e57a3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Build issue in drivers/net/ethernet/sfc/ptp.c
  54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
  49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12 16:15:30 -07:00
Linus Torvalds
0ac824f379 Merge branch 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which
  were hot-added"

* 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
2022-05-12 10:42:56 -07:00
Masahiro Yamada
7390b94a3c module: merge check_exported_symbol() into find_exported_symbol_in_section()
Now check_exported_symbol() always succeeds.

Merge it into find_exported_symbol_in_search() to make the code concise.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12 10:29:41 -07:00
Masahiro Yamada
cdd66eb52f module: do not binary-search in __ksymtab_gpl if fsa->gplok is false
Currently, !fsa->gplok && syms->license == GPL_ONLY) is checked after
bsearch() succeeds.

It is meaningless to do the binary search in the GPL symbol table when
fsa->gplok is false because we know find_exported_symbol_in_section()
will fail anyway.

This check should be done before bsearch().

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12 10:29:41 -07:00
Masahiro Yamada
c6eee9df57 module: do not pass opaque pointer for symbol search
There is no need to use an opaque pointer for check_exported_symbol()
or find_exported_symbol_in_section.

Pass (struct find_symbol_arg *) explicitly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12 10:29:41 -07:00
Lecopzer Chen
8eac910a49 module: show disallowed symbol name for inherit_taint()
The error log for inherit_taint() doesn't really help to find the
symbol which violates GPL rules.

For example,
if a module has 300 symbol and includes 50 disallowed symbols,
the log only shows the content below and we have no idea what symbol is.
    AAA: module using GPL-only symbols uses symbols from proprietary module BBB.

It's hard for user who doesn't really know how the symbol was parsing.

This patch add symbol name to tell the offending symbols explicitly.
    AAA: module using GPL-only symbols uses symbols SSS from proprietary module BBB.

Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12 10:29:41 -07:00
Alexey Dobriyan
391e982bfa module: fix [e_shstrndx].sh_size=0 OOB access
It is trivial to craft a module to trigger OOB access in this line:

	if (info->secstrings[strhdr->sh_size - 1] != '\0') {

BUG: unable to handle page fault for address: ffffc90000aa0fff
PGD 100000067 P4D 100000067 PUD 100066067 PMD 10436f067 PTE 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 1215 Comm: insmod Not tainted 5.18.0-rc5-00007-g9bf578647087-dirty #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
RIP: 0010:load_module+0x19b/0x2391

Fixes: ec2a29593c83 ("module: harden ELF info handling")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
[rebased patch onto modules-next]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12 10:29:41 -07:00
Aaron Tomlin
99bd995655 module: Introduce module unload taint tracking
Currently, only the initial module that tainted the kernel is
recorded e.g. when an out-of-tree module is loaded.

The purpose of this patch is to allow the kernel to maintain a record of
each unloaded module that taints the kernel. So, in addition to
displaying a list of linked modules (see print_modules()) e.g. in the
event of a detected bad page, unloaded modules that carried a taint/or
taints are displayed too. A tainted module unload count is maintained.

The number of tracked modules is not fixed. This feature is disabled by
default.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12 10:29:41 -07:00
Aaron Tomlin
6fb0538d01 module: Move module_assert_mutex_or_preempt() to internal.h
No functional change.

This patch migrates module_assert_mutex_or_preempt() to internal.h.
So, the aforementiond function can be used outside of main/or core
module code yet will remain restricted for internal use only.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12 10:29:41 -07:00
Aaron Tomlin
c14e522bc7 module: Make module_flags_taint() accept a module's taints bitmap and usable outside core code
No functional change.

The purpose of this patch is to modify module_flags_taint() to accept
a module's taints bitmap as a parameter and modifies all users
accordingly. Furthermore, it is now possible to access a given
module's taint flags data outside of non-essential code yet does
remain for internal use only.

This is in preparation for module unload taint tracking support.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-12 10:29:41 -07:00
Peter Zijlstra
2760f5a415 stop_machine: Add stop_core_cpuslocked() for per-core operations
Hardware core level testing features require near simultaneous execution
of WRMSR instructions on all threads of a core to initiate a test.

Provide a customized cut down version of stop_machine_cpuslocked() that
just operates on the threads of a single core.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-05-12 15:35:29 +02:00
Yuntao Wang
a2aa95b71c bpf: Fix potential array overflow in bpf_trampoline_get_progs()
The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not
include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of
the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline
can exceed BPF_MAX_TRAMP_PROGS.

When this happens, the assignment '*progs++ = aux->prog' in
bpf_trampoline_get_progs() will cause progs array overflow as the
progs field in the bpf_tramp_progs struct can only hold at most
BPF_MAX_TRAMP_PROGS bpf programs.

Fixes: 88fd9e5352fe ("bpf: Refactor trampoline update code")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Link: https://lore.kernel.org/r/20220430130803.210624-1-ytcoode@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-11 21:24:20 -07:00
Feng Zhou
07343110b2 bpf: add bpf_map_lookup_percpu_elem for percpu map
Add new ebpf helpers bpf_map_lookup_percpu_elem.

The implementation method is relatively simple, refer to the implementation
method of map_lookup_elem of percpu map, increase the parameters of cpu, and
obtain it according to the specified cpu.

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-11 18:16:54 -07:00
Delyan Kratunov
9c2136be08 sched/tracing: Append prev_state to tp args instead
Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.

This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.

If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.

Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
2022-05-12 00:37:11 +02:00
Peter Zijlstra
31cae1eaae sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-05-11 14:37:06 -05:00
Eric W. Biederman
5b4197cb28 ptrace: Always take siglock in ptrace_resume
Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-11-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:36:30 -05:00
Eric W. Biederman
2500ad1c7f ptrace: Don't change __state
Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implement a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is set
in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal.  Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set.  This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.

Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep.  As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending.   The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:35:32 -05:00
Eric W. Biederman
57b6de08b5 ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1].  The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger.  To complement removing the BUG_ON an attempt
to better handle the race was added.

The code detected the tracer had gone away and did not call
do_notify_parent_cldstop.  The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.

The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.

The code to prevent sending spurious SIGTRAPs is a failure.  At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired.  If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee.  The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.

Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that.  This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.

To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.

[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-9-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:35:00 -05:00
Eric W. Biederman
7b0fe1367e ptrace: Document that wait_task_inactive can't fail
After ptrace_freeze_traced succeeds it is known that the tracee
has a __state value of __TASK_TRACED and that no __ptrace_unlink will
happen because the tracer is waiting for the tracee, and the tracee is
in ptrace_stop.

The function ptrace_freeze_traced can succeed at any point after
ptrace_stop has set TASK_TRACED and dropped siglock.  The read_lock on
tasklist_lock only excludes ptrace_attach.

This means that the !current->ptrace which executes under a read_lock
of tasklist_lock will never see a ptrace_freeze_trace as the tracer
must have gone away before the tasklist_lock was taken and
ptrace_attach can not occur until the read_lock is dropped.  As
ptrace_freeze_traced depends upon ptrace_attach running before it can
run that excludes ptrace_freeze_traced until __state is set to
TASK_RUNNING.  This means that task_is_traced will fail in
ptrace_freeze_attach and ptrace_freeze_attached will fail.

On the current->ptrace branch of ptrace_stop which will be reached any
time after ptrace_freeze_traced has succeed it is known that __state
is __TASK_TRACED and schedule() will be called with that state.

Use a WARN_ON_ONCE to document that wait_task_inactive(TASK_TRACED)
should never fail.  Remove the stale comment about may_ptrace_stop.

Strictly speaking this is not true because if PREEMPT_RT is enabled
wait_task_inactive can fail because __state can be changed.  I don't
see this as a problem as the ptrace code is currently broken on
PREMPT_RT, and this is one of the issues.  Failing and warning when
the assumptions of the code are broken is good.

Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-8-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:34:47 -05:00
Eric W. Biederman
6a2d90ba02 ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop.  At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.

While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.

When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop.  Making the userspace
visible behavior of PTRACE_KILL a noop in those case.

As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.

Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0.  This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it.  As that has always
been the intent of the code this seems like a reasonable change.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-7-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:34:28 -05:00
Eric W. Biederman
cb3c19c93d signal: Use lockdep_assert_held instead of assert_spin_locked
The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-6-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:34:14 -05:00
Eric W. Biederman
16cc1bc67d ptrace: Remove arch_ptrace_attach
The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

I read through the code to verify that ptrace_attach_sync_user_rbs is
unnecessary.  What I found is that the code is quite dead.

Reading ptrace_attach_sync_user_rbs it is easy to see that the it does
nothing unless __state == TASK_STOPPED.

Calling arch_ptrace_attach (aka ptrace_attach_sync_user_rbs) after
ptrace_traceme it is easy to see that because we are talking about the
current process the value of __state is TASK_RUNNING.  Which means
ptrace_attach_sync_user_rbs does nothing.

The only other call of arch_ptrace_attach (aka
ptrace_attach_sync_user_rbs) is after ptrace_attach.

If the task is running (and PTRACE_SEIZE is not specified), a SIGSTOP
is sent which results in do_signal_stop setting JOBCTL_TRAP_STOP on
the target task (as it is ptraced) and the target task stopping
in ptrace_stop with __state == TASK_TRACED.

If the task was already stopped then ptrace_attach sets
JOBCTL_TRAPPING and JOBCTL_TRAP_STOP, wakes it out of __TASK_STOPPED,
and waits until the JOBCTL_TRAPPING_BIT is clear.  At which point
the task stops in ptrace_stop.

In both cases there are a couple of funning excpetions such as if the
traced task receiveds a SIGCONT, or is set a fatal signal.

However in all of those cases the tracee never stops in __state
TASK_STOPPED.  Which is a long way of saying that ptrace_attach_sync_user_rbs
is guaranteed never to do anything.

Cc: linux-ia64@vger.kernel.org
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:33:54 -05:00
Eric W. Biederman
e71ba12407 signal: Replace __group_send_sig_info with send_signal_locked
The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value.  As
the wrapper adds no real value update the code to directly call the
wrapped function.

Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:33:17 -05:00
Eric W. Biederman
157cc18122 signal: Rename send_signal send_signal_locked
Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.

Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-1-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 14:32:57 -05:00
Paul E. McKenney
ce13389053 Merge branch 'exp.2022.05.11a' into HEAD
exp.2022.05.11a: Expedited-grace-period latency-reduction updates.
2022-05-11 11:49:35 -07:00
Kalesh Singh
9621fbee44 rcu: Move expedited grace period (GP) work to RT kthread_worker
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
latency because its workqueues run at SCHED_OTHER, and thus can be
delayed by normal processes.  This commit avoids these delays by moving
the expedited GP work items to a real-time-priority kthread_worker.

This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
default on PREEMPT_RT=y kernels which disable expedited grace periods
after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.

The results were evaluated on arm64 Android devices (6GB ram) running
5.10 kernel, and capturing trace data in critical user-level code.

The table below shows the resulting order-of-magnitude improvements
in synchronize_rcu_expedited() latency:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Count                    |          725  |            688  |         |
------------------------------------------------------------------------
| Min Duration       (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Q1                 (ns)  |       39,428  |         38,971  |  -1.16% |
------------------------------------------------------------------------
| Q2 - Median        (ns)  |       98,225  |         69,743  | -29.00% |
------------------------------------------------------------------------
| Q3                 (ns)  |      342,122  |        126,638  | -62.98% |
------------------------------------------------------------------------
| Max Duration       (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Avg Duration       (ns)  |    2,746,353  |        151,242  | -94.49% |
------------------------------------------------------------------------
| Standard Deviation (ns)  |   19,327,765  |        294,408  |         |
------------------------------------------------------------------------

The below table show the range of maximums/minimums for
synchronize_rcu_expedited() latency from all experiments:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Total No. of Experiments |           25  |             23  |         |
------------------------------------------------------------------------
| Largest  Maximum   (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Smallest Maximum   (ns)  |       38,819  |         86,954  | 124.00% |
------------------------------------------------------------------------
| Range of Maximums  (ns)  |  372,728,148  |      2,242,717  |         |
------------------------------------------------------------------------
| Largest  Minimum   (ns)  |       88,623  |         27,588  | -68.87% |
------------------------------------------------------------------------
| Smallest Minimum   (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Range of Minimums  (ns)  |       88,297  |         27,141  |         |
------------------------------------------------------------------------

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Wei Wang <wvw@google.com>
Tested-by: Kyle Lin <kylelin@google.com>
Tested-by: Chunwei Lu <chunweilu@google.com>
Tested-by: Lulu Wang <luluw@google.com>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-11 11:47:10 -07:00
Uladzislau Rezki
28b3ae4265 rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
Currently both expedited and regular grace period stall warnings use
a single timeout value that with units of seconds.  However, recent
Android use cases problem require a sub-100-millisecond expedited RCU CPU
stall warning.  Given that expedited RCU grace periods normally complete
in far less than a single millisecond, especially for small systems,
this is not unreasonable.

Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel
configuration that defaults to 20 msec on Android and remains the same
as that of the non-expedited stall warnings otherwise.  It also can be
changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout.

[ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ]

Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-11 11:38:50 -07:00
Mikulas Patocka
84bc4f1dbb dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.

This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-05-11 19:48:34 +02:00
Christoph Hellwig
92826e9675 dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages
When dma_direct_alloc_pages encounters a highmem page it just gives up
currently.  But what we really should do is to try memory using the
page allocator instead - without this platforms with a global highmem
CMA pool will fail all dma_alloc_pages allocations.

Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API")
Reported-by: Mark O'Neill <mao@tumblingdice.co.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-05-11 19:48:34 +02:00
Eric W. Biederman
b3f9916d81 sched: Update task_tick_numa to ignore tasks without an mm
Qian Cai <quic_qiancai@quicinc.com> wrote:
> Reverting the last 3 commits of the series fixed a boot crash.
>
> 1b2552cbdbe0 fork: Stop allowing kthreads to call execve
> 753550eb0ce1 fork: Explicitly set PF_KTHREAD
> 68d85f0a33b0 init: Deal with the init process being a user mode process
>
>  BUG: KASAN: null-ptr-deref in task_nr_scan_windows.isra.0
>  arch_atomic_long_read at ./include/linux/atomic/atomic-long.h:29
>  (inlined by) atomic_long_read at ./include/linux/atomic/atomic-instrumented.h:1266
>  (inlined by) get_mm_counter at ./include/linux/mm.h:1996
>  (inlined by) get_mm_rss at ./include/linux/mm.h:2049
>  (inlined by) task_nr_scan_windows at kernel/sched/fair.c:1123
>  Read of size 8 at addr 00000000000003d0 by task swapper/0/1

With the change to init and the user mode helper processes to not have
PF_KTHREAD set before they call kernel_execve the PF_KTHREAD test in
task_tick_numa became insufficient to detect all tasks that have
"->mm == NULL".  Correct that by testing for "->mm == NULL" directly.

Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Fixes: 1b2552cbdbe0 ("fork: Stop allowing kthreads to call execve")
Link: https://lkml.kernel.org/r/87r150ug1l.fsf_-_@email.froward.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11 12:41:48 -05:00
Pierre Gondois
c9d8923bfb PM: EM: Decrement policy counter
In commit e458716a92b57 ("PM: EM: Mark inefficiencies in CPUFreq"),
cpufreq_cpu_get() is called without a cpufreq_cpu_put(), permanently
increasing the reference counts of the policy struct.

Decrement the reference count once the policy struct is not used
anymore.

Fixes: e458716a92b57 ("PM: EM: Mark inefficiencies in CPUFreq")
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-11 19:15:14 +02:00
Hao Jia
734387ec2f sched/deadline: Remove superfluous rq clock update in push_dl_task()
The change to call update_rq_clock() before activate_task()
commit 840d719604b0 ("sched/deadline: Update rq_clock of later_rq
when pushing a task") is no longer needed since commit f4904815f97a
("sched/deadline: Fix double accounting of rq/running bw in push & pull")
removed the add_running_bw() before the activate_task().

So we remove some comments that are no longer needed and update
rq clock in activate_task().

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lore.kernel.org/r/20220430085843.62939-3-jiahao.os@bytedance.com
2022-05-11 16:27:12 +02:00
Hao Jia
2679a83731 sched/core: Avoid obvious double update_rq_clock warning
When we use raw_spin_rq_lock() to acquire the rq lock and have to
update the rq clock while holding the lock, the kernel may issue
a WARN_DOUBLE_CLOCK warning.

Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
rq_lock(), there is no corresponding change to rq->clock_update_flags.
In particular, we have obtained the rq lock of other CPUs, the
rq->clock_update_flags of this CPU may be RQCF_UPDATED at this time, and
then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.

So we need to clear RQCF_UPDATED of rq->clock_update_flags to avoid
the WARN_DOUBLE_CLOCK warning.

For the sched_rt_period_timer() and migrate_task_rq_dl() cases
we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
rq_lock()/rq_unlock().

For the {pull,push}_{rt,dl}_task() cases, we add the
double_rq_clock_clear_update() function to clear RQCF_UPDATED of
rq->clock_update_flags, and call double_rq_clock_clear_update()
before double_lock_balance()/double_rq_lock() returns to avoid the
WARN_DOUBLE_CLOCK warning.

Some call trace reports:
Call Trace 1:
 <IRQ>
 sched_rt_period_timer+0x10f/0x3a0
 ? enqueue_top_rt_rq+0x110/0x110
 __hrtimer_run_queues+0x1a9/0x490
 hrtimer_interrupt+0x10b/0x240
 __sysvec_apic_timer_interrupt+0x8a/0x250
 sysvec_apic_timer_interrupt+0x9a/0xd0
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20

Call Trace 2:
 <TASK>
 activate_task+0x8b/0x110
 push_rt_task.part.108+0x241/0x2c0
 push_rt_tasks+0x15/0x30
 finish_task_switch+0xaa/0x2e0
 ? __switch_to+0x134/0x420
 __schedule+0x343/0x8e0
 ? hrtimer_start_range_ns+0x101/0x340
 schedule+0x4e/0xb0
 do_nanosleep+0x8e/0x160
 hrtimer_nanosleep+0x89/0x120
 ? hrtimer_init_sleeper+0x90/0x90
 __x64_sys_nanosleep+0x96/0xd0
 do_syscall_64+0x34/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 3:
 <TASK>
 deactivate_task+0x93/0xe0
 pull_rt_task+0x33e/0x400
 balance_rt+0x7e/0x90
 __schedule+0x62f/0x8e0
 do_task_dead+0x3f/0x50
 do_exit+0x7b8/0xbb0
 do_group_exit+0x2d/0x90
 get_signal+0x9df/0x9e0
 ? preempt_count_add+0x56/0xa0
 ? __remove_hrtimer+0x35/0x70
 arch_do_signal_or_restart+0x36/0x720
 ? nanosleep_copyout+0x39/0x50
 ? do_nanosleep+0x131/0x160
 ? audit_filter_inodes+0xf5/0x120
 exit_to_user_mode_prepare+0x10f/0x1e0
 syscall_exit_to_user_mode+0x17/0x30
 do_syscall_64+0x40/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 4:
 update_rq_clock+0x128/0x1a0
 migrate_task_rq_dl+0xec/0x310
 set_task_cpu+0x84/0x1e4
 try_to_wake_up+0x1d8/0x5c0
 wake_up_process+0x1c/0x30
 hrtimer_wakeup+0x24/0x3c
 __hrtimer_run_queues+0x114/0x270
 hrtimer_interrupt+0xe8/0x244
 arch_timer_handler_phys+0x30/0x50
 handle_percpu_devid_irq+0x88/0x140
 generic_handle_domain_irq+0x40/0x60
 gic_handle_irq+0x48/0xe0
 call_on_irq_stack+0x2c/0x60
 do_interrupt_handler+0x80/0x84

Steps to reproduce:
1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
   echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
   echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
3. Run some rt/dl tasks that periodically work and sleep, e.g.
Create 2*n rt or dl (90% running) tasks via rt-app (on a system
with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
on PREEMPT_RT kernel.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20220430085843.62939-2-jiahao.os@bytedance.com
2022-05-11 16:27:11 +02:00
Peter Zijlstra
47319846a9 Linux 5.18-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmJu9FYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGAyEH/16xtJSpLmLwrQzG
 o+4ToQxSQ+/9UHyu0RTEvHg2THm9/8emtIuYyc/5FgdoWctcSa3AaDcveWmuWmkS
 KYcdhfJsaEqjNHS3OPYXN84fmo9Hel7263shu5+IYmP/sN0DfQp6UWTryX1q4B3Q
 4Pdutkuq63Uwd8nBZ5LXQBumaBrmkkuMgWEdT4+6FOo1mPzwdIGBxCuz1UsNNl5k
 chLWxkQfe2eqgWbYJrgCQfrVdORXVtoU2fGilZUNrHRVGkkldXkkz5clJfapyZD3
 odmZCEbrE4GPKgZwCmDERMfD1hzhZDtYKiHfOQ506szH5ykJjPBcOjHed7dA60eB
 J3+wdek=
 =39Ca
 -----END PGP SIGNATURE-----

Merge branch 'v5.18-rc5'

Obtain the new INTEL_FAM6 stuff required.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-05-11 16:27:06 +02:00
Waiman Long
434e09e757 locking/qrwlock: Change "queue rwlock" to "queued rwlock"
Queued rwlock was originally named "queue rwlock" which wasn't quite
grammatically correct. However there are still some "queue rwlock"
references in the code. Change those to "queued rwlock" for consistency.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220510192134.434753-1-longman@redhat.com
2022-05-11 16:27:04 +02:00
Kui-Feng Lee
2fcc82411e bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm.
Pass a cookie along with BPF_LINK_CREATE requests.

Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie.
The cookie of a bpf_tracing_link is available by calling
bpf_get_attach_cookie when running the BPF program of the attached
link.

The value of a cookie will be set at bpf_tramp_run_ctx by the
trampoline of the link.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-4-kuifeng@fb.com
2022-05-10 21:58:31 -07:00
Kui-Feng Lee
e384c7b7b4 bpf, x86: Create bpf_tramp_run_ctx on the caller thread's stack
BPF trampolines will create a bpf_tramp_run_ctx, a bpf_run_ctx, on
stacks and set/reset the current bpf_run_ctx before/after calling a
bpf_prog.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-3-kuifeng@fb.com
2022-05-10 17:50:51 -07:00
Kui-Feng Lee
f7e0beaf39 bpf, x86: Generate trampolines from bpf_tramp_links
Replace struct bpf_tramp_progs with struct bpf_tramp_links to collect
struct bpf_tramp_link(s) for a trampoline.  struct bpf_tramp_link
extends bpf_link to act as a linked list node.

arch_prepare_bpf_trampoline() accepts a struct bpf_tramp_links to
collects all bpf_tramp_link(s) that a trampoline should call.

Change BPF trampoline and bpf_struct_ops to pass bpf_tramp_links
instead of bpf_tramp_progs.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220510205923.3206889-2-kuifeng@fb.com
2022-05-10 17:50:40 -07:00
Lukas Wunner
792ea6a074 genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq()
Since commit 0953fb263714 ("irq: remove handle_domain_{irq,nmi}()"),
generic_handle_domain_irq() warns if called outside hardirq context, even
though the function calls down to handle_irq_desc(), which warns about the
same, but conditionally on handle_enforce_irqctx().

The newly added warning is a false positive if the interrupt originates
from any other irqchip than x86 APIC or ARM GIC/GICv3.  Those are the only
ones for which handle_enforce_irqctx() returns true.  Per commit
c16816acd086 ("genirq: Add protection against unsafe usage of
generic_handle_irq()"):

 "In general calling generic_handle_irq() with interrupts disabled from non
  interrupt context is harmless. For some interrupt controllers like the
  x86 trainwrecks this is outright dangerous as it might corrupt state if
  an interrupt affinity change is pending."

Examples for interrupt chips where the warning is a false positive are
USB-attached GPIO controllers such as drivers/gpio/gpio-dln2.c:

  USB gadgets are incapable of directly signaling an interrupt because they
  cannot initiate a bus transaction by themselves.  All communication on
  the bus is initiated by the host controller, which polls a gadget's
  Interrupt Endpoint in regular intervals.  If an interrupt is pending,
  that information is passed up the stack in softirq context, from which a
  hardirq is synthesized via generic_handle_domain_irq().

Remove the warning to eliminate such false positives.

Fixes: 0953fb263714 ("irq: remove handle_domain_{irq,nmi}()")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jakub Kicinski <kuba@kernel.org>
CC: Linus Walleij <linus.walleij@linaro.org>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Octavian Purdila <octavian.purdila@nxp.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220505113207.487861b2@kernel.org
Link: https://lore.kernel.org/r/20220506203242.GA1855@wunner.de
Link: https://lore.kernel.org/r/c3caf60bfa78e5fdbdf483096b7174da65d1813a.1652168866.git.lukas@wunner.de
2022-05-11 02:22:52 +02:00
Jiri Olsa
0236fec57a bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link
Using kallsyms_lookup_names function to speed up symbols lookup in
kprobe multi link attachment and replacing with it the current
kprobe_multi_resolve_syms function.

This speeds up bpftrace kprobe attachment:

  # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* {  } i:ms:1 { exit(); }'
  ...
  6.5681 +- 0.0225 seconds time elapsed  ( +-  0.34% )

After:

  # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* {  } i:ms:1 { exit(); }'
  ...
  0.5661 +- 0.0275 seconds time elapsed  ( +-  4.85% )

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220510122616.2652285-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-10 14:42:06 -07:00
Jiri Olsa
8be9253344 fprobe: Resolve symbols with ftrace_lookup_symbols
Using ftrace_lookup_symbols to speed up symbols lookup
in register_fprobe_syms API.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220510122616.2652285-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-10 14:42:06 -07:00
Jiri Olsa
bed0d9a50d ftrace: Add ftrace_lookup_symbols function
Adding ftrace_lookup_symbols function that resolves array of symbols
with single pass over kallsyms.

The user provides array of string pointers with count and pointer to
allocated array for resolved values.

  int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt,
                            unsigned long *addrs)

It iterates all kallsyms symbols and tries to loop up each in provided
symbols array with bsearch. The symbols array needs to be sorted by
name for this reason.

We also check each symbol to pass ftrace_location, because this API
will be used for fprobe symbols resolving.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220510122616.2652285-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-10 14:42:06 -07:00
Jiri Olsa
d721def739 kallsyms: Make kallsyms_on_each_symbol generally available
Making kallsyms_on_each_symbol generally available, so it can be
used outside CONFIG_LIVEPATCH option in following changes.

Rather than adding another ifdef option let's make the function
generally available (when CONFIG_KALLSYMS option is defined).

Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220510122616.2652285-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-10 14:42:06 -07:00
Dmitrii Dolgov
9f88361273 bpf: Add bpf_link iterator
Implement bpf_link iterator to traverse links via bpf_seq_file
operations. The changeset is mostly shamelessly copied from
commit a228a64fc1e4 ("bpf: Add bpf_prog iterator")

Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220510155233.9815-2-9erthalion6@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-10 11:20:45 -07:00
Takshak Chahande
9263dddc7b bpf: Extend batch operations for map-in-map bpf-maps
This patch extends batch operations support for map-in-map map-types:
BPF_MAP_TYPE_HASH_OF_MAPS and BPF_MAP_TYPE_ARRAY_OF_MAPS

A usecase where outer HASH map holds hundred of VIP entries and its
associated reuse-ports per VIP stored in REUSEPORT_SOCKARRAY type
inner map, needs to do batch operation for performance gain.

This patch leverages the exiting generic functions for most of the batch
operations. As map-in-map's value contains the actual reference of the inner map,
for BPF_MAP_TYPE_HASH_OF_MAPS type, it needed an extra step to fetch the
map_id from the reference value.

selftests are added in next patch 2/2.

Signed-off-by: Takshak Chahande <ctakshak@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220510082221.2390540-1-ctakshak@fb.com
2022-05-10 10:34:57 -07:00
David Hildenbrand
40f2bbf711 mm/rmap: drop "compound" parameter from page_add_new_anon_rmap()
New anonymous pages are always mapped natively: only THP/khugepaged code
maps a new compound anonymous page and passes "true".  Otherwise, we're
just dealing with simple, non-compound pages.

Let's give the interface clearer semantics and document these.  Remove the
PageTransCompound() sanity check from page_add_new_anon_rmap().

Link: https://lkml.kernel.org/r/20220428083441.37290-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Liang Zhang <zhangliang5@huawei.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09 18:20:43 -07:00
Yuntao Wang
43bf087848 bpf: Remove unused parameter from find_kfunc_desc_btf()
The func_id parameter in find_kfunc_desc_btf() is not used, get rid of it.

Fixes: 2357672c54c3 ("bpf: Introduce BPF support for kernel module function calls")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20220505070114.3522522-1-ytcoode@gmail.com
2022-05-09 17:45:21 -07:00
YueHaibing
494dcdf46e sched: Fix build warning without CONFIG_SYSCTL
IF CONFIG_SYSCTL is n, build warn:

kernel/sched/core.c:1782:12: warning: ‘sysctl_sched_uclamp_handler’ defined but not used [-Wunused-function]
 static int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~

sysctl_sched_uclamp_handler() is used while CONFIG_SYSCTL enabled,
wrap all related code with CONFIG_SYSCTL to fix this.

Fixes: 3267e0156c33 ("sched: Move uclamp_util sysctls to core.c")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-05-09 16:54:57 -07:00