37242 Commits

Author SHA1 Message Date
Andrii Nakryiko
b89fbfbb85 bpf: Implement minimal BPF perf link
Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.

BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.

Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.

One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.

The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org
2021-08-17 00:45:07 +02:00
Andrii Nakryiko
652c1b17b8 bpf: Refactor perf_event_set_bpf_prog() to use struct bpf_prog input
Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
input argument, which makes it easier to re-use for other internal uses
(coming up for BPF link in the next patch). BPF program FD is not as
convenient and in some cases it's not available. So switch to struct bpf_prog,
move out refcounting outside and let caller do bpf_prog_put() in case of an
error. This follows the approach of most of the other BPF internal functions.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-4-andrii@kernel.org
2021-08-17 00:45:07 +02:00
Andrii Nakryiko
7d08c2c911 bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions
Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
with all the same readability and maintainability benefits. Making them into
functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
functions. Also, explicitly specifying the type of the BPF prog run callback
required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
internally to const struct sk_buff.

Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
the differences in bpf_run_ctx used for those two different use cases.

I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
everyone.

The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
pass-through user-provided context value in the next patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-3-andrii@kernel.org
2021-08-17 00:45:07 +02:00
Andrii Nakryiko
fb7dd8bca0 bpf: Refactor BPF_PROG_RUN into a function
Turn BPF_PROG_RUN into a proper always inlined function. No functional and
performance changes are intended, but it makes it much easier to understand
what's going on with how BPF programs are actually get executed. It's more
obvious what types and callbacks are expected. Also extra () around input
parameters can be dropped, as well as `__` variable prefixes intended to avoid
naming collisions, which makes the code simpler to read and write.

This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
BPF_PROG_RUN into a function causes naming conflict compilation error. So
rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
2021-08-17 00:45:07 +02:00
Valentin Schneider
15538a2057 notifier: Remove atomic_notifier_call_chain_robust()
This now has no more users, remove it.

Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-16 18:55:32 +02:00
Valentin Schneider
b2f6662ac0 PM: cpu: Make notifier chain use a raw_spinlock_t
Invoking atomic_notifier_chain_notify() requires acquiring a spinlock_t,
which can block under CONFIG_PREEMPT_RT. Notifications for members of the
cpu_pm notification chain will be issued by the idle task, which can never
block.

Making *all* atomic_notifiers use a raw_spinlock is too big of a hammer, as
only notifications issued by the idle task are problematic.

Special-case cpu_pm_notifier_chain by kludging a raw_notifier and
raw_spinlock_t together, matching the atomic_notifier behavior with a
raw_spinlock_t.

Fixes: 70d932985757 ("notifier: Fix broken error handling pattern")
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-16 18:55:32 +02:00
Randy Dunlap
dbcfa7156f PM: sleep: unmark 'state' functions as kernel-doc
Fix kernel-doc warnings in kernel/power/main.c by unmarking the
comment block as kernel-doc notation. This eliminates the following
kernel-doc warnings:

kernel/power/main.c:593: warning: expecting prototype for state(). Prototype was for state_show() instead
kernel/power/main.c:593: warning: Function parameter or member 'kobj' not described in 'state_show'
kernel/power/main.c:593: warning: Function parameter or member 'attr' not described in 'state_show'
kernel/power/main.c:593: warning: Function parameter or member 'buf' not described in 'state_show'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-16 18:49:39 +02:00
Linus Torvalds
b88bcc7d54 Fixes and clean ups to tracing:
- Fix header alignment when PREEMPT_RT is enabled for osnoise tracer
 
 - Inject "stop" event to see where osnoise stopped the trace
 
 - Define DYNAMIC_FTRACE_WITH_ARGS as some code had an #ifdef for it
 
 - Fix erroneous message for bootconfig cmdline parameter
 
 - Fix crash caused by not found variable in histograms
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYRVkURQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qmfbAPkBTlhuanWlsoOXnQA+SYHih9Y4NHsU
 QgkkRfVfqnY+XwD/aQ1Ze0O1xufZAA8rq0qOLIEssgZe4xXjjkDAf7ABaw4=
 =HIBc
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.14-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Fixes and clean ups to tracing:

   - Fix header alignment when PREEMPT_RT is enabled for osnoise tracer

   - Inject "stop" event to see where osnoise stopped the trace

   - Define DYNAMIC_FTRACE_WITH_ARGS as some code had an #ifdef for it

   - Fix erroneous message for bootconfig cmdline parameter

   - Fix crash caused by not found variable in histograms"

* tag 'trace-v5.14-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name
  init: Suppress wrong warning for bootconfig cmdline parameter
  tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS
  trace/osnoise: Print a stop tracing message
  trace/timerlat: Add a header with PREEMPT_RT additional fields
  trace/osnoise: Add a header with PREEMPT_RT additional fields
2021-08-16 06:31:06 -10:00
zhaoxiao
bd74095389 tracepoint: Fix kerneldoc comments
Fix function name in tracepoint.c kernel-doc comment
to remove a warning found by clang_w1.

kernel/tracepoint.c:589: warning: expecting prototype for register_tracepoint_notifier(). Prototype was for register_tracepoint_module_notifier() instead
kernel/tracepoint.c:613: warning: expecting prototype for unregister_tracepoint_notifier(). Prototype was for unregister_tracepoint_module_notifier() instead

Link: https://lkml.kernel.org/r/20210816052430.16539-1-zhaoxiao@uniontech.com

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: zhaoxiao <zhaoxiao@uniontech.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:39:51 -04:00
Masami Hiramatsu
64dc7f6958 tracing/boot: Show correct histogram error command
Since trigger_process_regex() modifies given trigger actions
while parsing, the error message couldn't show what command
was passed to the trigger_process_regex() when it returns
an error.

To fix that, show the backed up trigger action command
instead of parsed buffer.

Link: https://lkml.kernel.org/r/162856126413.203126.9465564928450701424.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:21 -04:00
Masami Hiramatsu
17abd7c36c tracing/boot: Support multiple histograms for each event
Add multiple histograms support for each event. This allows
user to set multiple histograms to an event.

ftrace.[instance.INSTANCE.]event.GROUP.EVENT.hist[.N] {
...
}

The 'N' is a digit started string and it can be omitted
for the default histogram.

For example, multiple hist triggers example in the
Documentation/trace/histogram.rst can be written as below;

ftrace.event.net.netif_receive_skb.hist {
	1 {
		keys = skbaddr.hex
		values = len
		filter = len < 0
	}
	2 {
		keys = skbaddr.hex
		values = len
		filter = len > 4096
	}
	3 {
		keys = skbaddr.hex
		values = len
		filter = len == 256
	}
	4 {
		keys = skbaddr.hex
		values = len
	}
	5 {
		keys = len
		values = common_preempt_count
	}
}

Link: https://lkml.kernel.org/r/162856125628.203126.15846930277378572120.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:21 -04:00
Masami Hiramatsu
8993665abc tracing/boot: Support multiple handlers for per-event histogram
Support multiple handlers for per-event histogram in boot-time tracing.
Since the histogram can register multiple same handler-actions with
different parameters, this expands the syntax to support such cases.

With this update, the 'onmax', 'onchange' and 'onmatch' handler subkeys
under per-event histogram option will take a number subkeys optionally
as below. (see [.N])

ftrace.[instance.INSTANCE.]event.GROUP.EVENT.hist {
     onmax|onchange[.N] { var = <VAR>; <ACTION> [= <PARAM>] }
     onmatch[.N] { event = <EVENT>; <ACTION> [= <PARAM>] }
}

The 'N' must be a digit (or digit started word).

Thus user can add several handler-actions to the histogram,
for example,

ftrace.event.SOMEGROUP.SOMEEVENT.hist {
   keys = SOME_ID; lat = common_timestamp.usecs-$ts0
   onmatch.1 {
	event = GROUP1.STARTEVENT1
	trace = latency_event, SOME_ID, $lat
   }
   onmatch.2 {
	event = GROUP2.STARTEVENT2
	trace = latency_event, SOME_ID, $lat
   }
}

Then, it can trace the elapsed time from GROUP1.STARTEVENT1 to
SOMEGROUP.SOMEEVENT, and from GROUP2.STARTEVENT2 to
SOMEGROUP.SOMEEVENT with SOME_ID key.

Link: https://lkml.kernel.org/r/162856124905.203126.14913731908137885922.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:21 -04:00
Masami Hiramatsu
e66ed86ca6 tracing/boot: Add per-event histogram action options
Add a hist-trigger action syntax support to boot-time tracing.
Currently, boot-time tracing supports per-event actions as option
strings. However, for the histogram action, it has a special syntax
and usually needs a long action definition.
To make it readable and fit to the bootconfig syntax, this introduces
a new options for histogram.

Here are the histogram action options for boot-time tracing.

ftrace.[instance.INSTANCE.]event.GROUP.EVENT.hist {
     keys = <KEY>[,...]
     values = <VAL>[,...]
     sort = <SORT-KEY>[,...]
     size = <ENTRIES>
     name = <HISTNAME>
     var { <VAR> = <EXPR> ... }
     pause|continue|clear
     onmax|onchange { var = <VAR>; <ACTION> [= <PARAM>] }
     onmatch { event = <EVENT>; <ACTION> [= <PARAM>] }
     filter = <FILTER>
}

Where <ACTION> is one of below;

     trace = <EVENT>, <ARG1>[, ...]
     save = <ARG1>[, ...]
     snapshot

Link: https://lkml.kernel.org/r/162856124106.203126.10501871028479029087.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:21 -04:00
Masahiro Yamada
de32951b29 tracing: Simplify the Kconfig dependency of FTRACE
The entire FTRACE block is surrounded by 'if TRACING_SUPPORT' ...
'endif'.

Using 'depends on' is a simpler way to guard FTRACE.

Link: https://lkml.kernel.org/r/20210731052233.4703-1-masahiroy@kernel.org

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:20 -04:00
Steven Rostedt (VMware)
ed2cf90735 tracing: Allow execnames to be passed as args for synthetic events
Allow common_pid.execname to be saved in a variable in one histogram to be
passed to another histogram that can pass it as a parameter to a synthetic
event.

 ># echo 'hist:keys=pid:__arg__1=common_timestamp.usecs:arg2=common_pid.execname' \
       > events/sched/sched_waking/trigger
 ># echo 'wakeup_lat s32 pid; u64 delta; char wake_comm[]' > synthetic_events
 ># echo 'hist:keys=next_pid:pid=next_pid,delta=common_timestamp.usecs-$__arg__1,exec=$arg2'\
':onmatch(sched.sched_waking).trace(wakeup_lat,$pid,$delta,$exec)' \
 > events/sched/sched_switch/trigger

The above is a wake up latency synthetic event setup that passes the execname
of the common_pid that woke the task to the scheduling of that task, which
triggers a synthetic event that passes the original execname as a
parameter to display it.

 ># echo 1 > events/synthetic/enable
 ># cat trace
    <idle>-0       [006] d..4   186.863801: wakeup_lat: pid=1306 delta=65 wake_comm=kworker/u16:3
    <idle>-0       [000] d..4   186.863858: wakeup_lat: pid=163 delta=27 wake_comm=<idle>
    <idle>-0       [001] d..4   186.863903: wakeup_lat: pid=1307 delta=36 wake_comm=kworker/u16:4
    <idle>-0       [000] d..4   186.863927: wakeup_lat: pid=163 delta=5 wake_comm=<idle>
    <idle>-0       [006] d..4   186.863957: wakeup_lat: pid=1306 delta=24 wake_comm=kworker/u16:3
      sshd-1306    [006] d..4   186.864051: wakeup_lat: pid=61 delta=62 wake_comm=<idle>
    <idle>-0       [000] d..4   186.965030: wakeup_lat: pid=609 delta=18 wake_comm=<idle>
    <idle>-0       [006] d..4   186.987582: wakeup_lat: pid=1306 delta=65 wake_comm=kworker/u16:3
    <idle>-0       [000] d..4   186.987639: wakeup_lat: pid=163 delta=27 wake_comm=<idle>

Link: https://lkml.kernel.org/r/20210722142837.458596338@goodmis.org

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:20 -04:00
Steven Rostedt (VMware)
3347d80baa tracing: Have histogram types be constant when possible
Instead of kstrdup("const", GFP_KERNEL), have the hist_field type simply
assign the constant hist_field->type = "const"; And when the value passed
to it is a variable, use "kstrdup_const(var, GFP_KERNEL);" which will just
copy the value if the variable is already a constant. This saves on having
to allocate when not needed.

All frees of the hist_field->type will need to use kfree_const().

Link: https://lkml.kernel.org/r/20210722142837.280718447@goodmis.org

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:20 -04:00
Steven Rostedt (VMware)
3703643519 tracing/histogram: Update the documentation for the buckets modifier
Update both the tracefs README file as well as the histogram.rst to
include an explanation of what the buckets modifier is and how to use it.
Include an example with the wakeup_latency example for both log2 and the
buckets modifiers as there was no existing log2 example.

Link: https://lkml.kernel.org/r/20210707213922.167218794@goodmis.org

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:20 -04:00
Steven Rostedt (VMware)
de9a48a360 tracing: Add linear buckets to histogram logic
There's been several times I wished the histogram logic had a "grouping"
feature for the buckets. Currently, each bucket has a size of one. That
is, if you trace the amount of requested allocations, each allocation is
its own bucket, even if you are interested in what allocates 100 bytes or
less, 100 to 200, 200 to 300, etc.

Also, without grouping, it fills up the allocated histogram buckets
quickly. If you are tracking latency, and don't care if something is 200
microseconds off, or 201 microseconds off, but want to track them by say
10 microseconds each. This can not currently be done.

There is a log2 but that grouping get's too big too fast for a lot of
cases.

Introduce a "buckets=SIZE" command to each field where it will record in a
rounded number. For example:

 ># echo 'hist:keys=bytes_req.buckets=100:sort=bytes_req' > events/kmem/kmalloc/trigger
 ># cat events/kmem/kmalloc/hist
 # event histogram
 #
 # trigger info:
 hist:keys=bytes_req.buckets=100:vals=hitcount:sort=bytes_req.buckets=100:size=2048
 [active]
 #

 { bytes_req: ~ 0-99 } hitcount:       3149
 { bytes_req: ~ 100-199 } hitcount:       1468
 { bytes_req: ~ 200-299 } hitcount:         39
 { bytes_req: ~ 300-399 } hitcount:        306
 { bytes_req: ~ 400-499 } hitcount:        364
 { bytes_req: ~ 500-599 } hitcount:         32
 { bytes_req: ~ 600-699 } hitcount:         69
 { bytes_req: ~ 700-799 } hitcount:         37
 { bytes_req: ~ 1200-1299 } hitcount:         16
 { bytes_req: ~ 1400-1499 } hitcount:         30
 { bytes_req: ~ 2000-2099 } hitcount:          6
 { bytes_req: ~ 4000-4099 } hitcount:       2168
 { bytes_req: ~ 5000-5099 } hitcount:          6

 Totals:
     Hits: 7690
     Entries: 13
     Dropped: 0

Link: https://lkml.kernel.org/r/20210707213921.980359719@goodmis.org

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:20 -04:00
Masami Hiramatsu
6fe7c745f2 tracing/boot: Fix a hist trigger dependency for boot time tracing
Fixes a build error when CONFIG_HIST_TRIGGERS=n with boot-time
tracing. Since the trigger_process_regex() is defined only
when CONFIG_HIST_TRIGGERS=y, if it is disabled, the 'actions'
event option also must be disabled.

Link: https://lkml.kernel.org/r/162856123376.203126.582144262622247352.stgit@devnote2

Fixes: 81a59555ff15 ("tracing/boot: Add per-event settings")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:37:20 -04:00
Pingfan Liu
6c34df6f35 tracing: Apply trace filters on all output channels
The event filters are not applied on all of the output, which results in
the flood of printk when using tp_printk. Unfolding
event_trigger_unlock_commit_regs() into trace_event_buffer_commit(), so
the filters can be applied on every output.

Link: https://lkml.kernel.org/r/20210814034538.8428-1-kernelfans@gmail.com

Cc: stable@vger.kernel.org
Fixes: 0daa2302968c1 ("tracing: Add tp_printk cmdline to have tracepoints go to printk()")
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16 11:01:52 -04:00
Sagi Grimberg
2a14c9ae15 params: lift param_set_uint_minmax to common code
It is a useful helper hence move it to common code so others can enjoy
it.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-08-16 14:42:22 +02:00
Linus Torvalds
c4f14eac22 A set of fixes for PCI/MSI and x86 interrupt startup:
- Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
    entries stay around e.g. when a crashkernel is booted.
 
  - Enforce masking of a MSI-X table entry when updating it, which mandatory
    according to speification
 
  - Ensure that writes to MSI[-X} tables are flushed.
 
  - Prevent invalid bits being set in the MSI mask register
 
  - Properly serialize modifications to the mask cache and the mask register
    for multi-MSI.
 
  - Cure the violation of the affinity setting rules on X86 during interrupt
    startup which can cause lost and stale interrupts. Move the initial
    affinity setting ahead of actualy enabling the interrupt.
 
  - Ensure that MSI interrupts are completely torn down before freeing them
    in the error handling case.
 
  - Prevent an array out of bounds access in the irq timings code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEY5bcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaMvD/9KeK2430f4h/x/fQhHmHHIJOv3kqmB
 gRXX4RV+N/DfU9GSflbzPxY9l2SkydgpQjHeGnqpV7DYRIu84nVYAuWcWtPimHHy
 JxapniLlQv2GS+SIy9f1mmChH6VUPS05brHxKSqAQZvQIoZqza8vF3umZlV7eYF4
 uZFd86TCbDFsBxbsKmyV1FtQLo008EeEp8dtZ/1cZ9Fbp0M/mQkuu7aTNqY0qWwZ
 rAoGyE4PjDR+yf87XjE5z7hMs2vfUjiGXg7Kbp30NPKGcRyasb+SlHVKcvZKJIji
 Y0Bk/SOyqoj1Co3U+cEaWolB1MeGff4nP+Xx8xvyNklKxxs1+92Z7L1RElXIc0cL
 kmUehUSf5JuJ83B6ucAYbmnXKNw1XB00PaMy7iSxsYekTXJx+t0b+Rt6o0R3inWB
 xUWbIVmoL2uF1oOAb6mEc3wDNMBVkY33e9l2jD0PUPxKXZ730MVeojWJ8FGFiPOT
 9+aCRLjZHV5slVQAgLnlpcrseJLuUei6HLVwRXxv19Bz5L+HuAXUxWL9h74SRuE9
 14kH63aXSVDlcYyW7c3t8Lh6QjKAf7AIz0iG+u3n09IWyURd4agHuKOl5itileZB
 BK9NuRrNgmr2nEKG461Suc6GojLBXc1ih3ak+MG+O4iaLxnhapTjW3Weqr+OVXr+
 SrIjoxjpEk2ECA==
 =yf3u
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for PCI/MSI and x86 interrupt startup:

   - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
     entries stay around e.g. when a crashkernel is booted.

   - Enforce masking of a MSI-X table entry when updating it, which
     mandatory according to speification

   - Ensure that writes to MSI[-X} tables are flushed.

   - Prevent invalid bits being set in the MSI mask register

   - Properly serialize modifications to the mask cache and the mask
     register for multi-MSI.

   - Cure the violation of the affinity setting rules on X86 during
     interrupt startup which can cause lost and stale interrupts. Move
     the initial affinity setting ahead of actualy enabling the
     interrupt.

   - Ensure that MSI interrupts are completely torn down before freeing
     them in the error handling case.

   - Prevent an array out of bounds access in the irq timings code"

* tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  driver core: Add missing kernel doc for device::msi_lock
  genirq/msi: Ensure deactivation on teardown
  genirq/timings: Prevent potential array overflow in __irq_timings_store()
  x86/msi: Force affinity setup before startup
  x86/ioapic: Force affinity setup before startup
  genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
  PCI/MSI: Protect msi_desc::masked for multi-MSI
  PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
  PCI/MSI: Correct misleading comments
  PCI/MSI: Do not set invalid bits in MSI mask
  PCI/MSI: Enforce MSI[X] entry updates to be visible
  PCI/MSI: Enforce that MSI-X table entry is masked for update
  PCI/MSI: Mask all unused MSI-X entries
  PCI/MSI: Enable and mask MSI-X early
2021-08-15 06:49:40 -10:00
Linus Torvalds
839da25385 - Fix a CONFIG symbol's spelling
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmEYzdcACgkQEsHwGGHe
 VUoeOhAAltobYZmv2XJdqVOcvCBSwgd4M5ILD7ZFoztw6ZyhvAr8Jzm9P4owW8a8
 r83kEkTIEGnjFK0X0RhakIZcnLJK9wQD74DxHlrcBMCUsnYHtLRvKDJS5niNN2Pz
 7d82NO7mVxPkRoTsG2gzn6xD0dkNGSJU6BhJjo8LdvtpOqYDt3jLPk6kGDxkjO9/
 8QaknaDHz1dYCoRt6YynH9lIH7Vzffjnt3MYPpm0pEtQIVSTK4FwaXkLctwHgXlk
 ZOEMI4qyu9z1Zid3V6pRoKQBCpI0d5bkiqXBaGw7aW2vdnE4LU0UF9RdTHJZwryw
 oRe587wfCln43+yGnath2XR93tkl0vKn0eu7FNyTmEHXduP8/+i9dIhsTiWp4JKL
 TRWFcOy676g+Qq3P8l0gkTJGPTcS91LWMfkx/FsbYBiecNBcMEX6V8FffsM7SF8A
 M558SRjZ/kMGXOdDe+Dtr8Vz7RXgFpye93nXZ9ZOieSeH5DWmTWsSfaGJi3pLJYZ
 utSRivpGDxleNkPMihPkg/aa1D3MsVFwnr7+SwwKNhUTtAzanUuyQrjXsEjuUnIX
 /sXMWrV+N8yk5JT6+GMmvLpeG6kspkn8UuQ+GVz2/8x88HLdIR3q+RATX8i9pQtI
 L0t1NWUFlXt3rb5xolZukG6h5dBTa1AXHJ5AR82tEMBGg2JgFss=
 =y2ev
 -----END PGP SIGNATURE-----

Merge tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:

 - Fix a CONFIG symbol's spelling

* tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Use the correct rtmutex debugging config option
2021-08-15 06:46:04 -10:00
Kuniyuki Iwashima
3478cfcfcd bpf: Support "%c" in bpf_bprintf_prepare().
/proc/net/unix uses "%c" to print a single-byte character to escape '\0' in
the name of the abstract UNIX domain socket.  The following selftest uses
it, so this patch adds support for "%c".  Note that it does not support
wide character ("%lc" and "%llc") for simplicity.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210814015718.42704-3-kuniyu@amazon.co.jp
2021-08-15 00:13:33 -07:00
Christoph Hellwig
2a047e0662 dma-mapping: return an unsigned int from dma_map_sg{,_attrs}
These can only return 0 for failure or the number of entries, so turn
the return value into an unsigned int.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2021-08-14 09:18:36 +02:00
Stanislav Fomichev
f1248dee95 bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_CGROUP_SOCKOPT
This is similar to existing BPF_PROG_TYPE_CGROUP_SOCK
and BPF_PROG_TYPE_CGROUP_SOCK_ADDR.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210813230530.333779-2-sdf@google.com
2021-08-13 17:50:40 -07:00
Ilya Leoshkevich
45c709f8c7 bpf: Clear zext_dst of dead insns
"access skb fields ok" verifier test fails on s390 with the "verifier
bug. zext_dst is set, but no reg is defined" message. The first insns
of the test prog are ...

   0:	61 01 00 00 00 00 00 00 	ldxw %r0,[%r1+0]
   8:	35 00 00 01 00 00 00 00 	jge %r0,0,1
  10:	61 01 00 08 00 00 00 00 	ldxw %r0,[%r1+8]

... and the 3rd one is dead (this does not look intentional to me, but
this is a separate topic).

sanitize_dead_code() converts dead insns into "ja -1", but keeps
zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such
an insn, it sees this discrepancy and bails. This problem can be seen
only with JITs whose bpf_jit_needs_zext() returns true.

Fix by clearning dead insns' zext_dst.

The commits that contributed to this problem are:

1. 5aa5bd14c5f8 ("bpf: add initial suite for selftests"), which
   introduced the test with the dead code.
2. 5327ed3d44b7 ("bpf: verifier: mark verified-insn with
   sub-register zext flag"), which introduced the zext_dst flag.
3. 83a2881903f3 ("bpf: Account for BPF_FETCH in
   insn_has_def32()"), which introduced the sanity check.
4. 9183671af6db ("bpf: Fix leakage under speculation on
   mispredicted branches"), which bisect points to.

It's best to fix this on stable branches that contain the second one,
since that's the point where the inconsistency was introduced.

Fixes: 5327ed3d44b7 ("bpf: verifier: mark verified-insn with sub-register zext flag")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com
2021-08-13 17:43:43 +02:00
Jakub Kicinski
f4083a752a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda659d ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 06:41:22 -07:00
Thomas Gleixner
04c2721d35 genirq: Fix kernel doc indentation
Fixes: 61377ec14457 ("genirq: Clarify documentation for request_threaded_irq()")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2021-08-13 12:45:13 +02:00
Linus Torvalds
f8e6dfc64f Networking fixes for 5.14-rc6, including fixes from netfilter, bpf,
can and ieee802154.
 
 Current release - regressions:
 
  - r8169: fix ASPM-related link-up regressions
 
  - bridge: fix flags interpretation for extern learn fdb entries
 
  - phy: micrel: fix link detection on ksz87xx switch
 
  - Revert "tipc: Return the correct errno code"
 
  - ptp: fix possible memory leak caused by invalid cast
 
 Current release - new code bugs:
 
  - bpf: add missing bpf_read_[un]lock_trace() for syscall program
 
  - bpf: fix potentially incorrect results with bpf_get_local_storage()
 
  - page_pool: mask the page->signature before the checking, avoid
       dma mapping leaks
 
  - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps
 
  - bnxt_en: fix firmware interface issues with PTP
 
  - mlx5: Bridge, fix ageing time
 
 Previous releases - regressions:
 
  - linkwatch: fix failure to restore device state across suspend/resume
 
  - bareudp: fix invalid read beyond skb's linear data
 
 Previous releases - always broken:
 
  - bpf: fix integer overflow involving bucket_size
 
  - ppp: fix issues when desired interface name is specified via netlink
 
  - wwan: mhi_wwan_ctrl: fix possible deadlock
 
  - dsa: microchip: ksz8795: fix number of VLAN related bugs
 
  - dsa: drivers: fix broken backpressure in .port_fdb_dump
 
  - dsa: qca: ar9331: make proper initial port defaults
 
 Misc:
 
  - bpf: add lockdown check for probe_write_user helper
 
  - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out
 
  - netfilter: conntrack: collect all entries in one cycle,
 	      heuristically slow down garbage collection scans
 	      on idle systems to prevent frequent wake ups
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEVb/AACgkQMUZtbf5S
 Irvlzw//XGDHNNPPOueHVhYK50+WiqPMxezQ5nbnG6uR6JtPyirMNTgzST8rQRsu
 HmQy8/Oi6bK5rbPC9iDtKK28ba6Ldvu1ic8lTkuWyNNthG/pZGJJQ+Pg7dmkd7te
 soJGZKnTbNWwbgGOFbfw9rLRuzWsjQjQ43vxTMjjNnpOwNxANuNR1GN0S/t8e9di
 9BBT8jtgcHhtW5jRMHMNWHk+k8aeyIZPxjl9fjzzsMt7meX50DFrCJgf8bKkZ5dA
 W2b/fzUyMqVQJpgmIY4ktFmR4mV382pWOOs6rl+ppSu+mU/gpTuYCofF7FqAUU5S
 71mzukW6KdOrqymVuwiTXBlGnZB370aT7aUU5PHL/ZkDJ9shSyVRcg/iQa40myzn
 5wxunZX936z5f84bxZPW1J5bBZklba8deKPXHUkl5RoIXsN2qWFPJpZ1M0eHyfPm
 ZdqvRZ1IkSSFZFr6FF374bEqa88NK1wbVKUbGQ+yn8abE+HQfXQR9ZWZa1DR1wkb
 rF8XWOHjQLp/zlTRnj3gj3T4pEwc5L1QOt7RUrYfI36Mh7iUz5EdzowaiEaDQT6/
 neThilci1F6Mz4Uf65pK4TaDTDvj1tqqAdg3g8uneHBTFARS+htGXqkaKxP6kSi+
 T/W4woOqCRT6c0+BhZ2jPRhKsMZ5kR1vKLUVBHShChq32mDpn6g=
 =hzDl
 -----END PGP SIGNATURE-----

Merge tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from netfilter, bpf, can and
  ieee802154.

  The size of this is pretty normal, but we got more fixes for 5.14
  changes this week than last week. Nothing major but the trend is the
  opposite of what we like. We'll see how the next week goes..

  Current release - regressions:

   - r8169: fix ASPM-related link-up regressions

   - bridge: fix flags interpretation for extern learn fdb entries

   - phy: micrel: fix link detection on ksz87xx switch

   - Revert "tipc: Return the correct errno code"

   - ptp: fix possible memory leak caused by invalid cast

  Current release - new code bugs:

   - bpf: add missing bpf_read_[un]lock_trace() for syscall program

   - bpf: fix potentially incorrect results with bpf_get_local_storage()

   - page_pool: mask the page->signature before the checking, avoid dma
     mapping leaks

   - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps

   - bnxt_en: fix firmware interface issues with PTP

   - mlx5: Bridge, fix ageing time

  Previous releases - regressions:

   - linkwatch: fix failure to restore device state across
     suspend/resume

   - bareudp: fix invalid read beyond skb's linear data

  Previous releases - always broken:

   - bpf: fix integer overflow involving bucket_size

   - ppp: fix issues when desired interface name is specified via
     netlink

   - wwan: mhi_wwan_ctrl: fix possible deadlock

   - dsa: microchip: ksz8795: fix number of VLAN related bugs

   - dsa: drivers: fix broken backpressure in .port_fdb_dump

   - dsa: qca: ar9331: make proper initial port defaults

  Misc:

   - bpf: add lockdown check for probe_write_user helper

   - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is
     out

   - netfilter: conntrack: collect all entries in one cycle,
     heuristically slow down garbage collection scans on idle systems to
     prevent frequent wake ups"

* tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  vsock/virtio: avoid potential deadlock when vsock device remove
  wwan: core: Avoid returning NULL from wwan_create_dev()
  net: dsa: sja1105: unregister the MDIO buses during teardown
  Revert "tipc: Return the correct errno code"
  net: mscc: Fix non-GPL export of regmap APIs
  net: igmp: increase size of mr_ifc_count
  MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers
  tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
  net: pcs: xpcs: fix error handling on failed to allocate memory
  net: linkwatch: fix failure to restore device state across suspend/resume
  net: bridge: fix memleak in br_add_if()
  net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
  net: bridge: fix flags interpretation for extern learn fdb entries
  net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
  net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
  net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
  net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
  bpf, core: Fix kernel-doc notation
  net: igmp: fix data-race in igmp_ifc_timer_expire()
  net: Fix memory leak in ieee802154_raw_deliver
  ...
2021-08-12 16:24:03 -10:00
Waiman Long
ee9707e859 cgroup/cpuset: Enable memory migration for cpuset v2
When a user changes cpuset.cpus, each task in a v2 cpuset will be moved
to one of the new cpus if it is not there already. For memory, however,
they won't be migrated to the new nodes when cpuset.mems changes. This is
an inconsistency in behavior.

In cpuset v1, there is a memory_migrate control file to enable such
behavior by setting the CS_MEMORY_MIGRATE flag. Make it the default
for cpuset v2 so that we have a consistent set of behavior for both
cpus and memory.

There is certainly a cost to make memory migration the default, but it
is a one time cost that shouldn't really matter as long as cpuset.mems
isn't changed frequenty.  Update the cgroup-v2.rst file to document the
new behavior and recommend against changing cpuset.mems frequently.

Since there won't be any concurrent access to the newly allocated cpuset
structure in cpuset_css_alloc(), we can use the cheaper non-atomic
__set_bit() instead of the more expensive atomic set_bit().

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-12 11:40:20 -10:00
Thomas Gleixner
f80e214895 hrtimer: Unbreak hrtimer_force_reprogram()
Since the recent consoliation of reprogramming functions,
hrtimer_force_reprogram() is affected by a check whether the new expiry
time is past the current expiry time.

This breaks the NOHZ logic as that relies on the fact that the tick hrtimer
is moved into the future. That means cpu_base->expires_next becomes stale
and subsequent reprogramming attempts fail as well until the situation is
cleaned up by an hrtimer interrupts.

For some yet unknown reason this leads to a complete stall, so for now
partially revert the offending commit to a known working state. The root
cause for the stall is still investigated and will be fixed in a subsequent
commit.

Fixes: b14bca97c9f5 ("hrtimer: Consolidate reprogramming code")
Reported-by: Mike Galbraith <efault@gmx.de>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Link: https://lore.kernel.org/r/8735recskh.ffs@tglx
2021-08-12 22:34:40 +02:00
Thomas Gleixner
9482fd71db hrtimer: Use raw_cpu_ptr() in clock_was_set()
clock_was_set() can be invoked from preemptible context. Use raw_cpu_ptr()
to check whether high resolution mode is active or not. It does not matter
whether the task migrates after acquiring the pointer.

Fixes: e71a4153b7c2 ("hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/875ywacsmb.ffs@tglx
2021-08-12 22:34:40 +02:00
Steven Rostedt (VMware)
5acce0bff2 tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name
The following commands:

 # echo 'read_max u64 size;' > synthetic_events
 # echo 'hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)' > events/syscalls/sys_enter_read/trigger

Causes:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP
 CPU: 4 PID: 1763 Comm: bash Not tainted 5.14.0-rc2-test+ #155
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:strcmp+0xc/0x20
 Code: 75 f7 31 c0 0f b6 0c 06 88 0c 02 48 83 c0 01 84 c9 75 f1 4c 89 c0
c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07
3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
 RSP: 0018:ffffb5fdc0963ca8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffffffffb3a4e040 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9714c0d0b640 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 00000022986b7cde R09: ffffffffb3a4dff8
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9714c50603c8
 R13: 0000000000000000 R14: ffff97143fdf9e48 R15: ffff9714c01a2210
 FS:  00007f1fa6785740(0000) GS:ffff9714da400000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000002d863004 CR4: 00000000001706e0
 Call Trace:
  __find_event_file+0x4e/0x80
  action_create+0x6b7/0xeb0
  ? kstrdup+0x44/0x60
  event_hist_trigger_func+0x1a07/0x2130
  trigger_process_regex+0xbd/0x110
  event_trigger_write+0x71/0xd0
  vfs_write+0xe9/0x310
  ksys_write+0x68/0xe0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f1fa6879e87

The problem was the "trace(read_max,count)" where the "count" should be
"$count" as "onmax()" only handles variables (although it really should be
able to figure out that "count" is a field of sys_enter_read). But there's
a path that does not find the variable and ends up passing a NULL for the
event, which ends up getting passed to "strcmp()".

Add a check for NULL to return and error on the command with:

 # cat error_log
  hist:syscalls:sys_enter_read: error: Couldn't create or find variable
  Command: hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)
                                ^
Link: https://lkml.kernel.org/r/20210808003011.4037f8d0@oasis.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 50450603ec9cb tracing: Add 'onmax' hist trigger action support
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12 13:35:57 -04:00
Lukas Bulwahn
12f9951d3f tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS
Commit 2860cd8a2353 ("livepatch: Use the default ftrace_ops instead of
REGS when ARGS is available") intends to enable config LIVEPATCH when
ftrace with ARGS is available. However, the chain of configs to enable
LIVEPATCH is incomplete, as HAVE_DYNAMIC_FTRACE_WITH_ARGS is available,
but the definition of DYNAMIC_FTRACE_WITH_ARGS, combining DYNAMIC_FTRACE
and HAVE_DYNAMIC_FTRACE_WITH_ARGS, needed to enable LIVEPATCH, is missing
in the commit.

Fortunately, ./scripts/checkkconfigsymbols.py detects this and warns:

DYNAMIC_FTRACE_WITH_ARGS
Referencing files: kernel/livepatch/Kconfig

So, define the config DYNAMIC_FTRACE_WITH_ARGS analogously to the already
existing similar configs, DYNAMIC_FTRACE_WITH_REGS and
DYNAMIC_FTRACE_WITH_DIRECT_CALLS, in ./kernel/trace/Kconfig to connect the
chain of configs.

Link: https://lore.kernel.org/kernel-janitors/CAKXUXMwT2zS9fgyQHKUUiqo8ynZBdx2UEUu1WnV_q0OCmknqhw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20210806195027.16808-1-lukas.bulwahn@gmail.com

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: stable@vger.kernel.org
Fixes: 2860cd8a2353 ("livepatch: Use the default ftrace_ops instead of REGS when ARGS is available")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12 13:35:57 -04:00
Daniel Bristot de Oliveira
0e05ba498d trace/osnoise: Print a stop tracing message
When using osnoise/timerlat with stop tracing, sometimes it is
not clear in which CPU the stop condition was hit, mainly
when using some extra events.

Print a message informing in which CPU the trace stopped, like
in the example below:

          <idle>-0       [006] d.h.  2932.676616: #1672599 context    irq timer_latency     34689 ns
          <idle>-0       [006] dNh.  2932.676618: irq_noise: local_timer:236 start 2932.676615639 duration 2391 ns
          <idle>-0       [006] dNh.  2932.676620: irq_noise: virtio0-output.0:47 start 2932.676620180 duration 86 ns
          <idle>-0       [003] d.h.  2932.676621: #1673374 context    irq timer_latency      1200 ns
          <idle>-0       [006] d...  2932.676623: thread_noise: swapper/6:0 start 2932.676615964 duration 4339 ns
          <idle>-0       [003] dNh.  2932.676623: irq_noise: local_timer:236 start 2932.676620597 duration 1881 ns
          <idle>-0       [006] d...  2932.676623: sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=timerlat/6 next_pid=852 next_prio=4
      timerlat/6-852     [006] ....  2932.676623: #1672599 context thread timer_latency     41931 ns
          <idle>-0       [003] d...  2932.676623: thread_noise: swapper/3:0 start 2932.676620854 duration 880 ns
          <idle>-0       [003] d...  2932.676624: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=timerlat/3 next_pid=849 next_prio=4
      timerlat/6-852     [006] ....  2932.676624: timerlat_main: stop tracing hit on cpu 6
      timerlat/3-849     [003] ....  2932.676624: #1673374 context thread timer_latency      4310 ns

Link: https://lkml.kernel.org/r/b30a0d7542adba019185f44ee648e60e14923b11.1626598844.git.bristot@kernel.org

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12 13:35:56 -04:00
Daniel Bristot de Oliveira
e1c4ad4a7f trace/timerlat: Add a header with PREEMPT_RT additional fields
Some extra flags are printed to the trace header when using the
PREEMPT_RT config. The extra flags are: need-resched-lazy,
preempt-lazy-depth, and migrate-disable.

Without printing these fields, the timerlat specific fields are
shifted by three positions, for example:

 # tracer: timerlat
 #
 #                                _-----=> irqs-off
 #                               / _----=> need-resched
 #                              | / _---=> hardirq/softirq
 #                              || / _--=> preempt-depth
 #                              || /
 #                              ||||             ACTIVATION
 #           TASK-PID      CPU# ||||   TIMESTAMP    ID            CONTEXT                LATENCY
 #              | |         |   ||||      |         |                  |                       |
           <idle>-0       [000] d..h...  3279.798871: #1     context    irq timer_latency       830 ns
            <...>-807     [000] .......  3279.798881: #1     context thread timer_latency     11301 ns

Add a new header for timerlat with the missing fields, to be used
when the PREEMPT_RT is enabled.

Link: https://lkml.kernel.org/r/babb83529a3211bd0805be0b8c21608230202c55.1626598844.git.bristot@kernel.org

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12 13:35:56 -04:00
Daniel Bristot de Oliveira
d03721a6e7 trace/osnoise: Add a header with PREEMPT_RT additional fields
Some extra flags are printed to the trace header when using the
PREEMPT_RT config. The extra flags are: need-resched-lazy,
preempt-lazy-depth, and migrate-disable.

Without printing these fields, the osnoise specific fields are
shifted by three positions, for example:

 # tracer: osnoise
 #
 #                                _-----=> irqs-off
 #                               / _----=> need-resched
 #                              | / _---=> hardirq/softirq
 #                              || / _--=> preempt-depth                            MAX
 #                              || /                                             SINGLE      Interference counters:
 #                              ||||               RUNTIME      NOISE  %% OF CPU  NOISE    +-----------------------------+
 #           TASK-PID      CPU# ||||   TIMESTAMP    IN US       IN US  AVAILABLE  IN US     HW    NMI    IRQ   SIRQ THREAD
 #              | |         |   ||||      |           |             |    |            |      |      |      |      |      |
            <...>-741     [000] .......  1105.690909: 1000000        234  99.97660      36     21      0   1001     22      3
            <...>-742     [001] .......  1105.691923: 1000000        281  99.97190     197      7      0   1012     35     14
            <...>-743     [002] .......  1105.691958: 1000000       1324  99.86760     118     11      0   1016    155    143
            <...>-744     [003] .......  1105.691998: 1000000        109  99.98910      21      4      0   1004     33      7
            <...>-745     [004] .......  1105.692015: 1000000       2023  99.79770      97     37      0   1023     52     18

Add a new header for osnoise with the missing fields, to be used
when the PREEMPT_RT is enabled.

Link: https://lkml.kernel.org/r/1f03289d2a51fde5a58c2e7def063dc630820ad1.1626598844.git.bristot@kernel.org

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12 13:35:56 -04:00
Linus Torvalds
f8fbb47c6e Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fix from Eric Biederman:
 "This fixes the ucount sysctls on big endian architectures.

  The counts were expanded to be longs instead of ints, and the sysctl
  code was overlooked, so only the low 32bit were being processed. On
  litte endian just processing the low 32bits is fine, but on 64bit big
  endian processing just the low 32bits results in the high order bits
  instead of the low order bits being processed and nothing works
  proper.

  This change took a little bit to mature as we have the SYSCTL_ZERO,
  and SYSCTL_INT_MAX macros that are only usable for sysctls operating
  on ints, but unfortunately are not obviously broken. Which resulted in
  the versions of this change working on big endian and not on little
  endian, because the int SYSCTL_ZERO when extended 64bit wound up being
  0x100000000. So we only allowed values greater than 0x100000000 and
  less than 0faff. Which unfortunately broken everything that tried to
  set the sysctls. (First reported with the windows subsystem for
  linux).

  I have tested this on x86_64 64bit after first reproducing the
  problems with the earlier version of this change, and then verifying
  the problems do not exist when we use appropriate long min and max
  values for extra1 and extra2"

* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: add missing data type changes
2021-08-12 07:20:16 -10:00
Ran Xiaokai
2863643fb8
set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds
in copy_process(): non root users but with capability CAP_SYS_RESOURCE
or CAP_SYS_ADMIN will clean PF_NPROC_EXCEEDED flag even
rlimit(RLIMIT_NPROC) exceeds. Add the same capability check logic here.

Align the permission checks in copy_process() and set_user(). In
copy_process() CAP_SYS_RESOURCE or CAP_SYS_ADMIN capable users will be
able to circumvent and clear the PF_NPROC_EXCEEDED flag whereas they
aren't able to the same in set_user(). There's no obvious logic to this
and trying to unearth the reason in the thread didn't go anywhere.

The gist seems to be that this code wants to make sure that a program
can't successfully exec if it has gone through a set*id() transition
while exceeding its RLIMIT_NPROC.
A capable but non-INIT_USER caller getting PF_NPROC_EXCEEDED set during
a set*id() transition wouldn't be able to exec right away if they still
exceed their RLIMIT_NPROC at the time of exec. So their exec would fail
in fs/exec.c:

        if ((current->flags & PF_NPROC_EXCEEDED) &&
            is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
                retval = -EAGAIN;
                goto out_ret;
        }

However, if the caller were to fork() right after the set*id()
transition but before the exec while still exceeding their RLIMIT_NPROC
then they would get PF_NPROC_EXCEEDED cleared (while the child would
inherit it):

        retval = -EAGAIN;
        if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
                if (p->real_cred->user != INIT_USER &&
                    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
                        goto bad_fork_free;
        }
        current->flags &= ~PF_NPROC_EXCEEDED;

which means a subsequent exec by the capable caller would now succeed
even though they could still exceed their RLIMIT_NPROC limit. This seems
inconsistent. Allow a CAP_SYS_ADMIN or CAP_SYS_RESOURCE capable user to
avoid PF_NPROC_EXCEEDED as they already can in copy_process().

Cc: peterz@infradead.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, Ran Xiaokai <ran.xiaokai@zte.com.cn>, , ,

Link: https://lore.kernel.org/r/20210728072629.530435-1-ran.xiaokai@zte.com.cn
Cc: Neil Brown <neilb@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-08-12 14:54:25 +02:00
Sebastian Andrzej Siewior
80771c8228 padata: Replace deprecated CPU-hotplug functions.
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-12 19:16:59 +08:00
Linus Torvalds
fd66ad69ef seccomp fixes for v5.14-rc6
- Fix typo in user notification documentation (Rodrigo Campos)
 
 - Fix userspace counter report when using TSYNC (Hsuan-Chi Kuo, Wiktor Garbacz)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmEUHhAWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJvHdEACn+Ws1PhWQrypmFOG177DXC/jn
 Z/cBUxAE8F/y+lQ4wRkbN0GEQsrIFgDkTFPykzKfrgChud6cUhRe9gwmpKfpKNkZ
 ciD8jFfV/DP8tT1OtVMy2fPUWTQI/Lc5m9rvHYxYrRA1YbP6QF3RQouDzQw//VgB
 fLkdBs9izQKDreclPy5spumuy2Th07EIJyHHxw9pOW27QdtQgqHobkpXNXZWCGra
 4Uo6lvK9XSnbY0PI95uu+5UgHMmqaZ0S+gDZL+mP105kMj10cbjocJkdUm0Trov6
 06/J36gCFCVWlbDl6QOlJcUCUz2r5eeWTpW1qNrOTyd9CZzKVgNxDvx0jVP962Vg
 DfHxXzrg4EmhthQr3hphBwf0+was3g3s+bxkKn0mV3Vp9RJ6zzModt1+OsRB2zR3
 FqKyGJbujHqYUhi1i9KayQxazWk7dEccvfDjY9F2eGmFBuUlM5jr4Gt4k2Wl93Df
 Yoco1f4AaEV3uU6zUf+Ta40FjLtQO66/ZhmmozGvrWLH8Y02se4ICgS+lXoGs4/k
 R2IMG6OPCpjmJc39+R5Lyh1jtbOXHf4Brxb2MMZkpoXQCMEND9uOleTaVlOLd4sP
 HGyu7dI36d6xH40vSIBRGV4ypE7jQUIQkIopjDqzbtFAyFvBXr3zEr/s3EVyc1E0
 e+/PligHRRYBpl+l0w==
 =N5TG
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fixes from Kees Cook:

 - Fix typo in user notification documentation (Rodrigo Campos)

 - Fix userspace counter report when using TSYNC (Hsuan-Chi Kuo, Wiktor
   Garbacz)

* tag 'seccomp-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: Fix setting loaded filter count during TSYNC
  Documentation: seccomp: Fix typo in user notification
2021-08-11 19:56:10 -10:00
Elliot Berman
14c4c8e415 cfi: Use rcu_read_{un}lock_sched_notrace
If rcu_read_lock_sched tracing is enabled, the tracing subsystem can
perform a jump which needs to be checked by CFI. For example, stm_ftrace
source is enabled as a module and hooks into enabled ftrace events. This
can cause an recursive loop where find_shadow_check_fn ->
rcu_read_lock_sched -> (call to stm_ftrace generates cfi slowpath) ->
find_shadow_check_fn -> rcu_read_lock_sched -> ...

To avoid the recursion, either the ftrace codes needs to be marked with
__no_cfi or CFI should not trace. Use the "_notrace" in CFI to avoid
tracing so that CFI can guard ftrace.

Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Cc: stable@vger.kernel.org
Fixes: cf68fffb66d6 ("add support for Clang CFI")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210811155914.19550-1-quic_eberman@quicinc.com
2021-08-11 13:11:12 -07:00
Hsuan-Chi Kuo
b4d8a58f8d seccomp: Fix setting loaded filter count during TSYNC
The desired behavior is to set the caller's filter count to thread's.
This value is reported via /proc, so this fixes the inaccurate count
exposed to userspace; it is not used for reference counting, etc.

Signed-off-by: Hsuan-Chi Kuo <hsuanchikuo@gmail.com>
Link: https://lore.kernel.org/r/20210304233708.420597-1-hsuanchikuo@gmail.com
Co-developed-by: Wiktor Garbacz <wiktorg@google.com>
Signed-off-by: Wiktor Garbacz <wiktorg@google.com>
Link: https://lore.kernel.org/lkml/20210810125158.329849-1-wiktorg@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status")
2021-08-11 11:48:28 -07:00
Yonghong Song
2d3a1e3615 bpf: Add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
          if (len == -2 && ret == 0 && buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)->cgroups,                          \
                   lockdep_is_held(&cgroup_mutex) ||                       \
                   lockdep_is_held(&css_set_lock) ||                       \
                   ((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a7935b
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().

 [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/

Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210810230537.2864668-1-yhs@fb.com
2021-08-11 11:45:43 -07:00
Waiman Long
e7cc9888dc cgroup/cpuset: Enable event notification when partition state changes
A valid cpuset partition can become invalid if all its CPUs are offlined
or somehow removed. This can happen through external events without
"cpuset.cpus.partition" being touched at all.

Users that rely on the property of a partition being present do not
currently have a simple way to get such an event notified other than
constant periodic polling which is both inefficient and cumbersome.

To make life easier for those users, event notification is now enabled
for "cpuset.cpus.partition" whenever its state changes.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-11 08:04:43 -10:00
Randy Dunlap
b4cc619608 cgroup: cgroup-v1: clean up kernel-doc notation
Fix kernel-doc warnings found in cgroup-v1.c:

kernel/cgroup/cgroup-v1.c:55: warning: No description found for return value of 'cgroup_attach_task_all'
kernel/cgroup/cgroup-v1.c:94: warning: expecting prototype for cgroup_trasnsfer_tasks(). Prototype was for cgroup_transfer_tasks() instead
cgroup-v1.c:96: warning: No description found for return value of 'cgroup_transfer_tasks'
kernel/cgroup/cgroup-v1.c:687: warning: No description found for return value of 'cgroupstats_build'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-11 07:57:43 -10:00
Randy Dunlap
49b3bd213a smp: Fix all kernel-doc warnings
Fix the following warnings:

kernel/smp.c:1189: warning: cannot understand function prototype: 'struct smp_call_on_cpu_struct '
kernel/smp.c:788: warning: No description found for return value of 'smp_call_function_single_async'
kernel/smp.c:990: warning: Function parameter or member 'wait' not described in 'smp_call_function_many'
kernel/smp.c:990: warning: Excess function parameter 'flags' description in 'smp_call_function_many'
kernel/smp.c:1198: warning: Function parameter or member 'work' not described in 'smp_call_on_cpu_struct'
kernel/smp.c:1198: warning: Function parameter or member 'done' not described in 'smp_call_on_cpu_struct'
kernel/smp.c:1198: warning: Function parameter or member 'func' not described in 'smp_call_on_cpu_struct'
kernel/smp.c:1198: warning: Function parameter or member 'data' not described in 'smp_call_on_cpu_struct'
kernel/smp.c:1198: warning: Function parameter or member 'ret' not described in 'smp_call_on_cpu_struct'
kernel/smp.c:1198: warning: Function parameter or member 'cpu' not described in 'smp_call_on_cpu_struct'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210810225051.3938-1-rdunlap@infradead.org
2021-08-11 14:47:16 +02:00
Randy Dunlap
3b35e7e6da genirq: Fix kernel-doc warnings in pm.c, msi.c and ipi.c
Fix all kernel-doc warnings in these 3 files and do some simple editing
(capitalize acronyms, capitalize Linux).

kernel/irq/pm.c:235: warning: expecting prototype for irq_pm_syscore_ops(). Prototype was for irq_pm_syscore_resume() instead
kernel/irq/msi.c:530: warning: expecting prototype for __msi_domain_free_irqs(). Prototype was for msi_domain_free_irqs() instead
kernel/irq/msi.c:31: warning: No description found for return value of 'alloc_msi_entry'
kernel/irq/msi.c:103: warning: No description found for return value of 'msi_domain_set_affinity'
kernel/irq/msi.c:288: warning: No description found for return value of 'msi_create_irq_domain'
kernel/irq/msi.c:499: warning: No description found for return value of 'msi_domain_alloc_irqs'
kernel/irq/msi.c:545: warning: No description found for return value of 'msi_get_domain_info'
kernel/irq/ipi.c:264: warning: expecting prototype for ipi_send_mask(). Prototype was for __ipi_send_mask() instead
kernel/irq/ipi.c:25: warning: No description found for return value of 'irq_reserve_ipi'
kernel/irq/ipi.c:116: warning: No description found for return value of 'irq_destroy_ipi'
kernel/irq/ipi.c:163: warning: No description found for return value of 'ipi_get_hwirq'
kernel/irq/ipi.c:222: warning: No description found for return value of '__ipi_send_single'
kernel/irq/ipi.c:308: warning: No description found for return value of 'ipi_send_single'
kernel/irq/ipi.c:329: warning: No description found for return value of 'ipi_send_mask'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210810234835.12547-1-rdunlap@infradead.org
2021-08-11 14:33:35 +02:00
Zhen Lei
290fdc4b7e genirq/timings: Fix error return code in irq_timings_test_irqs()
Return a negative error code from the error handling case instead of 0, as
done elsewhere in this function.

Fixes: f52da98d900e ("genirq/timings: Add selftest for irqs circular buffer")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210811093333.2376-1-thunder.leizhen@huawei.com
2021-08-11 14:33:35 +02:00