5686 Commits

Author SHA1 Message Date
Xiu Jianfeng
a76d4648a0 tracing: Make tracepoint_print_iter static
After change in commit 4239174570da ("tracing: Make tracepoint_printk a
static_key"), this symbol is not used outside of the file, so mark it
static.

Link: https://lkml.kernel.org/r/20221122091456.72055-1-xiujianfeng@huawei.com

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:31 -05:00
Chuang Wang
9430cd62b6 tracing/perf: Use strndup_user instead of kzalloc/strncpy_from_user
This patch uses strndup_user instead of kzalloc + strncpy_from_user,
which makes the code more concise.

Link: https://lkml.kernel.org/r/20221121080831.707409-1-nashuiliang@gmail.com

Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:31 -05:00
Daniel Bristot de Oliveira
30838fcd81 tracing/osnoise: Add OSNOISE_WORKLOAD option
The osnoise tracer is not only a tracer, and a set of tracepoints,
but also a workload dispatcher.

In preparation for having other workloads, e.g., in user-space,
add an option to avoid dispatching the workload.

By not dispatching the workload, the osnoise: tracepoints become
generic events to measure the execution time of *any* task on Linux.

For example:

  # cd /sys/kernel/tracing/
  # cat osnoise/options
  DEFAULTS OSNOISE_WORKLOAD
  # echo NO_OSNOISE_WORKLOAD > osnoise/options
  # cat osnoise/options
  NO_DEFAULTS NO_OSNOISE_WORKLOAD
  # echo osnoise > set_event
  # echo osnoise > current_tracer
  # tail -8 trace
      make-94722   [002] d..3.  1371.794507: thread_noise:     make:94722 start 1371.794302286 duration 200897 ns
        sh-121042  [020] d..3.  1371.794534: thread_noise:       sh:121042 start 1371.781610976 duration 8943683 ns
      make-121097  [005] d..3.  1371.794542: thread_noise:     make:121097 start 1371.794481522 duration 60444 ns
     <...>-40      [005] d..3.  1371.794550: thread_noise: migration/5:40 start 1371.794542256 duration 7154 ns
    <idle>-0       [018] dNh2.  1371.794554: irq_noise: reschedule:253 start 1371.794553547 duration 40 ns
    <idle>-0       [018] dNh2.  1371.794561: irq_noise: local_timer:236 start 1371.794556222 duration 4890 ns
    <idle>-0       [018] .Ns2.  1371.794563: softirq_noise:    SCHED:7 start 1371.794561803 duration 992 ns
    <idle>-0       [018] d..3.  1371.794566: thread_noise: swapper/18:0 start 1371.781368110 duration 13191798 ns

In preparation for the rtla exec_time tracer/tool and
rtla osnoise --user option.

Link: https://lkml.kernel.org/r/f5cfbd37aefd419eefe9243b4d2fc38ed5753fe4.1668692096.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:31 -05:00
Daniel Bristot de Oliveira
b179d48b6a tracing/osnoise: Add osnoise/options file
Add the tracing/osnoise/options file to control
osnoise/timerlat tracer features. It is a single
file to contain multiple features, similar to
the sched/features file.

Reading the file displays a list of options. Writing
the OPTION_NAME enables it, writing NO_OPTION_NAME disables
it.

The DEAFULTS is a particular option that resets the options
to the default ones.

It uses a bitmask to keep track of the status of the option. When
needed, we can add a list of static keys, but for now
it does not justify the memory increase.

Link: https://lkml.kernel.org/r/f8d34aefdb225d2603fcb4c02a120832a0cd3339.1668692096.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:31 -05:00
Song Chen
04aabc32fb ring_buffer: Remove unused "event" parameter
After commit a389d86f7fd0 ("ring-buffer: Have nested events still record
running time stamp"), the "event" parameter is no longer used in either
ring_buffer_unlock_commit() or rb_commit(). Best to remove it.

Link: https://lkml.kernel.org/r/1666274811-24138-1-git-send-email-chensong_2000@189.cn

Signed-off-by: Song Chen <chensong_2000@189.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:30 -05:00
Steven Rostedt (Google)
a01fdc897f tracing: Add trace_trigger kernel command line option
Allow triggers to be enabled at kernel boot up. For example:

  trace_trigger="sched_switch.stacktrace if prev_state == 2"

The above will enable the stacktrace trigger on top of the sched_switch
event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then
at boot up, a stacktrace will trigger and be recorded in the tracing ring
buffer every time the sched_switch happens where the previous state is
TASK_INTERRUPTIBLE.

Another useful trigger would be "traceoff" which can stop tracing on an
event if a field of the event matches a certain value defined by the
filter ("if" statement).

Link: https://lore.kernel.org/linux-trace-kernel/20221020210056.0d8d0a5b@gandalf.local.home

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:30 -05:00
Zheng Yejian
78a01feb40 ftrace: Clean comments related to FTRACE_OPS_FL_PER_CPU
Commit b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU") didn't
completely remove the comments related to FTRACE_OPS_FL_PER_CPU.

Link: https://lkml.kernel.org/r/20221025153923.1995973-1-zhengyejian1@huawei.com

Fixes: b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:30 -05:00
Steven Rostedt (Google)
4313e5a613 tracing: Free buffers when a used dynamic event is removed
After 65536 dynamic events have been added and removed, the "type" field
of the event then uses the first type number that is available (not
currently used by other events). A type number is the identifier of the
binary blobs in the tracing ring buffer (known as events) to map them to
logic that can parse the binary blob.

The issue is that if a dynamic event (like a kprobe event) is traced and
is in the ring buffer, and then that event is removed (because it is
dynamic, which means it can be created and destroyed), if another dynamic
event is created that has the same number that new event's logic on
parsing the binary blob will be used.

To show how this can be an issue, the following can crash the kernel:

 # cd /sys/kernel/tracing
 # for i in `seq 65536`; do
     echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
 # done

For every iteration of the above, the writing to the kprobe_events will
remove the old event and create a new one (with the same format) and
increase the type number to the next available on until the type number
reaches over 65535 which is the max number for the 16 bit type. After it
reaches that number, the logic to allocate a new number simply looks for
the next available number. When an dynamic event is removed, that number
is then available to be reused by the next dynamic event created. That is,
once the above reaches the max number, the number assigned to the event in
that loop will remain the same.

Now that means deleting one dynamic event and created another will reuse
the previous events type number. This is where bad things can happen.
After the above loop finishes, the kprobes/foo event which reads the
do_sys_openat2 function call's first parameter as an integer.

 # echo 1 > kprobes/foo/enable
 # cat /etc/passwd > /dev/null
 # cat trace
             cat-2211    [005] ....  2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
             cat-2211    [005] ....  2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
             cat-2211    [005] ....  2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
             cat-2211    [005] ....  2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
 # echo 0 > kprobes/foo/enable

Now if we delete the kprobe and create a new one that reads a string:

 # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events

And now we can the trace:

 # cat trace
        sendmail-1942    [002] .....   530.136320: foo: (do_sys_openat2+0x0/0x240) arg1=             cat-2046    [004] .....   530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
             cat-2046    [004] .....   530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
             cat-2046    [004] .....   530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
             cat-2046    [004] .....   530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
            bash-1515    [007] .....   534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U

And dmesg has:

==================================================================
BUG: KASAN: use-after-free in string+0xd4/0x1c0
Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049

 CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5b/0x77
  print_report+0x17f/0x47b
  kasan_report+0xad/0x130
  string+0xd4/0x1c0
  vsnprintf+0x500/0x840
  seq_buf_vprintf+0x62/0xc0
  trace_seq_printf+0x10e/0x1e0
  print_type_string+0x90/0xa0
  print_kprobe_event+0x16b/0x290
  print_trace_line+0x451/0x8e0
  s_show+0x72/0x1f0
  seq_read_iter+0x58e/0x750
  seq_read+0x115/0x160
  vfs_read+0x11d/0x460
  ksys_read+0xa9/0x130
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0033:0x7fc2e972ade2
 Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
 RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
 RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
 RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
 R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
  </TASK>

 The buggy address belongs to the physical page:
 page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
 raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                ^
  ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ==================================================================

This was found when Zheng Yejian sent a patch to convert the event type
number assignment to use IDA, which gives the next available number, and
this bug showed up in the fuzz testing by Yujie Liu and the kernel test
robot. But after further analysis, I found that this behavior is the same
as when the event type numbers go past the 16bit max (and the above shows
that).

As modules have a similar issue, but is dealt with by setting a
"WAS_ENABLED" flag when a module event is enabled, and when the module is
freed, if any of its events were enabled, the ring buffer that holds that
event is also cleared, to prevent reading stale events. The same can be
done for dynamic events.

If any dynamic event that is being removed was enabled, then make sure the
buffers they were enabled in are now cleared.

Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home
Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/

Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Depends-on: e18eb8783ec49 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
Depends-on: 5448d44c38557 ("tracing: Add unified dynamic event framework")
Depends-on: 6212dd29683ee ("tracing/kprobes: Use dyn_event framework for kprobe events")
Depends-on: 065e63f951432 ("tracing: Only have rmmod clear buffers that its events were active in")
Depends-on: 575380da8b469 ("tracing: Only clear trace buffer on module unload if event was traced")
Fixes: 77b44d1b7c283 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
Reported-by: Zheng Yejian <zhengyejian1@huawei.com>
Reported-by: Yujie Liu <yujie.liu@intel.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:07:12 -05:00
Steven Rostedt (Google)
e18eb8783e tracing: Add tracing_reset_all_online_cpus_unlocked() function
Currently the tracing_reset_all_online_cpus() requires the
trace_types_lock held. But only one caller of this function actually has
that lock held before calling it, and the other just takes the lock so
that it can call it. More users of this function is needed where the lock
is not held.

Add a tracing_reset_all_online_cpus_unlocked() function for the one use
case that calls it without being held, and also add a lockdep_assert to
make sure it is held when called.

Then have tracing_reset_all_online_cpus() take the lock internally, such
that callers do not need to worry about taking it.

Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:06:11 -05:00
Steven Rostedt (Google)
ef38c79a52 tracing: Fix race where histograms can be called before the event
commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before
the event") fixed an issue where if an event is soft disabled, and the
trigger is being added, there's a small window where the event sees that
there's a trigger but does not see that it requires reading the event yet,
and then calls the trigger with the record == NULL.

This could be solved with adding memory barriers in the hot path, or to
make sure that all the triggers requiring a record check for NULL. The
latter was chosen.

Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but
the same needs to be done with histograms.

Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/
Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:05:50 -05:00
Steven Rostedt (Google)
d5f30a7da8 tracing: Fix race where eprobes can be called before the event
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.

Test for a NULL record to keep this from happening.

Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/
Link: https://lore.kernel.org/all/20221117214249.2addbe10@gandalf.local.home/

Cc: stable@vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-24 00:42:18 +09:00
Daniel Bristot de Oliveira
022632f6c4 tracing/osnoise: Fix duration type
The duration type is a 64 long value, not an int. This was
causing some long noise to report wrong values.

Change the duration to a 64 bits value.

Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org

Cc: stable@vger.kernel.org
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Fixes: bce29ac9ce0b ("trace: Add osnoise tracer")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22 18:12:01 -05:00
Xiu Jianfeng
ccc6e59007 tracing/user_events: Fix memory leak in user_event_create()
Before current_user_event_group(), it has allocated memory and save it
in @name, this should freed before return error.

Link: https://lkml.kernel.org/r/20221115014445.158419-1-xiujianfeng@huawei.com

Fixes: e5d271812e7a ("tracing/user_events: Move pages/locks into groups to prepare for namespaces")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22 18:09:50 -05:00
Colin Ian King
0a068f4a71 tracing/hist: add in missing * in comment blocks
There are a couple of missing * in comment blocks. Fix these.
Cleans up two clang warnings:

kernel/trace/trace_events_hist.c:986: warning: bad line:
kernel/trace/trace_events_hist.c:3229: warning: bad line:

Link: https://lkml.kernel.org/r/20221020133019.1547587-1-colin.i.king@gmail.com

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22 16:17:33 -05:00
Ingo Molnar
0ce096db71 Linux 6.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN
 Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS
 FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i
 1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g
 qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k
 WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I
 1GfXkXs=
 =gt6P
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-rc6' into x86/core, to resolve conflicts

Resolve conflicts between these commits in arch/x86/kernel/asm-offsets.c:

 # upstream:
 debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")

 # retbleed work in x86/core:
 5d8213864ade ("x86/retbleed: Add SKL return thunk")

... and these commits in include/linux/bpf.h:

  # upstram:
  18acb7fac22f ("bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")")

  # x86/core commits:
  931ab63664f0 ("x86/ibt: Implement FineIBT")
  bea75b33895f ("x86/Kconfig: Introduce function padding")

The latter two modify BPF_DISPATCHER_ATTRIBUTES(), which was removed upstream.

 Conflicts:
	arch/x86/kernel/asm-offsets.c
	include/linux/bpf.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-11-21 23:01:51 +01:00
Linus Torvalds
c6c67bf9bc tracing/probes: Fixes for v6.1
- Fix possible NULL pointer dereference  on trace_event_file in kprobe_event_gen_test_exit()
 
 - Fix NULL pointer dereference for trace_array in kprobe_event_gen_test_exit()
 
 - Fix memory leak of filter string for eprobes
 
 - Fix a possible memory leak in rethook_alloc()
 
 - Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case
   which can cause a possible use-after-free
 
 - Fix warning in eprobe filter creation
 
 - Fix eprobe filter creation as it picked the wrong event for the fields
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY3qRDhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgCdAP0cB1ZjmMM7O8OFdnrTV0jfavZTnNNC
 ut9sczpYQ0upcwEArmVvB+H3sTM6Y7PrsQEUn8gsc7WmieUoDAOr0hIe4AI=
 =DGVC
 -----END PGP SIGNATURE-----

Merge tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing/probes fixes from Steven Rostedt:

 - Fix possible NULL pointer dereference on trace_event_file in
   kprobe_event_gen_test_exit()

 - Fix NULL pointer dereference for trace_array in
   kprobe_event_gen_test_exit()

 - Fix memory leak of filter string for eprobes

 - Fix a possible memory leak in rethook_alloc()

 - Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case which
   can cause a possible use-after-free

 - Fix warning in eprobe filter creation

 - Fix eprobe filter creation as it picked the wrong event for the
   fields

* tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/eprobe: Fix eprobe filter to make a filter correctly
  tracing/eprobe: Fix warning in filter creation
  kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case
  rethook: fix a potential memleak in rethook_alloc()
  tracing/eprobe: Fix memory leak of filter string
  tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()
  tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
2022-11-20 15:31:20 -08:00
Steven Rostedt (Google)
94eedf3dde tracing: Fix race where eprobes can be called before the event
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.

Test for a NULL record to keep this from happening.

Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home

Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-20 14:05:50 -05:00
David Vernet
3f00c52393 bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
to the verifier that it should enforce that a BPF program passes it a
"safe", trusted pointer. Currently, "safe" means that the pointer is
either PTR_TO_CTX, or is refcounted. There may be cases, however, where
the kernel passes a BPF program a safe / trusted pointer to an object
that the BPF program wishes to use as a kptr, but because the object
does not yet have a ref_obj_id from the perspective of the verifier, the
program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
kfunc.

The solution is to expand the set of pointers that are considered
trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
with these pointers without getting rejected by the verifier.

There is already a PTR_UNTRUSTED flag that is set in some scenarios,
such as when a BPF program reads a kptr directly from a map
without performing a bpf_kptr_xchg() call. These pointers of course can
and should be rejected by the verifier. Unfortunately, however,
PTR_UNTRUSTED does not cover all the cases for safety that need to
be addressed to adequately protect kfuncs. Specifically, pointers
obtained by a BPF program "walking" a struct are _not_ considered
PTR_UNTRUSTED according to BPF. For example, say that we were to add a
kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
that a task was unsafe to pass to a kfunc, the verifier would mistakenly
allow the following unsafe BPF program to be loaded:

SEC("tp_btf/task_newtask")
int BPF_PROG(unsafe_acquire_task,
             struct task_struct *task,
             u64 clone_flags)
{
        struct task_struct *acquired, *nested;

        nested = task->last_wakee;

        /* Would not be rejected by the verifier. */
        acquired = bpf_task_acquire(nested);
        if (!acquired)
                return 0;

        bpf_task_release(acquired);
        return 0;
}

To address this, this patch defines a new type flag called PTR_TRUSTED
which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
passed directly from the kernel as a tracepoint or struct_ops callback
argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
pointer is no longer PTR_TRUSTED. From the example above, the struct
task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
obtained from 'task->last_wakee' is not PTR_TRUSTED.

A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
and then another patch will add selftests to validate.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221120051004.3605026-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20 09:16:21 -08:00
Mark Rutland
94d095ffa0 ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly necessary
in the core ftrace code.

This patch adds new ftrace_regs_{get,set}_*() helpers which can be used
to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y,
these can always be used on any ftrace_regs, and when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are
available. A new ftrace_regs_has_args(fregs) helper is added which code
can use to check when these are usable.

Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 13:56:41 +00:00
Mark Rutland
9705bc7096 ftrace: pass fregs to arch_ftrace_set_direct_caller()
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly
necessary in the core ftrace code.

This patch changes the prototype of arch_ftrace_set_direct_caller() to
take ftrace_regs rather than pt_regs, and moves the extraction of the
pt_regs into arch_ftrace_set_direct_caller().

On x86, arch_ftrace_set_direct_caller() can be used even when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines
struct ftrace_regs. Due to this, it's necessary to define
arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete
type. I've also moved the body of arch_ftrace_set_direct_caller() after
the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct
ftrace_regs.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 13:56:41 +00:00
Sathvika Vasireddy
280981d699 objtool: Add --mnop as an option to --mcount
Some architectures (powerpc) may not support ftrace locations being nop'ed
out at build time. Introduce CONFIG_HAVE_OBJTOOL_NOP_MCOUNT for objtool, as
a means for architectures to enable nop'ing of ftrace locations. Add --mnop
as an option to objtool --mcount, to indicate support for the same.

Also, make sure that --mnop can be passed as an option to objtool only when
--mcount is passed.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-12-sv@linux.ibm.com
2022-11-18 19:00:16 +11:00
Zheng Yejian
067df9e0ad tracing: Fix potential null-pointer-access of entry in list 'tr->err_log'
Entries in list 'tr->err_log' will be reused after entry number
exceed TRACING_LOG_ERRS_MAX.

The cmd string of the to be reused entry will be freed first then
allocated a new one. If the allocation failed, then the entry will
still be in list 'tr->err_log' but its 'cmd' field is set to be NULL,
later access of 'cmd' is risky.

Currently above problem can cause the loss of 'cmd' information of first
entry in 'tr->err_log'. When execute `cat /sys/kernel/tracing/error_log`,
reproduce logs like:
  [   37.495100] trace_kprobe: error: Maxactive is not for kprobe(null) ^
  [   38.412517] trace_kprobe: error: Maxactive is not for kprobe
    Command: p4:myprobe2 do_sys_openat2
            ^

Link: https://lore.kernel.org/linux-trace-kernel/20221114104632.3547266-1-zhengyejian1@huawei.com

Fixes: 1581a884b7ca ("tracing: Remove size restriction on tracing_log_err cmd strings")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17 20:41:01 -05:00
Qiujun Huang
b8752064e3 tracing: Remove unused __bad_type_size() method
__bad_type_size() is unused after
commit 04ae87a52074("ftrace: Rework event_create_dir()").
So, remove it.

Link: https://lkml.kernel.org/r/D062EC2E-7DB7-4402-A67E-33C3577F551E@gmail.com

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17 20:21:06 -05:00
Masami Hiramatsu (Google)
40adaf51cb tracing/eprobe: Fix eprobe filter to make a filter correctly
Since the eprobe filter was defined based on the eprobe's trace event
itself, it doesn't work correctly. Use the original trace event of
the eprobe when making the filter so that the filter works correctly.

Without this fix:

 # echo 'e syscalls/sys_enter_openat \
	flags_rename=$flags:u32 if flags < 1000' >> dynamic_events
 # echo 1 > events/eprobes/sys_enter_openat/enable
[  114.551550] event trace: Could not enable event sys_enter_openat
-bash: echo: write error: Invalid argument

With this fix:
 # echo 'e syscalls/sys_enter_openat \
	flags_rename=$flags:u32 if flags < 1000' >> dynamic_events
 # echo 1 > events/eprobes/sys_enter_openat/enable
 # tail trace
cat-241     [000] ...1.   266.498449: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0
cat-242     [000] ...1.   266.977640: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0

Link: https://lore.kernel.org/all/166823166395.1385292.8931770640212414483.stgit@devnote3/

Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Tested-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18 10:15:34 +09:00
Rafael Mendonca
342a4a2f99 tracing/eprobe: Fix warning in filter creation
The filter pointer (filterp) passed to create_filter() function must be a
pointer that references a NULL pointer, otherwise, we get a warning when
adding a filter option to the event probe:

root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \
        runtime=$runtime:u32 if cpu < 4' >> dynamic_events
[ 5034.340439] ------------[ cut here ]------------
[ 5034.341258] WARNING: CPU: 0 PID: 223 at kernel/trace/trace_events_filter.c:1939 create_filter+0x1db/0x250
[...] stripped
[ 5034.345518] RIP: 0010:create_filter+0x1db/0x250
[...] stripped
[ 5034.351604] Call Trace:
[ 5034.351803]  <TASK>
[ 5034.351959]  ? process_preds+0x1b40/0x1b40
[ 5034.352241]  ? rcu_read_lock_bh_held+0xd0/0xd0
[ 5034.352604]  ? kasan_set_track+0x29/0x40
[ 5034.352904]  ? kasan_save_alloc_info+0x1f/0x30
[ 5034.353264]  create_event_filter+0x38/0x50
[ 5034.353573]  __trace_eprobe_create+0x16f4/0x1d20
[ 5034.353964]  ? eprobe_dyn_event_release+0x360/0x360
[ 5034.354363]  ? mark_held_locks+0xa6/0xf0
[ 5034.354684]  ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 5034.355105]  ? trace_hardirqs_on+0x41/0x120
[ 5034.355417]  ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 5034.355751]  ? __create_object+0x5b7/0xcf0
[ 5034.356027]  ? lock_is_held_type+0xaf/0x120
[ 5034.356362]  ? rcu_read_lock_bh_held+0xb0/0xd0
[ 5034.356716]  ? rcu_read_lock_bh_held+0xd0/0xd0
[ 5034.357084]  ? kasan_set_track+0x29/0x40
[ 5034.357411]  ? kasan_save_alloc_info+0x1f/0x30
[ 5034.357715]  ? __kasan_kmalloc+0xb8/0xc0
[ 5034.357985]  ? write_comp_data+0x2f/0x90
[ 5034.358302]  ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.358691]  ? argv_split+0x381/0x460
[ 5034.358949]  ? write_comp_data+0x2f/0x90
[ 5034.359240]  ? eprobe_dyn_event_release+0x360/0x360
[ 5034.359620]  trace_probe_create+0xf6/0x110
[ 5034.359940]  ? trace_probe_match_command_args+0x240/0x240
[ 5034.360376]  eprobe_dyn_event_create+0x21/0x30
[ 5034.360709]  create_dyn_event+0xf3/0x1a0
[ 5034.360983]  trace_parse_run_command+0x1a9/0x2e0
[ 5034.361297]  ? dyn_event_release+0x500/0x500
[ 5034.361591]  dyn_event_write+0x39/0x50
[ 5034.361851]  vfs_write+0x311/0xe50
[ 5034.362091]  ? dyn_event_seq_next+0x40/0x40
[ 5034.362376]  ? kernel_write+0x5b0/0x5b0
[ 5034.362637]  ? write_comp_data+0x2f/0x90
[ 5034.362937]  ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.363258]  ? ftrace_syscall_enter+0x544/0x840
[ 5034.363563]  ? write_comp_data+0x2f/0x90
[ 5034.363837]  ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.364156]  ? write_comp_data+0x2f/0x90
[ 5034.364468]  ? write_comp_data+0x2f/0x90
[ 5034.364770]  ksys_write+0x158/0x2a0
[ 5034.365022]  ? __ia32_sys_read+0xc0/0xc0
[ 5034.365344]  __x64_sys_write+0x7c/0xc0
[ 5034.365669]  ? syscall_enter_from_user_mode+0x53/0x70
[ 5034.366084]  do_syscall_64+0x60/0x90
[ 5034.366356]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 5034.366767] RIP: 0033:0x7ff0b43938f3
[...] stripped
[ 5034.371892]  </TASK>
[ 5034.374720] ---[ end trace 0000000000000000 ]---

Link: https://lore.kernel.org/all/20221108202148.1020111-1-rafaelmendsr@gmail.com/

Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18 10:15:34 +09:00
Yi Yang
0a1ebe35cb rethook: fix a potential memleak in rethook_alloc()
In rethook_alloc(), the variable rh is not freed or passed out
if handler is NULL, which could lead to a memleak, fix it.

Link: https://lore.kernel.org/all/20221110104438.88099-1-yiyang13@huawei.com/
[Masami: Add "rethook:" tag to the title.]

Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Cc: stable@vger.kernel.org
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Acke-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18 10:15:34 +09:00
Rafael Mendonca
d1776c0202 tracing/eprobe: Fix memory leak of filter string
The filter string doesn't get freed when a dynamic event is deleted. If a
filter is set, then memory is leaked:

root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
        sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
root@localhost:/sys/kernel/tracing# echo "-:egroup/stat_runtime_4core"  >> dynamic_events
root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak
[  224.416373] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88810156f1b8 (size 8):
  comm "bash", pid 224, jiffies 4294935612 (age 55.800s)
  hex dump (first 8 bytes):
    63 70 75 20 3c 20 34 00                          cpu < 4.
  backtrace:
    [<000000009f880725>] __kmem_cache_alloc_node+0x18e/0x720
    [<0000000042492946>] __kmalloc+0x57/0x240
    [<0000000034ea7995>] __trace_eprobe_create+0x1214/0x1d30
    [<00000000d70ef730>] trace_probe_create+0xf6/0x110
    [<00000000915c7b16>] eprobe_dyn_event_create+0x21/0x30
    [<000000000d894386>] create_dyn_event+0xf3/0x1a0
    [<00000000e9af57d5>] trace_parse_run_command+0x1a9/0x2e0
    [<0000000080777f18>] dyn_event_write+0x39/0x50
    [<0000000089f0ec73>] vfs_write+0x311/0xe50
    [<000000003da1bdda>] ksys_write+0x158/0x2a0
    [<00000000bb1e616e>] __x64_sys_write+0x7c/0xc0
    [<00000000e8aef1f7>] do_syscall_64+0x60/0x90
    [<00000000fe7fe8ba>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Additionally, in __trace_eprobe_create() function, if an error occurs after
the call to trace_eprobe_parse_filter(), which allocates the filter string,
then memory is also leaked. That can be reproduced by creating the same
event probe twice:

root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
        sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
        sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
-bash: echo: write error: File exists
root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak
[  207.871584] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8881020d17a8 (size 8):
  comm "bash", pid 223, jiffies 4294938308 (age 31.000s)
  hex dump (first 8 bytes):
    63 70 75 20 3c 20 34 00                          cpu < 4.
  backtrace:
    [<000000000e4f5f31>] __kmem_cache_alloc_node+0x18e/0x720
    [<0000000024f0534b>] __kmalloc+0x57/0x240
    [<000000002930a28e>] __trace_eprobe_create+0x1214/0x1d30
    [<0000000028387903>] trace_probe_create+0xf6/0x110
    [<00000000a80d6a9f>] eprobe_dyn_event_create+0x21/0x30
    [<000000007168698c>] create_dyn_event+0xf3/0x1a0
    [<00000000f036bf6a>] trace_parse_run_command+0x1a9/0x2e0
    [<00000000014bde8b>] dyn_event_write+0x39/0x50
    [<0000000078a097f7>] vfs_write+0x311/0xe50
    [<00000000996cb208>] ksys_write+0x158/0x2a0
    [<00000000a3c2acb0>] __x64_sys_write+0x7c/0xc0
    [<0000000006b5d698>] do_syscall_64+0x60/0x90
    [<00000000780e8ecf>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fix both issues by releasing the filter string in
trace_event_probe_cleanup().

Link: https://lore.kernel.org/all/20221108235738.1021467-1-rafaelmendsr@gmail.com/

Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18 10:15:34 +09:00
Shang XiaoJing
22ea4ca963 tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()
When test_gen_kprobe_cmd() failed after kprobe_event_gen_cmd_end(), it
will goto delete, which will call kprobe_event_delete() and release the
corresponding resource. However, the trace_array in gen_kretprobe_test
will point to the invalid resource. Set gen_kretprobe_test to NULL
after called kprobe_event_delete() to prevent null-ptr-deref.

BUG: kernel NULL pointer dereference, address: 0000000000000070
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 246 Comm: modprobe Tainted: G        W
6.1.0-rc1-00174-g9522dc5c87da-dirty #248
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:__ftrace_set_clr_event_nolock+0x53/0x1b0
Code: e8 82 26 fc ff 49 8b 1e c7 44 24 0c ea ff ff ff 49 39 de 0f 84 3c
01 00 00 c7 44 24 18 00 00 00 00 e8 61 26 fc ff 48 8b 6b 10 <44> 8b 65
70 4c 8b 6d 18 41 f7 c4 00 02 00 00 75 2f
RSP: 0018:ffffc9000159fe00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88810971d268 RCX: 0000000000000000
RDX: ffff8881080be600 RSI: ffffffff811b48ff RDI: ffff88810971d058
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc9000159fe58 R11: 0000000000000001 R12: ffffffffa0001064
R13: ffffffffa000106c R14: ffff88810971d238 R15: 0000000000000000
FS:  00007f89eeff6540(0000) GS:ffff88813b600000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000070 CR3: 000000010599e004 CR4: 0000000000330ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __ftrace_set_clr_event+0x3e/0x60
 trace_array_set_clr_event+0x35/0x50
 ? 0xffffffffa0000000
 kprobe_event_gen_test_exit+0xcd/0x10b [kprobe_event_gen_test]
 __x64_sys_delete_module+0x206/0x380
 ? lockdep_hardirqs_on_prepare+0xd8/0x190
 ? syscall_enter_from_user_mode+0x1c/0x50
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89eeb061b7

Link: https://lore.kernel.org/all/20221108015130.28326-3-shangxiaojing@huawei.com/

Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18 10:15:34 +09:00
Shang XiaoJing
e0d75267f5 tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
When trace_get_event_file() failed, gen_kretprobe_test will be assigned
as the error code. If module kprobe_event_gen_test is removed now, the
null pointer dereference will happen in kprobe_event_gen_test_exit().
Check if gen_kprobe_test or gen_kretprobe_test is error code or NULL
before dereference them.

BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 2210 Comm: modprobe Not tainted
6.1.0-rc1-00171-g2159299a3b74-dirty #217
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:kprobe_event_gen_test_exit+0x1c/0xb5 [kprobe_event_gen_test]
Code: Unable to access opcode bytes at 0xffffffff9ffffff2.
RSP: 0018:ffffc900015bfeb8 EFLAGS: 00010246
RAX: ffffffffffffffea RBX: ffffffffa0002080 RCX: 0000000000000000
RDX: ffffffffa0001054 RSI: ffffffffa0001064 RDI: ffffffffdfc6349c
RBP: ffffffffa0000000 R08: 0000000000000004 R09: 00000000001e95c0
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000800
R13: ffffffffa0002420 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f56b75be540(0000) GS:ffff88813bc00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff9ffffff2 CR3: 000000010874a006 CR4: 0000000000330ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __x64_sys_delete_module+0x206/0x380
 ? lockdep_hardirqs_on_prepare+0xd8/0x190
 ? syscall_enter_from_user_mode+0x1c/0x50
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Link: https://lore.kernel.org/all/20221108015130.28326-2-shangxiaojing@huawei.com/

Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18 10:15:33 +09:00
Shang XiaoJing
1b5f1c34d3 tracing: Fix wild-memory-access in register_synth_event()
In register_synth_event(), if set_synth_event_print_fmt() failed, then
both trace_remove_event_call() and unregister_trace_event() will be
called, which means the trace_event_call will call
__unregister_trace_event() twice. As the result, the second unregister
will causes the wild-memory-access.

register_synth_event
    set_synth_event_print_fmt failed
    trace_remove_event_call
        event_remove
            if call->event.funcs then
            __unregister_trace_event (first call)
    unregister_trace_event
        __unregister_trace_event (second call)

Fix the bug by avoiding to call the second __unregister_trace_event() by
checking if the first one is called.

general protection fault, probably for non-canonical address
	0xfbd59c0000000024: 0000 [#1] SMP KASAN PTI
KASAN: maybe wild-memory-access in range
[0xdead000000000120-0xdead000000000127]
CPU: 0 PID: 3807 Comm: modprobe Not tainted
6.1.0-rc1-00186-g76f33a7eedb4 #299
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_trace_event+0x6e/0x280
Code: 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 0e 02 00 00 48
b8 00 00 00 00 00 fc ff df 4c 8b 63 08 4c 89 e2 48 c1 ea 03 <80> 3c 02
00 0f 85 e2 01 00 00 49 89 2c 24 48 85 ed 74 28 e8 7a 9b
RSP: 0018:ffff88810413f370 EFLAGS: 00010a06
RAX: dffffc0000000000 RBX: ffff888105d050b0 RCX: 0000000000000000
RDX: 1bd5a00000000024 RSI: ffff888119e276e0 RDI: ffffffff835a8b20
RBP: dead000000000100 R08: 0000000000000000 R09: fffffbfff0913481
R10: ffffffff8489a407 R11: fffffbfff0913480 R12: dead000000000122
R13: ffff888105d050b8 R14: 0000000000000000 R15: ffff888105d05028
FS:  00007f7823e8d540(0000) GS:ffff888119e00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7823e7ebec CR3: 000000010a058002 CR4: 0000000000330ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __create_synth_event+0x1e37/0x1eb0
 create_or_delete_synth_event+0x110/0x250
 synth_event_run_command+0x2f/0x110
 test_gen_synth_cmd+0x170/0x2eb [synth_event_gen_test]
 synth_event_gen_test_init+0x76/0x9bc [synth_event_gen_test]
 do_one_initcall+0xdb/0x480
 do_init_module+0x1cf/0x680
 load_module+0x6a50/0x70a0
 __do_sys_finit_module+0x12f/0x1c0
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Link: https://lkml.kernel.org/r/20221117012346.22647-3-shangxiaojing@huawei.com

Fixes: 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17 18:24:58 -05:00
Shang XiaoJing
a4527fef9a tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
test_gen_synth_cmd() only free buf in fail path, hence buf will leak
when there is no failure. Add kfree(buf) to prevent the memleak. The
same reason and solution in test_empty_synth_event().

unreferenced object 0xffff8881127de000 (size 2048):
  comm "modprobe", pid 247, jiffies 4294972316 (age 78.756s)
  hex dump (first 32 bytes):
    20 67 65 6e 5f 73 79 6e 74 68 5f 74 65 73 74 20   gen_synth_test
    20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 64 5f   pid_t next_pid_
  backtrace:
    [<000000004254801a>] kmalloc_trace+0x26/0x100
    [<0000000039eb1cf5>] 0xffffffffa00083cd
    [<000000000e8c3bc8>] 0xffffffffa00086ba
    [<00000000c293d1ea>] do_one_initcall+0xdb/0x480
    [<00000000aa189e6d>] do_init_module+0x1cf/0x680
    [<00000000d513222b>] load_module+0x6a50/0x70a0
    [<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0
    [<00000000b36c4c0f>] do_syscall_64+0x3f/0x90
    [<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
unreferenced object 0xffff8881127df000 (size 2048):
  comm "modprobe", pid 247, jiffies 4294972324 (age 78.728s)
  hex dump (first 32 bytes):
    20 65 6d 70 74 79 5f 73 79 6e 74 68 5f 74 65 73   empty_synth_tes
    74 20 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69  t  pid_t next_pi
  backtrace:
    [<000000004254801a>] kmalloc_trace+0x26/0x100
    [<00000000d4db9a3d>] 0xffffffffa0008071
    [<00000000c31354a5>] 0xffffffffa00086ce
    [<00000000c293d1ea>] do_one_initcall+0xdb/0x480
    [<00000000aa189e6d>] do_init_module+0x1cf/0x680
    [<00000000d513222b>] load_module+0x6a50/0x70a0
    [<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0
    [<00000000b36c4c0f>] do_syscall_64+0x3f/0x90
    [<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Link: https://lkml.kernel.org/r/20221117012346.22647-2-shangxiaojing@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: <fengguang.wu@intel.com>
Cc: stable@vger.kernel.org
Fixes: 9fe41efaca08 ("tracing: Add synth event generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17 17:51:38 -05:00
Xiu Jianfeng
19ba6c8af9 ftrace: Fix null pointer dereference in ftrace_add_mod()
The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next}
of @ftrace_mode->list are NULL, it's not a valid state to call list_del().
If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free
tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del()
will write prev->next and next->prev, where null pointer dereference
happens.

BUG: kernel NULL pointer dereference, address: 0000000000000008
Oops: 0002 [#1] PREEMPT SMP NOPTI
Call Trace:
 <TASK>
 ftrace_mod_callback+0x20d/0x220
 ? do_filp_open+0xd9/0x140
 ftrace_process_regex.isra.51+0xbf/0x130
 ftrace_regex_write.isra.52.part.53+0x6e/0x90
 vfs_write+0xee/0x3a0
 ? __audit_filter_op+0xb1/0x100
 ? auditd_test_task+0x38/0x50
 ksys_write+0xa5/0xe0
 do_syscall_64+0x3a/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Kernel panic - not syncing: Fatal exception

So call INIT_LIST_HEAD() to initialize the list member to fix this issue.

Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com

Cc: stable@vger.kernel.org
Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17 17:16:44 -05:00
Daniil Tatianin
56f4ca0a79 ring_buffer: Do not deactivate non-existant pages
rb_head_page_deactivate() expects cpu_buffer to contain a valid list of
->pages, so verify that the list is actually present before calling it.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru

Cc: stable@vger.kernel.org
Fixes: 77ae365eca895 ("ring-buffer: make lockless")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17 17:10:40 -05:00
Wang Wensheng
bcea02b096 ftrace: Optimize the allocation for mcount entries
If we can't allocate this size, try something smaller with half of the
size. Its order should be decreased by one instead of divided by two.

Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17 15:42:48 -05:00
Wang Wensheng
08948caebe ftrace: Fix the possible incorrect kernel message
If the number of mcount entries is an integer multiple of
ENTRIES_PER_PAGE, the page count showing on the console would be wrong.

Link: https://lkml.kernel.org/r/20221109094434.84046-2-wangwensheng4@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: 5821e1b74f0d0 ("function tracing: fix wrong pos computing when read buffer has been fulfilled")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17 15:41:31 -05:00
Wang Yufen
649e72070c tracing: Fix memory leak in tracing_read_pipe()
kmemleak reports this issue:

unreferenced object 0xffff888105a18900 (size 128):
  comm "test_progs", pid 18933, jiffies 4336275356 (age 22801.766s)
  hex dump (first 32 bytes):
    25 73 00 90 81 88 ff ff 26 05 00 00 42 01 58 04  %s......&...B.X.
    03 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000560143a1>] __kmalloc_node_track_caller+0x4a/0x140
    [<000000006af00822>] krealloc+0x8d/0xf0
    [<00000000c309be6a>] trace_iter_expand_format+0x99/0x150
    [<000000005a53bdb6>] trace_check_vprintf+0x1e0/0x11d0
    [<0000000065629d9d>] trace_event_printf+0xb6/0xf0
    [<000000009a690dc7>] trace_raw_output_bpf_trace_printk+0x89/0xc0
    [<00000000d22db172>] print_trace_line+0x73c/0x1480
    [<00000000cdba76ba>] tracing_read_pipe+0x45c/0x9f0
    [<0000000015b58459>] vfs_read+0x17b/0x7c0
    [<000000004aeee8ed>] ksys_read+0xed/0x1c0
    [<0000000063d3d898>] do_syscall_64+0x3b/0x90
    [<00000000a06dda7f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

iter->fmt alloced in
  tracing_read_pipe() -> .. ->trace_iter_expand_format(), but not
freed, to fix, add free in tracing_release_pipe()

Link: https://lkml.kernel.org/r/1667819090-4643-1-git-send-email-wangyufen@huawei.com

Cc: stable@vger.kernel.org
Fixes: efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-16 20:29:00 -05:00
Steven Rostedt (Google)
31029a8b2c ring-buffer: Include dropped pages in counting dirty patches
The function ring_buffer_nr_dirty_pages() was created to find out how many
pages are filled in the ring buffer. There's two running counters. One is
incremented whenever a new page is touched (pages_touched) and the other
is whenever a page is read (pages_read). The dirty count is the number
touched minus the number read. This is used to determine if a blocked task
should be woken up if the percentage of the ring buffer it is waiting for
is hit.

The problem is that it does not take into account dropped pages (when the
new writes overwrite pages that were not read). And then the dirty pages
will always be greater than the percentage.

This makes the "buffer_percent" file inaccurate, as the number of dirty
pages end up always being larger than the percentage, event when it's not
and this causes user space to be woken up more than it wants to be.

Add a new counter to keep track of lost pages, and include that in the
accounting of dirty pages so that it is actually accurate.

Link: https://lkml.kernel.org/r/20221021123013.55fb6055@gandalf.local.home

Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-16 18:08:29 -05:00
Steven Rostedt (Google)
42fb0a1e84 tracing/ring-buffer: Have polling block on watermark
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.

That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.

Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.

Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home

Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Primiano Tucci <primiano@google.com>
Cc: stable@vger.kernel.org
Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-16 17:13:04 -05:00
Jakub Kicinski
966a9b4903 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/pch_can.c
  ae64438be192 ("can: dev: fix skb drop check")
  1dd1b521be85 ("can: remove obsolete PCH CAN driver")
https://lore.kernel.org/all/20221110102509.1f7d63cc@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10 17:43:53 -08:00
Shang XiaoJing
66f0919c95 tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()
test_gen_kprobe_cmd() only free buf in fail path, hence buf will leak
when there is no failure. Move kfree(buf) from fail path to common path
to prevent the memleak. The same reason and solution in
test_gen_kretprobe_cmd().

unreferenced object 0xffff888143b14000 (size 2048):
  comm "insmod", pid 52490, jiffies 4301890980 (age 40.553s)
  hex dump (first 32 bytes):
    70 3a 6b 70 72 6f 62 65 73 2f 67 65 6e 5f 6b 70  p:kprobes/gen_kp
    72 6f 62 65 5f 74 65 73 74 20 64 6f 5f 73 79 73  robe_test do_sys
  backtrace:
    [<000000006d7b836b>] kmalloc_trace+0x27/0xa0
    [<0000000009528b5b>] 0xffffffffa059006f
    [<000000008408b580>] do_one_initcall+0x87/0x2a0
    [<00000000c4980a7e>] do_init_module+0xdf/0x320
    [<00000000d775aad0>] load_module+0x3006/0x3390
    [<00000000e9a74b80>] __do_sys_finit_module+0x113/0x1b0
    [<000000003726480d>] do_syscall_64+0x35/0x80
    [<000000003441e93b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Link: https://lore.kernel.org/all/20221102072954.26555-1-shangxiaojing@huawei.com/

Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Cc: stable@vger.kernel.org
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-04 09:10:03 +09:00
Masami Hiramatsu (Google)
61b304b73a tracing/fprobe: Fix to check whether fprobe is registered correctly
Since commit ab51e15d535e ("fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag
for fprobe") introduced fprobe_kprobe_handler() for fprobe::ops::func,
unregister_fprobe() fails to unregister the registered if user specifies
FPROBE_FL_KPROBE_SHARED flag.
Moreover, __register_ftrace_function() is possible to change the
ftrace_ops::func, thus we have to check fprobe::ops::saved_func instead.

To check it correctly, it should confirm the fprobe::ops::saved_func is
either fprobe_handler() or fprobe_kprobe_handler().

Link: https://lore.kernel.org/all/166677683946.1459107.15997653945538644683.stgit@devnote3/

Fixes: cad9931f64dc ("fprobe: Add ftrace based probe APIs")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-04 08:50:07 +09:00
Rafael Mendonca
d05ea35e7e fprobe: Check rethook_alloc() return in rethook initialization
Check if fp->rethook succeeded to be allocated. Otherwise, if
rethook_alloc() fails, then we end up dereferencing a NULL pointer in
rethook_add_node().

Link: https://lore.kernel.org/all/20221025031209.954836-1-rafaelmendsr@gmail.com/

Fixes: 5b0ab78998e3 ("fprobe: Add exit_handler support")
Cc: stable@vger.kernel.org
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-04 08:50:00 +09:00
Li Huafei
0e792b89e6 ftrace: Fix use-after-free for dynamic ftrace_ops
KASAN reported a use-after-free with ftrace ops [1]. It was found from
vmcore that perf had registered two ops with the same content
successively, both dynamic. After unregistering the second ops, a
use-after-free occurred.

In ftrace_shutdown(), when the second ops is unregistered, the
FTRACE_UPDATE_CALLS command is not set because there is another enabled
ops with the same content.  Also, both ops are dynamic and the ftrace
callback function is ftrace_ops_list_func, so the
FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value
of 'command' will be 0 and ftrace_shutdown() will skip the rcu
synchronization.

However, ftrace may be activated. When the ops is released, another CPU
may be accessing the ops.  Add the missing synchronization to fix this
problem.

[1]
BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468

CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132
 show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b4/0x248 lib/dump_stack.c:118
 print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387
 __kasan_report mm/kasan/report.c:547 [inline]
 kasan_report+0x118/0x210 mm/kasan/report.c:564
 check_memory_region_inline mm/kasan/generic.c:187 [inline]
 __asan_load8+0x98/0xc0 mm/kasan/generic.c:253
 __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline]
 ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049
 ftrace_graph_call+0x0/0x4
 __might_sleep+0x8/0x100 include/linux/perf_event.h:1170
 __might_fault mm/memory.c:5183 [inline]
 __might_fault+0x58/0x70 mm/memory.c:5171
 do_strncpy_from_user lib/strncpy_from_user.c:41 [inline]
 strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139
 getname_flags+0xb0/0x31c fs/namei.c:149
 getname+0x2c/0x40 fs/namei.c:209
 [...]

Allocated by task 14445:
 kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc mm/kasan/common.c:479 [inline]
 __kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449
 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493
 kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950
 kmalloc include/linux/slab.h:563 [inline]
 kzalloc include/linux/slab.h:675 [inline]
 perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230
 perf_event_alloc kernel/events/core.c:11733 [inline]
 __do_sys_perf_event_open kernel/events/core.c:11831 [inline]
 __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
 __arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723
 [...]

Freed by task 14445:
 kasan_save_stack+0x24/0x50 mm/kasan/common.c:48
 kasan_set_track+0x24/0x34 mm/kasan/common.c:56
 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358
 __kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437
 __kasan_slab_free mm/kasan/common.c:445 [inline]
 kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446
 slab_free_hook mm/slub.c:1569 [inline]
 slab_free_freelist_hook mm/slub.c:1608 [inline]
 slab_free mm/slub.c:3179 [inline]
 kfree+0x12c/0xc10 mm/slub.c:4176
 perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434
 perf_event_alloc kernel/events/core.c:11733 [inline]
 __do_sys_perf_event_open kernel/events/core.c:11831 [inline]
 __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723
 [...]

Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei1@huawei.com

Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled")
Cc: stable@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-02 23:53:22 -04:00
Steven Rostedt (Google)
7433632c9f ring-buffer: Check for NULL cpu_buffer in ring_buffer_wake_waiters()
On some machines the number of listed CPUs may be bigger than the actual
CPUs that exist. The tracing subsystem allocates a per_cpu directory with
access to the per CPU ring buffer via a cpuX file. But to save space, the
ring buffer will only allocate buffers for online CPUs, even though the
CPU array will be as big as the nr_cpu_ids.

With the addition of waking waiters on the ring buffer when closing the
file, the ring_buffer_wake_waiters() now needs to make sure that the
buffer is allocated (with the irq_work allocated with it) before trying to
wake waiters, as it will cause a NULL pointer dereference.

While debugging this, I added a NULL check for the buffer itself (which is
OK to do), and also NULL pointer checks against buffer->buffers (which is
not fine, and will WARN) as well as making sure the CPU number passed in
is within the nr_cpu_ids (which is also not fine if it isn't).

Link: https://lore.kernel.org/all/87h6zklb6n.wl-tiwai@suse.de/
Link: https://lore.kernel.org/all/CAM6Wdxc0KRJMXVAA0Y=u6Jh2V=uWB-_Fn6M4xRuNppfXzL1mUg@mail.gmail.com/
Link: https://lkml.kernel.org/linux-trace-kernel/20221101191009.1e7378c8@rorschach.local.home

Cc: stable@vger.kernel.org
Cc: Steven Noonan <steven.noonan@gmail.com>
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1204705
Reported-by: Takashi Iwai <tiwai@suse.de>
Reported-by: Roland Ruckerbauer <roland.rucky@gmail.com>
Fixes: f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-02 23:52:03 -04:00
Jakub Kicinski
b54a0d4094 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY2GuKgAKCRDbK58LschI
 gy32AP9PI0e/bUGDExKJ8g97PeeEtnpj4TTI6g+XKILtYnyXlgD/Rk4j2D/f3IBF
 Ha9TmqYvAUim+U/g50vUrNuoNLNJ5w8=
 =OKC1
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2022-11-02

We've added 70 non-merge commits during the last 14 day(s) which contain
a total of 96 files changed, 3203 insertions(+), 640 deletions(-).

The main changes are:

1) Make cgroup local storage available to non-cgroup attached BPF programs
   such as tc BPF ones, from Yonghong Song.

2) Avoid unnecessary deadlock detection and failures wrt BPF task storage
   helpers, from Martin KaFai Lau.

3) Add LLVM disassembler as default library for dumping JITed code
   in bpftool, from Quentin Monnet.

4) Various kprobe_multi_link fixes related to kernel modules,
   from Jiri Olsa.

5) Optimize x86-64 JIT with emitting BMI2-based shift instructions,
   from Jie Meng.

6) Improve BPF verifier's memory type compatibility for map key/value
   arguments, from Dave Marchevsky.

7) Only create mmap-able data section maps in libbpf when data is exposed
   via skeletons, from Andrii Nakryiko.

8) Add an autoattach option for bpftool to load all object assets,
   from Wang Yufen.

9) Various memory handling fixes for libbpf and BPF selftests,
   from Xu Kuohai.

10) Initial support for BPF selftest's vmtest.sh on arm64,
    from Manu Bretelle.

11) Improve libbpf's BTF handling to dedup identical structs,
    from Alan Maguire.

12) Add BPF CI and denylist documentation for BPF selftests,
    from Daniel Müller.

13) Check BPF cpumap max_entries before doing allocation work,
    from Florian Lehner.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits)
  samples/bpf: Fix typo in README
  bpf: Remove the obsolte u64_stats_fetch_*_irq() users.
  bpf: check max_entries before allocating memory
  bpf: Fix a typo in comment for DFS algorithm
  bpftool: Fix spelling mistake "disasembler" -> "disassembler"
  selftests/bpf: Fix bpftool synctypes checking failure
  selftests/bpf: Panic on hard/soft lockup
  docs/bpf: Add documentation for new cgroup local storage
  selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x
  selftests/bpf: Add selftests for new cgroup local storage
  selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str
  bpftool: Support new cgroup local storage
  libbpf: Support new cgroup local storage
  bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
  bpf: Refactor some inode/task/sk storage functions for reuse
  bpf: Make struct cgroup btf id global
  selftests/bpf: Tracing prog can still do lookup under busy lock
  selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection
  bpf: Add new bpf_task_storage_delete proto with no deadlock detection
  bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check
  ...
====================

Link: https://lore.kernel.org/r/20221102062120.5724-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-02 08:18:27 -07:00
Yonghong Song
c4bcfb38a9 bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Similar to sk/inode/task storage, implement similar cgroup local storage.

There already exists a local storage implementation for cgroup-attached
bpf programs.  See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
bpf_get_local_storage(). But there are use cases such that non-cgroup
attached bpf progs wants to access cgroup local storage data. For example,
tc egress prog has access to sk and cgroup. It is possible to use
sk local storage to emulate cgroup local storage by storing data in socket.
But this is a waste as it could be lots of sockets belonging to a particular
cgroup. Alternatively, a separate map can be created with cgroup id as the key.
But this will introduce additional overhead to manipulate the new map.
A cgroup local storage, similar to existing sk/inode/task storage,
should help for this use case.

The life-cycle of storage is managed with the life-cycle of the
cgroup struct.  i.e. the storage is destroyed along with the owning cgroup
with a call to bpf_cgrp_storage_free() when cgroup itself
is deleted.

The userspace map operations can be done by using a cgroup fd as a key
passed to the lookup, update and delete operations.

Typically, the following code is used to get the current cgroup:
    struct task_struct *task = bpf_get_current_task_btf();
    ... task->cgroups->dfl_cgrp ...
and in structure task_struct definition:
    struct task_struct {
        ....
        struct css_set __rcu            *cgroups;
        ....
    }
With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
So the current implementation only supports non-sleepable program and supporting
sleepable program will be the next step together with adding rcu_read_lock
protection for rcu tagged structures.

Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
for cgroup storage available to non-cgroup-attached bpf programs. The old
cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
functionality. While old cgroup storage pre-allocates storage memory, the new
mechanism can also pre-allocate with a user space bpf_map_update_elem() call
to avoid potential run-time memory allocation failure.
Therefore, the new cgroup storage can provide all functionality w.r.t.
the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
be deprecated since the new one can provide the same functionality.

Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25 23:19:19 -07:00
Martin KaFai Lau
8a7dac37f2 bpf: Add new bpf_task_storage_delete proto with no deadlock detection
The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
The situation is similar to the bpf_pid_task_storage_delete_elem()
which is called from the syscall map_delete_elem.  It does not need
deadlock detection.  Otherwise, it will cause unnecessary failure
when calling the bpf_task_storage_delete() helper.

This patch adds bpf_task_storage_delete proto that does not do deadlock
detection.  It will be used by bpf_lsm and bpf_iter program.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-8-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25 23:11:46 -07:00
Martin KaFai Lau
4279adb094 bpf: Add new bpf_task_storage_get proto with no deadlock detection
The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
The situation is similar to the bpf_pid_task_storage_lookup_elem()
which is called from the syscall map_lookup_elem.  It does not need
deadlock detection.  Otherwise, it will cause unnecessary failure
when calling the bpf_task_storage_get() helper.

This patch adds bpf_task_storage_get proto that does not do deadlock
detection.  It will be used by bpf_lsm and bpf_iter programs.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25 23:11:46 -07:00
Martin KaFai Lau
0593dd34e5 bpf: Append _recur naming to the bpf_task_storage helper proto
This patch adds the "_recur" naming to the bpf_task_storage_{get,delete}
proto.  In a latter patch, they will only be used by the tracing
programs that requires a deadlock detection because a tracing
prog may use bpf_task_storage_{get,delete} recursively and cause a
deadlock.

Another following patch will add a different helper proto for the non
tracing programs because they do not need the deadlock prevention.
This patch does this rename to prepare for this future proto
additions.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25 23:11:46 -07:00
Jiri Olsa
e22061b2d3 bpf: Take module reference on kprobe_multi link
Currently we allow to create kprobe multi link on function from kernel
module, but we don't take the module reference to ensure it's not
unloaded while we are tracing it.

The multi kprobe link is based on fprobe/ftrace layer which takes
different approach and releases ftrace hooks when module is unloaded
even if there's tracer registered on top of it.

Adding code that gathers all the related modules for the link and takes
their references before it's attached. All kernel module references are
released after link is unregistered.

Note that we do it the same way already for trampoline probes
(but for single address).

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25 10:14:51 -07:00