bb2e04d449
15446 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Kaige Ye
|
58a8d2edd5 |
perf stat-display: Check if snprintf()'s fmt argument is NULL
It is undefined behavior to pass NULL as snprintf()'s fmt argument. Here is an example to trigger the problem: $ perf stat --metric-only -x, -e instructions -- sleep 1 insn per cycle, Segmentation fault (core dumped) With this patch: $ perf stat --metric-only -x, -e instructions -- sleep 1 insn per cycle, , Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kaige Ye <ye@kaige.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/01CA7674B690CA24+20230804020907.144562-2-ye@kaige.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
7d9642311b |
perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(augmented_arg->value) is a power of two.
Similar to what was done in the previous cset for sizeof(saddr), we need to make sure sizeof(augmented_arg->value) is a power of two to do bounds checking using &=: augmented_len &= sizeof(augmented_arg->value) - 1; Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZONrPo0NSqdbXiGx@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
262b54b6c9 |
perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(saddr) is a power of two.
We're using the BPF verifier suggestion: 22: (85) call bpf_probe_read#4 R2 min value is negative, either use unsigned or 'var &= const' That works only when const is a (power of two - 1) so add an assert to make sure that that is the case. Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZONrFmJBNlQpSpZj@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ilkka Koskinen
|
a50b8db3ea |
perf vendor events arm64: AmpereOne: Remove unsupported events
Some of the events included in the ampereone/core-imp-def are not supported on AmpereOne, remove them. Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230803211331.140553-5-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ilkka Koskinen
|
705ed54914 |
perf vendor events arm64: Add AmpereOne metrics
This patch adds AmpereOne metrics. The metrics also work around the issue related to some of the events. Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230803211331.140553-4-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ilkka Koskinen
|
47715f2b62 |
perf vendor events arm64: AmpereOne: Mark affected STALL_* events impacted by errata
Per errata AC03_CPU_29, STALL_SLOT_FRONTEND, STALL_FRONTEND, and STALL events are not counting as expected. The follow up metrics patch will include correct way to calculate the impacted events. Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230803211331.140553-3-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ilkka Koskinen
|
b8af10062d |
perf vendor events arm64: Remove L1D_CACHE_LMISS from AmpereOne list
amperene/cache.json file tried to include L1D_CACHE_LMISS while it
doesn't exist in common-and-microarch.json. While this bug doesn't seem to
cause issue in newer kernels with jevents.py script, it prevents building
older perf tools with the backported patch.
Fixes:
|
||
John Garry
|
7298e87607 |
perf jevents: Raise exception for no definition of a arch std event
Recently Ilkka reported that the JSONs for the AmpereOne arm64-based platform included a dud event which referenced a non-existent arch std event [0]. Previously in the times of jevents.c, we would raise an exception for this. This is still invalid, even though the current code just ignores such an event. Re-introduce code to raise an exception for when no definition exists to help catch as many invalid JSONs as possible. [0] https://lore.kernel.org/linux-perf-users/9e851e2a-26c7-ba78-cb20-be4337b2916a@oracle.com/ Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/20230807111631.3033102-1-john.g.garry@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
64917f4df0 |
perf trace: Use heuristic when deciding if a syscall tracepoint "const char *" field is really a string
'perf trace' tries to find BPF progs associated with a syscall that have a signature that is similar to syscalls without one to try and reuse, so, for instance, the 'open' signature can be reused with many other syscalls that have as its first arg a string. It uses the tracefs events format file for finding a signature that can be reused, but then comes the "write" syscall with its second argument as a "const char *": # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_write/format name: sys_enter_write ID: 746 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:int __syscall_nr; offset:8; size:4; signed:1; field:unsigned int fd; offset:16; size:8; signed:0; field:const char * buf; offset:24; size:8; signed:0; field:size_t count; offset:32; size:8; signed:0; print fmt: "fd: 0x%08lx, buf: 0x%08lx, count: 0x%08lx", ((unsigned long)(REC->fd)), ((unsigned long)(REC->buf)), ((unsigned long)(REC->count)) # Which isn't a string (the man page for glibc has buf as "void *"), so we have to use the name of the argument as an heuristic, to consider a string just args that are "const char *" and that have in its name the "path", "file", etc substrings. With that now it reuses: [root@quaco ~]# perf trace -v --max-events=1 |& grep Reus Reusing "open" BPF sys_enter augmenter for "stat" Reusing "open" BPF sys_enter augmenter for "lstat" Reusing "open" BPF sys_enter augmenter for "access" Reusing "connect" BPF sys_enter augmenter for "accept" Reusing "sendto" BPF sys_enter augmenter for "recvfrom" Reusing "connect" BPF sys_enter augmenter for "bind" Reusing "connect" BPF sys_enter augmenter for "getsockname" Reusing "connect" BPF sys_enter augmenter for "getpeername" Reusing "open" BPF sys_enter augmenter for "execve" Reusing "open" BPF sys_enter augmenter for "truncate" Reusing "open" BPF sys_enter augmenter for "chdir" Reusing "open" BPF sys_enter augmenter for "mkdir" Reusing "open" BPF sys_enter augmenter for "rmdir" Reusing "open" BPF sys_enter augmenter for "creat" Reusing "open" BPF sys_enter augmenter for "link" Reusing "open" BPF sys_enter augmenter for "unlink" Reusing "open" BPF sys_enter augmenter for "symlink" Reusing "open" BPF sys_enter augmenter for "readlink" Reusing "open" BPF sys_enter augmenter for "chmod" Reusing "open" BPF sys_enter augmenter for "chown" Reusing "open" BPF sys_enter augmenter for "lchown" Reusing "open" BPF sys_enter augmenter for "mknod" Reusing "open" BPF sys_enter augmenter for "statfs" Reusing "open" BPF sys_enter augmenter for "pivot_root" Reusing "open" BPF sys_enter augmenter for "chroot" Reusing "open" BPF sys_enter augmenter for "acct" Reusing "open" BPF sys_enter augmenter for "swapon" Reusing "open" BPF sys_enter augmenter for "swapoff" Reusing "open" BPF sys_enter augmenter for "delete_module" Reusing "open" BPF sys_enter augmenter for "setxattr" Reusing "open" BPF sys_enter augmenter for "lsetxattr" Reusing "openat" BPF sys_enter augmenter for "fsetxattr" Reusing "open" BPF sys_enter augmenter for "getxattr" Reusing "open" BPF sys_enter augmenter for "lgetxattr" Reusing "openat" BPF sys_enter augmenter for "fgetxattr" Reusing "open" BPF sys_enter augmenter for "listxattr" Reusing "open" BPF sys_enter augmenter for "llistxattr" Reusing "open" BPF sys_enter augmenter for "removexattr" Reusing "open" BPF sys_enter augmenter for "lremovexattr" Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr" Reusing "open" BPF sys_enter augmenter for "mq_open" Reusing "open" BPF sys_enter augmenter for "mq_unlink" Reusing "fsetxattr" BPF sys_enter augmenter for "add_key" Reusing "fremovexattr" BPF sys_enter augmenter for "request_key" Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch" Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat" Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat" Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat" Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat" Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "linkat" Reusing "open" BPF sys_enter augmenter for "symlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat" Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat" Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat" Reusing "connect" BPF sys_enter augmenter for "accept4" Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at" Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2" Reusing "open" BPF sys_enter augmenter for "memfd_create" Reusing "fremovexattr" BPF sys_enter augmenter for "execveat" Reusing "fremovexattr" BPF sys_enter augmenter for "statx" [root@quaco ~]# Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZN5lrdeEdSMCn7hk@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
83a0943b18 |
perf trace: Use the augmented_raw_syscall BPF skel only for tracing syscalls
It is possible to use 'perf trace' with tracepoints and in that case we
can't initialize/use the augmented_raw_syscalls BPF skel.
For instance, this usecase:
# perf trace -e sched:*exec --max-events=5
? ( ): NetworkManager/1183 ... [continued]: poll()) = 1
0.043 ( 0.007 ms): NetworkManager/1183 epoll_wait(epfd: 17<anon_inode:[eventpoll]>, events: 0x55555f90e920, maxevents: 6) = 0
0.060 ( 0.007 ms): NetworkManager/1183 write(fd: 3<anon_inode:[eventfd]>, buf: 0x7ffc5a27cd30, count: 8) = 8
0.073 ( 0.005 ms): NetworkManager/1183 epoll_wait(epfd: 24<anon_inode:[eventpoll]>, events: 0x7ffc5a27cd20, maxevents: 2) = 1
0.082 ( 0.010 ms): NetworkManager/1183 recvmmsg(fd: 26<socket:[30298]>, mmsg: 0x7ffc5a27caa0, vlen: 8) = 1
#
Where we want to trace just some sched tracepoints ending in 'exec' ends
up tracing all syscalls.
Fix it by checking existing trace->trace_syscalls boolean to see if we
need the augmenter.
A followup patch will move those sections of code used only with the
augmenter to separate functions, to get it cleaner and remove the goto,
done just for reviewing purposes.
With this patch in place the previous behaviour is restored: no syscalls
when we have other events and no syscall names:
[root@quaco ~]# perf probe do_filp_open "filename=pathname->name:string"
Added new event:
probe:do_filp_open (on do_filp_open with filename=pathname->name:string)
You can now use it in all perf tools, such as:
perf record -e probe:do_filp_open -aR sleep 1
[root@quaco ~]# perf trace --max-events=10 -e probe:do_filp_open sleep 1
0.000 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache")
0.056 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6")
0.481 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive")
0.501 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias")
0.572 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION")
0.581 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION")
0.616 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib64/gconv/gconv-modules.cache")
0.656 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT")
0.664 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_MEASUREMENT")
0.696 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE")
[root@quaco ~]#
As well as mixing syscalls with tracepoints, getting the syscall
tracepoints used augmented using the BPF skel:
[root@quaco ~]# perf trace --max-events=10 -e open*,probe:do_filp_open sleep 1
0.000 ( ): sleep/455124 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ...
0.005 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache")
0.000 ( 0.011 ms): sleep/455124 ... [continued]: openat()) = 3
0.031 ( ): sleep/455124 openat(dfd: CWD, filename: "/lib64/libc.so.6", flags: RDONLY|CLOEXEC) ...
0.033 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6")
0.031 ( 0.006 ms): sleep/455124 ... [continued]: openat()) = 3
0.258 ( ): sleep/455124 openat(dfd: CWD, filename: "/usr/lib/locale/locale-archive", flags: RDONLY|CLOEXEC) ...
0.261 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive")
0.258 ( 0.006 ms): sleep/455124 ... [continued]: openat()) = -1 ENOENT (No such file or directory)
0.272 ( ): sleep/455124 openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) ...
0.273 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias")
A final note: the probe:do_filp_open uses a kprobe (probably optimized
as its in the start of a function) that uses the kprobe_tracer mechanism
in the kernel to collect the pathname->name string and stash it into the
tracepoint created by 'perf probe' for that:
[root@quaco ~]# cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_filp_open _text+4621920 filename=+0(+0(%si)):string
[root@quaco ~]#
While the syscalls:sys_enter_openat tracepoint gets its string from a
BPF program attached to raw_syscalls:sys_enter that tail calls into
another BPF program that knows the types for the openat syscall args and
thus can bpf_probe_read it right after the normal
sys_enter/sys_enter_openat tracepoint payload that comes prefixed with
whatever perf_event_open asked for (CPU, timestamp, etc):
[root@quaco ~]# bpftool prog | grep -E "sys_enter |sys_enter_opena" -A3
3176: tracepoint name sys_enter tag 0bc3fc9d11754ba1 gpl
loaded_at 2023-08-17T12:32:20-0300 uid 0
xlated 272B jited 257B memlock 4096B map_ids 2462,2466,2463
btf_id 2976
--
3180: tracepoint name sys_enter_opena tag 19dd077f00ec2f58 gpl
loaded_at 2023-08-17T12:32:20-0300 uid 0
xlated 328B jited 206B memlock 4096B map_ids 2466,2465
btf_id 2976
[root@quaco ~]#
Fixes:
|
||
Arnaldo Carvalho de Melo
|
abaf1e0355 |
perf lock: Don't pass an ERR_PTR() directly to perf_session__delete()
While debugging a segfault on 'perf lock contention' without an
available perf.data file I noticed that it was basically calling:
perf_session__delete(ERR_PTR(-1))
Resulting in:
(gdb) run lock contention
Starting program: /root/bin/perf lock contention
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
failed to open perf.data: No such file or directory (try 'perf record' first)
Initializing perf session failed
Program received signal SIGSEGV, Segmentation fault.
0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858
2858 if (!session->auxtrace)
(gdb) p session
$1 = (struct perf_session *) 0xffffffffffffffff
(gdb) bt
#0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858
#1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300
#2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161
#3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604
#4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322
#5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375
#6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419
#7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535
(gdb)
So just set it to NULL after using PTR_ERR(session) to decode the error
as perf_session__delete(NULL) is supported.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
ef23cb5933 |
perf top: Don't pass an ERR_PTR() directly to perf_session__delete()
While debugging a segfault on 'perf lock contention' without an
available perf.data file I noticed that it was basically calling:
perf_session__delete(ERR_PTR(-1))
Resulting in:
(gdb) run lock contention
Starting program: /root/bin/perf lock contention
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
failed to open perf.data: No such file or directory (try 'perf record' first)
Initializing perf session failed
Program received signal SIGSEGV, Segmentation fault.
0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858
2858 if (!session->auxtrace)
(gdb) p session
$1 = (struct perf_session *) 0xffffffffffffffff
(gdb) bt
#0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858
#1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300
#2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161
#3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604
#4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322
#5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375
#6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419
#7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535
(gdb)
So just set it to NULL after using PTR_ERR(session) to decode the error
as perf_session__delete(NULL) is supported.
The same problem was found in 'perf top' after an audit of all
perf_session__new() failure handling.
Fixes:
|
||
James Clark
|
4473949074 |
perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo
Apart from some slight naming and grouping differences, the new metrics are functionally the same as the existing ones. Any missing metrics were manually appended to the end of the auto generated file. For the events, the new data includes descriptions that may have product specific details and new groupings that will be consistent with other products. After generating the metrics from the telemetry repo [1], the following manual steps were performed: * Change the topdown expressions to compare on CPUID and use #slots so that the same data can be shared between N2 and V2. Apart from these modifications, the expressions now match more closely with the Arm telemetry data which will hopefully make future updates easier. * Append some metrics from the old N2/V2 data that aren't present in the telemetry data. These will possibly be added to the telemetry-solution repo at a later time: l3d_cache_mpki, l3d_cache_miss_rate, branch_pki, ipc_rate, spec_ipc, retired_rate, wasted_rate, branch_immed_spec_rate, branch_return_spec_rate, branch_indirect_spec_rate [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2.json Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230816114841.1679234-7-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
d43f549121 |
perf vendor events arm64: Update stall_slot workaround for N2 r0p3
N2 r0p3 doesn't require the workaround [1], so gating on (#slots - 5) no longer works because all N2s have 5 slots. Use the new expression builtin that allows calling strcmp_cpuid_str() and comparing CPUIDs in metric formulas. In this case, the commented formula looks like this: strcmp_cpuid_str(0x410fd493) # greater than or equal to N2 r0p3 | strcmp_cpuid_str(0x410fd490) ^ 1 # OR NOT any version of N2 [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2-r0p3.json Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230816114841.1679234-6-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
9d5da30e4a |
perf jevents: Add a new expression builtin strcmp_cpuid_str()
This will allow writing formulas that are conditional on a specific CPU type or CPU version. It calls through to the existing strcmp_cpuid_str() function in Perf which has a default weak version, and an arch specific version for x86 and arm64. The function takes an 'ID' type value, which is a string. But in this case Arm CPU IDs are hex numbers prefixed with '0x'. metric.py assumes strings are only used by event names, and that they can't start with a number ('0'), so an additional change has to be made to the regex to convert hex numbers back to 'ID' types. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230816114841.1679234-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
81f7da549a |
perf test: Add a test for the new Arm CPU ID comparison behavior
Now that variant and revision fields are taken into account the behavior is slightly more complicated so add a test to ensure that this behaves as expected. Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230816114841.1679234-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
c3e1e8cf00 |
perf arm64: Allow version comparisons of CPU IDs
Currently variant and revision fields are masked out of the MIDR so it's not possible to compare different versions of the same CPU. In a later commit a workaround will be removed just for N2 r0p3, so enable comparisons on version. This has the side effect of changing the MIDR stored in the header of the perf.data file to no longer have masked version fields. It also affects the lookups in mapfile.csv, but as that currently only has zeroed version fields, it has no actual effect. The mapfile.csv documentation also states to zero the version fields, so unless this isn't done it will continue to have no effect. There is an existing weak default strcmp_cpuid_str() function, and an x86 version. This adds another version for arm64. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230816114841.1679234-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
1836480429 |
perf bpf_skel augmented_raw_syscalls: Cap the socklen parameter using &= sizeof(saddr)
This works with: $ clang -v clang version 14.0.5 (Fedora 14.0.5-2.fc36) $ But not with: $ clang -v clang version 16.0.6 (Fedora 16.0.6-2.fc38) $ [root@quaco ~]# perf trace -e connect*,sendto* ping -c 10 localhost libbpf: prog 'sys_enter_sendto': BPF program load failed: Permission denied libbpf: prog 'sys_enter_sendto': -- BEGIN PROG LOAD LOG -- reg type unsupported for arg#0 function sys_enter_sendto#59 0: R1=ctx(off=0,imm=0) R10=fp0 ; int sys_enter_sendto(struct syscall_enter_args *args) 0: (bf) r6 = r1 ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0) 1: (b7) r1 = 0 ; R1_w=0 ; int key = 0; 2: (63) *(u32 *)(r10 -4) = r1 ; R1_w=0 R10=fp0 fp-8=0000???? 3: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 ; 4: (07) r2 += -4 ; R2_w=fp-4 ; return bpf_map_lookup_elem(&augmented_args_tmp, &key); 5: (18) r1 = 0xffff8de5a5b8bc00 ; R1_w=map_ptr(off=0,ks=4,vs=8272,imm=0) 7: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0) 8: (bf) r7 = r0 ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0) R7_w=map_value_or_null(id=1,off=0,ks=4,vs=8272,imm=0) 9: (b7) r0 = 1 ; R0_w=1 ; if (augmented_args == NULL) 10: (15) if r7 == 0x0 goto pc+25 ; R7_w=map_value(off=0,ks=4,vs=8272,imm=0) ; unsigned int socklen = args->args[5]; 11: (79) r1 = *(u64 *)(r6 +56) ; R1_w=scalar() R6_w=ctx(off=0,imm=0) ; 12: (bf) r2 = r1 ; R1_w=scalar(id=2) R2_w=scalar(id=2) 13: (67) r2 <<= 32 ; R2_w=scalar(smax=9223372032559808512,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32_max=0) 14: (77) r2 >>= 32 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 15: (b7) r8 = 128 ; R8=128 ; if (socklen > sizeof(augmented_args->saddr)) 16: (25) if r2 > 0x80 goto pc+1 ; R2=scalar(umax=128,var_off=(0x0; 0xff)) 17: (bf) r8 = r1 ; R1=scalar(id=2) R8_w=scalar(id=2) ; const void *sockaddr_arg = (const void *)args->args[4]; 18: (79) r3 = *(u64 *)(r6 +48) ; R3_w=scalar() R6=ctx(off=0,imm=0) ; bpf_probe_read(&augmented_args->saddr, socklen, sockaddr_arg); 19: (bf) r1 = r7 ; R1_w=map_value(off=0,ks=4,vs=8272,imm=0) R7=map_value(off=0,ks=4,vs=8272,imm=0) 20: (07) r1 += 64 ; R1_w=map_value(off=64,ks=4,vs=8272,imm=0) ; bpf_probe_read(&augmented_args->saddr, socklen, sockaddr_arg); 21: (bf) r2 = r8 ; R2_w=scalar(id=2) R8_w=scalar(id=2) 22: (85) call bpf_probe_read#4 R2 min value is negative, either use unsigned or 'var &= const' processed 22 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 -- END PROG LOAD LOG -- libbpf: prog 'sys_enter_sendto': failed to load: -13 libbpf: failed to load object 'augmented_raw_syscalls_bpf' libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -13 So use the suggested &= variant since sizeof(saddr) == 128 bytes. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Alexandre Ghiti
|
26ba042414
|
perf: tests: Adapt mmap-basic.c for riscv
riscv now supports mmaping hardware counters to userspace so adapt the test to run on this architecture. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Ian Rogers <irogers@google.com> |
||
Kajol Jain
|
5ceb8b5b7d |
perf vendor events: Update metric events for power10 platform
Update JSON/events for power10 platform with additional metrics. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230814112803.1508296-7-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
edd65d2bc5 |
perf vendor events: Update metric event names for power10 platform
Update metric event name for some of the JSON/metric events for
power10 platform.
Fixes:
|
||
Kajol Jain
|
426c804b5a |
perf vendor events: Update JSON/events for power10 platform
Update JSON/events for power10 platform with additional events. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230814112803.1508296-5-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
7d473f475b |
perf vendor events: Move JSON/events to appropriate files for power10 platform
Move some of the power10 JSON/events to appropriate files.
Fixes:
|
||
Kajol Jain
|
4836b9a85e |
perf vendor events: Drop STORES_PER_INST metric event for power10 platform
Drop STORES_PER_INST metric event for the power10 platform, as the
metric expression of STORES_PER_INST metric event using dropped event
PM_ST_FIN.
Fixes:
|
||
Kajol Jain
|
e104df97b8 |
perf vendor events: Drop some of the JSON/events for power10 platform
Drop some of the JSON/events for power10 platform due to counter
data mismatch.
Fixes:
|
||
Kajol Jain
|
3286f88f31 |
perf vendor events: Update the JSON/events descriptions for power10 platform
Update the description for some of the JSON/events for power10 platform.
Fixes:
|
||
Alexandre Ghiti
|
10da1b8ed7 |
perf tests mmap-basic: Adapt for riscv
riscv now supports mmaping hardware counters to userspace so adapt the test to run on this architecture. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anup Patel <anup@brainfault.org> Cc: Atish Patra <atishp@atishpatra.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Rémi Denis-Courmont <remi@remlab.net> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230802080328.1213905-11-alexghiti@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
ff382c1ce8 |
perf parse-regs: Move out arch specific header from util/perf_regs.h
util/perf_regs.h includes another perf_regs.h: #include <perf_regs.h> Here it includes architecture specific header, for example, if we build arm64 target, the header tools/perf/arch/arm64/include/perf_regs.h is included. We use this implicit way to include architecture specific header, which is not directive; furthermore, util/perf_regs.c is coupled with the architecture specific definitions. This patch moves out arch specific header from util/perf_regs.h for generalizing the 'util' folder, as a result, the source files in 'arch' folder explicitly include architecture's perf_regs.h. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
856caabf72 |
perf parse-regs: Remove PERF_REGS_{MAX|MASK} from common code
The macros PERF_REGS_MAX and PERF_REGS_MASK are architecture specific, let's remove them from the common file util/perf_regs.c. As a side effect, the weak functions arch__intr_reg_mask() and arch__user_reg_mask() just return zeros, every arch defines its own functions in the 'arch' folder for returning right values. Note, we don't need to return intr/user register masks dynamically, this is because these two functions are invoked during recording phase but not decoding phase, they are always invoked on the native environment, thus we don't need to parse them dynamically. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
6a87e0f0ce |
perf parse-regs: Remove unused macros PERF_REG_{IP|SP}
The macros PERF_REG_{IP|SP} have been replaced by using functions perf_arch_reg_{ip|sp}(), remove them! Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
d8f69fb6fa |
perf unwind: Use perf_arch_reg_{ip|sp}() to substitute macros
We use perf_arch_reg_ip() and perf_arch_reg_sp() to substitute macros for obtaining the register numbers of SP and IP. This modification enables cross analysis in the unwinding, therefore, the unwinding is not restricted to the predefined values by the macros. Consequently, the macros LIBUNWIND__ARCH_REG_{IP|SP} are removed since they are no longer used. Committer notes: Add missing "util/env.h" header to make sure we have the definition for perf_env__arch(), that when built with NO_LIBUNWIND=1 isn't available, i.e. it was being included by sheer luck. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
34af56afac |
perf parse-regs: Introduce functions perf_arch_reg_{ip|sp}()
The current code uses macros PERF_REG_IP and PERF_REG_SP for parsing registers and we build perf with these macros statically, which means it only can correctly analyze CPU registers for the native architecture and fails to support cross analysis (e.g. we build perf on x86 and cannot analyze Arm64's registers). We need to generalize util/perf_regs.c for support multi architectures, as a first step, this commit introduces new functions perf_arch_reg_ip() and perf_arch_reg_sp(), these two functions dynamically return IP and SP register index respectively according to the parameter "arch". Every architecture has its own functions (like __perf_reg_ip_arm64 and __perf_reg_sp_arm64), these architecture specific functions are defined in each arch source file under folder util/perf-regs-arch; at the end all of them are built into the tool for cross analysis. Committer notes: Make DWARF_MINIMAL_REGS() an inline function, so that we can use the __maybe_unused attribute for the 'arch' parameter, as this will avoid a build failure when that variable is unused in the callers. That happens when building on unsupported architectures, the ones without HAVE_PERF_REGS_SUPPORT defined. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
5000e7f61a |
perf parse-regs: Refactor arch register parsing functions
Every architecture has a specific register parsing function for returning register name based on register index, to support cross analysis (e.g. we use perf x86 binary to parse Arm64's perf data), we build all these register parsing functions into the tool, this is why we place all related functions into util/perf_regs.c. Unfortunately, since util/perf_regs.c needs to include every arch's perf_regs.h, this easily introduces duplicated definitions coming from multiple headers, finally it's fragile for building and difficult for maintenance. We cannot simply move these register parsing functions into the corresponding 'arch' folder, the folder is only conditionally built based on the target architecture. Therefore, this commit creates a new folder util/perf-regs-arch/ and uses a dedicated source file to keep every architecture's register parsing function to avoid definition conflicts. This is only a refactoring, no functionality change is expected. Committer notes: Had to add util/perf-regs-arch/*.c to tools/perf/util/python-ext-sources to keep 'perf test python' passing. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Fangrui Song <maskray@google.com> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20230606014559.21783-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Changbin Du
|
a1ef3aaf6a |
perf docs: Fix format of unordered lists
Fix the format of unordered lists so the can wrap properly. Signed-off-by: Changbin Du <changbin.du@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230718085242.3090797-1-changbin.du@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
ab3744007d |
perf vendor events arm64: Update scale units and descriptions of common topdown metrics
Metrics will be published here [1] going forwards, but they have slightly different scale units. To allow autogenerated metrics to be added more easily, update the scale units to match. The more detailed descriptions have also been taken and added to the common file. [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/tree/main/data/pmu/cpu/ Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230811144017.491628-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
a4b6452af7 |
perf cs-etm: Don't duplicate FIELD_GET()
linux/bitfield.h can be included as long as linux/kernel.h is included first, so change the order of the includes and drop the duplicate macro. Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230811144017.491628-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
82b0a10390 |
perf dlfilter: Add al_cleanup()
Add perf_dlfilter_fns.al_cleanup() to do addr_location__exit() on data
passed via perf_dlfilter_fns.resolve_address().
Add dlfilter-test-api-v2 to the "dlfilter C API" test to test it.
Update documentation, clarifying that data returned by APIs should not
be dereferenced after filter_event() and filter_event_early() return.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
42c6dd9d23 |
perf dlfilter: Initialize addr_location before passing it to thread__find_symbol_fb()
As thread__find_symbol_fb() will end up calling thread__find_map() and
it in turn will call these on uninitialized memory:
maps__zput(al->maps);
map__zput(al->map);
thread__zput(al->thread);
Fixes:
|
||
Adrian Hunter
|
f178a76b05 |
perf dlfilter: Add a test for resolve_address()
Extend the "dlfilter C API" test to test perf_dlfilter_fns.resolve_address(). The test currently fails, but passes after a subsequent patch. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230731091857.10681-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Wei Li
|
41a37430f6 |
perf scripts python: Update audit-libs package name for python3
'audit-libs-python' is the package for python2, update it for python3. On Ubuntu and Fedora, the new package is 'python3-audit'. Signed-off-by: Wei Li <liwei391@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Bin <huawei.libin@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230815131805.1237491-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Wei Li
|
708a3e8b80 |
perf scripts python: Support syscall name parsing on arm64
In the result of "perf script syscall-counts" on arm64, the syscall events are not resolved currently. Add "aarch64" to audit uname list to support name parsing. * After the patch: [root@localhost ~]# perf script syscall-counts sleep 1 Press control+C to stop and show the summary syscall events: event count ---------------------------------------- ----------- mmap 6 close 5 mprotect 4 brk 3 newfstatat 3 openat 3 getrandom 1 prlimit64 1 munmap 1 clock_nanosleep 1 set_robust_list 1 set_tid_address 1 exit_group 1 read 1 faccessat 1 Signed-off-by: Wei Li <liwei391@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Bin <huawei.libin@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230815131735.1237221-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Yang Jihong
|
d095ad45e2 |
perf evsel: Remove duplicate check for field in evsel__intval()
The `file` parameter in evsel__intval() is checked repeatedly, fix it. No functional change. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230815221009.3641751-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
9575ecdd19 |
perf test: Add perf record sample filtering test
$ sudo ./perf test 'sample filter' -v 94: perf record sample filtering (by BPF) tests : --- start --- test child forked, pid 3817527 Checking BPF-filter privilege Basic bpf-filter test Basic bpf-filter test [Success] Failing bpf-filter test Error: task-clock event does not have PERF_SAMPLE_CPU Failing bpf-filter test [Success] Group bpf-filter test Error: task-clock event does not have PERF_SAMPLE_CPU Error: task-clock event does not have PERF_SAMPLE_CODE_PAGE_SIZE Group bpf-filter test [Success] test child finished with 0 ---- end ---- perf record sample filtering (by BPF) tests: Ok Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230811025822.3859771-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
dc7f01f1bc |
perf bpf-filter: Fix sample flag check with ||
For logical OR operator, the actual sample_flags are in the 'groups'
list so it needs to check entries in the list instead. Otherwise it
would show the following error message.
$ sudo perf record -a -e cycles:p --filter 'period > 100 || weight > 0' sleep 1
Error: cycles:p event does not have sample flags 0
failed to set filter "BPF" on event cycles:p with 2 (No such file or directory)
Actually it should warn on 'weight' is used without WEIGHT flag.
Error: cycles:p event does not have PERF_SAMPLE_WEIGHT
Hint: please add -W option to perf record
failed to set filter "BPF" on event cycles:p with 2 (No such file or directory)
Fixes:
|
||
Ian Rogers
|
cd2cece61a |
perf trace: Tidy comments related to BPF + syscall augmentation
Now tools/perf/examples/bpf/augmented_syscalls.c is tools/perf/util/bpf_skel/augmented_syscalls.bpf.c and not enabled as a BPF event, tidy the comments to reflect this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230810184853.2860737-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
5056c99e8d |
perf bpf examples: With no BPF events remove examples
The examples were used to give demonstrations of BPF events but such functionality is now subsumed by using --filter with 'perf record' or the direct use of BPF skeletons. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230810184853.2860737-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
5e6da6be30 |
perf trace: Migrate BPF augmentation to use a skeleton
Previously a BPF event of augmented_raw_syscalls.c could be used to enable augmentation of syscalls by perf trace. As BPF events are no longer supported, switch to using a BPF skeleton which when attached explicitly opens the sysenter and sysexit tracepoints. The dump map is removed as debugging wasn't supported by the augmentation and bpf_printk can be used when necessary. Remove tools/perf/examples/bpf/augmented_raw_syscalls.c so that the rename/migration to a BPF skeleton captures that this was the source. Committer notes: Some minor stylistic changes to help visualizing the diff. Use libbpf_strerror when failing to load the augmented raw syscalls BPF. Use bpf_object__for_each_program(prog, trace.skel->obj) to disable auto attachment for all but the sys_enter, sys_exit tracepoints, to avoid having to add extra lines as we go adding support for more pointer receiving syscalls. Committer testing: # perf trace -e open* --max-events=10 0.000 ( 0.022 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11 208.833 ( ): gnome-terminal/3223 openat(dfd: CWD, filename: "/proc/51250/cmdline") ... 249.993 ( 0.024 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11 250.118 ( 0.030 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11 250.205 ( 0.016 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.current", flags: RDONLY|CLOEXEC) = 11 250.244 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.min", flags: RDONLY|CLOEXEC) = 11 250.282 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.low", flags: RDONLY|CLOEXEC) = 11 250.320 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.swap.current", flags: RDONLY|CLOEXEC) = 11 250.355 ( 0.014 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.stat", flags: RDONLY|CLOEXEC) = 11 250.717 ( 0.016 ms): systemd-oomd/1151 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11 # # perf trace -e *nanosleep* --max-events=10 ? ( ): SCTP timer/28304 ... [continued]: clock_nanosleep()) = 0 0.007 (10.058 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0 10.069 ( ): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) ... 10.069 (10.056 ms): SCTP timer/28304 ... [continued]: clock_nanosleep()) = 0 17.059 ( ): podman/3572 nanosleep(rqtp: 0x7fc4f4d75be0) ... 17.059 (10.061 ms): podman/3572 ... [continued]: nanosleep()) = 0 20.131 (10.059 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0 30.195 (10.038 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0 40.238 (10.057 ms): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) = 0 50.301 ( ): SCTP timer/28304 clock_nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }, rmtp: 0x7f0466b78de0) ... # # perf trace -e perf_event* -- perf stat -e instructions,cycles,cache-misses sleep 0.1 0.000 ( 0.011 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 0.013 ( 0.003 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 0.017 ( 0.002 ms): perf/51331 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x3 (PERF_COUNT_HW_CACHE_MISSES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 51332 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5 Performance counter stats for 'sleep 0.1': 1,495,051 instructions # 1.11 insn per cycle 1,347,641 cycles 35,424 cache-misses 0.100935279 seconds time elapsed 0.000924000 seconds user 0.000000000 seconds sys # # perf trace -e connect* ssh localhost 0.000 ( 0.012 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.118 ( 0.004 ms): ssh/51346 connect(fd: 6, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.399 ( 0.007 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.426 ( 0.003 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.754 ( 0.009 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET, port: 22, addr: 127.0.0.1 }, addrlen: 16) = 0 0.771 ( 0.010 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET6, port: 22, addr: ::1 }, addrlen: 28) = 0 0.798 ( 0.053 ms): ssh/51346 connect(fd: 4, uservaddr: { .family: INET6, port: 22, addr: ::1 }, addrlen: 28) = 0 0.870 ( 0.004 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.904 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.930 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.957 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 0.981 ( 0.003 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 1.006 ( 0.004 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 1.036 ( 0.005 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/lib/sss/pipes/nss }, addrlen: 110) = -1 ECONNREFUSED (Connection refused) 65.077 ( 0.022 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, addrlen: 110) = 0 66.608 ( 0.014 ms): ssh/51346 connect(fd: 5, uservaddr: { .family: LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, addrlen: 110) = 0 root@localhost's password: # # perf trace -e sendto* ping -c 2 localhost PING localhost(localhost (::1)) 56 data bytes 64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.024 ms 0.000 ( 0.011 ms): ping/51357 sendto(fd: 5, buff: 0x7ffcca35e620, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20 0.135 ( 0.026 ms): ping/51357 sendto(fd: 4, buff: 0x5601398f7b20, len: 64, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 0x1c) = 64 1014.929 ( 0.050 ms): ping/51357 sendto(fd: 4, buff: 0x5601398f7b20, len: 64, flags: CONFIRM, addr: { .family: INET6, port: 58, addr: ::1 }, addr_len: 0x1c) = 64 64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.046 ms --- localhost ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1015ms rtt min/avg/max/mdev = 0.024/0.035/0.046/0.011 ms # Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230810184853.2860737-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
3d6dfae889 |
perf parse-events: Remove BPF event support
New features like the BPF --filter support in perf record have made the
BPF event functionality somewhat redundant. As shown by commit
fcb027c1a4f6 ("perf tools: Revert enable indices setting syntax for BPF
map") and commit
|
||
Ian Rogers
|
56b11a2126 |
perf bpf: Remove support for embedding clang for compiling BPF events (-e foo.c)
This never was in the default build for perf, is difficult to maintain as it uses clang/llvm internals so ditch it, keeping, for now, the external compilation of .c BPF into .o bytecode and its subsequent loading, that is also going to be removed, do it separately to help bisection and to properly document what is being removed and why. Committer notes: Extracted from a larger patch and removed some leftovers, namely deleting these now unused feature tests: tools/build/feature/test-clang.cpp tools/build/feature/test-cxx.cpp tools/build/feature/test-llvm-version.cpp tools/build/feature/test-llvm.cpp Testing the use of BPF events after applying this patch: To use the external clang/llvm toolchain to compile a .c event and then use libbpf to load it, to get the syscalls:sys_enter_open* tracepoints and read the filename pointer, putting it into the ring buffer right after the usual tracepoint payload for 'perf trace' to then print it: [root@quaco ~]# perf trace -e /home/acme/git/perf-tools-next/tools/perf/examples/bpf/augmented_raw_syscalls.c,open* --max-events=10 0.000 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12 0.083 abrt-dump-jour/1453 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 0.063 abrt-dump-jour/1454 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 0.082 abrt-dump-jour/1455 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 250.124 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12 250.521 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.pressure", flags: RDONLY|CLOEXEC) = 12 251.047 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.current", flags: RDONLY|CLOEXEC) = 12 251.162 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.min", flags: RDONLY|CLOEXEC) = 12 251.242 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.low", flags: RDONLY|CLOEXEC) = 12 251.353 systemd-oomd/959 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/memory.swap.current", flags: RDONLY|CLOEXEC) = 12 [root@quaco ~]# Same thing, but with a prebuilt .o BPF bytecode: [root@quaco ~]# perf trace -e /home/acme/git/perf-tools-next/tools/perf/examples/bpf/augmented_raw_syscalls.o,open* --max-events=10 0.000 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12 0.083 abrt-dump-jour/1453 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 0.083 abrt-dump-jour/1455 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 0.062 abrt-dump-jour/1454 openat(dfd: CWD, filename: "/var/log/journal/d6a97235307247e09f13f326fb607e3c/system.journal", flags: RDONLY|CLOEXEC|NONBLOCK) = 4 249.985 systemd-oomd/959 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 12 466.763 thermald/1234 openat(dfd: CWD, filename: "/sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/energy_uj") = 13 467.145 thermald/1234 openat(dfd: CWD, filename: "/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj") = 13 467.311 thermald/1234 openat(dfd: CWD, filename: "/sys/class/thermal/thermal_zone2/temp") = 13 500.040 cgroupify/24006 openat(dfd: 4, filename: ".", flags: RDONLY|CLOEXEC|DIRECTORY|NONBLOCK) = 5 500.295 cgroupify/24006 openat(dfd: 4, filename: "24616/cgroup.procs") = 5 [root@quaco ~]# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Cc: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/lkml/ZNZWsAXg2px1sm2h@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
6f769c3458 |
perf tests trace+probe_vfs_getname.sh: Accept quotes surrounding the filename
With augmented_raw_syscalls transformed into a BPF skel made the output have a " around the filenames, which is not what the old perf probe vfs_getname method of obtaining filenames did, so accept the augmented way, with the quotes. At this point probably removing all the logic for the vfs_getname method is in order, will do it at some point. For now lets accept with/without quotes and make that test pass. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
7777ac3dfe |
perf test trace+probe_vfs_getname.sh: Remove stray \ before /
Running on fedora:38 in verbose mode I noticed: # perf test -v 117 grep: warning: stray \ before / 117: Check open filename arg using perf trace + vfs_getname : Remove that \ before /. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZNvTDsSMO3nw9Tnp@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Petr Pavlu
|
833fd800bf |
x86/retpoline,kprobes: Skip optprobe check for indirect jumps with retpolines and IBT
The kprobes optimization check can_optimize() calls
insn_is_indirect_jump() to detect indirect jump instructions in
a target function. If any is found, creating an optprobe is disallowed
in the function because the jump could be from a jump table and could
potentially land in the middle of the target optprobe.
With retpolines, insn_is_indirect_jump() additionally looks for calls to
indirect thunks which the compiler potentially used to replace original
jumps. This extra check is however unnecessary because jump tables are
disabled when the kernel is built with retpolines. The same is currently
the case with IBT.
Based on this observation, remove the logic to look for calls to
indirect thunks and skip the check for indirect jumps altogether if the
kernel is built with retpolines or IBT. Remove subsequently the symbols
__indirect_thunk_start and __indirect_thunk_end which are no longer
needed.
Dropping this logic indirectly fixes a problem where the range
[__indirect_thunk_start, __indirect_thunk_end] wrongly included also the
return thunk. It caused that machines which used the return thunk as
a mitigation and didn't have it patched by any alternative ended up not
being able to use optprobes in any regular function.
Fixes:
|
||
Ian Rogers
|
33d9c50621 |
perf script python: Add stub for PMU symbol to the python binding
Fix missing symbol seen in: ``` 19: 'import perf' in python : --- start --- test child forked, pid 2640936 python usage test: "echo "import sys ; sys.path.insert(0, 'python'); import perf" | '/usr/bin/python3' " Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: tools/perf/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: perf_pmus__supports_extended_type test child finished with -1 ---- end ---- 'import perf' in python: FAILED! ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230810180944.2794188-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
e59fea47f8 |
perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols
Test "object code reading" fails sometimes for kernel address as below: Reading object code for memory address: 0xc000000000004c3c File is: [kernel.kallsyms] On file address is: 0x14c3c dso__data_read_offset failed test child finished with -1 ---- end ---- Object code reading: FAILED! Here dso__data_read_offset() fails for symbol address 0xc000000000004c3c. This is because the DSO long_name here is "[kernel.kallsyms]" and hence open_dso() fails to open this file. There is an incorrect DSO to map handling here. The key points here are: - The DSO long_name is set to "[kernel.kallsyms]". This file is not present and hence returns error - The DSO binary type is set to DSO_BINARY_TYPE__NOT_FOUND - The DSO adjust_symbols member is set to zero In the end dso__data_read_offset() returns -1 and the address 0x14c3c can not be resolved. Hence the test fails. But the address actually maps to the kernel DSO # objdump -z -d --start-address=0xc000000000004c3c --stop-address=0xc000000000004cbc /home/athira/linux/vmlinux /home/athira/linux/vmlinux: file format elf64-powerpcle Disassembly of section .head.text: c000000000004c3c <exc_virt_0x4c00_system_call+0x3c>: c000000000004c3c: a6 02 9b 7d mfsrr1 r12 c000000000004c40: 78 13 42 7c mr r2,r2 c000000000004c44: 18 00 4d e9 ld r10,24(r13) c000000000004c48: 60 c6 4a 61 ori r10,r10,50784 c000000000004c4c: a6 03 49 7d mtctr r10 Fix dso__process_kernel_symbol() to set the binary_type and adjust_symbols members. dso->adjust_symbols is used by map__rip_2objdump() which converts the symbol start address to the objdump address. Also set dso->long_name in dso__load_vmlinux(). Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230811051546.70039-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
878460e8d0 |
perf build: Remove -Wno-unused-but-set-variable from the flex flags when building with clang < 13.0.0
clang < 13.0.0 doesn't grok -Wno-unused-but-set-variable, so just remove
it to avoid:
error: unknown warning option '-Wno-unused-but-set-variable'; did you mean '-Wno-unused-const-variable'? [-Werror,-Wunknown-warning-option]
make[4]: *** [/git/perf-6.5.0-rc4/tools/build/Makefile.build:128: /tmp/build/perf/util/pmu-flex.o] Error 1
make[4]: *** Waiting for unfinished jobs....
Fixes:
|
||
Arnaldo Carvalho de Melo
|
55b2905019 |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up some more fixes that went upstream via the perf-tools fixes branch. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
487ae3b42d |
perf stat: Don't display zero tool counts
Andi reported (see link below) a regression when printing the 'duration_time' tool event, where it gets printed as "not counted" for most of the CPUs, fix it by skipping zero counts for tool events. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Claire Jensen <cjense@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/ZMlrzcVrVi1lTDmn@tassilo/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ivan Babrou
|
8c49c6e1a7 |
perf script: Print "cgroup" field on the same line as "comm"
Commit |
||
Arnaldo Carvalho de Melo
|
c0b067588a |
Revert "perf report: Append inlines to non-DWARF callchains"
This reverts commit
|
||
Arnaldo Carvalho de Melo
|
aeb50d3f2c |
perf probe: Make synthesize_perf_probe_point() private to probe-event.c
Not used in any other place, so just make it static. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZM0pjfOe6R4X%2Fcql@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
a612bbf8b8 |
perf probe: Free string returned by synthesize_perf_probe_point() on failure in synthesize_perf_probe_command()
Building perf with EXTRA_CFLAGS="-fsanitize=address" a leak was detected elsewhere and lead to an audit, where we found that synthesize_perf_probe_command() may leak synthesize_perf_probe_point() return on failure, fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZM0mzpQktHnhXJXr@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
7bc0153c53 |
perf probe: Free string returned by synthesize_perf_probe_point() on failure to add a probe
Building perf with EXTRA_CFLAGS="-fsanitize=address" a leak is detect when trying to add a probe to a non-existent function: # perf probe -x ~/bin/perf dso__neW Probe point 'dso__neW' not found. Error: Failed to add events. ================================================================= ==296634==ERROR: LeakSanitizer: detected memory leaks Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7f67642ba097 in calloc (/lib64/libasan.so.8+0xba097) #1 0x7f67641a76f1 in allocate_cfi (/lib64/libdw.so.1+0x3f6f1) Direct leak of 65 byte(s) in 1 object(s) allocated from: #0 0x7f67642b95b5 in __interceptor_realloc.part.0 (/lib64/libasan.so.8+0xb95b5) #1 0x6cac75 in strbuf_grow util/strbuf.c:64 #2 0x6ca934 in strbuf_init util/strbuf.c:25 #3 0x9337d2 in synthesize_perf_probe_point util/probe-event.c:2018 #4 0x92be51 in try_to_find_probe_trace_events util/probe-event.c:964 #5 0x93d5c6 in convert_to_probe_trace_events util/probe-event.c:3512 #6 0x93d6d5 in convert_perf_probe_events util/probe-event.c:3529 #7 0x56f37f in perf_add_probe_events /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:354 #8 0x572fbc in __cmd_probe /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:738 #9 0x5730f2 in cmd_probe /var/home/acme/git/perf-tools-next/tools/perf/builtin-probe.c:766 #10 0x635d81 in run_builtin /var/home/acme/git/perf-tools-next/tools/perf/perf.c:323 #11 0x6362c1 in handle_internal_command /var/home/acme/git/perf-tools-next/tools/perf/perf.c:377 #12 0x63667a in run_argv /var/home/acme/git/perf-tools-next/tools/perf/perf.c:421 #13 0x636b8d in main /var/home/acme/git/perf-tools-next/tools/perf/perf.c:537 #14 0x7f676302950f in __libc_start_call_main (/lib64/libc.so.6+0x2950f) SUMMARY: AddressSanitizer: 193 byte(s) leaked in 2 allocation(s). # synthesize_perf_probe_point() returns a "detachec" strbuf, i.e. a malloc'ed string that needs to be free'd. An audit will be performed to find other such cases. Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZM0l1Oxamr4SVjfY@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Xin Li
|
6e3edb0fb5 |
tools: Get rid of IRQ_MOVE_CLEANUP_VECTOR from tools
IRQ_MOVE_CLEANUP_VECTOR is not longer in use. Remove the last traces. Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230621171248.6805-4-xin3.li@intel.com |
||
Arnaldo Carvalho de Melo
|
bf1842996a |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up the fixes that were just merged from perf-tools/perf-tools for v6.5. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
f6b8436bed |
perf hists browser: Fix the number of entries for 'e' key
The 'e' key is to toggle expand/collapse the selected entry only. But the current code has a bug that it only increases the number of entries by 1 in the hierarchy mode so users cannot move under the current entry after the key stroke. This is due to a wrong assumption in the hist_entry__set_folding(). The commit |
||
Namhyung Kim
|
e2cabf2a44 |
perf hists browser: Fix hierarchy mode header
The commit |
||
Arnaldo Carvalho de Melo
|
979e9c9fc9 |
perf annotate bpf: Don't enclose non-debug code with an assert()
In |
||
Thomas Richter
|
8fcaea9fd0 |
perf build: Support llvm and clang support compiled in
Perf build suports llvm and clang support compiled in. Test case 56 builtin clang support provides a test case which is always skipped. Link perf with the latest llvm and clang libraries and enable this test case. Use 'make LIBCLANGLLVM=1' to include this support. V2: Add Library patch before -lclang-cpp Output before: # ./perf test 56 56: builtin clang support : 56.1: builtin clang compile C source to IR : Skip (not compiled in) 56.2: builtin clang compile C source to ELF object: Skip (not compiled in) Output after: # ./perf test 56 56: builtin clang support : 56.1: builtin clang compile C source to IR : Ok 56.2: builtin clang compile C source to ELF object : Ok # From Ian Rogers: Build tested with LLVM 14 and 15 using: BUILD_BPF_SKEL=1 LIBCLANGLLVM=1 LLVM_CONFIG=llvm-config-14 BUILD_BPF_SKEL=1 LIBCLANGLLVM=1 LLVM_CONFIG=llvm-config-15 Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lore.kernel.org/r/20230725150347.3479291-2-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
c43888e739 |
perf script python: Cope with declarations after statements found in Python.h
With -Werror the build was failing on fedora rawhide: [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC) [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ In file included from /usr/include/python3.12/Python.h:44, from scripts/python/Perf-Trace-Util/Context.c:14: /usr/include/python3.12/object.h: In function 'Py_SIZE': /usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 217 | PyVarObject *var_ob = _PyVarObject_CAST(ob); | ^~~~~~~~~~~ In file included from /usr/include/python3.12/Python.h:53: /usr/include/python3.12/cpython/longintrepr.h: In function '_PyLong_CompactValue': /usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK); | ^~~~~~~~~~ <SNIP> In file included from /usr/include/python3.12/Python.h:44, from util/scripting-engines/trace-event-python.c:22: /usr/include/python3.12/object.h: In function 'Py_SIZE': /usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 217 | PyVarObject *var_ob = _PyVarObject_CAST(ob); | ^~~~~~~~~~~ CC /tmp/build/perf/util/units.o CC /tmp/build/perf/util/time-utils.o In file included from /usr/include/python3.12/Python.h:53: /usr/include/python3.12/cpython/longintrepr.h: In function '_PyLong_CompactValue': /usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK); | ^~~~~~~~~~ So add -Wno-declaration-after-statement to the python scripting CFLAGS. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZMpdKeO8gU%2FcWDqH@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
a7789d3f2e |
perf python: Cope with declarations after statements found in Python.h
With -Werror the build was failing on fedora rawhide: [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC) [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ In file included from /usr/include/python3.12/Python.h:44, from /git/perf-6.5.0-rc2/tools/perf/util/python.c:2: /usr/include/python3.12/object.h: In function ‘Py_SIZE’: /usr/include/python3.12/object.h:217:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 217 | PyVarObject *var_ob = _PyVarObject_CAST(ob); | ^~~~~~~~~~~ LD /tmp/build/perf/arch/perf-in.o In file included from /usr/include/python3.12/Python.h:53: /usr/include/python3.12/cpython/longintrepr.h: In function ‘_PyLong_CompactValue’: /usr/include/python3.12/cpython/longintrepr.h:121:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 121 | Py_ssize_t sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK); | ^~~~~~~~~~ So add -Wno-declaration-after-statement to the python binding CFLAGS. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZMpcTMvnQns81YWA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
9a7d82c188 |
perf vendor events intel: Update Icelake+ metric constraints
Avoid grouping events especially in cases where the kernel's PMU driver fails to not open the events, causing the events to report back as "<not counted>". This update comes from: https://github.com/intel/perfmon/pull/94 Fixes issues reported with patch: https://lore.kernel.org/lkml/20230719001836.198363-3-irogers@google.com/ Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230801053634.1142634-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
b691f30700 |
perf vendor events intel: Update sapphirerapids to 1.15
1.15 events were released in:
|
||
Ian Rogers
|
ab0cfb796e |
perf vendor events intel: Update meteorlake to 1.04
1.04 events were released in:
|
||
Ian Rogers
|
714b451111 |
perf parse-events x86: Avoid sorting uops_retired.slots
As topdown.slots may appear as slots it may get confused with uops_retired.slots which is an invalid perf metric event group leader. Special case uops_retired.slots to avoid this confusion. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230801053634.1142634-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Masami Hiramatsu
|
e8ca4f0f8c |
perf probe: Show correct error message about @symbol usage for uprobe
Since @symbol variable access is not supported by uprobe event, it must be correctly warn user instead of kernel version update. Committer testing: With/without the patch: [root@quaco ~]# perf probe -x ~/bin/perf -L sigtrap_handler <sigtrap_handler@/home/acme/git/perf-tools-next/tools/perf/tests/sigtrap.c:0> 0 sigtrap_handler(int signum __maybe_unused, siginfo_t *info, void *ucontext __maybe_unused) 1 { 2 if (!__atomic_fetch_add(&ctx.signal_count, 1, __ATOMIC_RELAXED)) 3 ctx.first_siginfo = *info; 4 __atomic_fetch_sub(&ctx.tids_want_signal, syscall(SYS_gettid), __ATOMIC_RELAXED); 5 } static void *test_thread(void *arg) { [root@quaco ~]# perf probe -x ~/bin/perf sigtrap_handler:4 "ctx.signal_count" Without the patch: [root@quaco ~]# perf probe -x ~/bin/perf sigtrap_handler:4 "ctx.signal_count" Failed to write event: Invalid argument Please upgrade your kernel to at least 3.14 to have access to feature @ctx Error: Failed to add events. [root@quaco ~]# With the patch: [root@quaco ~]# Failed to write event: Invalid argument @ctx accesses a variable by symbol name, but that is not supported for user application probe. Error: Failed to add events. [root@quaco ~]# Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Closes: https://lore.kernel.org/all/ZLWDEjvFjrrEJODp@kernel.org/ Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/169055397023.67089.12693645664676964310.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
ed847e30f0 |
perf test bpf: Address error about non-null argument for epoll_pwait 2nd arg
First noticed on Fedora Rawhide: tests/bpf.c: In function ‘epoll_pwait_loop’: tests/bpf.c:36:17: error: argument 2 null where non-null expected [-Werror=nonnull] 36 | epoll_pwait(-(i + 1), NULL, 0, 0, NULL); | ^~~~~~~~~~~ In file included from tests/bpf.c:5: /usr/include/sys/epoll.h:134:12: note: in a call to function ‘epoll_pwait’ declared ‘nonnull’ 134 | extern int epoll_pwait (int __epfd, struct epoll_event *__events, | ^~~~~~~~~~~ [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.2.1-20230728/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC) [perfbuilder@27cfe44d67ed perf-6.5.0-rc2]$ Just add that argument to address this compiler warning. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZMj8+bvN86D0ZKiB@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
35578a551b |
perf tests stat+std_output: Fix shellcheck warnings about word splitting/quoting and local variables
Running shellcheck on stat_std_output testcase throws below warning: In tests/shell/stat+std_output.sh line 9: . $(dirname $0)/lib/stat_output.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. In tests/shell/stat+std_output.sh line 32: local -i cnt=0 ^-^ SC2034 (warning): cnt appears unused. Verify use (or export if used externally). Fixed the warning by adding quotes to avoid word splitting and removed unused variable "cnt" at line 32. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-27-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
8439b44abb |
perf tests stat+std_output: Fix shellcheck warnings about word splitting/quoting
Running shellcheck on stat+csv_output.sh throws below warning: In tests/shell/stat+csv_output.sh line 9: . $(dirname $0)/lib/stat_output.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. Fixed the warning by adding quotes to avoid word splitting. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-26-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
eef1fb50ca |
perf tests lib stat_output: Fix shellcheck warning about missing shebang
Running shellcheck on stat_output.sh throws below warning: In tests/shell/lib/stat_output.sh line 1: ^-- SC2148 (error): Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive. Fixed the warning by adding shell directive. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-25-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
1f14b8af2c |
perf tests coresight thread_loop_check_tid_2: Fix shellcheck warnings about word splitting/quoting
Running shellcheck on thread_loop_check_tid_2.sh throws below warning: In tests/shell/coresight/thread_loop_check_tid_2.sh line 8: . $(dirname $0)/../lib/coresight.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. Fixed the warning by adding quotes to avoid word splitting. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-24-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
3a4367c118 |
perf tests record+zstd_comp_decomp: Fix the shellcheck warnings about word splitting/quoting
Running shellcheck on record+zstd_comp_decomp.sh testcases throws below warning: In tests/shell/record+zstd_comp_decomp.sh line 16: $perf_tool record -o $trace_file $gflag -z -F 5000 -- \ ^---------^ SC2086 (info): Double quote to prevent globbing and word splitting. Did you mean: $perf_tool record -o "$trace_file" $gflag -z -F 5000 -- \ In tests/shell/record+zstd_comp_decomp.sh line 22: $perf_tool report -i $trace_file --header --stats | \ ^---------^ SC2086 (info): Double quote to prevent globbing and word splitting. Added double quote around file names to fix these shellcheck reported issues. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-23-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
84caba70d0 |
perf arch x86: Address shellcheck warnings about unused variables in syscalltbl.sh
Running shellcheck on syscalltbl.sh generates below warning: In ./tools/perf/arch/x86/entry/syscalls/syscalltbl.sh line 27: while read nr abi name entry compat; do ^-^ SC2034 (warning): abi appears unused. Verify use (or export if used externally). ^----^ SC2034 (warning): compat appears unused. Verify use (or export if used externally). These variables are intentionally unused since they are needed to parse through the output. Use "_" as a prefix for these throw away variables. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-22-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
5e9310ae23 |
perf trace x86_arch_prctl: Address shellcheck warnings about local variables
Running shellcheck on x86_arch_prctl.sh generates below warning: In ./tools/perf/trace/beauty/x86_arch_prctl.sh line 10: local idx=$1 ^-------^ SC3043 (warning): In POSIX sh, 'local' is undefined. In ./tools/perf/trace/beauty/x86_arch_prctl.sh line 11: local prefix=$2 ^----------^ SC3043 (warning): In POSIX sh, 'local' is undefined. In ./tools/perf/trace/beauty/x86_arch_prctl.sh line 12: local first_entry=$3 ^---------------^ SC3043 (warning): In POSIX sh, 'local' is undefined. Fix this by removing local since these are variables used only in specific function Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-21-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
1e094f925e |
perf tests lib waiting: Fix the shellcheck warnings about missing shebang
Running shellcheck in "lib/waiting.sh" generates below warning: In ./tools/perf/tests/shell/lib/waiting.sh line 1: # SPDX-License-Identifier: GPL-2.0 ^-- SC2148 (error): Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive. Fix this by adding shebang in the beginning of the script. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-20-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
a5f3171b13 |
perf tests lib probe_vfs_getname: Fix shellcheck warnings about missing shebang/local variables
Running shellcheck on probe_vfs_getname fails with below warning: In ./tools/perf/tests/shell/lib/probe_vfs_getname.sh line 1: # Arnaldo Carvalho de Melo <acme@kernel.org>, 2017 ^-- SC2148 (error): Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive. In ./tools/perf/tests/shell/lib/probe_vfs_getname.sh line 14: local verbose=$1 ^-----------^ SC3043 (warning): In POSIX sh, 'local' is undefined. Fix this: - by adding shebang in the beginning of the file and - rename variable verbose to "add_probe_verbose" after removing local Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-19-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
b19de09bbe |
perf tests unroll_loop_thread_10: Fix shellcheck warnings about word splitting/quoting
Fix the shellcheck warnings for unroll_loop_thread_10.sh Add quotes to prevent word splitting which are caused by unquoted command expansions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-18-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
5fe0531205 |
perf tests thread_loop_check_tid_10: Fix shellcheck warnings bout word splitting/quoting
Fix the shellcheck warnings for thread_loop_check_tid_10.sh In ./tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh line 8: . $(dirname $0)/../lib/coresight.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. Add quotes to prevent word splitting which are caused by unquoted command expansions. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-17-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
e936584214 |
perf build: Fix shellcheck issue about quotes for check-headers.sh
Running shellcheck on check-headers.sh generates below warning: In check-headers.sh line 126: check_2 "tools/$file" "$file" $* ^-- SC2048 (warning): Use "$@" (with quotes) to prevent whitespace problems. In check-headers.sh line 134: check_2 "tools/perf/trace/beauty/$file" "$file" $* ^-- SC2048 (warning): Use "$@" (with quotes) to prevent whitespace problems. In check-headers.sh line 186: cd tools/perf ^-----------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails. Fixed the warnings by: - Using "$@" instead of $* - Adding exit condition with cd command Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-16-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
f188b2ce65 |
perf beauty arch_errno_names: Fix shellcheck issue about local variables
Running shellcheck on arch_errno_names.sh generates below warning: In arch_errno_names.sh line 20: local arch="$1" ^--------^ SC3043 (warning): In POSIX sh, 'local' is undefined. ...... In arch_errno_names.sh line 61: local arch ^--------^ SC3043 (warning): In POSIX sh, 'local' is undefined. In arch_errno_names.sh line 67: printf '\t\treturn errno_to_name__%s(err);\n' $(arch_string "$arch") ^--------------------^ SC2046 (warning): Quote this to prevent word splitting. In arch_errno_names.sh line 69: printf '\treturn errno_to_name__%s(err);\n' $(arch_string "$default") ^-----------------------^ SC2046 (warning): Quote this to prevent word splitting. Fixed the warnings by: - Fixing shellcheck warnings for local usage, by removing local from the variable names - Adding quotes to avoid word splitting Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-15-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
05ef238cd0 |
perf tests lib probe: Fix shellcheck warning about about missing shebang
Running shellcheck on probe.sh throws below warning: In lib/probe.sh line 1: ^-- SC2148 (error): Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive. Fixed the warnings by adding shell directive. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-14-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
5f83f1d588 |
perf tests memcpy_thread_16k_10: Fix shellcheck warning about word splitting/quote
Running shellcheck on memcpy_thread_16k_10.sh throws below warning: In memcpy_thread_16k_10.sh line 8: . $(dirname $0)/../lib/coresight.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. Fixed the warning by adding quotes to avoid word splitting. ShellCheck result with patch: # shellcheck -S warning coresight/memcpy_thread_16k_10.sh # Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-13-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
60f253ea7f |
perf tests asm_pure_loop: Fix shellcheck warning about word splitting/quote
Running shellcheck on asm_pure_loop.sh throws below warning: In coresight/asm_pure_loop.sh line 8: . $(dirname $0)/../lib/coresight.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. Fixed the warning by adding quotes to avoid word splitting. ShellCheck result with patch: # shellcheck -S warning coresight/asm_pure_loop.sh # Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-12-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
d13841e789 |
perf tests stat+shadow_stat: Fix shellcheck warning about unused variable
Running shellcheck on stat+shadow_stat.sh generates below warning: In tests/shell/stat+shadow_stat.sh line 48: while read cpu num evt hash ipc rest ^--^ SC2034 (warning): hash appears unused. Verify use (or export if used externally). This variable is intentionally unused since it is needed to parse through the output. Use "_" as a prefix for this throw away variable. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-11-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
d10eedd87b |
perf tests stat_bpf_counters: Fix usage of '==' to address shellcheck warning
Running shellcheck on stat_bpf_counter.sh generates below warning: In tests/shell/stat_bpf_counters.sh line 34: if [ "$base_cycles" == "<not" ]; then ^-- SC3014 (warning): In POSIX sh, == in place of = is undefined. In tests/shell/stat_bpf_counters.sh line 39: if [ "$bpf_cycles" == "<not" ]; then ^-- SC3014 (warning): In POSIX sh, == in place of = is undefined. Fix this by using "=" instead of "==" to work for all shells. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-10-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
38b3fa07f1 |
perf tests perf_dat _converter_json: Use quoting to avoid word splitting
Running shellcheck on test_perf_data_converter_json.sh throws below warning: In tests/shell/test_perf_data_converter_json.sh line 42: if [ $(cat "${result}" | wc -l) -gt "0" ] ; then ^------------------------^ SC2046 (warning): Quote this to prevent word splitting. Fixed the warning by adding quotes to avoid word splitting. ShellCheck result with patch: # shellcheck -S warning test_perf_data_converter_json.sh # Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-9-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
faae152aa6 |
perf tests stat+csv_summary: Fix unused variable references detected via shellcheck
Running shellcheck on stat+csv_summary.sh throws below warnings: In tests/shell/stat+csv_summary.sh line 26: while read num event run pct ^-^ SC2034 (warning): num appears unused. Verify use (or export if used externally). ^---^ SC2034 (warning): event appears unused. Verify use (or export if used externally). ^-^ SC2034 (warning): run appears unused. Verify use (or export if used externally). ^-^ SC2034 (warning): pct appears unused. Verify use (or export if used externally). These variables are intentionally unused since they are needed to parse through the output. Use "_" as a prefix for these throw away variables. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-8-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
e9c7c3a109 |
perf tests: Address signal case issues detected via shellcheck
Running shellcheck -S on test_arm_spe_fork.sh throws below warnings: In tests/shell/test_arm_spe_fork.sh line 25: trap cleanup_files exit term int ^--^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. ^--^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. ^-^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. Fixed this issue by using uppercase for "EXIT", "TERM" and "INIT" signals to avoid using lower/mixed case for signal names as input. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-7-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
0dd1f81554 |
perf test stat_bpf_counters_cgrp: Fix shellcheck issue about logical operators
Running shellcheck on lock_contention.sh generates below warning: In stat_bpf_counters_cgrp.sh line 28: if [ -d /sys/fs/cgroup/system.slice -a -d /sys/fs/cgroup/user.slice ]; then ^-- SC2166 (warning): Prefer [ p ] && [ q ] as [ p -a q ] is not well defined. In stat_bpf_counters_cgrp.sh line 34: local self_cgrp=$(grep perf_event /proc/self/cgroup | cut -d: -f3) ^-------------^ SC3043 (warning): In POSIX sh, 'local' is undefined. ^-------^ SC2155 (warning): Declare and assign separately to avoid masking return values. ^-- SC2046 (warning): Quote this to prevent word splitting. In stat_bpf_counters_cgrp.sh line 51: local output ^----------^ SC3043 (warning): In POSIX sh, 'local' is undefined. In stat_bpf_counters_cgrp.sh line 65: local output ^----------^ SC3043 (warning): In POSIX sh, 'local' is undefined. Fixed above warnings by: - Changing the expression [p -a q] to [p] && [q]. - Fixing shellcheck warnings for local usage, by prefixing function name to the variable. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-6-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
edf197cb9d |
perf tests lock_contention: Fix shellcheck issue about quoting to avoid word splitting
Running shellcheck on lock_contention.sh generates below warning: In tests/shell/lock_contention.sh line 24: if [ `id -u` != 0 ]; then ^-----^ SC2046 (warning): Quote this to prevent word splitting. In tests/shell/lock_contention.sh line 160: local type=$(head -1 "${result}" | awk '{ print $8 }' | sed -e 's/:.*//') ^--------^ SC3043 (warning): In POSIX sh, 'local' is undefined. ^--^ SC2155 (warning): Declare and assign separately to avoid masking return values. ^-- SC2046 (warning): Quote this to prevent word splitting. Fixed above warnings by: - Adding quotes to avoid word splitting. - Fixing shellcheck warnings for local usage, by prefixing function name to the variable. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-5-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
a225c30497 |
perf tests record_offcpu: Fix shellcheck warnings about word splitting/quoting and signal names case
Running shellcheck on record_offcpu.sh throws below warning: In tests/shell/record_offcpu.sh line 13: trap - exit term int ^--^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. ^--^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. ^-^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. In tests/shell/record_offcpu.sh line 20: trap trap_cleanup exit term int ^--^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. ^--^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. ^-^ SC3049 (warning): In POSIX sh, using lower/mixed case for signal names is undefined. In tests/shell/record_offcpu.sh line 25: if [ `id -u` != 0 ] ^-----^ SC2046 (warning): Quote this to prevent word splitting. Fixed the warnings by: - Capitalize signals(INT, TERM, EXIT) to avoid mixed/lower case naming of signals. - Adding quotes to avoid word splitting. Result from shellcheck after patch changes: $ shellcheck -S warning record_offcpu.sh $ Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-4-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
7b485d9468 |
perf tests probe_vfs_getname: Fix shellcheck warnings about word splitting/quoting
Running shellcheck on probe_vfs_getname.sh throws below warning: In tests/shell/probe_vfs_getname.sh line 7: . $(dirname $0)/lib/probe.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. In tests/shell/probe_vfs_getname.sh line 11: . $(dirname $0)/lib/probe_vfs_getname.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. Fixed the warning by adding quotes to avoid word splitting. ShellCheck result with patch: # shellcheck -S warning probe_vfs_getname.sh # Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-3-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Akanksha J N
|
38beba673b |
perf tests trace+probe_vfs_getname: Fix shellcheck warnings about word splitting/quoting
Running shellcheck -S on probe_vfs_getname.sh, throws below warnings: Before fix: $ shellcheck -S warning trace+probe_vfs_getname.sh In trace+probe_vfs_getname.sh line 13: . $(dirname $0)/lib/probe.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. In trace+probe_vfs_getname.sh line 18: . $(dirname $0)/lib/probe_vfs_getname.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. In trace+probe_vfs_getname.sh line 21: evts=$(echo $(perf list syscalls:sys_enter_open* 2>/dev/null | grep -E 'open(at)? ' | sed -r 's/.*sys_enter_([a-z]+) +\[.*$/\1/') | sed 's/ /,/') ^-- SC2046 (warning): Quote this to prevent word splitting. Fix the shellcheck warnings by adding quotes to prevent word splitting. Signed-off-by: Akanksha J N <akanksha@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230709182800.53002-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Aditya Gupta
|
11cb1ed477 |
perf tests task_analyzer: Check perf build options for libtraceevent support
Currently we depend on output of 'perf record -e "sched:sched_switch"', to check whether perf was built with libtraceevent support. Instead, a more straightforward approach can be to check the build options, using 'perf version --build-options', to check for libtraceevent support. When perf is compiled WITHOUT libtraceevent ('make NO_LIBTRACEEVENT=1'), 'perf version --build-options' outputs (output trimmed): ... libtraceevent: [ OFF ] # HAVE_LIBTRACEEVENT ... While, when perf is compiled WITH libtraceevent, 'perf version --build-options' outputs: ... libtraceevent: [ on ] # HAVE_LIBTRACEEVENT ... Committer notes: Removed one grep in the pipleline by combining the two into just one expression that covers the OFF + HAVE_LIBTRACEEVENT. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230725061649.34937-1-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
c9b57eb8dc |
perf parse-events: Remove array remnants
parse_events_array was set up by event term parsing, which no longer exists. Remove this struct and references to it. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230728001212.457900-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
30f4ade33d |
perf tools: Revert enable indices setting syntax for BPF map
This reverts commit
|
||
Ian Rogers
|
c76a1444c0 |
perf parse-event: Avoid BPF test SEGV
loc is passed as NULL in tools/perf/tests/bpf.c do_test, meaning errors trigger a SEGV when trying to access. Add the missing NULL check. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230728001212.457900-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
c7e97f215a |
perf build: Include generated header files properly
The flex and bison generate header files from the source. When user
specified a build directory with O= option, it'd generate files under
the directory. The build command has -I option to specify the header
include directory.
But the -I option only affects the files included like <...>. Let's
change the flex and bison headers to use it instead of "...".
Fixes:
|
||
Namhyung Kim
|
7822a8913f |
perf build: Update build rule for generated files
The bison and flex generate C files from the source (.y and .l)
files. When O= option is used, they are saved in a separate directory
but the default build rule assumes the .C files are in the source
directory. So it might read invalid file if there are generated files
from an old version. The same is true for the pmu-events files.
For example, the following command would cause a build failure:
$ git checkout v6.3
$ make -C tools/perf # build in the same directory
$ git checkout v6.5-rc2
$ mkdir build # create a build directory
$ make -C tools/perf O=build # build in a different directory but it
# refers files in the source directory
Let's update the build rule to specify those cases explicitly to depend
on the files in the output directory.
Note that it's not a complete fix and it needs the next patch for the
include path too.
Fixes:
|
||
Ian Rogers
|
f776b0435e |
perf build: Remove -Wno-redundant-decls in 2 cases
Properly fix a warning and remove the -Wno-redundant-decls C flag. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230728064917.767761-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
ddc8e4c966 |
perf build: Disable fewer bison warnings
If bison is version 3.8.2, reduce the number of bison C warnings disabled. Earlier bison versions have all C warnings disabled. Avoid implicit declarations of yylex by adding the declaration in the C file. A header can't be included as a circular dependency would occur due to the lexer using the bison defined tokens. Committer notes: Some recent versions of gcc and clang (noticed on Alpine Linux 3.17, edge, clearlinux, fedora 37, etc. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230728064917.767761-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
10c775afa5 |
perf build: Disable fewer flex warnings
If flex is version 2.6.4, reduce the number of flex C warnings disabled. Earlier flex versions have all C warnings disabled. Committer notes: Added this to the list of ignored warnings to get it building on a Fedora 36 machine with flex 2.6.4: -Wno-misleading-indentation Noticed when building with: $ make LLVM=1 -C tools/perf NO_BPF_SKEL=1 DEBUG=1 Take two: We can't just try to canonicalize flex versions by just removing the dots, as we end up with: 2.6.4 >= 2.5.37 becoming: 264 >= 2537 Failing the build on flex 2.5.37, so instead use the back to the past added $(call version_ge3,$(FLEX_VERSION),2.6.4) variant to check for that. Making sure $(FLEX_VERSION) keeps the dots as we may want to use 'sort -V' or something nicer when available everywhere. Some other tweaks for other flex versions and combinations with gcc and clang versions were added, notes on the patch. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230728064917.767761-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
07d2b820fd |
perf test parse-events: Test complex name has required event format
test__checkevent_complex_name will use an "event" format which if not
present, such as with a placeholder PMU, will cause test failures. Skip
the test in this case to avoid failures in restricted environments.
Add perf_pmu__has_format utility as a general PMU utility.
Fixes:
|
||
Ian Rogers
|
34bc65d6d8 |
perf pmus: Create placholder regardless of scanning core_only
If scanning all PMUs the placeholder is still necessary if no core PMU
is found. This situation occurs in perf test's parse-events test,
when uncore events appear before core.
Fixes:
|
||
Ian Rogers
|
e5764ae4c9 |
perf build: Add Wextra for C++ compilation
Commit
|
||
Ian Rogers
|
435bea0a45 |
perf build: Don't always set -funwind-tables and -ggdb3
Commit
|
||
Ian Rogers
|
1134f290d0 |
perf bpf-loader: Remove unneeded diagnostic pragma
Added during the progress to libbpf 1.0 the deprecated functions are no longer used and so the pragma can be removed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230728064917.767761-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jing Zhang
|
3e65bd1374 |
perf vendor events arm64: Add JSON metrics for Yitian 710 DDR
Add JSON metrics for T-HEAD Yitian 710 SoC DDR. Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Zhuo Song <zhuo.song@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/1690528175-2499-3-git-send-email-renyu.zj@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jing Zhang
|
24069d8112 |
perf jevents: Add support for Yitian 710 DDR PMU (arm64) aliasing
Add alias support for T-HEAD Yitian 710 SoC DDR PMU events. Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Zhuo Song <zhuo.song@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/1690528175-2499-2-git-send-email-renyu.zj@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
404e077a16 |
perf tools: Add a place to put kernel config fragments for test runs
Defconfig doesn't give full coverage for a perf test run, so these can be merged with defconfig to do so. It's not complete yet, but is a starting point as a place to add to when a specific test needs something extra to run. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Aishwarya.TCV@arm.com Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230628105303.4053478-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anup Sharma
|
f9f72b2ab7 |
perf scripts python: Add command execution for gecko script
This will enable the execution of gecko.py script using record and report commands in 'perf script'. And this will be also reflected at "perf script -l" command. For Example: perf script record gecko perf script report gecko Committer notes: As discussed on the perf tools office hours, I made -F 99 the default for the record script and removed the double -- on the report script so that the existing 'perf script' protocol for the combined operation: # perf script gecko Works, i.e. the record script pipes its stdout into the stdin of the report script, basically: /bin/sh /usr/libexec/perf-core/scripts/python/bin/gecko-record -F 99 -g -a -q -o - | \ /bin/sh /usr/libexec/perf-core/scripts/python/bin/gecko-report -i - Testing it: The resulting JSON file needs to be uploaded to https://profiler.firefox.com, Anup already has code to start a local http server on the trace_begin handler of the gecko python script, start firefox and feed it the JSON. The example below only collects sample for the specified workload, so that we don't produce thousands of lines, to collect system wide samples, use instead: # perf script gecko -a sleep 0.5 # nohup perf script gecko sleep 0.5 { "meta": { "interval": 1, "processType": 0, "product": "x86_64 GNU/Linux", "stackwalk": 1, "debug": 0, "gcpoison": 0, "asyncstack": 1, "startTime": 274601692.636, "shutdownTime": null, "version": 24, "presymbolicated": true, "categories": [ { "name": "User", "color": "yellow", "subcategories": [ "Other" ] }, { "name": "Kernel", "color": "orange", "subcategories": [ "Other" ] } ], "markerSchema": [] }, "libs": [], "threads": [ { "tid": |
||
Anup Sharma
|
2d889c6af1 |
perf scripts python: Implement add sample function and thread processing
The stack has been created for storing func and dso from the callchain. The sample has been added to a specific thread. It first checks if the thread exists in the Thread class. Then it call _add_sample function which is responsible for appending a new entry to the samples list. Also callchain parsing and storing part is implemented. Moreover removed the comment from thread. Signed-off-by: Anup Sharma <anupnewsmail@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/5a112be85ccdcdcd611e343f6a7a7482d01f6299.1689961706.git.anupnewsmail@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anup Sharma
|
258dfd41c1 |
perf scripts python: Implement add sample function and thread processing
The intern_stack function is responsible for retrieving or creating a stack_id based on the provided frame_id and prefix_id. It first generates a key using the frame_id and prefix_id values. If the stack corresponding to the key is found in the stackMap, it is returned. Otherwise, a new stack is created by appending the prefix_id and frame_id to the stackTable. The key and the index of the newly created stack are added to the stackMap for future reference. The _intern_frame function is responsible for retrieving or creating a frame_id based on the provided frame string. If the frame_id corresponding to the frameString is found in the frameMap, it is returned. Otherwise, a new frame is created by appending relevant information to the frameTable and adding the frameString to the string_id through _intern_string. The _intern_string function will gets a matching string, or saves the new string and returns a String ID. Signed-off-by: Anup Sharma <anupnewsmail@gmail.com> Link: https://lore.kernel.org/r/4442f4b1ab4c7317cf940560a3a285fcdfbeeb08.1689961706.git.anupnewsmail@gmail.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anup Sharma
|
833daec7e6 |
perf scripts python: Add trace end processing and PRODUCT and CATEGORIES information
The final output will now be presented in JSON format following the Gecko profile structure. Additionally, the inclusion of PRODUCT allows easy retrieval of header information for UI. Furthermore, CATEGORIES have been introduced to enable customization of kernel and user colors using input arguments. To facilitate this functionality, an argparse-based parser has been implemented. Note: The implementation of threads will be addressed in subsequent commits for now I have commented it out. Signed-off-by: Anup Sharma <anupnewsmail@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/fa6d027e4134c48e8a2ea45dd8f6b21e6a3418e4.1689961706.git.anupnewsmail@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anup Sharma
|
5aacd7f08a |
perf scripts python: Add classes and conversion functions
This commit introduces new classes and conversion functions to facilitate the representation of Gecko profile information. The new classes Frame, Stack, Sample, and Thread are added to handle specific components of the profile data, also link to the origin docs has been commented out. Additionally, Inside the Thread class _to_json_dict() method has been created that converts the current thread data into the corresponding format expected by the GeckoThread JSON schema, as per the Gecko profile format specification. Signed-off-by: Anup Sharma <anupnewsmail@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/ab7b40bd32df7101a6f8b4a3aa41570b63b831ac.1689961706.git.anupnewsmail@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anup Sharma
|
0a02e44cc2 |
perf scripts python: Extact necessary information from process event
The script takes in a sample event dictionary(param_dict) and retrieves relevant data such as time stamp, PID, TID, and comm for each event. Also start time is defined as a global variable as it need to be passed to trace_end for gecko meta information field creation. Signed-off-by: Anup Sharma <anupnewsmail@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/19910fefcfe4be03cd5c2aa3fec11d3f86c0381b.1689961706.git.anupnewsmail@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anup Sharma
|
1699d3efe1 |
perf scripts python: Add initial script file with usage information
Added necessary modules, including the Perf-Trace-Util library, and defines the required functions and variables for using perf script python. The perf_trace_context and Core modules for tracing and processing events has been also imported. Added usage information. Signed-off-by: Anup Sharma <anupnewsmail@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/f2f1a62f1cc69f44a5414da46a26a4cf124d2744.1689961706.git.anupnewsmail@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Xiu Jianfeng
|
1e37201405 |
perf doc: Fix typo in perf.data-file-format.txt
The 'it' should be 'is' here, fix it. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230727105001.261420-1-xiujianfeng@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
69a87a32f5 |
perf machine: Include data symbols in the kernel map
When 'perf record -d' is used, it needs data mmaps to symbolize global data. But it missed to collect kernel data maps so it cannot symbolize them. Instead of having a separate map, just increase the kernel map size to include the data section. Probably we can have a separate kernel map for data, but the current code assumes a single kernel map. So it'd require more changes in other places and looks error-prone. I decided not to go that way for now. Also it seems the kernel module size already includes the data section. For example, my system has the following. $ grep -e _stext -e _etext -e _edata /proc/kallsyms ffffffff99800000 T _stext ffffffff9a601ac8 T _etext ffffffff9b446a00 D _edata Size of the text section is (0x9a601ac8 - 0x99800000 = 0xe01ac8) and size including data section is (0x9b446a00 - 0x99800000 = 0x1c46a00). Before: $ perf record -d true $ perf report -D | grep MMAP | head -1 0 0 0x460 [0x60]: PERF_RECORD_MMAP -1/0: [0xffffffff99800000(0xe01ac8) @ 0xffffffff99800000]: x [kernel.kallsyms]_text ^^^^^^^^ here After: $ perf report -D | grep MMAP | head -1 0 0 0x460 [0x60]: PERF_RECORD_MMAP -1/0: [0xffffffff99800000(0x1c46a00) @ 0xffffffff99800000]: x [kernel.kallsyms]_text ^^^^^^^^^ Instead of just replacing it to _edata, try _edata first and then fall back to _etext just in case. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230725001929.368041-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
f9dd531c5b |
perf symbols: Add kallsyms__get_symbol_start()
The kallsyms__get_symbol_start() to get any symbol address from kallsyms. The existing kallsyms__get_function_start() only allows text symbols so create this to allow data symbols too. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230725001929.368041-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
4c11adff67 |
perf parse-events: Remove ABORT_ON
Prefer informative messages rather than none with ABORT_ON. Document one failure mode and add an error message for another. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
81a4e31f8c |
perf parse-events: Improve location for add pmu
Improve the location for add PMU for cases when PMUs aren't found. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
d81fa63b09 |
perf parse-events: Populate error column for BPF/tracepoint events
Follow convention from parse_events_terms__num/str and pass the YYLTYPE for the location. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-12-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
b30d4f0b69 |
perf parse-events: Additional error reporting
When no events or PMUs match report an error for event_pmu:
Before:
```
$ perf stat -e 'asdfasdf' -a sleep 1
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
After:
```
$ perf stat -e 'asdfasdf' -a sleep 1
event syntax error: 'asdfasdf'
\___ Bad event name
Unabled to find PMU or event on a PMU of 'asdfasdf'
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Fixes the inadvertent removal when hybrid parsing was modified.
Fixes:
|
||
Ian Rogers
|
b52cb995f1 |
perf parse-events: Separate ENOMEM memory handling
Add PE_ABORT that will YYNOMEM or YYABORT accordingly. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
77cdd787fc |
perf parse-events: Move instances of YYABORT to YYNOMEM
Migration to improve error reporting as YYABORT cases should carry event parsing errors. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
a7a3252dad |
perf parse-events: Separate YYABORT and YYNOMEM cases
Split cases in event_pmu for greater accuracy. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
9462e4de62 |
perf parse-event: Add memory allocation test for name terms
If the name memory allocation fails then propagate to the parser. Committer notes: Use $(BISON_FALLBACK_FLAGS) on the bison call so that we continue building with older bison versions, before 3.81, where YYNOMEM isn't present. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
88cc47e245 |
perf build: Define YYNOMEM as YYNOABORT for bison < 3.81
YYNOMEM was introduced in bison 3.81, so define it as YYABORT for older versions, which should provide the previous perf behaviour. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Georg Müller
|
98ce8e4a9d |
perf test uprobe_from_different_cu: Skip if there is no gcc
Without gcc, the test will fail.
On cleanup, ignore probe removal errors. Otherwise, in case of an error
adding the probe, the temporary directory is not removed.
Fixes:
|
||
Ian Rogers
|
b161f25fa3 |
perf parse-events: Only move force grouped evsels when sorting
Prior to this change, events without a group would be sorted as if they
were from the location of the first event without a group. For example
instructions and cycles are without a group:
instructions,{imc_free_running/data_read/,imc_free_running/data_write/},cycles
parse events would create an eventual evlist like:
instructions,cycles,{uncore_imc_free_running_0/data_read/,uncore_imc_free_running_1/data_read/,uncore_imc_free_running_0/data_write/,uncore_imc_free_running_1/data_write/}
This is done so that perf metric events, that must always be in a
group, will be adjacent and so can be forced into a group.
This change modifies the sorting so that only force grouped events,
like perf metrics, are sorted and all other events keep their position
with respect to groups in the evlist. The location of the force
grouped event is chosen to match the first force grouped event.
For architectures without force grouped events, ie anything not Intel
Icelake or newer, this should mean sorting and fixing doesn't modify
the event positions except when fixing the grouping for PMUs of things
like uncore events.
Fixes:
|
||
Ian Rogers
|
e8d38345da |
perf parse-events: When fixing group leaders always set the leader
The evsel grouping fix iterates over evsels tracking the leader group
and the current position's group, updating the current position's leader
if an evsel is being forced into a group or groups changed. However,
groups changing isn't a sufficient condition as sorting may have
reordered events and the leader may no longer come first. For this
reason update all leaders whenever they disagree.
This change breaks certain Icelake+ metrics due to bugs in the
kernel. For example, tma_l3_bound with threshold enabled tries to
program the events:
{topdown-retiring,slots,CYCLE_ACTIVITY.STALLS_L2_MISS,topdown-fe-bound,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.1_PORTS_UTIL,topdown-be-bound,cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/,CYCLE_ACTIVITY.STALLS_L3_MISS,CPU_CLK_UNHALTED.THREAD,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.2_PORTS_UTIL,CYCLE_ACTIVITY.STALLS_TOTAL,topdown-bad-spec}:W
fixing the perf metric event order gives:
{slots,topdown-retiring,topdown-fe-bound,topdown-be-bound,topdown-bad-spec,CYCLE_ACTIVITY.STALLS_L2_MISS,EXE_ACTIVITY.BOUND_ON_STORES,EXE_ACTIVITY.1_PORTS_UTIL,cpu/INT_MISC.RECOVERY_CYCLES,cmask=1,edge/,CYCLE_ACTIVITY.STALLS_L3_MISS,CPU_CLK_UNHALTED.THREAD,CYCLE_ACTIVITY.STALLS_MEM_ANY,EXE_ACTIVITY.2_PORTS_UTIL,CYCLE_ACTIVITY.STALLS_TOTAL}:W
Both of these return "<not counted>" for all events, whilst they work
with the group removed respecting that the perf metric events must still
be grouped. A vendor events update will need to add METRIC_NO_GROUP to
these metrics to workaround the kernel PMU driver issue.
Fixes:
|
||
Ian Rogers
|
5c49b6c3f2 |
perf parse-events: Extra care around force grouped events
Perf metric (topdown) events on Intel Icelake+ machines require a
group, however, they may be next to events that don't require a group.
Consider:
cycles,slots,topdown-fe-bound
The cycles event needn't be grouped but slots and topdown-fe-bound need
grouping.
Prior to this change, as slots and topdown-fe-bound need a group forcing
and all events share the same PMU, slots and topdown-fe-bound would be
forced into a group with cycles.
This is a bug on two fronts, cycles wasn't supposed to be grouped and
cycles can't be a group leader with a perf metric event.
This change adds recognition that cycles isn't force grouped and so it
shouldn't be force grouped with slots and topdown-fe-bound.
Fixes:
|
||
Ian Rogers
|
93d7e9c8fb |
perf parse-events: Avoid regrouped warning for wild card events
There is logic to avoid printing the regrouping warning for wild card PMUs, this logic also needs to apply for wild card events. Before: ``` $ perf stat -e '{data_read,data_write}' -a sleep 1 WARNING: events were regrouped to match PMUs Performance counter stats for 'system wide': 2,979.16 MiB data_read 410.26 MiB data_write 1.001541923 seconds time elapsed ``` After: ``` $ perf stat -e '{data_read,data_write}' -a sleep 1 Performance counter stats for 'system wide': 2,975.94 MiB data_read 432.05 MiB data_write 1.001119499 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
22881e2b45 |
perf parse-events: Add more comments to 'struct parse_events_state'
Improve documentation of 'struct parse_events_state'. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
7e34daa550 |
perf parse-events: Remove two unused tokens
The tokens PE_PREFIX_RAW and PE_PREFIX_GROUP are unused so remove them. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230627181030.95608-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
bf7d46b3a0 |
perf parse-events: Remove unused PE_KERNEL_PMU_EVENT token
Removed by commit
|
||
Ian Rogers
|
84efbdb7fb |
perf parse-events: Remove unused PE_PMU_EVENT_FAKE token
Removed by commit
|
||
Ian Rogers
|
c126ac4a20 |
perf build: Add LTO build option
Add an LTO build option, that sets the appropriate CFLAGS and CXXFLAGS values. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Fangrui Song <maskray@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230724201247.748146-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
5cfb0cc0d9 |
perf test: Avoid weak symbol for arch_tests
GCC LTO will complain that the array length varies for the arch_tests weak symbol. Use extern/static and architecture determining #if to workaround this problem. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Fangrui Song <maskray@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230724201247.748146-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
0f97a3a0de |
perf parse-events: Avoid use uninitialized warning
With GCC LTO a potential use uninitialized is spotted: ``` In function ‘parse_events_config_bpf’, inlined from ‘parse_events_load_bpf’ at util/parse-events.c:874:8: util/parse-events.c:792:37: error: ‘error_pos’ may be used uninitialized [-Werror=maybe-uninitialized] 792 | idx = term->err_term + error_pos; | ^ util/parse-events.c: In function ‘parse_events_load_bpf’: util/parse-events.c:765:13: note: ‘error_pos’ was declared here 765 | int error_pos; | ^ ``` So initialize at declaration. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Fangrui Song <maskray@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230724201247.748146-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
91f88a0ac8 |
perf stat: Avoid uninitialized use of perf_stat_config
perf_event__read_stat_config will assign values based on number of tags and tag values. Initialize the structs to zero before they are assigned so that no uninitialized values can be seen. This potential error was reported by GCC with LTO enabled. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Fangrui Song <maskray@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230724201247.748146-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Athira Rajeev
|
341e0e9f59 |
perf callchain powerpc: Fix addr location init during arch_skip_callchain_idx function
'perf record; with callchain recording fails as below in powerpc: ./perf record -a -gR sleep 10 ./perf report perf: Segmentation fault gdb trace points to thread__find_map 0 0x00000000101df314 in atomic_cmpxchg (newval=1818846826, oldval=1818846827, v=0x1001a8f3) at /home/athira/linux/tools/include/asm-generic/atomic-gcc.h:70 1 refcount_sub_and_test (i=1, r=0x1001a8f3) at /home/athira/linux/tools/include/linux/refcount.h:135 2 refcount_dec_and_test (r=0x1001a8f3) at /home/athira/linux/tools/include/linux/refcount.h:148 3 map__put (map=0x1001a8b3) at util/map.c:311 4 0x000000001016842c in __map__zput (map=0x7fffffffa368) at util/map.h:190 5 thread__find_map (thread=0x105b92f0, cpumode=<optimized out>, addr=13835058055283572736, al=al@entry=0x7fffffffa358) at util/event.c:582 6 0x000000001016882c in thread__find_symbol (thread=<optimized out>, cpumode=<optimized out>, addr=<optimized out>, al=0x7fffffffa358) at util/event.c:656 7 0x00000000102e12b4 in arch_skip_callchain_idx (thread=<optimized out>, chain=<optimized out>) at arch/powerpc/util/skip-callchain-idx.c:255 8 0x00000000101d3bf4 in thread__resolve_callchain_sample (thread=0x105b92f0, cursor=0x1053d160, evsel=<optimized out>, sample=0x7fffffffa908, parent=0x7fffffffa778, root_al=0x7fffffffa710, max_stack=<optimized out>) at util/machine.c:2940 9 0x00000000101cd210 in sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=<optimized out>, evsel=<optimized out>, al=<optimized out>, max_stack=<optimized out>) at util/callchain.c:1112 10 0x000000001022a9d8 in hist_entry_iter__add (iter=0x7fffffffa750, al=0x7fffffffa710, max_stack_depth=<optimized out>, arg=0x7fffffffbbd0) at util/hist.c:1232 11 0x0000000010056d98 in process_sample_event (tool=0x7fffffffbbd0, event=0x7ffff6223c38, sample=0x7fffffffa908, evsel=<optimized out>, machine=0x10524ef8) at builtin-report.c:332 Here arch_skip_callchain_idx calls thread__find_symbol and which invokes thread__find_map with uninitialised "addr_location". Snippet: thread__find_symbol(thread, PERF_RECORD_MISC_USER, ip, &al); Recent change with commit |
||
Haixin Yu
|
9754353d0a |
perf pmu arm64: Fix reading the PMU cpu slots in sysfs
Commit |
||
Lu Hongfei
|
681f34d52b |
perf diff: Replaces some ',' as separator with the more usual ';'
When wrapping code, use ';' better than using ',' which is more in line with the coding habits of most engineers. Signed-off-by: Lu Hongfei <luhongfei@vivo.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: opensource.kernel@vivo.com Link: https://lore.kernel.org/r/20230706094635.1553-1-luhongfei@vivo.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
7b47623b8c |
perf bench uprobe trace_printk: Add entry attaching an BPF program that does a trace_printk
[root@five ~]# perf bench uprobe all # Running uprobe/baseline benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,053,963 usecs 1,053.963 usecs/op # Running uprobe/empty benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,056,293 usecs +2,330 to baseline 1,056.293 usecs/op 2.330 usecs/op to baseline # Running uprobe/trace_printk benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,056,977 usecs +3,014 to baseline +684 to previous 1,056.977 usecs/op 3.014 usecs/op to baseline 0.684 usecs/op to previous [root@five ~]# Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andre Fredette <anfredet@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Dave Tucker <datucker@redhat.com> Cc: Derek Barbosa <debarbos@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719204910.539044-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
6af5e4cf3a |
perf bench uprobe empty: Add entry attaching an empty BPF program
Using libbpf and a BPF skel: # perf bench uprobe all # Running uprobe/baseline benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,055,618 usecs 1,055.618 usecs/op # Running uprobe/empty benchmark... # Executed 1,000 usleep(1000) calls Total time: 1,057,146 usecs +1,528 to baseline 1,057.146 usecs/op # Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andre Fredette <anfredet@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Dave Tucker <datucker@redhat.com> Cc: Derek Barbosa <debarbos@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719204910.539044-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
54d811023b |
perf bench uprobe: Show diff to previous
Will be useful to show the incremental overhead as we do more stuff in the BPF program attached to the uprobes. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andre Fredette <anfredet@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Dave Tucker <datucker@redhat.com> Cc: Derek Barbosa <debarbos@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719204910.539044-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
dded6f615b |
perf bench uprobe: Print diff to baseline
This is just prep work to show the diff to the unmodified workload. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andre Fredette <anfredet@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Dave Tucker <datucker@redhat.com> Cc: Derek Barbosa <debarbos@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719204910.539044-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
2df2707164 |
perf bench uprobe: Add benchmark to test uprobe overhead
This just adds the initial "workload", a call to libc's usleep(1000us) function: $ perf stat --null perf bench uprobe all # Running uprobe/baseline benchmark... # Executed 1000 usleep(1000) calls Total time: 1053533 usecs 1053.533 usecs/op Performance counter stats for 'perf bench uprobe all': 1.061042896 seconds time elapsed 0.001079000 seconds user 0.006499000 seconds sys $ More entries will be added using a BPF skel to add various uprobes to the usleep() function. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andre Fredette <anfredet@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Dave Tucker <datucker@redhat.com> Cc: Derek Barbosa <debarbos@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719204910.539044-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
fcca1faf11 |
perf trace: Free thread_trace->files table
The fd->pathname table that is kept in 'struct thread_trace' and thus in thread->priv must be freed when a thread is deleted. This was also detected using -fsanitize=address. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719202951.534582-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
7962ef1365 |
perf trace: Really free the evsel->priv area
In |
||
Arnaldo Carvalho de Melo
|
9de251cb50 |
perf trace: Register a thread priv destructor
To plug these leaks detected with: $ make EXTRA_CFLAGS="-fsanitize=address" BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools-next -C tools/perf install-bin ================================================================= ==473890==ERROR: LeakSanitizer: detected memory leaks Direct leak of 112 byte(s) in 1 object(s) allocated from: #0 0x7fdf19aba097 in calloc (/lib64/libasan.so.8+0xba097) #1 0x987836 in zalloc (/home/acme/bin/perf+0x987836) #2 0x5367ae in thread_trace__new /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:1289 #3 0x5367ae in thread__trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:1307 #4 0x5367ae in trace__sys_exit /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:2468 #5 0x52bf34 in trace__handle_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3177 #6 0x52bf34 in __trace__deliver_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3685 #7 0x542927 in trace__deliver_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3712 #8 0x542927 in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:4055 #9 0x542927 in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5141 #10 0x5ef1a2 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323 #11 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377 #12 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421 #13 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537 #14 0x7fdf18a4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) Direct leak of 2048 byte(s) in 1 object(s) allocated from: #0 0x7f788fcba6af in __interceptor_malloc (/lib64/libasan.so.8+0xba6af) #1 0x5337c0 in trace__sys_enter /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:2342 #2 0x52bfb4 in trace__handle_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3191 #3 0x52bfb4 in __trace__deliver_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3699 #4 0x542883 in trace__deliver_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3726 #5 0x542883 in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:4069 #6 0x542883 in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5155 #7 0x5ef232 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323 #8 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377 #9 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421 #10 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537 #11 0x7f788ec4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) Indirect leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7fdf19aba6af in __interceptor_malloc (/lib64/libasan.so.8+0xba6af) #1 0x77b335 in intlist__new util/intlist.c:116 #2 0x5367fd in thread_trace__new /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:1293 #3 0x5367fd in thread__trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:1307 #4 0x5367fd in trace__sys_exit /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:2468 #5 0x52bf34 in trace__handle_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3177 #6 0x52bf34 in __trace__deliver_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3685 #7 0x542927 in trace__deliver_event /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3712 #8 0x542927 in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:4055 #9 0x542927 in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5141 #10 0x5ef1a2 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323 #11 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377 #12 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421 #13 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537 #14 0x7fdf18a4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719202951.534582-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
04cb4fc4d4 |
perf thread: Allow tools to register a thread->priv destructor
So that when thread__delete() runs it can be called and free stuff tools stashed into thread->priv, like 'perf trace' does and will use this new facility to plug some leaks. Added an assert(thread__priv_destructor == NULL) as suggested in Ian's review. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fV3Er=Ek8=iE=bSGbEBmM56_PJffMWot1g_5Bh8B5hO7A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
3f6a74bd62 |
perf evsel: Free evsel->filter on the destructor
Noticed with: make EXTRA_CFLAGS="-fsanitize=address" BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools-next -C tools/perf install-bin Direct leak of 45 byte(s) in 1 object(s) allocated from: #0 0x7f213f87243b in strdup (/lib64/libasan.so.8+0x7243b) #1 0x63d15f in evsel__set_filter util/evsel.c:1371 #2 0x63d15f in evsel__append_filter util/evsel.c:1387 #3 0x63d15f in evsel__append_tp_filter util/evsel.c:1400 #4 0x62cd52 in evlist__append_tp_filter util/evlist.c:1145 #5 0x62cd52 in evlist__append_tp_filter_pids util/evlist.c:1196 #6 0x541e49 in trace__set_filter_loop_pids /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3646 #7 0x541e49 in trace__set_filter_pids /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3670 #8 0x541e49 in trace__run /home/acme/git/perf-tools/tools/perf/builtin-trace.c:3970 #9 0x541e49 in cmd_trace /home/acme/git/perf-tools/tools/perf/builtin-trace.c:5141 #10 0x5ef1a2 in run_builtin /home/acme/git/perf-tools/tools/perf/perf.c:323 #11 0x4196da in handle_internal_command /home/acme/git/perf-tools/tools/perf/perf.c:377 #12 0x4196da in run_argv /home/acme/git/perf-tools/tools/perf/perf.c:421 #13 0x4196da in main /home/acme/git/perf-tools/tools/perf/perf.c:537 #14 0x7f213e84a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) Free it on evsel__exit(). Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20230719202951.534582-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Andrei Vagin
|
7d5cb68af6 |
perf/benchmark: add a new benchmark for seccom_unotify
The benchmark is similar to the pipe benchmark. It creates two processes, one is calling syscalls, and another process is handling them via seccomp user notifications. It measures the time required to run a specified number of interations. $ ./perf bench sched seccomp-notify --sync-mode --loop 1000000 # Running 'sched/seccomp-notify' benchmark: # Executed 1000000 system calls Total time: 2.769 [sec] 2.769629 usecs/op 361059 ops/sec $ ./perf bench sched seccomp-notify # Running 'sched/seccomp-notify' benchmark: # Executed 1000000 system calls Total time: 8.571 [sec] 8.571119 usecs/op 116670 ops/sec Signed-off-by: Andrei Vagin <avagin@google.com> Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lore.kernel.org/r/20230308073201.3102738-7-avagin@google.com Link: https://lore.kernel.org/r/20230630051953.454638-1-avagin@gmail.com [kees: Added PRIu64 format string] Signed-off-by: Kees Cook <keescook@chromium.org> |
||
Arnaldo Carvalho de Melo
|
2480232c61 |
perf test task_exit: No need for a cycles event to check if we get an PERF_RECORD_EXIT
The intent of this test is to check we get a PERF_RECORD_EXIT as asked for by setting perf_event_attr.task=1. When the test was written we didn't had the "dummy" event so we went with the default event, "cycles". There were reports of this test failing sometimes, one of these reports was with a PREEMPT_RT_FULL, but I noticed it failing sometimes with an aarch64 Firefly board. In the kernel the call to perf_event_task_output(), that generates the PERF_RECORD_EXIT may fail when there is not enough memory in the ring buffer, if the ring buffer is paused, etc. So switch to using the "dummy" event to use the ring buffer just for what the test was designed for, avoiding uneeded PERF_RECORD_SAMPLEs. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZLGXmMuNRpx1ubFm@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
0e022f5bf7 |
perf beauty: Update copy of linux/socket.h with the kernel sources
To pick the changes in: |
||
Ian Rogers
|
5b10c18d1b |
perf parse-events: Avoid SEGV if PMU lookup fails for legacy cache terms
libfuzzer found the following command could SEGV:
$ perf stat -e cpu/L2,L2/ true
This is because the L2 term rewrites the perf_event_attr type to
PERF_TYPE_HW_CACHE which then fails the PMU lookup for the second
legacy cache term.
The new failure is consistent with repeated hardware terms:
$ perf stat -e cpu/L2,L2/ true
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Initial error:
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
$ perf stat -e cpu/cycles,cycles/ true
event syntax error: 'cpu/cycles,cycles/'
\___ Failed to find PMU for type 0
Initial error:
event syntax error: 'cpu/cycles,cycles/'
\___ Failed to find PMU for type 0
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Committer testing:
Before:
$ perf stat -e cpu/L2,L2/ true
Segmentation fault (core dumped)
$
After:
$ perf stat -e cpu/L2,L2/ true
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Initial error:
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
$
Fixes:
|
||
Arnaldo Carvalho de Melo
|
920b91d927 |
tools include UAPI: Sync linux/mount.h copy with the kernel sources
To pick the changes from:
|
||
Sandipan Das
|
8d40f74ebf |
perf vendor events amd: Fix large metrics
There are cases where a metric requires more events than the number of available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric counters but the "nps1_die_to_dram" metric has eight events. By default, the constituent events are placed in a group and since the events cannot be scheduled at the same time, the metric is not computed. The "all metrics" test also fails because of this. Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect the user to run perf with "--metric-no-group". E.g. $ sudo perf test -v 101 Before: 101: perf all metrics test : --- start --- test child forked, pid 37131 Testing branch_misprediction_ratio Testing all_remote_links_outbound Testing nps1_die_to_dram Metric 'nps1_die_to_dram' not printed in: Error: Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'. Testing macro_ops_dispatched Testing all_l2_cache_accesses Testing all_l2_cache_hits Testing all_l2_cache_misses Testing ic_fetch_miss_ratio Testing l2_cache_accesses_from_l2_hwpf Testing l2_cache_misses_from_l2_hwpf Testing op_cache_fetch_miss_ratio Testing l3_read_miss_latency Testing l1_itlb_misses test child finished with -1 ---- end ---- perf all metrics test: FAILED! After: 101: perf all metrics test : --- start --- test child forked, pid 43766 Testing branch_misprediction_ratio Testing all_remote_links_outbound Testing nps1_die_to_dram Testing macro_ops_dispatched Testing all_l2_cache_accesses Testing all_l2_cache_hits Testing all_l2_cache_misses Testing ic_fetch_miss_ratio Testing l2_cache_accesses_from_l2_hwpf Testing l2_cache_misses_from_l2_hwpf Testing op_cache_fetch_miss_ratio Testing l3_read_miss_latency Testing l1_itlb_misses test child finished with 0 ---- end ---- perf all metrics test: Ok Reported-by: Ayush Jain <ayush.jain3@amd.com> Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
1feece2780 |
perf build: Fix library not found error when using CSLIBS
-L only specifies the search path for libraries directly provided in the link line with -l. Because -lopencsd isn't specified, it's only linked because it's a dependency of -lopencsd_c_api. Dependencies like this are resolved using the default system search paths or -rpath-link=... rather than -L. This means that compilation only works if OpenCSD is installed to the system rather than provided with the CSLIBS (-L) option. This could be fixed by adding -Wl,-rpath-link=$(CSLIBS) but that is less conventional than just adding -lopencsd to the link line so that it uses -L. -lopencsd seems to have been removed in commit |
||
Arnaldo Carvalho de Melo
|
9350a91791 |
tools headers UAPI: Sync files changed by new cachestat syscall with the kernel sources
To pick the changes in these csets:
|
||
Georg Müller
|
c66e1c68c1 |
perf probe: Read DWARF files from the correct CU
After switching from dwarf_decl_file() to die_get_decl_file(), it is not
possible to add probes for certain functions:
$ perf probe -x /usr/lib/systemd/systemd-logind match_unit_removed
A function DIE doesn't have decl_line. Maybe broken DWARF?
A function DIE doesn't have decl_line. Maybe broken DWARF?
Probe point 'match_unit_removed' not found.
Error: Failed to add events.
The problem is that die_get_decl_file() uses the wrong CU to search for
the file. elfutils commit e1db5cdc9f has some good explanation for this:
dwarf_decl_file uses dwarf_attr_integrate to get the DW_AT_decl_file
attribute. This means the attribute might come from a different DIE
in a different CU. If so, we need to use the CU associated with the
attribute, not the original DIE, to resolve the file name.
This patch uses the same source of information as elfutils: use attribute
DW_AT_decl_file and use this CU to search for the file.
Fixes:
|
||
Georg Müller
|
56cbeacf14 |
perf probe: Add test for regression introduced by switch to die_get_decl_file()
This patch adds a test to validate that 'perf probe' works for binaries where DWARF info is split into multiple CUs Signed-off-by: Georg Müller <georgmueller@gmx.net> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: regressions@lists.linux.dev Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230628084551.1860532-5-georgmueller@gmx.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Linus Torvalds
|
c206353dfd |
perf tools changes and fixes for v6.5: 2nd batch
Build: - Allow to generate vmlinux.h from BTF using `make GEN_VMLINUX_H=1` and skip if the vmlinux has no BTF. - Replace deprecated clang -target xxx option by --target=xxx. perf record: - Print event attributes with well known type and config symbols in the debug output like below: # perf record -e cycles,cpu-clock -C0 -vv true <SNIP> ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0 (PERF_COUNT_SW_CPU_CLOCK) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 - Update AMD IBS event error message since it now support per-process profiling but no priviledge filters. $ sudo perf record -e ibs_op//k -C 0 Error: AMD IBS doesn't support privilege filtering. Try again without the privilege modifiers (like 'k') at the end. perf lock contention: - Support CSV style output using -x option $ sudo perf lock con -ab -x, sleep 1 # output: contended, total wait, max wait, avg wait, type, caller 19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0 15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e 4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d 1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135 8, 67608, 27404, 8451, spinlock, __queue_work+0x174 3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff 3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248 2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad 1, 24619, 24619, 24619, spinlock, rcu_core+0xd4 - Add --output option to save the data to a file not to be interfered by other debug messages. Test: - Fix event parsing test on ARM where there's no raw PMU nor supports PERF_PMU_CAP_EXTENDED_HW_TYPE. - Update the lock contention test case for CSV output. - Fix a segfault in the daemon command test. Vendor events (JSON): - Add has_event() to check if the given event is available on system at runtime. On Intel machines, some transaction events may not be present when TSC extensions are disabled. - Update Intel event metrics. Misc: - Sort symbols by name using an external array of pointers instead of a rbtree node in the symbol. This will save 16-bytes or 24-bytes per symbol whether the sorting is actually requested or not. - Fix unwinding DWARF callstacks using libdw when --symfs option is used. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZKb4mwAKCRCMstVUGiXM g1QqAPwKZow/DhAzyN7KvzdNd+SojRGpUMl6RkVphY/9ntDqPAD+L3V5aXLTiC1L 8kUzdpRX5VMjqdR9U7TycUOi4QU40QA= =dEF1 -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next Pull more perf tools updates from Namhyung Kim: "These are remaining changes and fixes for this cycle. Build: - Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1` and skip if the vmlinux has no BTF. - Replace deprecated clang -target xxx option by --target=xxx. perf record: - Print event attributes with well known type and config symbols in the debug output like below: # perf record -e cycles,cpu-clock -C0 -vv true <SNIP> ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0 (PERF_COUNT_SW_CPU_CLOCK) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 - Update AMD IBS event error message since it now support per-process profiling but no priviledge filters. $ sudo perf record -e ibs_op//k -C 0 Error: AMD IBS doesn't support privilege filtering. Try again without the privilege modifiers (like 'k') at the end. perf lock contention: - Support CSV style output using -x option $ sudo perf lock con -ab -x, sleep 1 # output: contended, total wait, max wait, avg wait, type, caller 19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0 15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e 4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d 1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135 8, 67608, 27404, 8451, spinlock, __queue_work+0x174 3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff 3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248 2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad 1, 24619, 24619, 24619, spinlock, rcu_core+0xd4 - Add --output option to save the data to a file not to be interfered by other debug messages. Test: - Fix event parsing test on ARM where there's no raw PMU nor supports PERF_PMU_CAP_EXTENDED_HW_TYPE. - Update the lock contention test case for CSV output. - Fix a segfault in the daemon command test. Vendor events (JSON): - Add has_event() to check if the given event is available on system at runtime. On Intel machines, some transaction events may not be present when TSC extensions are disabled. - Update Intel event metrics. Misc: - Sort symbols by name using an external array of pointers instead of a rbtree node in the symbol. This will save 16-bytes or 24-bytes per symbol whether the sorting is actually requested or not. - Fix unwinding DWARF callstacks using libdw when --symfs option is used" * tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits) perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported. perf test: Fix event parsing test on Arm perf evsel amd: Fix IBS error message perf: unwind: Fix symfs with libdw perf symbol: Fix uninitialized return value in symbols__find_by_name() perf test: Test perf lock contention CSV output perf lock contention: Add --output option perf lock contention: Add -x option for CSV style output perf lock: Remove stale comments perf vendor events intel: Update tigerlake to 1.13 perf vendor events intel: Update skylakex to 1.31 perf vendor events intel: Update skylake to 57 perf vendor events intel: Update sapphirerapids to 1.14 perf vendor events intel: Update icelakex to 1.21 perf vendor events intel: Update icelake to 1.19 perf vendor events intel: Update cascadelakex to 1.19 perf vendor events intel: Update meteorlake to 1.03 perf vendor events intel: Add rocketlake events/metrics perf vendor metrics intel: Make transaction metrics conditional perf jevents: Support for has_event function ... |
||
James Clark
|
bcd981db12 |
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
Arm has multiple PMU types for heterogeneous systems, but doesn't
currently support PERF_PMU_CAP_EXTENDED_HW_TYPE. Make the tests
support both scenarios so that they pass on Arm, and will still pass
once PERF_PMU_CAP_EXTENDED_HW_TYPE support is added.
Fixes:
|
||
James Clark
|
808ce56e7d |
perf test: Fix event parsing test on Arm
The test looks for a PMU from sysfs with type = PERF_TYPE_RAW when
opening a raw event. Arm doesn't have a real raw PMU, only core PMUs
with unique types other than raw.
Instead of looking for a matching PMU, just test that the event type
was parsed as raw and skip the PMU search on Arm. The raw event type
test should also apply to all platforms so add it outside of the ifdef.
Fixes:
|
||
Ravi Bangoria
|
b2ad9549bf |
perf evsel amd: Fix IBS error message
AMD IBS can do per-process profiling[1] and is no longer restricted to per-cpu or systemwide only. Remove stale error message. Also, checking just exclude_kernel is not sufficient since IBS does not support any privilege filters. So include all exclude_* checks. And finally, move these checks under tools/perf/arch/x86/ from generic code. Before: $ sudo ./perf record -e ibs_op//k -C 0 Error: AMD IBS may only be available in system-wide/per-cpu mode. Try using -a, or -C and workload affinity After: $ sudo ./perf record -e ibs_op//k -C 0 Error: AMD IBS doesn't support privilege filtering. Try again without the privilege modifiers (like 'k') at the end. [1] https://git.kernel.org/torvalds/c/30093056f7b2 Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: ananth.narayan@amd.com Cc: sandipan.das@amd.com Cc: santosh.shukla@amd.com Cc: irogers@google.com Cc: peterz@infradead.org Cc: adrian.hunter@intel.com Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lore.kernel.org/r/20230630085230.437-1-ravi.bangoria@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Vincent Whitchurch
|
5f06267b6e |
perf: unwind: Fix symfs with libdw
Pass the full path including the symfs (if any) to libdw. Without this unwinding fails with errors like this when a symfs is used: unwind: failed with 'No such file or directory'" Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kernel@axis.com Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230630-perf-libdw-symfs-v2-1-469760dd4d5b@axis.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
James Clark
|
78a175c462 |
perf symbol: Fix uninitialized return value in symbols__find_by_name()
found_idx and s aren't initialized, so if no symbol is found then the
assert at the end will index off the end of the array causing a
segfault. The function also doesn't return NULL when the symbol isn't
found even if the assert passes. Fix it by initializing the values and
only setting them when something is found.
Fixes the following test failure:
$ perf test 1
1: vmlinux symtab matches kallsyms : FAILED!
Fixes:
|
||
Namhyung Kim
|
2aefb4cc90 |
perf test: Test perf lock contention CSV output
To verify CSV output, just check the number of separators (",") using the tr and wc commands like this. grep -v "^#" ${result} | tr -d -c | wc -c Now it expects 6 columns (and 5 separators) in the output, but it may be changed later so count the field in the header first and compare it to the actual output lines. $ cat ${result} # output: contended, total wait, max wait, avg wait, type, caller 1, 28787, 28787, 28787, spinlock, raw_spin_rq_lock_nested+0x1b The test looks like below now: $ sudo ./perf test -v contention 86: kernel lock contention analysis test : --- start --- test child forked, pid 2705822 Testing perf lock record and perf lock contention Testing perf lock contention --use-bpf Testing perf lock record and perf lock contention at the same time Testing perf lock contention --threads Testing perf lock contention --lock-addr Testing perf lock contention --type-filter (w/ spinlock) Testing perf lock contention --lock-filter (w/ tasklist_lock) Testing perf lock contention --callstack-filter (w/ unix_stream) Testing perf lock contention --callstack-filter with task aggregation Testing perf lock contention CSV output test child finished with 0 ---- end ---- kernel lock contention analysis test: Ok Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628200141.2739587-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Namhyung Kim
|
f6027053f8 |
perf lock contention: Add --output option
To avoid formatting failures for example in CSV output due to debug messages, add --output option to put the result in a file. Unfortunately the short -o option was taken by the --owner already. $ sudo ./perf lock con -ab --output lock-out.txt -v sleep 1 Looking at the vmlinux_path (8 entries long) symsrc__init: cannot get elf header. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols $ head lock-out.txt contended total wait max wait avg wait type caller 3 76.79 us 26.89 us 25.60 us rwlock:R ep_poll_callback+0x2d 0xffffffff9a23f4b5 _raw_read_lock_irqsave+0x45 0xffffffff99bbd4dd ep_poll_callback+0x2d 0xffffffff999029f3 __wake_up_common+0x73 0xffffffff99902b82 __wake_up_common_lock+0x82 0xffffffff99fa5b1c sock_def_readable+0x3c 0xffffffff9a11521d unix_stream_sendmsg+0x18d 0xffffffff99f9fc9c sock_sendmsg+0x5c Suggested-by: Ian Rogers <irogers@google.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628200141.2739587-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Namhyung Kim
|
69c5c9930d |
perf lock contention: Add -x option for CSV style output
Sometimes we want to process the output by external programs. Let's add the -x option to specify the field separator like perf stat. $ sudo ./perf lock con -ab -x, sleep 1 # output: contended, total wait, max wait, avg wait, type, caller 19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0 15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e 4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d 1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135 8, 67608, 27404, 8451, spinlock, __queue_work+0x174 3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff 3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248 2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad 1, 24619, 24619, 24619, spinlock, rcu_core+0xd4 The first line is a comment that shows the output format. Each line is separated by the given string ("," in this case). The time is printed in nsec without the unit so that it can be parsed easily. The characters can be used in the output like (":", "+" and ".") are not allowed for the -x option. $ ./perf lock con -x: Cannot use the separator that is already used Usage: perf lock contention [<options>] -x, --field-separator <separator> print result in CSV format with custom separator The stacktraces are printed in the same line separated by ":". The header is updated to show the stacktrace. Also the debug output is added at the end as a comment. $ sudo ./perf lock con -abv -x, -F wait_total sleep 1 Looking at the vmlinux_path (8 entries long) symsrc__init: cannot get elf header. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols # output: total wait, type, caller, stacktrace 37134, spinlock, rcu_core+0xd4, 0xffffffff9d0401e4 _raw_spin_lock_irqsave+0x44: 0xffffffff9c738114 rcu_core+0xd4: ... 21213, spinlock, raw_spin_rq_lock_nested+0x1b, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9c6d9cfb raw_spin_rq_lock_nested+0x1b: ... 20506, rwlock:W, ep_done_scan+0x2d, 0xffffffff9c9bc4dd ep_done_scan+0x2d: 0xffffffff9c9bd5f1 do_epoll_wait+0x6d1: ... 18044, rwlock:R, ep_poll_callback+0x2d, 0xffffffff9d040555 _raw_read_lock_irqsave+0x45: 0xffffffff9c9bc81d ep_poll_callback+0x2d: ... 17890, rwlock:W, do_epoll_wait+0x47b, 0xffffffff9c9bd39b do_epoll_wait+0x47b: 0xffffffff9c9be9ef __x64_sys_epoll_wait+0x6d1: ... 12114, spinlock, futex_wait_queue+0x60, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9d037cae __schedule+0xbe: ... # debug: total=7, bad=0, bad_task=0, bad_stack=0, bad_time=0, bad_data=0 Also note that some field (like lock symbols) can be empty. $ sudo ./perf lock con -abl -x, -E 10 sleep 1 # output: contended, total wait, max wait, avg wait, address, symbol, type 6, 275025, 61764, 45837, ffff9dcc9f7d60d0, , spinlock 18, 87716, 11196, 4873, ffff9dc540059000, , spinlock 2, 6472, 5499, 3236, ffff9dcc7f730e00, rq_lock, spinlock 3, 4429, 2341, 1476, ffff9dcc7f7b0e00, rq_lock, spinlock 3, 3974, 1635, 1324, ffff9dcc7f7f0e00, rq_lock, spinlock 4, 3290, 1326, 822, ffff9dc5f4e2cde0, , rwlock 3, 2894, 1023, 964, ffffffff9e0d7700, rcu_state, spinlock 1, 2567, 2567, 2567, ffff9dcc7f6b0e00, rq_lock, spinlock 4, 1259, 596, 314, ffff9dc69c2adde0, , rwlock 1, 934, 934, 934, ffff9dcc7f670e00, rq_lock, spinlock Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628200141.2739587-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Namhyung Kim
|
7b83d597c8 |
perf lock: Remove stale comments
The comment was for symbol_conf.sort_by_name which was deleted already. Let's get rid of the stale comments as well. Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628200141.2739587-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Linus Torvalds
|
b30d7a77c5 |
perf tools changes and fixes for v6.5: 1st batch
Internal cleanup: - Refactor PMU data management to handle hybrid systems in a generic way. Do more work in the lexer so that legacy event types parse more easily. A side-effect of this is that if a PMU is specified, scanning sysfs is avoided improving start-up time. - Fix hybrid metrics, for example, the TopdownL1 works for both performance and efficiency cores on Intel machines. To support this, sort and regroup events after parsing. - Add reference count checking for the 'thread' data structure. - Lots of fixes for memory leaks in various places thanks to the ASAN and Ian's refcount checker. - Reduce the binary size by replacing static variables with local or dynamically allocated memory. - Introduce shared_mutex for annotate data to reduce memory footprint. - Make filesystem access library functions more thread safe. Test: - Organize cpu_map tests into a single suite. - Add metric value validation test to check if the values are within correct value ranges. - Add perf stat stdio output test to check if event and metric names match. - Add perf data converter JSON output test. - Fix a lot of issues reported by shellcheck(1). This is a preparation to enable shellcheck by default. - Make the large x86 new instructions test optional at build time using EXTRA_TESTS=1. - Add a test for libpfm4 events. perf script: - Add 'dsoff' outpuf field to display offset from the DSO. $ perf script -F comm,pid,event,ip,dsoff ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5) ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms]) ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms]) ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms]) ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms]) ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms]) ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0) ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms]) - Adjust width for large PID/TID values. perf report: - Robustify reading addr2line output for srcline by checking sentinel output before the actual data and by using timeout of 1 second. - Allow config terms (like 'name=ABC') with breakpoint events. $ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1 perf annotate: - Handle x86 instruction suffix like 'l' in 'movl' generally. - Parse instruction operands properly even with a whitespace. This is needed for llvm-objdump output. - Support RISC-V binutils lookup using the triplet prefixes. - Add '<' and '>' key to navigate to prev/next symbols in TUI. - Fix instruction association and parsing for LoongArch. perf stat: - Add --per-cache aggregation option, optionally specify a cache level like `--per-cache=L2`. $ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\ taskset -c 0-15,64-79,128-143,192-207\ perf bench sched messaging -p -t -l 100000 -g 8 # Running 'sched/messaging' benchmark: # 20 sender and receiver threads per group # 8 groups == 320 threads run Total time: 7.648 [sec] Performance counter stats for 'system wide': S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote 7.779171484 seconds time elapsed - Change default (no event/metric) formatting for default metrics so that events are hidden and the metric and group appear. Performance counter stats for 'ls /': 1.85 msec task-clock # 0.594 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 97 page-faults # 52.517 K/sec 2,187,173 cycles # 1.184 GHz 2,474,459 instructions # 1.13 insn per cycle 531,584 branches # 287.805 M/sec 13,626 branch-misses # 2.56% of all branches TopdownL1 # 23.5 % tma_backend_bound # 11.5 % tma_bad_speculation # 39.1 % tma_frontend_bound # 25.9 % tma_retiring - Allow --cputype option to have any PMU name (not just hybrid). - Fix output value not to added when it runs multiple times with -r option. perf list: - Show metricgroup description from JSON file called metricgroups.json. - Allow 'pfm' argument to list only libpfm4 events and check each event is supported before showing it. JSON vendor events: - Avoid event grouping using "NO_GROUP_EVENTS" constraints. The topdown events are correctly grouped even if no group exists. - Add "Default" metric group to print it in the default output. And use "DefaultMetricgroupName" to indicate the real metric group name. - Add AmpereOne core PMU events. Misc: - Define man page date correctly. - Track exception level properly on ARM CoreSight ETM. - Allow anonymous struct, union or enum when retrieving type names from DWARF. - Fix incorrect filename when calling `perf inject --jit`. - Handle PLT size correctly on LoongArch. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZJxT3gAKCRCMstVUGiXM g3//AQDyH3tbAVxU6JkvEOjjDvK7MWeXef7GQh8MP8D9Wkxk1AD9HgyxZWXn+mer wxzBMntnxlr9+mkBerrVwUzYMd/IJQk= =hPh8 -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next Pull perf tools updates from Namhyung Kim: "Internal cleanup: - Refactor PMU data management to handle hybrid systems in a generic way. Do more work in the lexer so that legacy event types parse more easily. A side-effect of this is that if a PMU is specified, scanning sysfs is avoided improving start-up time. - Fix hybrid metrics, for example, the TopdownL1 works for both performance and efficiency cores on Intel machines. To support this, sort and regroup events after parsing. - Add reference count checking for the 'thread' data structure. - Lots of fixes for memory leaks in various places thanks to the ASAN and Ian's refcount checker. - Reduce the binary size by replacing static variables with local or dynamically allocated memory. - Introduce shared_mutex for annotate data to reduce memory footprint. - Make filesystem access library functions more thread safe. Test: - Organize cpu_map tests into a single suite. - Add metric value validation test to check if the values are within correct value ranges. - Add perf stat stdio output test to check if event and metric names match. - Add perf data converter JSON output test. - Fix a lot of issues reported by shellcheck(1). This is a preparation to enable shellcheck by default. - Make the large x86 new instructions test optional at build time using EXTRA_TESTS=1. - Add a test for libpfm4 events. perf script: - Add 'dsoff' outpuf field to display offset from the DSO. $ perf script -F comm,pid,event,ip,dsoff ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5) ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms]) ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms]) ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms]) ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms]) ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms]) ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0) ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms]) - Adjust width for large PID/TID values. perf report: - Robustify reading addr2line output for srcline by checking sentinel output before the actual data and by using timeout of 1 second. - Allow config terms (like 'name=ABC') with breakpoint events. $ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1 perf annotate: - Handle x86 instruction suffix like 'l' in 'movl' generally. - Parse instruction operands properly even with a whitespace. This is needed for llvm-objdump output. - Support RISC-V binutils lookup using the triplet prefixes. - Add '<' and '>' key to navigate to prev/next symbols in TUI. - Fix instruction association and parsing for LoongArch. perf stat: - Add --per-cache aggregation option, optionally specify a cache level like `--per-cache=L2`. $ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\ taskset -c 0-15,64-79,128-143,192-207\ perf bench sched messaging -p -t -l 100000 -g 8 # Running 'sched/messaging' benchmark: # 20 sender and receiver threads per group # 8 groups == 320 threads run Total time: 7.648 [sec] Performance counter stats for 'system wide': S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote 7.779171484 seconds time elapsed - Change default (no event/metric) formatting for default metrics so that events are hidden and the metric and group appear. Performance counter stats for 'ls /': 1.85 msec task-clock # 0.594 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 97 page-faults # 52.517 K/sec 2,187,173 cycles # 1.184 GHz 2,474,459 instructions # 1.13 insn per cycle 531,584 branches # 287.805 M/sec 13,626 branch-misses # 2.56% of all branches TopdownL1 # 23.5 % tma_backend_bound # 11.5 % tma_bad_speculation # 39.1 % tma_frontend_bound # 25.9 % tma_retiring - Allow --cputype option to have any PMU name (not just hybrid). - Fix output value not to added when it runs multiple times with -r option. perf list: - Show metricgroup description from JSON file called metricgroups.json. - Allow 'pfm' argument to list only libpfm4 events and check each event is supported before showing it. JSON vendor events: - Avoid event grouping using "NO_GROUP_EVENTS" constraints. The topdown events are correctly grouped even if no group exists. - Add "Default" metric group to print it in the default output. And use "DefaultMetricgroupName" to indicate the real metric group name. - Add AmpereOne core PMU events. Misc: - Define man page date correctly. - Track exception level properly on ARM CoreSight ETM. - Allow anonymous struct, union or enum when retrieving type names from DWARF. - Fix incorrect filename when calling `perf inject --jit`. - Handle PLT size correctly on LoongArch" * tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits) perf test: Skip metrics w/o event name in stat STD output linter perf test: Reorder event name checks in stat STD output linter perf pmu: Remove a hard coded cpu PMU assumption perf pmus: Add notion of default PMU for JSON events perf unwind: Fix map reference counts perf test: Set PERF_EXEC_PATH for script execution perf script: Initialize buffer for regs_map() perf tests: Fix test_arm_callgraph_fp variable expansion perf symbol: Add LoongArch case in get_plt_sizes() perf test: Remove x permission from lib/stat_output.sh perf test: Rerun failed metrics with longer workload perf test: Add skip list for metrics known would fail perf test: Add metric value validation test perf jit: Fix incorrect file name in DWARF line table perf annotate: Fix instruction association and parsing for LoongArch perf annotation: Switch lock from a mutex to a sharded_mutex perf sharded_mutex: Introduce sharded_mutex tools: Fix incorrect calculation of object size by sizeof perf subcmd: Fix missing check for return value of malloc() in add_cmdname() perf parse-events: Remove unneeded semicolon ... |
||
Ian Rogers
|
887e845f8c |
perf vendor events intel: Update tigerlake to 1.13
Updates were released in:
|
||
Ian Rogers
|
b5d2644da6 |
perf vendor events intel: Update skylakex to 1.31
Updates were released in:
|
||
Ian Rogers
|
ea3eafa08a |
perf vendor events intel: Update skylake to 57
Updates were released in:
|
||
Ian Rogers
|
938e4ad310 |
perf vendor events intel: Update sapphirerapids to 1.14
Updates were released in:
|
||
Ian Rogers
|
663655c91c |
perf vendor events intel: Update icelakex to 1.21
Updates were released in:
|
||
Ian Rogers
|
d1363b9454 |
perf vendor events intel: Update icelake to 1.19
Updates were released in:
|
||
Ian Rogers
|
0c9e39421c |
perf vendor events intel: Update cascadelakex to 1.19
Updates were released in:
|
||
Ian Rogers
|
dfc83cc877 |
perf vendor events intel: Update meteorlake to 1.03
1.03 events were released in:
|
||
Ian Rogers
|
7e74ece31a |
perf vendor events intel: Add rocketlake events/metrics
Add RocketLake events added to Intel perfmon in:
|
||
Ian Rogers
|
8076dc8c68 |
perf vendor metrics intel: Make transaction metrics conditional
Make the transaction metrics conditional on the cycles-tx event being present. This event may not be present when TSX extensions have been disabled. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623151016.4193660-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
eadc0040a9 |
perf jevents: Support for has_event function
Support for the new has_event function in metrics. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623151016.4193660-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
4a4a9bf907 |
perf expr: Add has_event function
Some events are dependent on firmware/kernel enablement. Allow such events to be detected when the metric is parsed so that the metric's event parsing doesn't fail. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623151016.4193660-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Namhyung Kim
|
36cee69f57 |
perf tools: Do not remove addr_location.thread in thread__find_map()
The thread__find_map() is to find a map for a given address in the
given thread's address space. It searches maps based on the cpu mode
and fills various information in the addr_location data structure.
It might change al->maps and al->map, but not al->thread. Then I
think no reason to not set the al->thread at the beginning.
Also get rid of the duplicate 'al->map = NULL' part.
Fixes:
|
||
Linus Torvalds
|
3a8a670eee |
Networking changes for 6.5.
Core ---- - Rework the sendpage & splice implementations. Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is. Remove the MSG_SENDPAGE_NOTLAST flag completely. - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid. - Enable socket busy polling with CONFIG_RT. - Improve reliability and efficiency of reporting for ref_tracker. - Auto-generate a user space C library for various Netlink families. Protocols --------- - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2]. - Use per-VMA locking for "page-flipping" TCP receive zerocopy. - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags. - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative. - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO). - Avoid waking up applications using TLS sockets until we have a full record. - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring. - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address. - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch. - PPPoE: make number of hash bits configurable. - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig). - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge). - Support matching on Connectivity Fault Management (CFM) packets. - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug. - HSR: don't enable promiscuous mode if device offloads the proto. - Support active scanning in IEEE 802.15.4. - Continue work on Multi-Link Operation for WiFi 7. BPF --- - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators. - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything. - Accept dynptr memory as memory arguments passed to helpers. - Add routing table ID to bpf_fib_lookup BPF helper. - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands. - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only). - Show target_{obj,btf}_id in tracing link fdinfo. - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter --------- - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value. - Increase ip_vs_conn_tab_bits range for 64BIT builds. - Allow updating size of a set. - Improve NAT tuple selection when connection is closing. Driver API ---------- - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out). - Support configuring rate selection pins of SFP modules. - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines. - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer. - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio). - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message. - Split devlink instance and devlink port operations. New hardware / drivers ---------------------- - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers ------- - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3 MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07 IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/ fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7 yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/ JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4= =9i4I -----END PGP SIGNATURE----- Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" * tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ... |
||
Ian Rogers
|
628eaa4e87 |
perf pmus: Add placeholder core PMU
If loading a core PMU fails, legacy hardware/cache events may segv due to there being no PMU. Create a placeholder empty PMU for this case. This was discussed in: https://lore.kernel.org/lkml/20230614151625.2077-1-yangjihong1@huawei.com/ Reported-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230627182834.117565-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |