16159 Commits

Author SHA1 Message Date
Namhyung Kim
0993d72467 perf hist: Move histogram related code to hist.h
It's strange that sort.h has the definition of struct hist_entry.  As
sort.h already includes hist.h, let's move the data structure to hist.h.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240411181718.2367948-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-17 12:21:39 -03:00
Namhyung Kim
a5a00497b9 perf annotate-data: Handle RSP if it's not the FB register
In some cases, the stack pointer on x86 (rsp = reg7) is used to point
variables on stack but it's not the frame base register.  Then it
should handle the register like normal registers (IOW not to access
the other stack variables using offset calculation) but it should not
assume it would have a pointer.

Before:
  -----------------------------------------------------------
  find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  no pointer or no type
  check variable "zc" failed (die: 0x7b9580a)
   variable location: base=reg7, offset=0x40
   type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4)

After:
  -----------------------------------------------------------
  find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  found "zc" in scope=3/3 (die: 0x7b957fc) type_offset=0x3c
   variable location: base=reg7, offset=0x40
   type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4)

Note that the type-offset was properly calculated to 0x3c as the
variable starts at 0x40.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16 10:46:55 -03:00
Namhyung Kim
0519fadbbe perf dwarf-aux: Check variable address range properly
In match_var_offset(), it just checked the end address of the variable
with the given offset because it assumed the register holds a pointer
to the data type and the offset starts from the base.

But I found some cases that the stack pointer (rsp = reg7) register is
used to pointer a stack variable while the frame base is maintained by a
different register (rbp = reg6).  In that case, it cannot simply use the
stack pointer as it cannot guarantee that it points to the frame base.
So it needs to check both boundaries of the variable location.

Before:
  -----------------------------------------------------------
  find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  no pointer or no type
  check variable "tss" failed (die: 0x7b95801)
   variable location: base reg7, offset=0x110
   type='struct scm_timestamping_internal' size=0x30 (die:0x7b8c126)

So the current code just checks register number for the non-PC and
non-FB registers and assuming it has offset 0.  But this variable has
offset 0x110 so it should not match to this.

After:
  -----------------------------------------------------------
  find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  no pointer or no type
  check variable "zc" failed (die: 0x7b9580a)
   variable location: base=reg7, offset=0x40
   type='struct tcp_zerocopy_receive' size=0x40 (die:7b947f4)

Now it find the correct variable "zc".  It was located at reg7 + 0x40
and the size if 0x40 which means it should cover [0x40, 0x80).  And the
access was for reg7 + 0x7c so it found the right one.  But it still
failed to use the variable and it would be handled in the next patch.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16 10:46:55 -03:00
Namhyung Kim
645af3fb62 perf dwarf-aux: Check pointer offset when checking variables
In match_var_offset(), it checks the offset range with the target type
only for non-pointer types.  But it also needs to check the pointer
types with the target type.

This is because there can be more than one pointer variable located in
the same register.  Let's look at the following example.  It's looking
up a variable for reg3 at tcp_get_info+0x62.  It found "sk" variable but
it wasn't the right one since it accesses beyond the target type (struct
'sock' in this case) size.

  -----------------------------------------------------------
  find data type for 0x7bc(reg3) at tcp_get_info+0x62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  offset: 1980 is bigger than size: 760
  check variable "sk" failed (die: 0x7b92b2c)
   variable location: reg3
   type='struct sock' size=0x2f8 (die:0x7b63c3a)

Actually there was another variable "tp" in the function and it's
located at the same (reg3) because it's just type-casted like below.

  void tcp_get_info(struct sock *sk, struct tcp_info *info)
  {
      const struct tcp_sock *tp = tcp_sk(sk);
      ...

The 'struct tcp_sock' contains the 'struct sock' at offset 0 so it can
just use the same address as a pointer to tcp_sock.  That means it
should match variables correctly by checking the offset and size.
Actually it cannot distinguish if the offset was smaller than the size
of the original struct sock.  But I think it's fine as they are the same
at that part.

So let's check the target type size and retry if it doesn't match.
Now it succeeded to find the correct variable.

  -----------------------------------------------------------
  find data type for 0x7bc(reg3) at tcp_get_info+0x62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  found "tp" in scope=1/1 (die: 0x7b92b16) type_offset=0x7bc
   variable location: reg3
   type='struct tcp_sock' size=0xa68 (die:0x7b81380)

Fixes: bc10db8eb8955fbc ("perf annotate-data: Support stack variables")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16 10:46:55 -03:00
Namhyung Kim
2bc3cf575a perf annotate-data: Improve debug message with location info
To verify it found the correct variable, let's add the location
expression to the debug message.

  $ perf --debug type-profile annotate --data-type
  ...
  -----------------------------------------------------------
  find data type for 0xaf0(reg15) at schedule+0xeb
  CU for kernel/sched/core.c (die:0x1180523)
  frame base: cfa=0 fbreg=6
  found "rq" in scope=3/4 (die: 0x11b6a00) type_offset=0xaf0
   variable location: reg15
   type='struct rq' size=0xfc0 (die:0x11892e2)
  -----------------------------------------------------------
  find data type for 0x7bc(reg3) at tcp_get_info+0x62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  offset: 1980 is bigger than size: 760
  check variable "sk" failed (die: 0x7b92b2c)
   variable location: reg3
   type='struct sock' size=0x2f8 (die:0x7b63c3a)
  -----------------------------------------------------------
  ...

The first case is fine.  It looked up a data type in r15 with offset of
0xaf0 at schedule+0xeb.  It found the CU die and the frame base info and
the variable "rq" was found in the scope 3/4.  Its location is the r15
register and the type size is 0xfc0 which includes 0xaf0.

But the second case is not good.  It looked up a data type in rbx (reg3)
with offset 0x7bc.  It found a CU and the frame base which is good so
far.  And it also found a variable "sk" but the access offset is bigger
than the type size (1980 vs. 760 or 0x7bc vs. 0x2f8).  The variable has
the right location (reg3) but I need to figure out why it accesses
beyond what it's supposed to.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-2-namhyung@kernel.org
[ Fix the build on 32-bit by casting Dwarf_Word to (long) in pr_debug_location() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16 10:46:08 -03:00
Ian Rogers
988052f4bf perf bench uprobe: Add uretprobe variant of uprobe benchmarks
Name benchmarks with _ret at the end to avoid creating a new set of
benchmarks.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240406040911.1603801-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
459fee7b50 perf bench uprobe: Remove lib64 from libc.so.6 binary path
bpf_program__attach_uprobe_opts will search LD_LIBRARY_PATH and so
specifying `/lib64` is unnecessary and causes failures for libc.so.6
paths like `/lib/x86_64-linux-gnu/libc.so.6`.

Fixes: 7b47623b8cae8149 ("perf bench uprobe trace_printk: Add entry attaching an BPF program that does a trace_printk")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240406040911.1603801-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
2b8c43e768 perf trace beauty: Add shellcheck to scripts
Add shell check to scripts generating perf trace lookup tables. Fix
quoting issue in arch_errno_names.sh.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20240409023216.2342032-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
61ff60aab7 perf util: Add shellcheck to generate-cmdlist.sh
Add shellcheck to generate-cmdlist.sh to avoid basic shell script
mistakes.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20240409023216.2342032-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
ec440763bb perf arch x86: Add shellcheck to build
Add shellcheck for:

  tools/perf/arch/x86/tests/gen-insn-x86-dat.sh
  tools/perf/arch/x86/entry/syscalls/syscalltbl.sh

Address a minor quoting issue.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20240409023216.2342032-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
646e22eb87 perf build: Add shellcheck to tools/perf scripts
Address shell check errors/warnings in perf-archive.sh and
perf-completion.sh.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20240409023216.2342032-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
20b0027ca1 perf list: Escape '\r' in JSON output
Events like for sapphirerapids have '\r' in the uncore descriptions. The
non-escaped versions of this fail JSON validation the the 'perf list'
test.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240410222353.1722840-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
0ffc8fca5c perf dsos: Switch more loops to dsos__for_each_dso()
Switch loops within dsos.c, add a version that isn't locked. Switch
some unlocked loops to hold the read lock.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Ian Rogers
1d6eff9305 perf dso: Move dso functions out of dsos.c
Move dso and dso_id functions to dso.c to match the struct declarations.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Ian Rogers
73f3fea2e1 perf dsos: Introduce dsos__for_each_dso()
To better abstract the dsos internals, introduce dsos__for_each_dso that
does a callback on each dso.

This also means the read lock can be correctly held.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Ian Rogers
f649ed80f3 perf dsos: Tidy reference counting and locking
Move more functionality into dsos.c generally from machine.c, renaming
functions to match their new usage.

The find function is made to always "get" before returning a dso.

Reduce the scope of locks in vdso to match the locking paradigm.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Ian Rogers
83acca9f90 perf dsos: Attempt to better abstract DSOs internals
Move functions from machine and build-id to dsos. Pass 'struct dsos'
rather than internal state.

Rename some functions to better represent which data structure they
operate on.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Adrian Hunter
792bc998ba perf record: Fix debug message placement for test consumption
evlist__config() might mess up the debug output consumed by test
"Test per-thread recording" in "Miscellaneous Intel PT testing".

Move it out from between the debug prints:

  "perf record opening and mmapping events" and
  "perf record done opening and mmapping events"

Fixes: da4062021e0e6da5 ("perf tools: Add debug messages and comments for testing")
Closes: https://lore.kernel.org/linux-perf-users/ZhVfc5jYLarnGzKa@x1/
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240411075447.17306-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
Namhyung Kim
873a83731f perf annotate: Skip DSOs not found
In some data file, I see the following messages repeated.  It seems it
doesn't have DSOs in the system and the dso->binary_type is set to
DSO_BINARY_TYPE__NOT_FOUND.  Let's skip them to avoid the followings.

  No output from objdump  --start-address=0x0000000000000000 --stop-address=0x00000000000000d4  -d --no-show-raw-insn       -C "$1"
  Error running objdump  --start-address=0x0000000000000000 --stop-address=0x0000000000000631  -d --no-show-raw-insn       -C "$1"
  ...

Closes: https://lore.kernel.org/linux-perf-users/15e1a2847b8cebab4de57fc68e033086aa6980ce.camel@yandex.ru/
Reported-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240410185117.1987239-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
Namhyung Kim
6cdd977ec2 perf report: Do not collect sample histogram unnecessarily
The data type profiling alone doesn't need the sample histogram for
functions.  It only needs the histogram for the types.

Let's remove the condition in the report_callback to check if data type
profiling is selected and make sure the annotation has the 'struct
annotated_source' instantiated before calling symbol__disassemble().

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
Namhyung Kim
0bfbe661a2 perf report: Add a menu item to annotate data type in TUI
When the hist entry has the type info, it should be able to display the
annotation browser for the type like in `perf annotate --data-type`.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
Namhyung Kim
2b08f219d5 perf annotate-data: Support event group display in TUI
Like in stdio, it should print all events in a group together.

Committer notes:

Collect it:

  root@number:~# perf record -a -e '{cpu_core/mem-loads,ldlat=30/P,cpu_core/mem-stores/P}'
  ^C[ perf record: Woken up 8 times to write data ]
  [ perf record: Captured and wrote 4.980 MB perf.data (55825 samples) ]
  root@number:~#

Then do it in stdio:

  root@number:~# perf annotate --stdio --data-type

  Annotate type: 'union ' in /usr/lib64/libc.so.6 (1131 samples):
   event[0] = cpu_core/mem-loads,ldlat=30/P
   event[1] = cpu_core/mem-stores/P
  ============================================================================
           Percent     offset       size  field
    100.00  100.00          0         40  union    {
    100.00  100.00          0         40      struct __pthread_mutex_s    __data {
     48.61   23.46          0          4          int     __lock;
      0.00    0.48          4          4          unsigned int    __count;
      6.38   41.32          8          4          int     __owner;
      8.74   34.02         12          4          unsigned int    __nusers;
     35.66    0.26         16          4          int     __kind;
      0.61    0.45         20          2          short int       __spins;
      0.00    0.00         22          2          short int       __elision;
      0.00    0.00         24         16          __pthread_list_t        __list {
      0.00    0.00         24          8              struct __pthread_internal_list*     __prev;
      0.00    0.00         32          8              struct __pthread_internal_list*     __next;
                                                  };
                                              };
      0.00    0.00          0          0      char*       __size;
     48.61   23.94          0          8      long int    __align;
                                          };

Now with TUI before this patch:

  root@number:~# perf annotate --tui --data-type
  Annotate type: 'union ' (790 samples)
      Percent     Offset       Size  Field
       100.00          0         40  union  {
       100.00          0         40      struct __pthread_mutex_s __data {
        48.61          0          4          int  __lock;
         0.00          4          4          unsigned int __count;
         6.38          8          4          int  __owner;
         8.74         12          4          unsigned int __nusers;
        35.66         16          4          int  __kind;
         0.61         20          2          short int    __spins;
         0.00         22          2          short int    __elision;
         0.00         24         16          __pthread_list_t     __list {
         0.00         24          8              struct __pthread_internal_list*  __prev;
         0.00         32          8              struct __pthread_internal_list*  __next;

         0.00          0          0      char*    __size;
        48.61          0          8      long int __align;
                                     };

And now after this patch:

Annotate type: 'union ' (790 samples)
               Percent     Offset       Size  Field
     100.00     100.00          0         40  union  {
     100.00     100.00          0         40      struct __pthread_mutex_s      __data {
      48.61      23.46          0          4          int       __lock;
       0.00       0.48          4          4          unsigned int      __count;
       6.38      41.32          8          4          int       __owner;
       8.74      34.02         12          4          unsigned int      __nusers;
      35.66       0.26         16          4          int       __kind;
       0.61       0.45         20          2          short int __spins;
       0.00       0.00         22          2          short int __elision;
       0.00       0.00         24         16          __pthread_list_t  __list {
       0.00       0.00         24          8              struct __pthread_internal_list*       __prev;
       0.00       0.00         32          8              struct __pthread_internal_list*       __next;
                                                      };
                                                  };
       0.00       0.00          0          0      char* __size;
      48.61      23.94          0          8      long int      __align;
                                              };

On a followup patch the --tui output should have this that is present in
--stdio:

  And the --stdio has all the missing info in TUI:

    Annotate type: 'union ' in /usr/lib64/libc.so.6 (1131 samples):
     event[0] = cpu_core/mem-loads,ldlat=30/P
     event[1] = cpu_core/mem-stores/P

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
Namhyung Kim
d001c7a7f4 perf annotate-data: Add hist_entry__annotate_data_tui()
Support data type profiling output on TUI.

Testing from Arnaldo:

First make sure that the debug information for your workload binaries
in embedded in them by building it with '-g' or install the debuginfo
packages, since our workload is 'find':

  root@number:~# type find
  find is hashed (/usr/bin/find)
  root@number:~# rpm -qf /usr/bin/find
  findutils-4.9.0-5.fc39.x86_64
  root@number:~# dnf debuginfo-install findutils
  <SNIP>
  root@number:~#

Then collect some data:

  root@number:~# echo 1 > /proc/sys/vm/drop_caches
  root@number:~# perf mem record find / > /dev/null
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.331 MB perf.data (3982 samples) ]
  root@number:~#

Finally do data-type annotation with the following command, that will
default, as 'perf report' to the --tui mode, with lines colored to
highlight the hotspots, etc.

  root@number:~# perf annotate --data-type
  Annotate type: 'struct predicate' (58 samples)
      Percent     Offset       Size  Field
       100.00          0        312  struct predicate {
         0.00          0          8      PRED_FUNC        pred_func;
         0.00          8          8      char*    p_name;
         0.00         16          4      enum predicate_type      p_type;
         0.00         20          4      enum predicate_precedence        p_prec;
         0.00         24          1      _Bool    side_effects;
         0.00         25          1      _Bool    no_default_print;
         0.00         26          1      _Bool    need_stat;
         0.00         27          1      _Bool    need_type;
         0.00         28          1      _Bool    need_inum;
         0.00         32          4      enum EvaluationCost      p_cost;
         0.00         36          4      float    est_success_rate;
         0.00         40          1      _Bool    literal_control_chars;
         0.00         41          1      _Bool    artificial;
         0.00         48          8      char*    arg_text;
  <SNIP>

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
Namhyung Kim
9b561be15f perf annotate-data: Add hist_entry__annotate_data_tty()
And move the related code into util/annotate-data.c file.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
Namhyung Kim
d9aedc12d3 perf annotate: Show progress of sample processing
Like 'perf report', it can take a while to process samples.

Show a progress window to inform users how that it is not stuck.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
Namhyung Kim
eb83348863 perf annotate-data: Skip sample histogram for stack canary
It's a pseudo data type and has no field.

Fixes: b3c95109c131fcc9 ("perf annotate-data: Add stack canary type")
Closes: https://lore.kernel.org/lkml/Zhb6jJneP36Z-or0@x1
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
James Clark
7aa8749979 perf tests: Remove dependency on lscpu
This check can be done with uname which is more portable. At the same
time re-arrange it into a standard if statement so that it's more
readable.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Spoorthy S <spoorts2@in.ibm.com>
Link: https://lore.kernel.org/r/20240410103458.813656-5-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
James Clark
df12e21d4e perf map: Remove kernel map before updating start and end addresses
In a debug build there is validation that mmap lists are sorted when
taking a lock. In machine__update_kernel_mmap() the start and end
addresses are updated resulting in an unsorted list before the map is
removed from the list. When the map is removed, the lock is taken which
triggers the validation and the failure:

  $ perf test "object code reading"
  --- start ---
  perf: util/maps.c:88: check_invariants: Assertion `map__start(prev) <= map__start(map)' failed.
  Aborted

Fix it by updating the addresses after removal, but before insertion.
The bug depends on the ordering and type of debug info on the system and
doesn't reproduce everywhere.

Fixes: 659ad3492b913c90 ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Spoorthy S <spoorts2@in.ibm.com>
Link: https://lore.kernel.org/r/20240410103458.813656-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:06 -03:00
James Clark
2dade41a53 perf tests: Apply attributes to all events in object code reading test
PERF_PMU_CAP_EXTENDED_HW_TYPE results in multiple events being opened on
heterogeneous systems. Currently this test only sets its required
attributes on the first event. Not disabling enable_on_exec on the other
events causes the test to fail because the forked objdump processes are
sampled. No tracking event is opened so Perf only knows about its own
mappings causing the objdump samples to give the following error:

  $ perf test -vvv "object code reading"

  Reading object code for memory address: 0xffff9aaa55ec
  thread__find_map failed
  ---- end(-1) ----
  24: Object code reading              : FAILED!

Fixes: 251aa040244a3b17 ("perf parse-events: Wildcard most "numeric" events")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Spoorthy S <spoorts2@in.ibm.com>
Link: https://lore.kernel.org/r/20240410103458.813656-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:05 -03:00
James Clark
256ef072b3 perf tests: Make "test data symbol" more robust on Neoverse N1
To prevent anyone from seeing a test failure appear as a regression and
thinking that it was caused by their code change, insert some noise into
the loop which makes it immune to sampling bias issues (errata 1694299).

The "test data symbol" test can fail with any unrelated change that
shifts the loop into an unfortunate position in the Perf binary which is
almost impossible to debug as the root cause of the test failure.
Ultimately it's caused by the referenced errata.

Fixes: 60abedb8aa902b06 ("perf test: Introduce script for data symbol testing")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Spoorthy S <spoorts2@in.ibm.com>
Link: https://lore.kernel.org/r/20240410103458.813656-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:05 -03:00
Ian Rogers
4b5ee6db2d perf metrics: Remove the "No_group" metric group
Rather than place metrics without a metric group in "No_group" place
them in a a metric group that is their name. Still allow such metrics
to be selected if "No_group" is passed, this change just impacts perf
list.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240403164636.3429091-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:05 -03:00
Namhyung Kim
0235abd89f perf annotate: Get rid of symbol__ensure_annotate()
Now symbol__annotate() is reentrant and it doesn't need to remove
non-instruction lines.  Let's get rid of symbol__ensure_annotate() and
call symbol__annotate() directly.  Also we can use it to get the arch
pointer instead of calling evsel__get_arch() directly.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240405211800.1412920-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:05 -03:00
Namhyung Kim
879ebf3c83 perf annotate-data: Do not delete non-asm lines
For data type profiling, it removed non-instruction lines from the list
of annotation lines.  It was to simplify the implementation dealing with
instructions like to calculate the PC-relative address and to search the
shortest path to the target instruction or basic block.

But it means that it removes all the comments and debug information in
the annotate output like source file name and line numbers.  To support
both code annotation and data type annotation, it'd be better to keep
the non-instruction lines as well.

So this change is to skip those lines during the data type profiling
and to display them in the normal perf annotate output.

No function changes intended (other than having more lines).

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240405211800.1412920-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:05 -03:00
Namhyung Kim
657852135d perf annotate-data: Fix global variable lookup
The recent change in the global variable handling added a bug to miss
setting the return value even if it found a data type.  Also add the
type name in the debug message.

Fixes: 1ebb5e17ef21b492 ("perf annotate-data: Add get_global_var_type()")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240405211800.1412920-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:02:05 -03:00
Namhyung Kim
f3408580ba perf lock contention: Add a missing NULL check
I got a report for a failure in BPF verifier on a recent kernel with
perf lock contention command.  It checks task->sighand->siglock without
checking if sighand is NULL or not.  Let's add one.

  ; if (&curr->sighand->siglock == (void *)lock)
  265: (79) r1 = *(u64 *)(r0 +2624)     ; frame1: R0_w=trusted_ptr_task_struct(off=0,imm=0)
                                        ;         R1_w=rcu_ptr_or_null_sighand_struct(off=0,imm=0)
  266: (b7) r2 = 0                      ; frame1: R2_w=0
  267: (0f) r1 += r2
  R1 pointer arithmetic on rcu_ptr_or_null_ prohibited, null-check it first
  processed 164 insns (limit 1000000) max_states_per_insn 1 total_states 15 peak_states 15 mark_read 5
  -- END PROG LOAD LOG --
  libbpf: prog 'contention_end': failed to load: -13
  libbpf: failed to load object 'lock_contention_bpf'
  libbpf: failed to load BPF skeleton 'lock_contention_bpf': -13
  Failed to load lock-contention BPF skeleton
  lock contention BPF setup failed
  lock contention did not detect any lock contention

Fixes: 1811e82767dcc ("perf lock contention: Track and show siglock with address")
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240409225542.1870999-1-namhyung@kernel.org
2024-04-11 10:30:06 -07:00
Namhyung Kim
2b8dbf69ec perf annotate: Make sure to call symbol__annotate2() in TUI
The symbol__annotate2() initializes some data structures needed by TUI.
It has a logic to prevent calling it multiple times by checking if it
has the annotated source.  But data type profiling uses a different
code (symbol__annotate) to allocate the annotated lines in advance.
So TUI missed to call symbol__annotate2() when it shows the annotation
browser.

Make symbol__annotate() reentrant and handle that situation properly.
This fixes a crash in the annotation browser started by perf report in
TUI like below.

  $ perf report -s type,sym --tui
  # and press 'a' key and then move down

Fixes: 81e57deec325 ("perf report: Support data type profiling")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240405211800.1412920-2-namhyung@kernel.org
2024-04-11 10:14:58 -07:00
Namhyung Kim
8c004c7a60 perf annotate: Move 'start' field struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual
samples.  No need to consume memory for every symbol ('struct annotation').

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-10-namhyung@kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc:  <linux-perf-users@vger.kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Namhyung Kim
6f94a72d45 perf annotate: Move nr_events struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual
samples.  No need to consume memory for every symbol ('struct annotation').

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Namhyung Kim
f6b18ababa perf annotate: Move 'max_jump_sources' struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with actual
samples.  No need to consume memory for every symbol ('struct annotation').

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Namhyung Kim
a46acc4567 perf annotate: Move 'widths' struct to 'struct annotated_source'
It's only used in 'perf annotate' output which means functions with
actual samples.  No need to consume memory for every symbol
('struct annotation').

Also move the 'max_line_len' field into it as it's related.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Namhyung Kim
cee9b86043 perf annotate: Get rid of offsets array
The struct annotated_source.offsets[] is to save pointers to
annotation_line at each offset.  We can use annotated_source__get_line()
helper instead so let's get rid of the array.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Namhyung Kim
0c053ed273 perf annotate: Check annotation lines more efficiently
In some places, it checks annotated (disasm) lines for each byte.  But
as it already has a list of disasm lines, it'd be better to traverse the
list entries instead of checking every offset with linear search (by
annotated_source__get_line() helper).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Namhyung Kim
6f157d9af1 perf annotate: Introduce annotated_source__get_line()
It's a helper function to get annotation_line at the given offset
without using the offsets array.  The goal is to get rid of the
offsets array altogether.  It just does the linear search but I
think it's better to save memory as it won't be called in a hot
path.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Namhyung Kim
bfd98ceb62 perf annotate: Staticize some local functions
I found annotation__mark_jump_targets(), annotation__set_offsets()
and annotation__init_column_widths() are only used in the same file.
Let's make them static.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Namhyung Kim
aaf494cf48 perf annotate: Fix annotation_calc_lines() to pass correct address to get_srcline()
It should pass a proper address (i.e. suitable for objdump or addr2line)
to get_srcline() in order to work correctly.  It used to pass an address
with map__rip_2objdump() as the second argument but later it's changed
to use notes->start.  It's ok in normal cases but it can be changed when
annotate_opts.full_addr is set.  So let's convert the address directly
instead of using the notes->start.

Also the last argument is an IP to print symbol offset if requested.  So
it should pass symbol-relative address.

Fixes: 7d18a824b5e57ddd ("perf annotate: Toggle full address <-> offset display")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:43:20 -03:00
Adrian Hunter
218c200f67 perf script: Consolidate capstone print functions
Consolidate capstone print functions, to reduce duplication. Amend call
sites to use a file pointer for output, which is consistent with most
perf tools print functions. Add print_opts with an option to print also
the hex value of a resolved symbol+offset.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20240401210925.209671-4-ak@linux.intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Added missing inttypes.h include to use PRIx64 in util/print_insn.c ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08 17:42:27 -03:00
Andi Kleen
d812044688 perf script: Add capstone support for '-F +brstackdisasm'
Support capstone output for the '-F +brstackinsn' branch dump.

The new output is enabled with the new field 'brstackdisasm'.

This was possible before with --xed, but now also allow it for users
that don't have xed using the builtin capstone support.

Before:

  perf record -b emacs -Q --batch '()'
  perf script -F +brstackinsn
  ...
            emacs   55778 1814366.755945:     151564 cycles:P:      7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s>        intel_check_word.constprop.0+237:
          00007f0ab2d1711d        insn: 75 e6                     # PRED 3 cycles [3]
          00007f0ab2d17105        insn: 73 51
          00007f0ab2d17107        insn: 48 89 c1
          00007f0ab2d1710a        insn: 48 39 ca
          00007f0ab2d1710d        insn: 73 96
          00007f0ab2d1710f        insn: 48 8d 04 11
          00007f0ab2d17113        insn: 48 d1 e8
          00007f0ab2d17116        insn: 49 8d 34 c1
          00007f0ab2d1711a        insn: 44 3a 06
          00007f0ab2d1711d        insn: 75 e6                     # PRED 3 cycles [6] 3.00 IPC
          00007f0ab2d17105        insn: 73 51                     # PRED 1 cycles [7] 1.00 IPC
          00007f0ab2d17158        insn: 48 8d 50 01
          00007f0ab2d1715c        insn: eb 92                     # PRED 1 cycles [8] 2.00 IPC
          00007f0ab2d170f0        insn: 48 39 ca
          00007f0ab2d170f3        insn: 73 b0                     # PRED 1 cycles [9] 2.00 IPC

After (perf must be compiled with capstone):

  perf script -F +brstackdisasm

  ...
             emacs   55778 1814366.755945:     151564 cycles:P:      7f0ab2d17192 intel_check_word.constprop.0+0x162 (/usr/lib64/ld-linux-x86-64.s>        intel_check_word.constprop.0+237:
          00007f0ab2d1711d        jne intel_check_word.constprop.0+0xd5   # PRED 3 cycles [3]
          00007f0ab2d17105        jae intel_check_word.constprop.0+0x128
          00007f0ab2d17107        movq %rax, %rcx
          00007f0ab2d1710a        cmpq %rcx, %rdx
          00007f0ab2d1710d        jae intel_check_word.constprop.0+0x75
          00007f0ab2d1710f        leaq (%rcx, %rdx), %rax
          00007f0ab2d17113        shrq $1, %rax
          00007f0ab2d17116        leaq (%r9, %rax, 8), %rsi
          00007f0ab2d1711a        cmpb (%rsi), %r8b
          00007f0ab2d1711d        jne intel_check_word.constprop.0+0xd5   # PRED 3 cycles [6] 3.00 IPC
          00007f0ab2d17105        jae intel_check_word.constprop.0+0x128  # PRED 1 cycles [7] 1.00 IPC
          00007f0ab2d17158        leaq 1(%rax), %rdx
          00007f0ab2d1715c        jmp intel_check_word.constprop.0+0xc0   # PRED 1 cycles [8] 2.00 IPC
          00007f0ab2d170f0        cmpq %rcx, %rdx
          00007f0ab2d170f3        jae intel_check_word.constprop.0+0x75   # PRED 1 cycles [9] 2.00 IPC

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/20240401210925.209671-3-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-05 10:43:07 -03:00
Andi Kleen
38ab60132b perf script: Support 32bit code under 64bit OS with capstone
Use the DSO to resolve whether an IP is 32bit or 64bit and use that to
configure capstone to the correct mode. This allows to correctly
disassemble 32bit code under a 64bit OS.

  % cat > loop.c
  volatile int var;
  int main(void)
  {
  	int i;
  	for (i = 0; i < 100000; i++)
  		var++;
  }
  % gcc -m32 -o loop loop.c
  % perf record -e cycles:u ./loop
  % perf script -F +disasm
    loop   82665 1833176.618023:      1 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
    loop   82665 1833176.618029:      1 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
    loop   82665 1833176.618031:      7 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
    loop   82665 1833176.618034:     91 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax
    loop   82665 1833176.618036:   1242 cycles:u:   f7eed500 _start+0x0 (/usr/lib/ld-linux.so.2)   movl %esp, %eax

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/20240401210925.209671-2-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-05 09:42:36 -03:00
Thomas Richter
c2f3d7dfc7 perf stat: Do not fail on metrics on s390 z/VM systems
On s390 z/VM virtual machines command 'perf list' also displays metrics:

  # perf list | grep -A 20 'Metric Groups:'
  Metric Groups:

  No_group:
   cpi
        [Cycles per Instruction]
   est_cpi
        [Estimated Instruction Complexity CPI infinite Level 1]
   finite_cpi
        [Cycles per Instructions from Finite cache/memory]
   l1mp
        [Level One Miss per 100 Instructions]
   l2p
        [Percentage sourced from Level 2 cache]
   l3p
        [Percentage sourced from Level 3 on same chip cache]
   l4lp
        [Percentage sourced from Level 4 Local cache on same book]
   l4rp
        [Percentage sourced from Level 4 Remote cache on different book]
   memp
        [Percentage sourced from memory]
   ....
  #

The command

  # perf stat -M cpi -- true
  event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/.....'
                        \___ Bad event or PMU

  Unable to find PMU or event on a PMU of 'CPU_CYCLES'

   event syntax error: '{CPU_CYCLES/metric-id=CPU_CYCLES/...'
                        \___ Cannot find PMU `CPU_CYCLES'.
                             Missing kernel support?
 #

fails. 'perf stat' should not fail on metrics when the referenced CPU
Counter Measurement PMU is not available.

Output after:

  # perf stat -M est_cpi -- sleep 1

  Performance counter stats for 'sleep 1':

     1,000,887,494 ns   duration_time   #     0.00 est_cpi

       1.000887494 seconds time elapsed

       0.000143000 seconds user
       0.000662000 seconds sys

 #

Fixes: 7f76b31130680fb3 ("perf list: Add IBM z16 event description for s390")
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20240404064806.1362876-2-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-04 18:10:11 -03:00
Thomas Richter
b74bc5a633 perf report: Fix PAI counter names for s390 virtual machines
s390 introduced the Processor Activity Instrumentation (PAI) counter
facility on LPAR and virtual machines z/VM for models 3931 and 3932.

These counters are stored as raw data in the perf.data file and are
displayed with:

 # perf report -i /tmp//perfout-635468 -D | grep Counter
	Counter:007 <unknown> Value:0x00000000000186a0
	Counter:032 <unknown> Value:0x0000000000000001
	Counter:032 <unknown> Value:0x0000000000000001
	Counter:032 <unknown> Value:0x0000000000000001
 #

However on z/VM virtual machines, the counter names are not retrieved
from the PMU and are shown as '<unknown>'.  This is caused by the CPU
string saved in the mapfile.csv for this machine:

   ^IBM.393[12].*3\.7.[[:xdigit:]]+$,3,cf_z16,core

This string contains the CPU Measurement facility first and second
version number and authorization level (3\.7.[[:xdigit:]]+).  These
numbers do not apply to the PAI counter facility.  In fact they can be
omitted.

Shorten the CPU identification string for this machine to manufacturer
and model. This is sufficient for all PMU devices.

Output after:

 # perf report -i /tmp//perfout-635468 -D | grep Counter
	Counter:007 km_aes_128 Value:0x00000000000186a0
	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
	Counter:032 kma_gcm_aes_256 Value:0x0000000000000001
 #

Fixes: b539deafbadb2fc6 ("perf report: Add s390 raw data interpretation for PAI counters")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20240404064806.1362876-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-04 18:08:21 -03:00