7205 Commits

Author SHA1 Message Date
Ingo Molnar
e855e80d00 Linux 5.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFRBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBhB7AeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGCPUH+KKkSoOlN2YNu1oc
 iy2nznwZoSQTk5ZLz7PypO/WWmmtgzudkObG7yqIURdrncsAkHR17Wu2P7rdBr1j
 Ma+VhF9MQ+xx+r86upH7c3gYfhyfdUMvzuLy0rwLQ1Yrzrb7xFcVkj3BHk54TAQA
 w05sRPuVJ3/c/HPYV2iXkkdnnMbXSTCebeDDwjFb9D3qagr4vcd/PjDHmGbfNF8R
 o6gLpbK5Ly6ww1nth9gGGUjzrW95yVItvcroP6vQWljxhuy+NE1lXRm8LsGhxqtW
 foFFptJup5nhSNJXWtQt/U3huVD6mZ3W3y9cOThPjXZRy2wva3I1IpBKoEFReUpG
 /Tq8EA==
 =tPUY
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc5' into WIP.x86/core, to pick up recent NOP related changes

In particular we want to have this upstream commit:

  b90829704780: ("bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG")

... before merging in x86/cpu changes and the removal of the NOP optimizations, and
applying PeterZ's !retpoline objtool series.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-04-02 12:33:16 +02:00
Fabian Hemmer
292c5ed168 perf tools: Preserve identifier id in OCaml demangler
Some OCaml developers reported that this bit of information is sometimes
useful for disambiguating functions for which the OCaml compiler assigns
the same name, e.g. nested or inlined functions.

Signed-off-by: Fabian Hemmer <copy@copy.sh>
Link: http://lore.kernel.org/lkml/20210226075223.p3s5oz4jbxwnqjtv@nyu
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-30 12:45:59 -03:00
Arnaldo Carvalho de Melo
b0a752d43b Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes sent via perf/urgent and in the BPF tools/ directories.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-29 10:39:10 -03:00
Athira Rajeev
50fa3a531e perf sort: Display sort dimension p_stage_cyc only on supported archs
The sort dimension "p_stage_cyc" is used to represent pipeline
stage cycle information. Presently, this is used only in powerpc.

For unsupported platforms, we don't want to display it
in the perf report output columns. Hence add check in sort_dimension__add()
and skip the sort key incase it is not applicable for the particular arch.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/1616425047-1666-6-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-26 08:50:00 -03:00
Athira Rajeev
06e5ca746c perf tools: Support pipeline stage cycles for powerpc
The pipeline stage cycles details can be recorded on powerpc from the
contents of Performance Monitor Unit (PMU) registers. On ISA v3.1
platform, sampling registers exposes the cycles spent in different
pipeline stages. Patch adds perf tools support to present two of the
cycle counter information along with memory latency (weight).

Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
This is stored in 'var2_w' field of 'perf_sample_weight'.

Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
which is stored in 'var3_w' field of perf_sample_weight.

Add new sort function 'Pipeline Stage Cycle' and include this in
default_mem_sort_order[]. This new sort function may be used to denote
some other pipeline stage in another architecture. So add this to list
of sort entries that can have dynamic header string.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-26 08:49:54 -03:00
Athira Rajeev
0a606822c4 perf sort: Add dynamic headers for perf report columns
Currently the header string for different columns in perf report is
fixed. Some fields of perf sample could have different meaning for
different architectures than the meaning conveyed by the header string.

An example is the new field 'var2_w' of perf_sample_weight structure.
This is presently captured as 'Local INSTR Latency' in perf mem report.
But this could be used to denote a different latency cycle in another
architecture.

Introduce a weak function arch_perf_header_entry() to set the arch
specific header string for the fields which can contain dynamic header.
If the architecture do not have this function, fall back to the default
header string value.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/1616425047-1666-3-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-26 08:49:27 -03:00
Wan Jiabing
405e07010d perf tools: Remove duplicate struct forward declarations
'struct evlist' has been declared at 10th line.

'struct comm' has been declared at 15th line.

Remove the duplicates

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kael_w@yeah.net
Link: http://lore.kernel.org/lkml/20210325043947.846093-1-wanjiabing@vivo.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-25 08:59:10 -03:00
Namhyung Kim
41d5854113 perf record: Fix memory leak in vDSO found using ASAN
I got several memory leak reports from Asan with a simple command.  It
was because VDSO is not released due to the refcount.  Like in
__dsos_addnew_id(), it should put the refcount after adding to the list.

  $ perf record true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]

  =================================================================
  ==692599==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
    #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
    #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
    #4 0x559bce50826c in map__new util/map.c:175
    #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #10 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #11 0x559bce519bea in perf_session__process_events util/session.c:2297
    #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  Indirect leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
    #2 0x559bce50821b in map__new util/map.c:168
    #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #8 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #9 0x559bce519bea in perf_session__process_events util/session.c:2297
    #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-24 10:38:56 -03:00
Jin Yao
0bdad97801 perf stat: Align CSV output for summary mode
The 'perf stat' subcommand supports the request for a summary of the
interval counter readings.  But the summary lines break the CSV output
so it's hard for scripts to parse the result.

Before:

  # perf stat -x, -I1000 --interval-count 1 --summary
       1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
       1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
       1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
       1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
       1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
       1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
       1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
       1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
  8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
  270,,context-switches,8013513297,100.00,0.034,K/sec
  13,,cpu-migrations,8013530032,100.00,0.002,K/sec
  184,,page-faults,8013546992,100.00,0.023,K/sec
  20574191,,cycles,8013551506,100.00,0.003,GHz
  10562267,,instructions,8013564958,100.00,0.51,insn per cycle
  2019244,,branches,8013575673,100.00,0.252,M/sec
  106152,,branch-misses,8013585776,100.00,5.26,of all branches

The summary line loses the timestamp column, which breaks the CSV
output.

We add a column at the original 'timestamp' position and it just says
'summary' for the summary line.

After:

  # perf stat -x, -I1000 --interval-count 1 --summary
       1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
       1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
       1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
       1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
       1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
       1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
       1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
       1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
           summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
           summary,218,,context-switches,8012753271,100.00,0.027,K/sec
           summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
           summary,0,,page-faults,8012786257,100.00,0.000,K/sec
           summary,15004518,,cycles,8012790637,100.00,0.002,GHz
           summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
           summary,1590259,,branches,8012814766,100.00,0.198,M/sec
           summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches

Now it's easy for script to analyse the summary lines.

Of course, we also consider not to break possible existing scripts which
can continue to use the broken CSV format by using a new '--no-csv-summary.'
option.

  # perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary
       1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized
       1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
       1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
       1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
       1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
       1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle
       1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
       1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches
  8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
  197,,context-switches,8012703742,100.00,24.586,/sec
  9,,cpu-migrations,8012720902,100.00,1.123,/sec
  644,,page-faults,8012738266,100.00,80.373,/sec
  18350698,,cycles,8012744109,100.00,0.002,GHz
  12745021,,instructions,8012759001,100.00,0.69,insn per cycle
  2458033,,branches,8012770864,100.00,306.768,K/sec
  102107,,branch-misses,8012781751,100.00,4.15,of all branches

This option can be enabled in perf config by setting the variable
'stat.no-csv-summary'.

  # perf config stat.no-csv-summary=true

  # perf config -l
  stat.no-csv-summary=true

  # perf stat -x, -I1000 --interval-count 1 --summary
       1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized
       1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
       1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
       1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
       1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
       1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle
       1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
       1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches
  8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
  205,,context-switches,8013308394,100.00,25.583,/sec
  10,,cpu-migrations,8013324681,100.00,1.248,/sec
  0,,page-faults,8013340926,100.00,0.000,/sec
  8027742,,cycles,8013344503,100.00,0.001,GHz
  2871717,,instructions,8013356501,100.00,0.36,insn per cycle
  553564,,branches,8013366204,100.00,69.081,K/sec
  54021,,branch-misses,8013375952,100.00,9.76,of all branches

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-24 10:21:49 -03:00
Song Liu
7fac83aaf2 perf stat: Introduce 'bperf' to share hardware PMCs with BPF
The perf tool uses performance monitoring counters (PMCs) to monitor
system performance. The PMCs are limited hardware resources. For
example, Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.

Modern data center systems use these PMCs in many different ways: system
level monitoring, (maybe nested) container level monitoring, per process
monitoring, profiling (in sample mode), etc. In some cases, there are
more active perf_events than available hardware PMCs. To allow all
perf_events to have a chance to run, it is necessary to do expensive
time multiplexing of events.

On the other hand, many monitoring tools count the common metrics
(cycles, instructions). It is a waste to have multiple tools create
multiple perf_events of "cycles" and occupy multiple PMCs.

bperf tries to reduce such wastes by allowing multiple perf_events of
"cycles" or "instructions" (at different scopes) to share PMUs. Instead
of having each perf-stat session to read its own perf_events, bperf uses
BPF programs to read the perf_events and aggregate readings to BPF maps.
Then, the perf-stat session(s) reads the values from these BPF maps.

Please refer to the comment before the definition of bperf_ops for the
description of bperf architecture.

bperf is off by default. To enable it, pass --bpf-counters option to
perf-stat. bperf uses a BPF hashmap to share information about BPF
programs and maps used by bperf. This map is pinned to bpffs. The
default path is /sys/fs/bpf/perf_attr_map. The user could change the
path with option --bpf-attr-map.

Committer testing:

  # dmesg|grep "Performance Events" -A5
  [    0.225277] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
  [    0.225280] ... version:                0
  [    0.225280] ... bit width:              48
  [    0.225281] ... generic registers:      6
  [    0.225281] ... value mask:             0000ffffffffffff
  [    0.225281] ... max period:             00007fffffffffff
  #
  #  for a in $(seq 6) ; do perf stat -a -e cycles,instructions sleep 100000 & done
  [1] 2436231
  [2] 2436232
  [3] 2436233
  [4] 2436234
  [5] 2436235
  [6] 2436236
  # perf stat -a -e cycles,instructions sleep 0.1

   Performance counter stats for 'system wide':

         310,326,987      cycles                                                        (41.87%)
         236,143,290      instructions              #    0.76  insn per cycle           (41.87%)

         0.100800885 seconds time elapsed

  #

We can see that the counters were enabled for this workload 41.87% of
the time.

Now with --bpf-counters:

  #  for a in $(seq 32) ; do perf stat --bpf-counters -a -e cycles,instructions sleep 100000 & done
  [1] 2436514
  [2] 2436515
  [3] 2436516
  [4] 2436517
  [5] 2436518
  [6] 2436519
  [7] 2436520
  [8] 2436521
  [9] 2436522
  [10] 2436523
  [11] 2436524
  [12] 2436525
  [13] 2436526
  [14] 2436527
  [15] 2436528
  [16] 2436529
  [17] 2436530
  [18] 2436531
  [19] 2436532
  [20] 2436533
  [21] 2436534
  [22] 2436535
  [23] 2436536
  [24] 2436537
  [25] 2436538
  [26] 2436539
  [27] 2436540
  [28] 2436541
  [29] 2436542
  [30] 2436543
  [31] 2436544
  [32] 2436545
  #
  # ls -la /sys/fs/bpf/perf_attr_map
  -rw-------. 1 root root 0 Mar 23 14:53 /sys/fs/bpf/perf_attr_map
  # bpftool map | grep bperf | wc -l
  64
  #

  # bpftool map | tail
  1265: percpu_array  name accum_readings  flags 0x0
  	key 4B  value 24B  max_entries 1  memlock 4096B
  1266: hash  name filter  flags 0x0
  	key 4B  value 4B  max_entries 1  memlock 4096B
  1267: array  name bperf_fo.bss  flags 0x400
  	key 4B  value 8B  max_entries 1  memlock 4096B
  	btf_id 996
  	pids perf(2436545)
  1268: percpu_array  name accum_readings  flags 0x0
  	key 4B  value 24B  max_entries 1  memlock 4096B
  1269: hash  name filter  flags 0x0
  	key 4B  value 4B  max_entries 1  memlock 4096B
  1270: array  name bperf_fo.bss  flags 0x400
  	key 4B  value 8B  max_entries 1  memlock 4096B
  	btf_id 997
  	pids perf(2436541)
  1285: array  name pid_iter.rodata  flags 0x480
  	key 4B  value 4B  max_entries 1  memlock 4096B
  	btf_id 1017  frozen
  	pids bpftool(2437504)
  1286: array  flags 0x0
  	key 4B  value 32B  max_entries 1  memlock 4096B
  #
  # bpftool map dump id 1268 | tail
  value (CPU 21):
  8f f3 bc ca 00 00 00 00  80 fd 2a d1 4d 00 00 00
  80 fd 2a d1 4d 00 00 00
  value (CPU 22):
  7e d5 64 4d 00 00 00 00  a4 8a 2e ee 4d 00 00 00
  a4 8a 2e ee 4d 00 00 00
  value (CPU 23):
  a7 78 3e 06 01 00 00 00  b2 34 94 f6 4d 00 00 00
  b2 34 94 f6 4d 00 00 00
  Found 1 element
  # bpftool map dump id 1268 | tail
  value (CPU 21):
  c6 8b d9 ca 00 00 00 00  20 c6 fc 83 4e 00 00 00
  20 c6 fc 83 4e 00 00 00
  value (CPU 22):
  9c b4 d2 4d 00 00 00 00  3e 0c df 89 4e 00 00 00
  3e 0c df 89 4e 00 00 00
  value (CPU 23):
  18 43 66 06 01 00 00 00  5b 69 ed 83 4e 00 00 00
  5b 69 ed 83 4e 00 00 00
  Found 1 element
  # bpftool map dump id 1268 | tail
  value (CPU 21):
  f2 6e db ca 00 00 00 00  92 67 4c ba 4e 00 00 00
  92 67 4c ba 4e 00 00 00
  value (CPU 22):
  dc 8e e1 4d 00 00 00 00  d9 32 7a c5 4e 00 00 00
  d9 32 7a c5 4e 00 00 00
  value (CPU 23):
  bd 2b 73 06 01 00 00 00  7c 73 87 bf 4e 00 00 00
  7c 73 87 bf 4e 00 00 00
  Found 1 element
  #

  # perf stat --bpf-counters -a -e cycles,instructions sleep 0.1

   Performance counter stats for 'system wide':

       119,410,122      cycles
       152,105,479      instructions              #    1.27  insn per cycle

       0.101395093 seconds time elapsed

  #

See? We had the counters enabled all the time.

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20210316211837.910506-2-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23 17:46:44 -03:00
Ingo Molnar
4d39c89f0b perf tools: Fix various typos in comments
Fix ~124 single-word typos and a few spelling errors in the perf tooling code,
accumulated over the years.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com
Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23 17:13:43 -03:00
Jackie Liu
1a096ae46e perf top: Fix BPF support related crash with perf_event_paranoid=3 + kptr_restrict
After installing the libelf-dev package and compiling perf, if we have
kptr_restrict=2 and perf_event_paranoid=3 'perf top' will crash because
the value of /proc/kallsyms cannot be obtained, which leads to
info->jited_ksyms == NULL. In order to solve this problem, Add a
check before use.

Also plug some leaks on the error path.

Suggested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: jackie liu <liuyun01@kylinos.cn>
Link: http://lore.kernel.org/lkml/20210316012453.1156-1-liuyun01@kylinos.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-16 10:01:44 -03:00
Changbin Du
6859bc0e78 perf stat: Improve readability of shadow stats
This adds function convert_unit_double() and selects appropriate
unit for shadow stats between K/M/G.

  $ sudo perf stat -a -- sleep 1

Before: Unit 'M' is selected even the number is very small.

 Performance counter stats for 'system wide':

          4,003.06 msec cpu-clock                 #    3.998 CPUs utilized
            16,179      context-switches          #    0.004 M/sec
               161      cpu-migrations            #    0.040 K/sec
             4,699      page-faults               #    0.001 M/sec
     6,135,801,925      cycles                    #    1.533 GHz                      (83.21%)
     5,783,308,491      stalled-cycles-frontend   #   94.26% frontend cycles idle     (83.21%)
     4,543,694,050      stalled-cycles-backend    #   74.05% backend cycles idle      (66.49%)
     4,720,130,587      instructions              #    0.77  insn per cycle
                                                  #    1.23  stalled cycles per insn  (83.28%)
       753,848,078      branches                  #  188.318 M/sec                    (83.61%)
        37,457,747      branch-misses             #    4.97% of all branches          (83.48%)

       1.001283725 seconds time elapsed

After:

$ sudo perf stat -a -- sleep 2

 Performance counter stats for 'system wide':

          8,005.52 msec cpu-clock                 #    3.999 CPUs utilized
            10,715      context-switches          #    1.338 K/sec
               785      cpu-migrations            #   98.057 /sec
               102      page-faults               #   12.741 /sec
     1,948,202,279      cycles                    #    0.243 GHz
     2,816,470,932      stalled-cycles-frontend   #  144.57% frontend cycles idle
     2,661,172,207      stalled-cycles-backend    #  136.60% backend cycles idle
       464,172,105      instructions              #    0.24  insn per cycle
                                                  #    6.07  stalled cycles per insn
        91,567,662      branches                  #   11.438 M/sec
         7,756,054      branch-misses             #    8.47% of all branches

       2.002040043 seconds time elapsed

v2:
  o do not change 'sec' to 'cpu-sec'.
  o use convert_unit_double to implement convert_unit.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210315143047.3867-1-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-15 11:36:54 -03:00
Arnaldo Carvalho de Melo
a7672d1df5 perf evlist: Change the COMM when preparing the workload
It was reported that --exclude-perf wasn't working, as tracepoints were
appearing in 'perf script' output as having the 'perf' COMM, that is
just the window in evlist__prepare_workload() after the fork() and
before the execvp() call for workloads specified in the command line.

Example:

  # perf record -e kmem:kmalloc --filter 'bytes_alloc<650 && bytes_alloc>620' --exclude-perf -e kmem:kfree --exclude-perf -aR sleep 30

Then:

  # perf script
          perf 15905 [009] 1498.356094: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
          perf 15905 [009] 1498.356116: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
          perf 15905 [009] 1498.356116: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00
          perf 15905 [009] 1498.356138: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
          perf 15905 [009] 1498.356148: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
          perf 15905 [009] 1498.356148: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00
          perf 15905 [009] 1498.356168: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
          perf 15905 [009] 1498.356176: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
  <SNIP>
          perf 15905 [009] 1498.356348: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
          perf 15905 [014] 1498.356386: kmem:kfree: call_site=security_compute_sid.part.0+0x3b2 ptr=(nil)
          perf 15905 [014] 1498.356423: kmem:kfree: call_site=load_elf_binary+0x207 ptr=0xffff9cf5b2a34220
          perf 15905 [014] 1498.356694: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf6d0b3b000
         sleep 15905 [014] 1498.356739: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)

Use prctl() to show that that is just the preparation of the workload:

  # perf script
     perf-exec 19036 [009] 2199.357582: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
     perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
     perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf786459800
     perf-exec 19036 [009] 2199.357630: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
  <SNIP>
     perf-exec 19036 [000] 2199.358277: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786fb9c00
     perf-exec 19036 [000] 2199.358278: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458200
     perf-exec 19036 [000] 2199.358279: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458600
         sleep 19036 [000] 2199.358316: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
         sleep 19036 [000] 2199.358323: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
         sleep 19036 [000] 2199.358330: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
         sleep 19036 [000] 2199.358337: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
         sleep 19036 [000] 2199.358339: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
         sleep 19036 [000] 2199.358341: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000

Reporter: zhanweiw <wingfancy@hotmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212213
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-15 10:13:22 -03:00
Jin Yao
e40647762f perf pmu: Validate raw event with sysfs exported format bits
A raw PMU event (eventsel+umask) in the form of rNNN is supported
by perf but lacks of checking for the validity of raw encoding.

For example, bit 16 and bit 17 are not valid on KBL but perf doesn't
report warning when encoding with these bits.

Before:

  # ./perf stat -e cpu/r031234/ -a -- sleep 1

   Performance counter stats for 'system wide':

                   0      cpu/r031234/

         1.003798924 seconds time elapsed

It may silently measure the wrong event!

The kernel supported bits have been exported through
/sys/devices/<pmu>/format/. Perf collects the information to
'struct perf_pmu_format' and links it to 'pmu->format' list.

The 'struct perf_pmu_format' has a bitmap which records the
valid bits for this format. For example,

  root@kbl-ppc:/sys/devices/cpu/format# cat umask
  config:8-15

The valid bits (bit8-bit15) are recorded in bitmap of format 'umask'.

We collect total valid bits of all formats, save to a local variable
'masks' and reverse it. Now '~masks' represents total invalid bits.

bits = config & ~masks;

The set bits in 'bits' indicate the invalid bits used in config.
Finally we use bitmap_scnprintf to report the invalid bits.

Some architectures may not export supported bits through sysfs,
so if masks is 0, perf_pmu__warn_invalid_config directly returns.

After:

Single event without name:

  # ./perf stat -e cpu/r031234/ -a -- sleep 1
  WARNING: event 'N/A' not valid (bits 16-17 of config '31234' not supported by kernel)!

   Performance counter stats for 'system wide':

                   0      cpu/r031234/

         1.001597373 seconds time elapsed

Multiple events with names:

  # ./perf stat -e cpu/rf01234,name=aaa/,cpu/r031234,name=bbb/ -a -- sleep 1
  WARNING: event 'aaa' not valid (bits 20,22 of config 'f01234' not supported by kernel)!
  WARNING: event 'bbb' not valid (bits 16-17 of config '31234' not supported by kernel)!

   Performance counter stats for 'system wide':

                   0      aaa
                   0      bbb

         1.001573787 seconds time elapsed

Warnings are reported for invalid bits.

Co-developed-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210310051138.12154-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-15 10:12:02 -03:00
Borislav Petkov
62660b0fd2 tools/perf: Convert to insn_decode()
Simplify code, no functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lkml.kernel.org/r/20210304174237.31945-20-bp@alien8.de
2021-03-15 12:41:26 +01:00
Ian Rogers
2a76f6de07 perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
Account for alignment bytes in the zero-ing memset.

Fixes: 1a853e36871b533c ("perf record: Allow specifying a pid to record")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210309234945.419254-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-10 09:20:59 -03:00
Thomas Richter
c3d59cfde9 perf synthetic-events: Fix uninitialized 'kernel_thread' variable
perf build fails on 5.12.0rc2 on s390 with this error message:

util/synthetic-events.c: In function
				‘__event__synthesize_thread.part.0.isra’:
util/synthetic-events.c:787:19: error: ‘kernel_thread’ may be
    used uninitialized in this function [-Werror=maybe-uninitialized]
    787 |   if (_pid == pid && !kernel_thread) {
        |       ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~

The build succeeds using command 'make DEBUG=y'.

The variable kernel_thread is set by this function sequence:

__event__synthesize_thread()
|    defines bool kernel_thread; as local variable and calls
+--> perf_event__prepare_comm(..., &kernel_thread)
     +--> perf_event__get_comm_ids(..., bool *kernel);
          On return of this function variable kernel is always
          set to true or false.

To prevent this compile error, assign variable kernel_thread
a value when it is defined.

Output after:

  [root@m35lp76 perf]# make  util/synthetic-events.o
  ....
   CC       util/synthetic-events.o
  [root@m35lp76 perf]#

Fixes: c1b907953b2cd9ff ("perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20210309110447.834292-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-10 09:16:47 -03:00
Adrian Hunter
b410ed2a85 perf auxtrace: Fix auxtrace queue conflict
The only requirement of an auxtrace queue is that the buffers are in
time order.  That is achieved by making separate queues for separate
perf buffer or AUX area buffer mmaps.

That generally means a separate queue per cpu for per-cpu contexts, and
a separate queue per thread for per-task contexts.

When buffers are added to a queue, perf checks that the buffer cpu and
thread id (tid) match the queue cpu and thread id.

However, generally, that need not be true, and perf will queue buffers
correctly anyway, so the check is not needed.

In addition, the check gets erroneously hit when using sample mode to
trace multiple threads.

Consequently, fix that case by removing the check.

Fixes: e502789302a6 ("perf auxtrace: Add helpers for queuing AUX area tracing data")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210308151143.18338-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-10 09:16:47 -03:00
Jiapeng Chong
83ff0f93b0 perf machine: Assign boolean values to a bool variable
Fix the following coccicheck warnings:

./tools/perf/util/machine.c:2041:9-10: WARNING: return of 0/1 in
function 'symbol__match_regex' with return type bool.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/1615284669-82139-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-09 09:09:45 -03:00
Arnaldo Carvalho de Melo
905203411d perf stat: Fixup __perf_stat_evsel__is() prefix
This is a perf_stat_evsel method, so should have that as its prefix,
previously it was swapped as __perf_evsel_stat__is().

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-09 09:03:40 -03:00
Arnaldo Carvalho de Melo
210e4c89ef perf symbols: Fix dso__fprintf_symbols_by_name() to return the number of printed chars
The 'ret' variable was initialized to zero but then it was not updated
from the fprintf() return, fix it.

Reported-by: Yang Li <yang.lee@linux.alibaba.com>
cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
cc: Ingo Molnar <mingo@redhat.com>
cc: Jiri Olsa <jolsa@redhat.com>
cc: Mark Rutland <mark.rutland@arm.com>
cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Fixes: 90f18e63fbd00513 ("perf symbols: List symbols in a dso in ascending name order")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-08 11:17:51 -03:00
Arnaldo Carvalho de Melo
009ef05f98 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up the fixes sent for v5.12 and continue development based on
v5.12-rc2, i.e. without the swap on file bug.

This also gets a slightly newer and better tools/perf/arch/arm/util/cs-etm.c
patch version, using the BIT() macro, that had already been slated to
v5.13 but ended up going to v5.12-rc1 on an older version.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-08 10:11:33 -03:00
Arnaldo Carvalho de Melo
77d02bd00c perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
Noticed on a debian:experimental mips and mipsel cross build build
environment:

  perfbuilder@ec265a086e9b:~$ mips-linux-gnu-gcc --version | head -1
  mips-linux-gnu-gcc (Debian 10.2.1-3) 10.2.1 20201224
  perfbuilder@ec265a086e9b:~$

    CC       /tmp/build/perf/util/map.o
  util/map.c: In function 'map__new':
  util/map.c:109:5: error: '%s' directive output may be truncated writing between 1 and 2147483645 bytes into a region of size 4096 [-Werror=format-truncation=]
    109 |    "%s/platforms/%s/arch-%s/usr/lib/%s",
        |     ^~
  In file included from /usr/mips-linux-gnu/include/stdio.h:867,
                   from util/symbol.h:11,
                   from util/map.c:2:
  /usr/mips-linux-gnu/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk' output 32 or more bytes (assuming 4294967321) into a destination of size 4096
     67 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
        |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     68 |        __bos (__s), __fmt, __va_arg_pack ());
        |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Since we have the lenghts for what lands in that place, use it to give
the compiler more info and make it happy.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:32 -03:00
Ravi Bangoria
6740a4e70e perf report: Fix -F for branch & mem modes
perf report fails to add valid additional fields with -F when
used with branch or mem modes. Fix it.

Before patch:

  $ perf record -b
  $ perf report -b -F +srcline_from --stdio
  Error:
  Invalid --fields key: `srcline_from'

After patch:

  $ perf report -b -F +srcline_from --stdio
  # Samples: 8K of event 'cycles'
  # Event count (approx.): 8784
  ...

Committer notes:

There was an inversion: when looking at branch stack dimensions (keys)
it was checking if the sort mode was 'mem', not 'branch'.

Fixes: aa6b3c99236b ("perf report: Make -F more strict like -s")
Reported-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210304062958.85465-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:32 -03:00
Namhyung Kim
513068f2b1 perf stat: Fix use-after-free when -r option is used
I got a segfault when using -r option with event groups.  The option
makes it run the workload multiple times and it will reuse the evlist
and evsel for each run.

While most of resources are allocated and freed properly, the id hash
in the evlist was not and it resulted in the bug.  You can see it with
the address sanitizer like below:

  $ perf stat -r 100 -e '{cycles,instructions}' true
  =================================================================
  ==693052==ERROR: AddressSanitizer: heap-use-after-free on
      address 0x6080000003d0 at pc 0x558c57732835 bp 0x7fff1526adb0 sp 0x7fff1526ada8
  WRITE of size 8 at 0x6080000003d0 thread T0
    #0 0x558c57732834 in hlist_add_head /home/namhyung/project/linux/tools/include/linux/list.h:644
    #1 0x558c57732834 in perf_evlist__id_hash /home/namhyung/project/linux/tools/lib/perf/evlist.c:237
    #2 0x558c57732834 in perf_evlist__id_add /home/namhyung/project/linux/tools/lib/perf/evlist.c:244
    #3 0x558c57732834 in perf_evlist__id_add_fd /home/namhyung/project/linux/tools/lib/perf/evlist.c:285
    #4 0x558c5747733e in store_evsel_ids util/evsel.c:2765
    #5 0x558c5747733e in evsel__store_ids util/evsel.c:2782
    #6 0x558c5730b717 in __run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:895
    #7 0x558c5730b717 in run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:1014
    #8 0x558c5730b717 in cmd_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:2446
    #9 0x558c57427c24 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #10 0x558c572b1a48 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #11 0x558c572b1a48 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #12 0x558c572b1a48 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #13 0x7fcadb9f7d09 in __libc_start_main ../csu/libc-start.c:308
    #14 0x558c572b60f9 in _start (/home/namhyung/project/linux/tools/perf/perf+0x45d0f9)

Actually the nodes in the hash table are struct perf_stream_id and
they were freed in the previous run.  Fix it by resetting the hash.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210225035148.778569-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:31 -03:00
Jin Yao
034f7ee130 perf stat: Fix wrong skipping for per-die aggregation
Uncore becomes die-scope on Xeon Cascade Lake-AP and perf has supported
--per-die aggregation yet.

One issue is found in check_per_pkg() for uncore events running on AP
system. On cascade Lake-AP, we have:

S0-D0
S0-D1
S1-D0
S1-D1

But in check_per_pkg(), S0-D1 and S1-D1 are skipped because the mask
bits for S0 and S1 have been set for S0-D0 and S1-D0. It doesn't check
die_id. So the counting for S0-D1 and S1-D1 are set to zero.  That's not
correct.

  root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
     1.001460963 S0-D0           1            1317376 Bytes llc_misses.mem_read
     1.001460963 S0-D1           1             998016 Bytes llc_misses.mem_read
     1.001460963 S1-D0           1             970496 Bytes llc_misses.mem_read
     1.001460963 S1-D1           1            1291264 Bytes llc_misses.mem_read
     2.003488021 S0-D0           1            1082048 Bytes llc_misses.mem_read
     2.003488021 S0-D1           1            1919040 Bytes llc_misses.mem_read
     2.003488021 S1-D0           1             890752 Bytes llc_misses.mem_read
     2.003488021 S1-D1           1            2380800 Bytes llc_misses.mem_read
     3.005613270 S0-D0           1            1126080 Bytes llc_misses.mem_read
     3.005613270 S0-D1           1            2898176 Bytes llc_misses.mem_read
     3.005613270 S1-D0           1             870912 Bytes llc_misses.mem_read
     3.005613270 S1-D1           1            3388608 Bytes llc_misses.mem_read
     4.007627598 S0-D0           1            1124608 Bytes llc_misses.mem_read
     4.007627598 S0-D1           1            3884416 Bytes llc_misses.mem_read
     4.007627598 S1-D0           1             921088 Bytes llc_misses.mem_read
     4.007627598 S1-D1           1            4451840 Bytes llc_misses.mem_read
     5.001479927 S0-D0           1             963328 Bytes llc_misses.mem_read
     5.001479927 S0-D1           1            4831936 Bytes llc_misses.mem_read
     5.001479927 S1-D0           1             895104 Bytes llc_misses.mem_read
     5.001479927 S1-D1           1            5496640 Bytes llc_misses.mem_read

From above output, we can see S0-D1 and S1-D1 don't report the interval
values, they are continued to grow. That's because check_per_pkg()
wrongly decides to use zero counts for S0-D1 and S1-D1.

So in check_per_pkg(), we should use hashmap(socket,die) to decide if
the cpu counts needs to skip. Only considering socket is not enough.

Now with this patch,

  root@lkp-csl-2ap4 ~# ./perf stat -a -I 1000 -e llc_misses.mem_read --per-die -- sleep 5
     1.001586691 S0-D0           1            1229440 Bytes llc_misses.mem_read
     1.001586691 S0-D1           1             976832 Bytes llc_misses.mem_read
     1.001586691 S1-D0           1             938304 Bytes llc_misses.mem_read
     1.001586691 S1-D1           1            1227328 Bytes llc_misses.mem_read
     2.003776312 S0-D0           1            1586752 Bytes llc_misses.mem_read
     2.003776312 S0-D1           1             875392 Bytes llc_misses.mem_read
     2.003776312 S1-D0           1             855616 Bytes llc_misses.mem_read
     2.003776312 S1-D1           1             949376 Bytes llc_misses.mem_read
     3.006512788 S0-D0           1            1338880 Bytes llc_misses.mem_read
     3.006512788 S0-D1           1             920064 Bytes llc_misses.mem_read
     3.006512788 S1-D0           1             877184 Bytes llc_misses.mem_read
     3.006512788 S1-D1           1            1020736 Bytes llc_misses.mem_read
     4.008895291 S0-D0           1             926592 Bytes llc_misses.mem_read
     4.008895291 S0-D1           1             906368 Bytes llc_misses.mem_read
     4.008895291 S1-D0           1             892224 Bytes llc_misses.mem_read
     4.008895291 S1-D1           1             987712 Bytes llc_misses.mem_read
     5.001590993 S0-D0           1             962624 Bytes llc_misses.mem_read
     5.001590993 S0-D1           1             912512 Bytes llc_misses.mem_read
     5.001590993 S1-D0           1             891200 Bytes llc_misses.mem_read
     5.001590993 S1-D1           1             978432 Bytes llc_misses.mem_read

On no-die system, die_id is 0, actually it's hashmap(socket,0), original behavior
is not changed.

Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Huang <ying.huang@intel.com>
Link: http://lore.kernel.org/lkml/20210128013417.25597-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:30 -03:00
Jiri Olsa
84ea603650 perf tools: Fix event's PMU name parsing
Jin Yao reported parser error for software event:

  # perf stat -e software/r1a/ -a -- sleep 1
  event syntax error: 'software/r1a/'
                       \___ parser error

This happens after commit 8c3b1ba0e7ea9a80 ("drm/i915/gt: Track the
overall awake/busy time"), where new software-gt-awake-time event's
non-pmu-event-style makes event parser conflict with software PMU.

If we allow PE_PMU_EVENT_PRE to be parsed as PMU name, we fix the
conflict and the following character '/' for PMU or '-' for
non-pmu-event-style event allows parser to decide what even is
specified.

Fixes: 8c3b1ba0e7ea9a80 ("drm/i915/gt: Track the overall awake/busy time")
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210301122315.63471-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:27 -03:00
Ian Rogers
137a525893 perf traceevent: Ensure read cmdlines are null terminated.
Issue detected by address sanitizer.

Fixes: cd4ceb63438e9e28 ("perf util: Save pid-cmdline mapping into tracing header")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210226221431.1985458-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:26 -03:00
Pierre Gondois
ded2e511a8 perf tools: Cast (struct timeval).tv_sec when printing
The musl-libc [1] defines (struct timeval).tv_sec as a 'long long' for
arm and other architectures. The default build having a '-Wformat' flag,
not casting the field when printing prevents from building perf.

This patch casts the (struct timeval).tv_sec fields to the expected
format.

[1] git://git.musl-libc.org/musl

Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Douglas.raillard@arm.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210224182410.5366-1-Pierre.Gondois@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:24 -03:00
Martin Liska
2777b81b37 perf annotate: Show full source location with 'l' hotkey
Right now, when Line numbers are displayed, one can't easily find a
source file that the line corresponds to.

When a source line is selected and 'l' is pressed, full source file
location is displayed in perf UI footer line. The hotkey works only for
source code lines.

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/25a6384f-d862-5dda-4fec-8f0555599c75@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:42:31 -03:00
Martin Liska
44e176501c perf config: Add annotate.demangle{,_kernel}
Committer notes:

This allows setting this in from the command line:

  $ perf config annotate.demangle
  $ perf config annotate.demangle=yes
  $ perf config annotate.demangle
  annotate.demangle=yes
  $ cat ~/.perfconfig
  # this file is auto-generated.
  [report]
  	sort-order = srcline
  [annotate]
  	demangle = yes
  $
  $
  $ perf config annotate.demangle_kernel
  $ perf config annotate.demangle_kernel=yes
  $ perf config annotate.demangle_kernel
  annotate.demangle_kernel=yes
  $ cat ~/.perfconfig
  # this file is auto-generated.
  [report]
  	sort-order = srcline
  [annotate]
  	demangle = yes
  	demangle_kernel = yes
  $

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/c96aabe7-791f-9503-295f-3147a9d19b60@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:42:31 -03:00
Ian Rogers
509bbd75f7 perf bpf: Minor whitespace cleanup.
Missed space after #include.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210306080840.3785816-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:42:23 -03:00
Ian Rogers
35276a4f05 perf skel: Remove some unused variables.
Fixes -Wall warnings.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210306080840.3785816-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:42:02 -03:00
Suzuki K Poulose
8e1488a46d perf cs-etm: Detect pid in VMID for kernel running at EL2
The PID of the task could be traced as VMID when the kernel is running
at EL2.  Teach the decoder to look for VMID when the CONTEXTIDR (Arm32)
or CONTEXTIDR_EL1 (Arm64) is invalid but we have a valid VMID.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Co-developed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Al Grant <al.grant@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Link: https://lore.kernel.org/r/20210213113220.292229-6-leo.yan@linaro.org
Link: https://lore.kernel.org/r/20210224164835.3497311-7-mathieu.poirier@linaro.org
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-02 09:49:35 -03:00
Leo Yan
47f0d94c20 perf cs-etm: Add helper cs_etm__get_pid_fmt()
This patch adds helper function cs_etm__get_pid_fmt(), by passing
parameter "traceID", it returns the PID format.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Link: https://lore.kernel.org/r/20210213113220.292229-5-leo.yan@linaro.org
Link: https://lore.kernel.org/r/20210224164835.3497311-6-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-02 09:49:16 -03:00
Mike Leach
42b2b570b3 perf cs-etm: Update ETM metadata format
The current fixed metadata version format (version 0), means that adding
metadata parameter items renders files from a previous version of perf
unreadable. Per CPU parameters appear in a fixed order, but there is no
field to indicate the number of ETM parameters per CPU.

This patch updates the per CPU parameter blocks to include a NR_PARAMs
value which indicates the number of parameters in the block.

The header version is incremented to 1. Fixed ordering is retained,
new ETM parameters are added to the end of the list.

The reader code is updated to be able to read current version 0 files,
For version 1, the reader will read the number of parameters in the
per CPU block. This allows the reader to process older or newer files
that may have different numbers of parameters than in use at the
time perf was built.

Signed-off-by: Mike Leach <mike.leach@linaro.org>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20210202214040.32349-1-mike.leach@linaro.org
Link: https://lore.kernel.org/r/20210224164835.3497311-2-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-02 09:46:28 -03:00
Tiezhu Yang
d9fd5a7189 perf tools: Generate mips syscalls_n64.c syscall table
Grab a copy of arch/mips/kernel/syscalls/syscall_n64.tbl and use it to
generate tools/perf/arch/mips/include/generated/asm/syscalls_n64.c file,
this is similar with commit 1b700c997500 ("perf tools: Build syscall
table .c header from kernel's syscall_64.tbl")

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Juxin Gao <gaojuxin@loongson.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: linux-mips@vger.kernel.org
Link: http://lore.kernel.org/lkml/1612409724-3516-4-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-01 14:49:28 -03:00
Tiezhu Yang
b5f184fbdb perf tools: Support MIPS unwinding and dwarf-regs
Map perf APIs (perf_reg_name/get_arch_regstr/unwind__arch_reg_id) with
MIPS specific registers.

[ayan@wavecomp.com: repick this patch for unwinding userstack backtrace
by perf and libunwind on MIPS based CPU.]

[yangtiezhu@loongson.cn: Add sample_reg_masks[] to fix build error,
silence some checkpatch errors and warnings, and also separate the
original patches into two parts (MIPS kernel and perf tools) to merge
easily.]

The original patches:

https://lore.kernel.org/patchwork/patch/1126521/
https://lore.kernel.org/patchwork/patch/1126520/

Committer notes:

Do it as __perf_reg_name() to cope with:

  067012974c8ae31a ("perf tools: Fix arm64 build error with gcc-11")

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Archer Yan <ayan@wavecomp.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jianlin Lv <Jianlin.Lv@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Juxin Gao <gaojuxin@loongson.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Link: http://lore.kernel.org/lkml/1612409724-3516-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Archer Yan <ayan@wavecomp.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-01 14:47:50 -03:00
Nicholas Fraser
3027ce36cc perf buildid-cache: Don't skip 16-byte build-ids
lsdir_bid_tail_filter() ignored any build-id that wasn't exactly 20
bytes. This worked only for SHA-1 build-ids. The build-id for a PE file
is always a 16-byte GUID and ELF files can also have MD5 or UUID
build-ids.

This fix changes the filter to allow build-ids between 16 and 20 bytes.

Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Huw Davies <huw@codeweavers.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Ulrich Czekalla <uczekalla@codeweavers.com>
Link: http://lore.kernel.org/lkml/597788e4-661d-633f-857c-3de700115d02@codeweavers.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:41:40 -03:00
Nicholas Fraser
bff8b3072e perf symbol: Remove redundant libbfd checks
This removes the redundant checks bfd_check_format() and
bfd_target_elf_flavour. They were previously checking different files.

Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Huw Davies <huw@codeweavers.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Ulrich Czekalla <uczekalla@codeweavers.com>
Link: https://lore.kernel.org/r/94758ca1-0031-d7c6-6c6a-900fd77ef695@codeweavers.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:38:31 -03:00
Jianlin Lv
067012974c perf tools: Fix arm64 build error with gcc-11
gcc version: 11.0.0 20210208 (experimental) (GCC)

Following build error on arm64:

.......
In function ‘printf’,
    inlined from ‘regs_dump__printf’ at util/session.c:1141:3,
    inlined from ‘regs__printf’ at util/session.c:1169:2:
/usr/include/aarch64-linux-gnu/bits/stdio2.h:107:10: \
  error: ‘%-5s’ directive argument is null [-Werror=format-overflow=]

107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, \
                __va_arg_pack ());

......
In function ‘fprintf’,
  inlined from ‘perf_sample__fprintf_regs.isra’ at \
    builtin-script.c:622:14:
/usr/include/aarch64-linux-gnu/bits/stdio2.h💯10: \
    error: ‘%5s’ directive argument is null [-Werror=format-overflow=]
  100 |   return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt,
  101 |                         __va_arg_pack ());

cc1: all warnings being treated as errors
.......

This patch fixes Wformat-overflow warnings. Add helper function to
convert NULL to "unknown".

Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: iecedge@gmail.com
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210218031245.2078492-1-Jianlin.Lv@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:24:43 -03:00
Adrian Hunter
19854e45b3 perf intel-pt: Split VM-Entry and VM-Exit branches
Events record a single cpumode so the tools cannot handle a branch from
the host machine to a virtual machine, or vice versa. Split it in two so
that each branch can have a different cpumode.

  E.g.		host ip -> guest ip

  becomes:	host ip -> 0
		      0 -> guest ip

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210218095801.19576-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:15:38 -03:00
Adrian Hunter
695fc45106 perf intel-pt: Adjust sample flags for VM-Exit
Use the change of NR to detect whether an asynchronous branch is a VM-Exit.

Note VM-Entry is determined from the vmlaunch or vmresume instruction,
in which case, sample flags will show "VMentry" even if the VM-Entry fails.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210218095801.19576-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:15:26 -03:00
Adrian Hunter
65faca5ce8 perf intel-pt: Allow for a guest kernel address filter
Handling TIP.PGD for an address filter for a guest kernel is the same as a
host kernel, but user space decoding, and hence address filters, are not
supported.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210218095801.19576-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:15:08 -03:00
Adrian Hunter
6e86bfdc4a perf intel-pt: Support decoding of guest kernel
The guest kernel can be found from any guest thread belonging to the guest
machine. The guest machine is associated with the current host process pid.
An idle thread (pid=tid=0) is created as a vehicle from which to find the
guest kernel map.

Decoding guest user space is not supported.

Synthesized samples just need the cpumode set for the guest.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210218095801.19576-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:14:49 -03:00
Adrian Hunter
3035cb6cbd perf machine: Factor out machine__idle_thread()
Factor out machine__idle_thread() so it can be re-used for guest machines.

A thread is needed to find executable code, even for the guest kernel. To
avoid possible future pid number conflicts, the idle thread can be used.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210218095801.19576-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:14:33 -03:00
Adrian Hunter
fcda5ff711 perf machine: Factor out machines__find_guest()
Factor out machines__find_guest() so it can be re-used.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210218095801.19576-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:14:14 -03:00
Adrian Hunter
80a038860b perf intel-pt: Amend decoder to track the NR flag
The PIP packet NR (non-root) flag indicates whether or not a virtual
machine is being traced (NR=1 => VM). Add support for tracking its value.

In particular note that the PIP packet (outside of PSB+) will be
associated with a TIP packet from which address the NR value takes
effect. At that point, there is a branch from_ip, to_ip with
corresponding from_nr and to_nr.

In the event of VM-Entry failure, there should still PIP and TIP packets
that can be followed in the same way.

Also note that this assumes that a host VMM is not employing VMX controls
that affect Intel PT, e.g. to hide the host from a guest using Intel PT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210218095801.19576-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:13:56 -03:00
Adrian Hunter
90af7555c3 perf intel-pt: Retain the last PIP packet payload as is
Retain the PIP packet payload as is, instead of just the CR3, because it
contains also the VMX NR flag which is needed to track VM-Entry.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210218095801.19576-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:13:46 -03:00