736 Commits

Author SHA1 Message Date
Linus Torvalds
2268735045 - Add support for a couple new insn sets to the insn decoder: AVX512-FP16,
AMX, other misc insns.
 
 - Update VMware-specific MAINTAINERS entries
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmI4URIACgkQEsHwGGHe
 VUob3A/9GFyqt9bBKrSaq9Rt1UVkq6dQhG3kO7dW5d0YDvy8JmR9is4rNDV9GGx6
 A1OAue/gDlZFIz/829oS1qwjB7GZ4Rfb0gRo33bytDLLmd0BRXW7ioZ54jBRnWvy
 8dZ2WruMmazK6uJxoHvtOA+Pt3ukb074CZZ1SfW344clWK6FJZeptyRclWaT1Py2
 QOIJOxMraCdNAay/1ZvOdIqqdIPx5+JyzbHIYOWUFzwT4y+Q8kFNbigrJnqxe5Ij
 aqRjzMIvt6MeLwbq9CfLsPFA3gaSzYeOkuXQPcqRgd5LU5ZyXBLStUrGEv1fsMvd
 9Kh7VFycZPS7MKzxoEcbuJTTOR4cBsINOlbo9iWr7UD5pm5h7c3vc+nCyia+U+Xo
 5XRpf8nitt4a3r1f6HxwXJS0OlBkS4CqexE2OejY4yhWRlxhMcIvRyquU+Z0J4Bp
 mgDJuXSzfJfFcBzp4jjOBxGPNEjXXOdy/qc/1jR97eMmTKrk3gk/74NWUx9hw4oN
 5RGeC+khAD13TL0yVQfKBe5HuLK5tHppAzXAnT2xi6qUn+VJjLxNWgg3iV9tbShM
 4q5vJp3BmvNOY8HQv1R3IDFfN0IAL09Q9v6EzEroNuVUhEOzBdH7JSzWkvBBveZb
 FVgD3I+wNBE1nQD3cP/6DGbRe1JG3ULDF95WJshB8gNJwavlZGs=
 =f7VZ
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

 - Add support for a couple new insn sets to the insn decoder:
   AVX512-FP16, AMX, other misc insns.

 - Update VMware-specific MAINTAINERS entries

* tag 'x86_misc_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Mark VMware mailing list entries as email aliases
  MAINTAINERS: Add Zack as maintainer of vmmouse driver
  MAINTAINERS: Update maintainers for paravirt ops and VMware hypervisor interface
  x86/insn: Add AVX512-FP16 instructions to the x86 instruction decoder
  perf/tests: Add AVX512-FP16 instructions to x86 instruction decoder test
  x86/insn: Add misc instructions to x86 instruction decoder
  perf/tests: Add misc instructions to the x86 instruction decoder test
  x86/insn: Add AMX instructions to the x86 instruction decoder
  perf/tests: Add AMX instructions to x86 instruction decoder test
2022-03-21 11:19:00 -07:00
Ian Rogers
7bd1da15d2 perf parse-events: Ignore case in topdown.slots check
An issue with icelakex metrics:

  https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json?h=perf/core&id=65eab2bc7dab326ee892ec5a4c749470b368b51a#n48

That causes the slots not to be first.

Fixes: 94dbfd6781a0e87b ("perf parse-events: Architecture specific leader override")
Reported-by: Caleb Biggers <caleb.biggers@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220317224309.543736-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-18 18:39:09 -03:00
Alan Kao
aec499c75c nds32: Remove the architecture
The nds32 architecture, also known as AndeStar V3, is a custom 32-bit
RISC target designed by Andes Technologies. Support was added to the
kernel in 2016 as the replacement RISC-V based V5 processors were
already announced, and maintained by (current or former) Andes
employees.

As explained by Alan Kao, new customers are now all using RISC-V,
and all known nds32 users are already on longterm stable kernels
provided by Andes, with no development work going into mainline
support any more.

While the port is still in a reasonably good shape, it only gets
worse over time without active maintainers, so it seems best
to remove it before it becomes unusable. As always, if it turns
out that there are mainline users after all, and they volunteer
to maintain the port in the future, the removal can be reverted.

Link: https://lore.kernel.org/linux-mm/YhdWNLUhk+x9RAzU@yamatobi.andestech.com/
Link: https://lore.kernel.org/lkml/20220302065213.82702-1-alankao@andestech.com/
Link: https://www.andestech.com/en/products-solutions/andestar-architecture/
Signed-off-by: Alan Kao <alankao@andestech.com>
[arnd: rewrite changelog to provide more background]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-03-07 13:54:59 +01:00
German Gomez
521f2688c5 perf arm-spe: Use advertised caps/min_interval as default sample_period
When recording SPE traces, the default sample_period is currently being
set to 1 in the perf_event_attr fields, instead of the value advertised
in '/sys/devices/arm_spe_0/caps/min_interval':

Before:

  $ perf record -e arm_spe// -vv -- sleep 1
  [...]
    { sample_period, sample_freq }   1
  [...]

Use the value from the above sysfs location as a more sensible default
(it was already being read, but the value not being used)

After:

  $ perf record -e arm_spe// -vv -- sleep 1
  [...]
    { sample_period, sample_freq }   1024
  [...]

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220221171042.58460-1-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-23 07:02:23 -03:00
James Clark
aca8af3c2e perf cs-etm: Update deduction of TRCCONFIGR register for branch broadcast
Now that a config flag for branch broadcast has been added, take it into
account when trying to deduce what the driver would have programmed the
TRCCONFIGR register to.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Reviewed-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/20220113091056.1297982-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:15:32 -03:00
Adrian Hunter
f2be829e72 perf intel-pt: Record Event Trace capability flag
The change to the MODE.Exec packet means processing must distinguish
between the old and new cases. Record the Event Trace capability flag to
make that possible.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-14-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:11:16 -03:00
Adrian Hunter
f7934477ce perf intel-pt: pkt-decoder: Add MODE.Exec IFLAG bit
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which adds a bit to the existing
MODE.Exec packet to record the interrupt flag. Amend the packet decoder and
packet decoder test accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:06:11 -03:00
Adrian Hunter
2750af50a3 perf intel-pt: pkt-decoder: Add CFE and EVD packets
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which requires 2 new packets CFE and
EVD. Add them to the packet decoder and packet decoder test.

Committer notes:

I got the "Intel® 64 and IA-32 architectures software developer’s manual
combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at:

  https://cdrdv2.intel.com/v1/dl/getContent/671200

And these new packets are described in page 3951:

<quote>
32.2.4

Event Trace is a capability that exposes details about the asynchronous
events, when they are generated, and when their corresponding software
event handler completes execution. These include:

o Interrupts, including NMI and SMI, including the interrupt vector when
defined.

o Faults, exceptions including the fault vector.

— Page faults additionally include the page fault address, when in context.

o Event handler returns, including IRET and RSM.

o VM exits and VM entries.¹

— VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields.
INIT and SIPI events.

o TSX aborts, including the abort status returned for the RTM instructions.

o Shutdown.

Additionally, it provides indication of the status of the Interrupt Flag
(IF), to indicate when interrupts are masked.
</quote>

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:05:44 -03:00
Adrian Hunter
32449b430f perf intel-pt: pkt-decoder-test: Fix scope of test_data
Make test_data 'static' otherwise it will conflict with any global
variable of the same name.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:05:20 -03:00
Ian Rogers
1a97cee604 perf maps: Use a pointer for kmaps
struct maps is reference counted, using a pointer is more idiomatic.

Committer notes:

Delay:

   maps = machine__kernel_maps(&vmlinux);

To after:

  machine__init(&vmlinux, "", HOST_KERNEL_ID);

To avoid this on f34:

  In file included from /var/home/acme/git/perf/tools/perf/util/build-id.h:10,
                   from /var/home/acme/git/perf/tools/perf/util/dso.h:13,
                   from tests/vmlinux-kallsyms.c:8:
  In function ‘machine__kernel_maps’,
      inlined from ‘test__vmlinux_matches_kallsyms’ at tests/vmlinux-kallsyms.c:122:22:
  /var/home/acme/git/perf/tools/perf/util/machine.h:86:23: error: ‘vmlinux.kmaps’ is used uninitialized [-Werror=uninitialized]
     86 |         return machine->kmaps;
        |                ~~~~~~~^~~~~~~
  tests/vmlinux-kallsyms.c: In function ‘test__vmlinux_matches_kallsyms’:
  tests/vmlinux-kallsyms.c:121:34: note: ‘vmlinux’ declared here
    121 |         struct machine kallsyms, vmlinux;
        |                                  ^~~~~~~
  cc1: all warnings being treated as errors

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: http://lore.kernel.org/lkml/20220211103415.2737789-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-14 16:47:13 -03:00
Heiko Carstens
f36e7c9845 s390: remove invalid email address of Heiko Carstens
Remove my old invalid email address which can be found in a couple of
files. Instead of updating it, just remove my contact data completely
from source files.
We have git and other tools which allow to figure out who is responsible
for what with recent contact data.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-02-06 23:31:29 +01:00
Adrian Hunter
cdb63ba98c perf/tests: Add AVX512-FP16 instructions to x86 instruction decoder test
The x86 instruction decoder is used for both kernel instructions and
user space instructions (e.g. uprobes, perf tools Intel PT), so it is
good to update it with new instructions.

Add AVX512-FP16 instructions to x86 instruction decoder test.

A subsequent patch adds the instructions to the instruction decoder.

Reference:
Intel AVX512-FP16 Architecture Specification
June 2021
Revision 1.0
Document Number: 347407-001US

Example:

  $ perf test -v "x86 instruction decoder" |& grep vfcmaddcph | head -2
  Failed to decode: 62 f6 6f 48 56 cb     vfcmaddcph %zmm3,%zmm2,%zmm1
  Failed to decode: 62 f6 6f 48 56 8c c8 78 56 34 12      vfcmaddcph 0x12345678(%eax,%ecx,8),%zmm2,%zmm1

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20211202095029.2165714-6-adrian.hunter@intel.com
2022-01-23 20:37:57 +01:00
Adrian Hunter
a6ea1142de perf/tests: Add misc instructions to the x86 instruction decoder test
The x86 instruction decoder is used for both kernel instructions and
user space instructions (e.g. uprobes, perf tools Intel PT), so it is
good to update it with new instructions.

Add the following instructions to the x86 instruction decoder test:

	User Interrupt

		clui
		senduipi
		stui
		testui
		uiret

	Prediction history reset

		hreset

	Serialize instruction execution

		serialize

	TSX suspend load address tracking

		xresldtrk
		xsusldtrk

A subsequent patch adds the instructions to the instruction decoder.

Reference:
Intel Architecture Instruction Set Extensions and Future Features
Programming Reference
May 2021
Document Number: 319433-044

Example:

  $ perf test -v "x86 instruction decoder" |& grep -i hreset
  Failed to decode length (4 vs expected 6): f3 0f 3a f0 c0 00    	hreset $0x0
  Failed to decode length (4 vs expected 6): f3 0f 3a f0 c0 00    	hreset $0x0

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20211202095029.2165714-4-adrian.hunter@intel.com
2022-01-23 20:37:50 +01:00
Adrian Hunter
4810dd2c94 perf/tests: Add AMX instructions to x86 instruction decoder test
The x86 instruction decoder is used for both kernel instructions and
user space instructions (e.g. uprobes, perf tools Intel PT), so it is
good to update it with new instructions.

Add AMX instructions to the x86 instruction decoder test.

A subsequent patch adds the instructions to the instruction decoder.

Reference:
Intel Architecture Instruction Set Extensions and Future Features
Programming Reference
May 2021
Document Number: 319433-044

Example:

  $ INSN='ldtilecfg\|sttilecfg\|tdpbf16ps\|tdpbssd\|'
  $ INSN+='tdpbsud\|tdpbusd\|'tdpbuud\|tileloadd\|'
  $ INSN+='tileloaddt1\|tilerelease\|tilestored\|tilezero'
  $ perf test -v "x86 instruction decoder" |& grep -i $INSN
  Failed to decode: c4 e2 78 49 04 c8    	ldtilecfg (%rax,%rcx,8)
  Failed to decode: c4 c2 78 49 04 c8    	ldtilecfg (%r8,%rcx,8)
  Failed to decode: c4 e2 79 49 04 c8    	sttilecfg (%rax,%rcx,8)
  Failed to decode: c4 c2 79 49 04 c8    	sttilecfg (%r8,%rcx,8)
  Failed to decode: c4 e2 7a 5c d1       	tdpbf16ps %tmm0,%tmm1,%tmm2
  Failed to decode: c4 e2 7b 5e d1       	tdpbssd %tmm0,%tmm1,%tmm2
  Failed to decode: c4 e2 7a 5e d1       	tdpbsud %tmm0,%tmm1,%tmm2
  Failed to decode: c4 e2 79 5e d1       	tdpbusd %tmm0,%tmm1,%tmm2
  Failed to decode: c4 e2 78 5e d1       	tdpbuud %tmm0,%tmm1,%tmm2
  Failed to decode: c4 e2 7b 4b 0c c8    	tileloadd (%rax,%rcx,8),%tmm1
  Failed to decode: c4 c2 7b 4b 14 c8    	tileloadd (%r8,%rcx,8),%tmm2
  Failed to decode: c4 e2 79 4b 0c c8    	tileloaddt1 (%rax,%rcx,8),%tmm1
  Failed to decode: c4 c2 79 4b 14 c8    	tileloaddt1 (%r8,%rcx,8),%tmm2
  Failed to decode: c4 e2 78 49 c0       	tilerelease
  Failed to decode: c4 e2 7a 4b 0c c8    	tilestored %tmm1,(%rax,%rcx,8)
  Failed to decode: c4 c2 7a 4b 14 c8    	tilestored %tmm2,(%r8,%rcx,8)
  Failed to decode: c4 e2 7b 49 c0       	tilezero %tmm0
  Failed to decode: c4 e2 7b 49 f8       	tilezero %tmm7

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20211202095029.2165714-2-adrian.hunter@intel.com
2022-01-23 20:37:42 +01:00
Arnaldo Carvalho de Melo
6e10e21915 tools headers UAPI: Sync files changed by new set_mempolicy_home_node syscall
To pick the changes in these csets:

  21b084fdf2a49ca1 ("mm/mempolicy: wire up syscall set_mempolicy_home_node")

That add support for this new syscall in tools such as 'perf trace'.

For instance, this is now possible:

  [root@five ~]# perf trace -e set_mempolicy_home_node
  ^C[root@five ~]#
  [root@five ~]# perf trace -v -e set_mempolicy_home_node
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 253729 && common_pid != 3585) && (id == 450)
  mmap size 528384B
  ^C[root@five ~]
  [root@five ~]# perf trace -v -e set*  --max-events 5
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 253734 && common_pid != 3585) && (id == 38 || id == 54 || id == 105 || id == 106 || id == 109 || id == 112 || id == 113 || id == 114 || id == 116 || id == 117 || id == 119 || id == 122 || id == 123 || id == 141 || id == 160 || id == 164 || id == 170 || id == 171 || id == 188 || id == 205 || id == 218 || id == 238 || id == 273 || id == 308 || id == 450)
  mmap size 528384B
       0.000 ( 0.008 ms): bash/253735 setpgid(pid: 253735 (bash), pgid: 253735 (bash))      = 0
    6849.011 ( 0.008 ms): bash/16046 setpgid(pid: 253736 (bash), pgid: 253736 (bash))       = 0
    6849.080 ( 0.005 ms): bash/253736 setpgid(pid: 253736 (bash), pgid: 253736 (bash))      = 0
    7437.718 ( 0.009 ms): gnome-shell/253737 set_robust_list(head: 0x7f34b527e920, len: 24) = 0
   13445.986 ( 0.010 ms): bash/16046 setpgid(pid: 253738 (bash), pgid: 253738 (bash))       = 0
  [root@five ~]#

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -w set_mempolicy_home_node
  tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
  tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:450 	nospu	set_mempolicy_home_node		sys_set_mempolicy_home_node
  tools/perf/arch/s390/entry/syscalls/syscall.tbl:450  common	set_mempolicy_home_node	sys_set_mempolicy_home_node	sys_set_mempolicy_home_node
  tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:450	common	set_mempolicy_home_node	sys_set_mempolicy_home_node
  $

  $ grep -w set_mempolicy_home_node /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c
	[450] = "set_mempolicy_home_node",
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
  diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-20 11:20:37 -03:00
Ian Rogers
6d18804b96 perf cpumap: Give CPUs their own type
A common problem is confusing CPU map indices with the CPU, by wrapping
the CPU with a struct then this is avoided. This approach is similar to
atomic_t.

Committer notes:

To make it build with BUILD_BPF_SKEL=1 these files needed the
conversions to 'struct perf_cpu' usage:

  tools/perf/util/bpf_counter.c
  tools/perf/util/bpf_counter_cgroup.c
  tools/perf/util/bpf_ftrace.c

Also perf_env__get_cpu() was removed back in "perf cpumap: Switch
cpu_map__build_map to cpu function".

Additionally these needed to be fixed for the ARM builds to complete:

  tools/perf/arch/arm/util/cs-etm.c
  tools/perf/arch/arm64/util/pmu.c

Suggested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: zhengjun.xing@intel.com
Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12 14:28:23 -03:00
Ian Rogers
dfc66beff7 perf cpumap: Move 'has' function to libperf
Make the cpu map argument const for consistency with the rest of the
API. Modify cpu_map__idx accordingly.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: zhengjun.xing@intel.com
Link: https://lore.kernel.org/r/20220105061351.120843-21-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12 14:28:22 -03:00
Athira Rajeev
befee3775b perf powerpc: Update global/local variants for p_stage_cyc
Update the arch_support_sort_key() function in powerpc to enable
presenting local and global variants of sort key 'p_stage_cyc'.

Update the "se_header" strings for these in arch_perf_header_entry()
along with instruction latency.

Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20211203022038.48240-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-10 15:39:00 -03:00
Alexandre Truong
7248e308a5 perf tools: Record ARM64 LR register automatically
On ARM64, automatically record the link register if the frame pointer
mode is on. It will be used to do a dwarf unwind to find the caller of
the leaf frame if the frame pointer was omitted.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211217154521.80603-2-german.gomez@arm.com
Signed-off-by: German Gomez <german.gomez@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-21 18:35:23 -03:00
German Gomez
83869019c7 perf arch: Support register names from all archs
When reading a perf.data file with register values, there is a mismatch
between the names and the values of the registers because the tool is
built using only the register names from the local architecture.

Reading a perf.data file that was recorded on ARM64, gives the following
erroneous output on an X86 machine:

  # perf report -i perf_arm64.data -D
  [...]
  24661932634451 0x698 [0x21d0]: PERF_RECORD_SAMPLE(IP, 0x1): 43239/43239: 0xffffc5be8f100f98 period: 1 addr: 0
  ... user regs: mask 0x1ffffffff ABI 64-bit
  .... AX    0x0000ffffd1515817
  .... BX    0x0000ffffd1515480
  .... CX    0x0000aaaadabf6c80
  .... DX    0x000000000000002e
  .... SI    0x0000000040100401
  .... DI    0x0040600200000080
  .... BP    0x0000ffffd1510e10
  .... SP    0x0000000000000000
  .... IP    0x00000000000000dd
  .... FLAGS 0x0000ffffd1510cd0
  .... CS    0x0000000000000000
  .... SS    0x0000000000000030
  .... DS    0x0000ffffa569a208
  .... ES    0x0000000000000000
  .... FS    0x0000000000000000
  .... GS    0x0000000000000000
  .... R8    0x0000aaaad3de9650
  .... R9    0x0000ffffa57397f0
  .... R10   0x0000000000000001
  .... R11   0x0000ffffa57fd000
  .... R12   0x0000ffffd1515817
  .... R13   0x0000ffffd1515480
  .... R14   0x0000aaaadabf6c80
  .... R15   0x0000000000000000
  .... unknown 0x0000000000000001
  .... unknown 0x0000000000000000
  .... unknown 0x0000000000000000
  .... unknown 0x0000000000000000
  .... unknown 0x0000000000000000
  .... unknown 0x0000ffffd1510d90
  .... unknown 0x0000ffffa5739b90
  .... unknown 0x0000ffffd1510d80
  .... XMM0  0x0000ffffa57392c8
   ... thread: perf-exec:43239
   ...... dso: [kernel.kallsyms]

As can be seen, the register names correspond to X86 registers, even
though the perf.data file was recorded on an ARM64 system. After this
patch, the output of the command displays the correct register names:

  # perf report -i perf_arm64.data -D
  [...]
  24661932634451 0x698 [0x21d0]: PERF_RECORD_SAMPLE(IP, 0x1): 43239/43239: 0xffffc5be8f100f98 period: 1 addr: 0
  ... user regs: mask 0x1ffffffff ABI 64-bit
  .... x0    0x0000ffffd1515817
  .... x1    0x0000ffffd1515480
  .... x2    0x0000aaaadabf6c80
  .... x3    0x000000000000002e
  .... x4    0x0000000040100401
  .... x5    0x0040600200000080
  .... x6    0x0000ffffd1510e10
  .... x7    0x0000000000000000
  .... x8    0x00000000000000dd
  .... x9    0x0000ffffd1510cd0
  .... x10   0x0000000000000000
  .... x11   0x0000000000000030
  .... x12   0x0000ffffa569a208
  .... x13   0x0000000000000000
  .... x14   0x0000000000000000
  .... x15   0x0000000000000000
  .... x16   0x0000aaaad3de9650
  .... x17   0x0000ffffa57397f0
  .... x18   0x0000000000000001
  .... x19   0x0000ffffa57fd000
  .... x20   0x0000ffffd1515817
  .... x21   0x0000ffffd1515480
  .... x22   0x0000aaaadabf6c80
  .... x23   0x0000000000000000
  .... x24   0x0000000000000001
  .... x25   0x0000000000000000
  .... x26   0x0000000000000000
  .... x27   0x0000000000000000
  .... x28   0x0000000000000000
  .... x29   0x0000ffffd1510d90
  .... lr    0x0000ffffa5739b90
  .... sp    0x0000ffffd1510d80
  .... pc    0x0000ffffa57392c8
   ... thread: perf-exec:43239
   ...... dso: [kernel.kallsyms]

Tester comments:

Athira reports:

"Looks good to me. Tested this patchset in powerpc by capturing regs in
powerpc and doing perf report to read the data from x86."

Reported-by: Alexandre Truong <alexandre.truong@arm.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20211207180653.1147374-4-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16 12:18:12 -03:00
German Gomez
d3b58af9a8 perf arm64: Rename perf_event_arm_regs for ARM64 registers
The registers for ARM and ARM64 are enumerated using two enums that have
the same name. In order to be able to import both headers, the name of
one can be replaced using the C preprocessor like so:

  #define perf_event_arm_regs perf_event_arm64_regs
  #include <asm/perf_regs.h>
  #undef perf_event_arm_regs

This patch updates all imports of ARM64's perf_regs.h in order to
prevent the naming collision.

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20211207180653.1147374-3-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16 12:18:12 -03:00
James Clark
7cc9680c4b perf cs-etm: Remove duplicate and incorrect aux size checks
There are two checks, one is for size when running without admin, but
this one is covered by the driver and reported on in more detail here
(builtin-record.c):

  pr_err("Permission error mapping pages.\n"
         "Consider increasing "
         "/proc/sys/kernel/perf_event_mlock_kb,\n"
         "or try again with a smaller value of -m/--mmap_pages.\n"
         "(current value: %u,%u)\n",

This had the effect of artificially limiting the aux buffer size to a
value smaller than what was allowed because perf_event_mlock_kb wasn't
taken into account.

The second is to check for a power of two, but this is covered here
(evlist.c):

  pr_info("rounding mmap pages size to %s (%lu pages)\n",
          buf, pages);

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211208115435.610101-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16 12:18:12 -03:00
Ian Rogers
94dbfd6781 perf parse-events: Architecture specific leader override
Currently topdown events must appear after a slots event:

  $ perf stat -e '{slots,topdown-fe-bound}' /bin/true

   Performance counter stats for '/bin/true':

         3,183,090      slots
           986,133      topdown-fe-bound

Reversing the events yields:

  $ perf stat -e '{topdown-fe-bound,slots}' /bin/true
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-fe-bound).

For metrics the order of events is determined by iterating over a
hashmap, and so slots isn't guaranteed to be first which can yield this
error.

Change the set_leader in parse-events, called when a group is closed, so
that rather than always making the first event the leader, if the slots
event exists then it is made the leader. It is then moved to the head of
the evlist otherwise it won't be opened in the correct order.

The result is:

  $ perf stat -e '{topdown-fe-bound,slots}' /bin/true

   Performance counter stats for '/bin/true':

         3,274,795      slots
         1,001,702      topdown-fe-bound

A problem with this approach is the slots event is identified by name,
names can be overwritten like 'cpu/slots,name=foo/' and this causes the
leader change to fail.

The change also modifies and fixes mixed groups like, with the change:

  $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2

   Performance counter stats for 'system wide':

        5574985410      slots
         971981616      instructions
        1348461887      topdown-fe-bound

       2.001263120 seconds time elapsed

Without the change:

  $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2

   Performance counter stats for 'system wide':

     <not counted>      instructions
     <not counted>      slots
   <not supported>      topdown-fe-bound

       2.006247990 seconds time elapsed

Something that may be undesirable here is that the events are reordered
in the output.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Link: http://lore.kernel.org/lkml/20211130174945.247604-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07 22:18:24 -03:00
Arnaldo Carvalho de Melo
cba43fcf7a tools headers UAPI: Sync powerpc syscall table file changed by new futex_waitv syscall
To pick the changes in this cset:

  a0eb2da92b715d0c ("futex: Wireup futex_waitv syscall")

That add support for this new syscall in tools such as 'perf trace'.

For instance, this is now possible (adapted from the x86_64 test output):

  # perf trace -e futex_waitv
  ^C#
  # perf trace -v -e futex_waitv
  event qualifier tracepoint filter: (common_pid != 807333 && common_pid != 3564) && (id == 449)
  ^C#
  # perf trace -v -e futex* --max-events 10
  event qualifier tracepoint filter: (common_pid != 812168 && common_pid != 3564) && (id == 221 || id == 449)
  mmap size 528384B
           ? (         ): Timer/219310  ... [continued]: futex())                                            = -1 ETIMEDOUT (Connection timed out)
       0.012 ( 0.002 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.024 ( 0.060 ms): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) = 0
       0.086 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.088 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d424, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
       0.075 ( 0.005 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d420, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.169 ( 0.004 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d424, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.088 ( 0.089 ms): Timer/219310  ... [continued]: futex())                                            = 0
       0.179 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.181 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
  #

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ grep futex tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
  221	32	futex				sys_futex_time32
  221	64	futex				sys_futex
  221	spu	futex				sys_futex
  422	32	futex_time64			sys_futex			sys_futex
  449	common  futex_waitv                     sys_futex_waitv
  $

This addresses this perf build warnings:

  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl

Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>,
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/YZ%2F1OU9mJuyS2HMa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-06 21:57:53 -03:00
Arnaldo Carvalho de Melo
71a16df164 tools headers UAPI: Sync s390 syscall table file changed by new futex_waitv syscall
To pick the changes in these csets:

  6c122360cf2f4c5a ("s390: wire up sys_futex_waitv system call")

That add support for this new syscall in tools such as 'perf trace'.

For instance, this is now possible (adapted from the x86_64 test output):

  # perf trace -e futex_waitv
  ^C#
  # perf trace -v -e futex_waitv
  event qualifier tracepoint filter: (common_pid != 807333 && common_pid != 3564) && (id == 449)
  ^C#
  # perf trace -v -e futex* --max-events 10
  event qualifier tracepoint filter: (common_pid != 812168 && common_pid != 3564) && (id == 238 || id == 449)
           ? (         ): Timer/219310  ... [continued]: futex())                                            = -1 ETIMEDOUT (Connection timed out)
       0.012 ( 0.002 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.024 ( 0.060 ms): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) = 0
       0.086 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.088 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d424, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
       0.075 ( 0.005 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d420, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.169 ( 0.004 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d424, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.088 ( 0.089 ms): Timer/219310  ... [continued]: futex())                                            = 0
       0.179 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.181 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
  #

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ grep futex tools/perf/arch/s390/entry/syscalls/syscall.tbl
  238  common	futex			sys_futex			sys_futex_time32
  422	32	futex_time64		-				sys_futex
  449  common	futex_waitv		sys_futex_waitv			sys_futex_waitv
  $

This addresses this perf build warnings:

  Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>,
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/lkml/YZ%2F2qRW%2FTScYTP1U@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-06 21:57:52 -03:00
Arnaldo Carvalho de Melo
8b8dcc3720 tools headers UAPI: Sync MIPS syscall table file changed by new futex_waitv syscall
To pick the changes in these csets:

  b3ff2881ba18b852 ("MIPS: syscalls: Wire up futex_waitv syscall")

That add support for this new syscall in tools such as 'perf trace'.

For instance, this is now possible (adapted from the x86_64 test output):

  # perf trace -e futex_waitv
  ^C#
  # perf trace -v -e futex_waitv
  event qualifier tracepoint filter: (common_pid != 807333 && common_pid != 3564) && (id == 449)
  ^C#
  # perf trace -v -e futex* --max-events 10
  event qualifier tracepoint filter: (common_pid != 812168 && common_pid != 3564) && (id == 202 || id == 449)
  mmap size 528384B
           ? (         ): Timer/219310  ... [continued]: futex())                                            = -1 ETIMEDOUT (Connection timed out)
       0.012 ( 0.002 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.024 ( 0.060 ms): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) = 0
       0.086 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.088 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d424, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
       0.075 ( 0.005 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d420, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.169 ( 0.004 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d424, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.088 ( 0.089 ms): Timer/219310  ... [continued]: futex())                                            = 0
       0.179 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.181 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
  #

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ grep futex_waitv tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
  449	n64	futex_waitv			sys_futex_waitv
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
  diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl

Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Wang Haojun <jiangliuer01@gmail.com>
Link: https://lore.kernel.org/lkml/YZZRxuIyvSGLZhM4@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:15:27 -03:00
Arnaldo Carvalho de Melo
7380aa8990 tools headers UAPI: Sync files changed by new futex_waitv syscall
To pick the changes in these csets:

  039c0ec9bb77446d ("futex,x86: Wire up sys_futex_waitv()")
  bf69bad38cf63d98 ("futex: Implement sys_futex_waitv()")

That add support for this new syscall in tools such as 'perf trace'.

For instance, this is now possible:

  # perf trace -e futex_waitv
  ^C#
  # perf trace -v -e futex_waitv
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 807333 && common_pid != 3564) && (id == 449)
  mmap size 528384B
  ^C#
  # perf trace -v -e futex* --max-events 10
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 812168 && common_pid != 3564) && (id == 202 || id == 449)
  mmap size 528384B
           ? (         ): Timer/219310  ... [continued]: futex())                                            = -1 ETIMEDOUT (Connection timed out)
       0.012 ( 0.002 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.024 ( 0.060 ms): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) = 0
       0.086 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.088 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d424, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
       0.075 ( 0.005 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d420, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.169 ( 0.004 ms): Web Content/219299 futex(uaddr: 0x7fd0b152d424, op: WAKE|PRIVATE_FLAG, val: 1)     = 1
       0.088 ( 0.089 ms): Timer/219310  ... [continued]: futex())                                            = 0
       0.179 ( 0.001 ms): Timer/219310 futex(uaddr: 0x7fd0b152d3c8, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
       0.181 (         ): Timer/219310 futex(uaddr: 0x7fd0b152d420, op: WAIT_BITSET|PRIVATE_FLAG, utime: 0x7fd0b1657840, val3: MATCH_ANY) ...
  #

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ grep futex_waitv tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
  449	common	futex_waitv		sys_futex_waitv
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl

Cc: André Almeida <andrealmeid@collabora.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
German Gomez
455c988225 perf arm-spe: Update --switch-events docs in 'perf record'
Update 'perf record' docs and ARM SPE recording options so that they are
consistent. This includes supporting the --no-switch-events flag in ARM
SPE as well.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211111133625.193568-3-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Namhyung Kim
9dc9855f18 perf arm-spe: Track task context switch for cpu-mode events
When perf report synthesize events from ARM SPE data, it refers to
current cpu, pid and tid in the machine.  But there's no place to set
them in the ARM SPE decoder.  I'm seeing all pid/tid is set to -1 and
user symbols are not resolved in the output.

  # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1

  # perf report -q | head
     8.77%     8.77%  :-1      [kernel.kallsyms]  [k] format_decode
     7.02%     7.02%  :-1      [kernel.kallsyms]  [k] seq_printf
     7.02%     7.02%  :-1      [unknown]          [.] 0x0000ffff9f687c34
     5.26%     5.26%  :-1      [kernel.kallsyms]  [k] vsnprintf
     3.51%     3.51%  :-1      [kernel.kallsyms]  [k] string
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f66ae20
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f670b3c
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f67c040
     1.75%     1.75%  :-1      [kernel.kallsyms]  [k] ___cache_free
     1.75%     1.75%  :-1      [kernel.kallsyms]  [k] __count_memcg_events

Like Intel PT, add context switch records to track task info.  As ARM
SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think
we can safely set the attr.context_switch bit and use it.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211111133625.193568-2-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
German Gomez
56c31cdff7 perf arm-spe: Implement find_snapshot callback
The head pointer of the AUX buffer managed by the arm_spe_pmu.c driver
is not monotonically increasing, therefore the find_snapshot callback is
needed in order to find the trace data within the AUX buffer and avoid
wasting space in the perf.data file.

The pointer is assumed to have wrapped if the buffer contains non-zero
data at the end. If it has wrapped, the entire contents of the AUX
buffer are stored in the perf.data file. Otherwise only the data up to
the head pointer is stored.

Reviewed-by: James Clark <james.clark@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211109163009.92072-3-german.gomez@arm.com
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
German Gomez
0901b56028 perf arm-spe: Add snapshot mode support
This patch enables support for snapshot mode of arm_spe events,
including the implementation of the necessary callbacks (excluding
find_snapshot, which is to be included in a followup commit).

Reviewed-by: James Clark <james.clark@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211109163009.92072-2-german.gomez@arm.com
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
33f44bfd3c perf test: Rename struct test to test_suite
This is to align with kunit's terminology.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211104064208.3156807-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 10:32:22 -03:00
Ian Rogers
d68f036508 perf test: Move each test suite struct to its test
Rather than export test functions, export the test struct. Rename with a
suite__ prefix to avoid name collisions.

Committer notes:

Its '&suite__vectors_page', not '&suite__vectors_pages', noticed when
cross building to arm (32-bit).

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211104064208.3156807-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 10:30:58 -03:00
Ian Rogers
df2252054e perf test: Make each test/suite its own struct.
By switching to an array of pointers to tests (later to be suites)
the definition of the tests can be moved to the file containing the
tests.

Committer notes:

It's "&vectors_page", not "&vectors_pages", noticed when cross building
to 32-bit ARM.

Also the DEFINE_SUITE(vectors_page) should be done where its function is
implemented, in tools/perf/arch/arm/tests/vectors-page.c, so that we can
make it static, as we don't have anymore its declaration in tests.h.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211104064208.3156807-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 10:30:04 -03:00
Ian Rogers
07eafd4e05 perf parse-event: Add init and exit to parse_event_error
parse_events() may succeed but leave string memory allocations reachable
in the error.

Add an init/exit that must be called to initialize and clean up the
error. This fixes a leak in metricgroup parse_ids.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107090002.3784612-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:39:25 -03:00
Ian Rogers
6c1912898e perf parse-events: Rename parse_events_error functions
Group error functions and name after the data type they manipulate.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107090002.3784612-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:38:54 -03:00
Ravi Bangoria
eb39bf3256 perf evsel: Don't set exclude_guest by default
Perf tool sets exclude_guest by default while calling perf_event_open().
Because IBS does not have filtering capability, it always gets rejected
by IBS PMU driver and thus perf falls back to non-precise sampling. Fix
it by not setting exclude_guest by default on AMD.

Before:
  $ sudo ./perf record -C 0 -vvv true |& grep precise
    precise_ip                       3
  decreasing precise_ip by one (2)
    precise_ip                       2
  decreasing precise_ip by one (1)
    precise_ip                       1
  decreasing precise_ip by one (0)

After:
  $ sudo ./perf record -C 0 -vvv true |& grep precise
    precise_ip                       3
  decreasing precise_ip by one (2)
    precise_ip                       2

Committer notes:

Fixup init to zero for perf_env in older compilers:

  arch/x86/util/evsel.c:15:26: error: missing field 'os_release' initializer [-Werror,-Wmissing-field-initializers]
          struct perf_env env = {0};
                                  ^

Committer notes:

Namhyung remarked:

  It'd be nice if it can cover explicit "-e cycles:pp" as well.

Ravi clarified:

  For explicit :pp modifier, evsel->precise_max does not get set and thus perf
  does not try with different attr->precise_ip values while exclude_guest set.
  So no issue with explicit :pp:

    $ sudo ./perf record -C 0 -e cycles:pp -vvv |& grep "precise_ip\|exclude_guest"
      precise_ip                       2
      exclude_guest                    1
      precise_ip                       2
      exclude_guest                    1
    switching off exclude_guest, exclude_host
      precise_ip                       2
    ^C

  Also, with :P modifier, evsel->precise_max gets set but exclude_guest does
  not and thus :P also works fine:

    $ sudo ./perf record -C 0 -e cycles:P -vvv |& grep "precise_ip\|exclude_guest"
      precise_ip                       3
    decreasing precise_ip by one (2)
      precise_ip                       2
    ^C

Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211103072112.32312-1-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 12:26:24 -03:00
Arnaldo Carvalho de Melo
875eaa3990 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-01 07:10:30 -03:00
Jiri Olsa
89ac61ff05 perf callchain: Fix compilation on powerpc with gcc11+
Got following build fail on powerpc:

    CC      arch/powerpc/util/skip-callchain-idx.o
  In function ‘check_return_reg’,
      inlined from ‘check_return_addr’ at arch/powerpc/util/skip-callchain-idx.c:213:7,
      inlined from ‘arch_skip_callchain_idx’ at arch/powerpc/util/skip-callchain-idx.c:265:7:
  arch/powerpc/util/skip-callchain-idx.c:54:18: error: ‘dwarf_frame_register’ accessing 96 bytes \
  in a region of size 64 [-Werror=stringop-overflow=]
     54 |         result = dwarf_frame_register(frame, ra_regno, ops_mem, &ops, &nops);
        |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  arch/powerpc/util/skip-callchain-idx.c: In function ‘arch_skip_callchain_idx’:
  arch/powerpc/util/skip-callchain-idx.c:54:18: note: referencing argument 3 of type ‘Dwarf_Op *’
  In file included from /usr/include/elfutils/libdwfl.h:32,
                   from arch/powerpc/util/skip-callchain-idx.c:10:
  /usr/include/elfutils/libdw.h:1069:12: note: in a call to function ‘dwarf_frame_register’
   1069 | extern int dwarf_frame_register (Dwarf_Frame *frame, int regno,
        |            ^~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

The dwarf_frame_register args changed with [1],
Updating ops_mem accordingly.

[1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=5621fe5443da23112170235dd5cac161e5c75e65

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Mark Wieelard <mjw@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20210928195253.1267023-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31 12:51:41 -03:00
Athira Rajeev
83e1ada67a perf powerpc: Add support to expose instruction and data address registers as part of extended regs
This patch enables presenting Sampled Instruction Address Register
(SIAR) and Sampled Data Address Register (SDAR) SPRs as part of extended
registers for the perf tool.

Add these SPR's to sample_reg_mask in the tool side (to use with -I?
option).

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20211018114948.16830-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-25 13:47:42 -03:00
Ian Rogers
47f572aad5 perf pmu: Make pmu_event tables const.
Make lookup nature of data structures clearer through their type. Reduce
scope of architecture specific pmu_event tables by making them static.

Suggested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Denys Zagorui <dzagorui@cisco.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nicholas Fraser <nfraser@codeweavers.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: ShihCheng Tu <mrtoastcheng@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20211015172132.1162559-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-20 10:32:33 -03:00
Ian Rogers
0ec43c0837 perf pmu: Add const to pmu_events_map.
The pmu_events_map is generated at compile time and used for lookup. For
testing purposes we need to swap the map being used.

Having the pmu_events_map be non-const is misleading as it may be an out
argument.

Make it const and update uses so they work on const too.

Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Denys Zagorui <dzagorui@cisco.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nicholas Fraser <nfraser@codeweavers.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: ShihCheng Tu <mrtoastcheng@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20211015172132.1162559-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-20 10:31:47 -03:00
Arnaldo Carvalho de Melo
47e7dd34a2 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up the fixes in perf/urgent that were just merged into upstream.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-08 11:13:17 -03:00
Like Xu
4da8b12188 perf iostat: Fix Segmentation fault from NULL 'struct perf_counts_values *'
If the 'perf iostat' user specifies two or more iio_root_ports and also
specifies the cpu(s) by -C which is not *connected to all* the above iio
ports, the iostat_print_metric() will run into trouble:

For example:

  $ perf iostat list
  S0-uncore_iio_0<0000:16>
  S1-uncore_iio_0<0000:97> # <--- CPU 1 is located in the socket S0

  $ perf iostat 0000:16,0000:97 -C 1 -- ls
  port 	Inbound Read(MB)	Inbound Write(MB)	Outbound Read(MB)	Outbound
  Write(MB) ../perf-iostat: line 12: 104418 Segmentation fault
  (core dumped) perf stat --iostat$DELIMITER$*

The core-dump stack says, in the above corner case, the returned
(struct perf_counts_values *) count will be NULL, and the caller
iostat_print_metric() apparently doesn't not handle this case.

  433	struct perf_counts_values *count = perf_counts(evsel->counts, die, 0);
  434
  435	if (count->run && count->ena) {
  (gdb) p count
  $1 = (struct perf_counts_values *) 0x0

The deeper reason is that there are actually no statistics from the user
specified pair "iostat 0000:X, -C (disconnected) Y ", but let's fix it with
minimum cost by adding a NULL check in the user space.

Fixes: f9ed693e8bc0e7de ("perf stat: Enable iostat mode for x86 platforms")
Signed-off-by: Like Xu <likexu@tencent.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210927081115.39568-2-likexu@tencent.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-27 09:41:07 -03:00
William Cohen
0ba37e05c2 perf annotate: Add riscv64 support
This patch adds basic arch initialization and instruction associate
support for the riscv64 CPU architecture.

Example output:

  $ perf annotate --stdio2
  Samples: 122K of event 'task-clock:u', 4000 Hz, Event count (approx.): 30637250000, [percent: local period]
  strcmp() /usr/lib64/libc-2.32.so
  Percent

	      Disassembly of section .text:

	      0000000000069a30 <strcmp>:
	      __GI_strcmp():
	      const unsigned char *s2 = (const unsigned char *) p2;
	      unsigned char c1, c2;

	      do
	      {
	      c1 = (unsigned char) *s1++;
   37.30        lbu  a5,0(a0)
	      c2 = (unsigned char) *s2++;
    1.23        addi a1,a1,1
	      c1 = (unsigned char) *s1++;
   18.68        addi a0,a0,1
	      c2 = (unsigned char) *s2++;
    1.37        lbu  a4,-1(a1)
	      if (c1 == '\0')
   18.71      ↓ beqz a5,18
	       return c1 - c2;
	       }

Signed-off-by: William Cohen <wcohen@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210927005115.610264-1-wcohen@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-27 09:33:44 -03:00
Ian Rogers
c6613bd4a5 perf arm: Fix off-by-one directory paths.
Relative path include works in the regular build due to -I paths but may
fail in other situations.

v2. Rebase. Comments on v1 were that we should handle include paths
    differently and it is agreed that can be a sensible refactor but
    beyond the scope of this change.
https://lore.kernel.org/lkml/20210504191227.793712-1-irogers@google.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210923154254.737657-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-27 09:32:28 -03:00
Ravi Bangoria
3149733584 perf annotate: Add fusion logic for AMD microarchs
AMD family 15h and above microarchs fuse a subset of cmp/test/ALU
instructions with branch instructions[1][2]. Add perf annotate
fused instruction support for these microarchs.

Before:
         │       testb  $0x80,0x51(%rax)
         │    ┌──jne    5b3
    0.78 │    │  mov    %r13,%rdi
         │    │→ callq  mark_page_accessed
    1.08 │5b3:└─→mov    0x8(%r13),%rax

After:
         │    ┌──testb  $0x80,0x51(%rax)
         │    ├──jne    5b3
    0.78 │    │  mov    %r13,%rdi
         │    │→ callq  mark_page_accessed
    1.08 │5b3:└─→mov    0x8(%r13),%rax

[1] https://bugzilla.kernel.org/attachment.cgi?id=298553
[2] https://bugzilla.kernel.org/attachment.cgi?id=298555

Committer testing:

On a:

  $ grep -m1 "model name" /proc/cpuinfo
  model name	: AMD Ryzen 9 3900X 12-Core Processor
  $

  Samples: 44K of event 'cycles', 4000 Hz, Event count (approx.): 7533249650
  _int_malloc  /usr/lib64/libc-2.33.so [Percent: local period]
  Percent│    ┌──test   %eax,%eax
         │    ├──jne    884
         │    │↓ jmpq   943
         │    │  nop
         │878:│  add    $0x10,%rdx
    0.64 │    │  add    %eax,%eax
    0.57 │    │↓ je     cc9
    0.77 │884:└─→test   %esi,%eax
         │     ↑ je     878
         │       mov    0x18(%rdx),%r15

Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https //lore.kernel.org/r/20210911043854.8373-2-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-15 17:54:52 -03:00
Arnaldo Carvalho de Melo
64f4535166 tools headers UAPI: Sync files changed by new process_mrelease syscall and the removal of some compat entry points
To pick the changes in these csets:

  59ab844eed9c6b01 ("compat: remove some compat entry points")
  dce49103962840dd ("mm: wire up syscall process_mrelease")
  b48c7236b13cb5ef ("exit/bdflush: Remove the deprecated bdflush system call")

That add support for this new syscall in tools such as 'perf trace'.

For instance, this is now possible:

  # perf trace -v -e process_mrelease
  event qualifier tracepoint filter: (common_pid != 19351 && common_pid != 9112) && (id == 448)
  ^C#

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ grep process_mrelease tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
  448    common  process_mrelease            sys_process_mrelease
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
  diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-10 11:45:07 -03:00
Linus Torvalds
27151f1778 perf tools changes for v5.15:
New features:
 
 - Improvements for the flamegraph python script, including:
 
   - Display perf.data header
   - Display PIDs of user stacks
   - Added option to change color scheme
   - Default to blue/green color scheme to improve accessibility
   - Correctly identify kernel stacks when debuginfo is available
 
 - Improvements for 'perf bench futex':
   - Add --mlockall parameter
   - Add --broadcast and --pi to the 'requeue' sub benchmark
 
 - Add support for PMU aliases.
 
 - Introduce an ARM Coresight ETE decoder.
 
 - Add a 'perf bench' entry for evlist open/close operations, to help quantify
   improvements with multithreading 'perf record'.
 
 - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf script's
   python scripting.
 
 - Add a 'perf test' entry for PMU aliases.
 
 - Add a 'perf test' entry for 'perf record/perf report/perf script' pipe mode.
 
 Fixes:
 
 - perf script dlfilter (API for filtering via dynamically loaded shared object
   introduced in v5.14) fixes and a 'perf test' entry for it.
 
 - Fix get_current_dir_name() compilation on Android.
 
 - Fix issues with asciidoc and double dashes uses.
 
 - Fix memory leaks in the BTF handling code.
 
 - Fix leftover problems in the Documentation from the infrastructure originally
   lifted from the git codebase.
 
 - Fix *probe_vfs_getname.sh 'perf test' failures.
 
 - Handle fd gaps in 'perf test's test__dso_data_reopen().
 
 - Make sure to show disasembly warnings for 'perf annotate --stdio'.
 
 - Fix output from pipe to file and vice-versa in 'perf record/report/script'.
 
 - Correct 'perf data -h' output.
 
 - Fix wrong comm in system-wide mode with 'perf record --delay'.
 
 - Do not allow --for-each-cgroup without cpu in 'perf stat'
 
 - Make 'perf test --skip' work on shell tests.
 
 - Fix libperf's verbose printing.
 
 Misc improvements:
 
 - Preparatory patches for multithreading varios 'perf record' phases
   (synthesizing, opening, recording, etc).
 
 - Add sparse context/locking annotations in compiler-types.h, also to help with
   the multithreading effort.
 
 - Optimize the generation of the arch specific erno tables used in 'perf trace'.
 
 - Optimize libperf's perf_cpu_map__max().
 
 - Improve ARM's CoreSight warnings.
 
 - Report collisions in AUX records.
 
 - Improve warnings for the LLVM 'perf test' entry.
 
 - Improve the PMU events 'perf test' codebase.
 
 - perf test: Do not compare overheads in the zstd comp test
 
 - Better support annotation on ARM.
 
 - Update 'perf trace's cmd string table to decode sys_bpf() first arg.
 
 Vendor events:
 
 - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and Elhart Lake.
 
 - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake servers.
 
 Hardware tracing:
 
 - Improvements for the ARM hardware tracing auxtrace support.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYTO8OAAKCRCyPKLppCJ+
 J/N0AQCTAfxKsVyh9vjIY5vwHvNkYMDM4Wvs7TmlfXbVgKLP3AEA0wYlTyar/teI
 NC6jQpipxpWASFUJbLSmU8qn/XFXwQo=
 =4r0w
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Improvements for the flamegraph python script, including:
       - Display perf.data header
       - Display PIDs of user stacks
       - Added option to change color scheme
       - Default to blue/green color scheme to improve accessibility
       - Correctly identify kernel stacks when debuginfo is available

   - Improvements for 'perf bench futex':
       - Add --mlockall parameter
       - Add --broadcast and --pi to the 'requeue' sub benchmark

   - Add support for PMU aliases.

   - Introduce an ARM Coresight ETE decoder.

   - Add a 'perf bench' entry for evlist open/close operations, to help
     quantify improvements with multithreading 'perf record'.

   - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf
     script's python scripting.

   - Add a 'perf test' entry for PMU aliases.

   - Add a 'perf test' entry for 'perf record/perf report/perf script'
     pipe mode.

  Fixes:

   - perf script dlfilter (API for filtering via dynamically loaded
     shared object introduced in v5.14) fixes and a 'perf test' entry
     for it.

   - Fix get_current_dir_name() compilation on Android.

   - Fix issues with asciidoc and double dashes uses.

   - Fix memory leaks in the BTF handling code.

   - Fix leftover problems in the Documentation from the infrastructure
     originally lifted from the git codebase.

   - Fix *probe_vfs_getname.sh 'perf test' failures.

   - Handle fd gaps in 'perf test's test__dso_data_reopen().

   - Make sure to show disasembly warnings for 'perf annotate --stdio'.

   - Fix output from pipe to file and vice-versa in 'perf
     record/report/script'.

   - Correct 'perf data -h' output.

   - Fix wrong comm in system-wide mode with 'perf record --delay'.

   - Do not allow --for-each-cgroup without cpu in 'perf stat'

   - Make 'perf test --skip' work on shell tests.

   - Fix libperf's verbose printing.

  Misc improvements:

   - Preparatory patches for multithreading various 'perf record' phases
     (synthesizing, opening, recording, etc).

   - Add sparse context/locking annotations in compiler-types.h, also to
     help with the multithreading effort.

   - Optimize the generation of the arch specific erno tables used in
     'perf trace'.

   - Optimize libperf's perf_cpu_map__max().

   - Improve ARM's CoreSight warnings.

   - Report collisions in AUX records.

   - Improve warnings for the LLVM 'perf test' entry.

   - Improve the PMU events 'perf test' codebase.

   - perf test: Do not compare overheads in the zstd comp test

   - Better support annotation on ARM.

   - Update 'perf trace's cmd string table to decode sys_bpf() first
     arg.

  Vendor events:

   - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and
     Elhart Lake.

   - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake
     servers.

  Hardware tracing:

   - Improvements for the ARM hardware tracing auxtrace support"

* tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits)
  perf tests: Add test for PMU aliases
  perf pmu: Add PMU alias support
  perf session: Report collisions in AUX records
  perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
  perf build: Report failure for testing feature libopencsd
  perf cs-etm: Show a warning for an unknown magic number
  perf cs-etm: Print the decoder name
  perf cs-etm: Create ETE decoder
  perf cs-etm: Update OpenCSD decoder for ETE
  perf cs-etm: Fix typo
  perf cs-etm: Save TRCDEVARCH register
  perf cs-etm: Refactor out ETMv4 header saving
  perf cs-etm: Initialise architecture based on TRCIDR1
  perf cs-etm: Refactor initialisation of decoder params.
  tools build: Fix feature detect clean for out of source builds
  perf evlist: Add evlist__for_each_entry_from() macro
  perf evsel: Handle precise_ip fallback in evsel__open_cpu()
  perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
  perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
  perf evsel: Move ignore_missing_thread() to fallback code
  ...
2021-09-05 11:56:18 -07:00
Kan Liang
13d60ba073 perf pmu: Add PMU alias support
A perf uncore PMU may have two PMU names, a real name and an alias. The
alias is exported at /sys/bus/event_source/devices/uncore_*/alias.
The perf tool should support the alias as well.

Add alias_name in the struct perf_pmu to store the alias. For the PMU
which doesn't have an alias. It's NULL.

Introduce two X86 specific functions to retrieve the real name and the
alias separately.

Only go through the sysfs to retrieve the mapping between the real name
and the alias once. The result is cached in a list, uncore_pmu_list.

Nothing changed for the other ARCHs.

With the patch, the perf tool can monitor the PMU with either the real
name or the alias.

Use the real name,
 $ perf stat -e uncore_cha_2/event=1/ -x,
   4044879584,,uncore_cha_2/event=1/,2528059205,100.00,,

Use the alias,
 $ perf stat -e uncore_type_0_2/event=1/ -x,
   3659675336,,uncore_type_0_2/event=1/,2287306455,100.00,,

Committer notes:

Rename 'struct perf_pmu_alias_name' to 'pmu_alias', the 'perf_' prefix
should be used for libperf, things inside just tools/perf/ are being
moved away from that prefix.

Also 'pmu_alias' is shorter and reflects the abstraction.

Also don't use 'pmu' as the name for variables for that type, we should
use that for the 'struct perf_pmu' variables, avoiding confusion. Use
'pmu_alias' for 'struct pmu_alias' variables.

Co-developed-by: Jin Yao <yao.jin@linux.intel.com>
Co-developed-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Link: http://lore.kernel.org/lkml/20210902065955.1299-2-yao.jin@linux.intel.com
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-03 08:33:26 -03:00