linux

iv/linux

Author	SHA1	Message	Date
Athira Rajeev	9eef41014f	perf vendor events powerpc: Update datasource event name to fix duplicate events Running "perf list" on powerpc fails with segfault as below: $ ./perf list Segmentation fault (core dumped) $ This happens because of duplicate events in the JSON list. The powerpc JSON event list contains some event with same event name, but different event code. They are: - PM_INST_FROM_L3MISS (Present in datasource and frontend) - PM_MRK_DATA_FROM_L2MISS (Present in datasource and marked) - PM_MRK_INST_FROM_L3MISS (Present in datasource and marked) - PM_MRK_DATA_FROM_L3MISS (Present in datasource and marked) pmu_events_table__num_events() uses the value from table_pmu->num_entries which includes duplicate events as well. This causes issue during "perf list" and results in a segmentation fault. Since both event codes are valid, append _DSRC to the Data Source events (datasource.json), so that they would have a unique name. Also add PM_DATA_FROM_L2MISS_DSRC and PM_DATA_FROM_L3MISS_DSRC events. With the fix, 'perf list' works as expected. Fixes: `fc14358075` ("perf vendor events power10: Update JSON/events") Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123160110.94090-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-05 15:48:52 -03:00
Ian Rogers	7d723ef83b	perf test: Add basic 'perf list --json" test Test that JSON output produces valid JSON. Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231129213428.2227448-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-05 15:48:52 -03:00
Ian Rogers	8226e4a3b3	perf test: Use common python setup library Avoid replicated logic by having a common library to set the PYTHON environment variable. Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231129213428.2227448-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-05 15:48:52 -03:00
Ian Rogers	b809fc656e	perf build: Shellcheck support for OUTPUT directory Migrate Makefile.tests to Build so that variables like rule_mkdir are defined via Makefile.build (needed so the output directory can be created). This requires SHELLCHECK being exported and the clean rule tweaking to remove the files in find. Change find "-perm -o=x" as it was failing on my Debian based Linux kernel tree, switch to using "-executable". Adding a filename prefix of "." to the shellcheck log files is a pain and error prone in make, remove this prefix and just add the shellcheck log files to .gitignore. Fix the command echo so that running the test is displayed. Fixes: `1638b11ef8` ("perf tools: Add perf binary dependent rule for shellcheck log in Makefile.perf") Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231129213428.2227448-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-05 15:46:43 -03:00
Ilkka Koskinen	16438b652b	perf vendor events arm64 AmpereOneX: Add core PMU events and metrics Add JSON files for AmpereOneX core PMU events and metrics. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231201021550.1109196-4-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-05 15:46:43 -03:00
Ilkka Koskinen	10a149e4b4	perf vendor events arm64 AmpereOne: Rename BPU_FLUSH_MEM_FAULT to GPC_FLUSH_MEM_FAULT The documentation wrongly called the event as BPU_FLUSH_MEM_FAULT and now has been fixed. Correct the name in the perf tool as well. Fixes: `a9650b7f6f` ("perf vendor events arm64: Add AmpereOne core PMU events") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20231201021550.1109196-3-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-05 15:46:43 -03:00
Ilkka Koskinen	90fe70d4e2	perf vendor events arm64: AmpereOne: Add missing DefaultMetricgroupName fields AmpereOne metrics were missing DefaultMetricgroupName from metrics with "Default" in group name resulting perf to segfault. Add the missing field to address the issue. Fixes: `59faeaf80d` ("perf vendor events arm64: Fix for AmpereOne metrics") Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231201021550.1109196-2-ilkka@os.amperecomputing.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-12-05 10:16:40 -08:00
Ian Rogers	e2b005d6ec	perf metrics: Avoid segv if default metricgroup isn't set A metric is default by having "Default" within its groups. The default metricgroup name needn't be set and this can result in segv in default_metricgroup_cmp and perf_stat__print_shadow_stats_metricgroup that assume it has a value when there is a Default metric group. To avoid the segv initialize the value to "". Fixes: `1c0e47956a` ("perf metrics: Sort the Default metricgroup") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: James Clark <james.clark@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: stable@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231204182330.654255-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-12-05 10:15:10 -08:00
Veronika Molnarova	28b01743ca	perf test record user-regs: Fix mask for vg register The 'vg' register for arm64 shows up in --user_regs as available when masking the variable AT_HWCAP with 1 << 22 returns '1' as done in perf_regs.c. However, in subtests for support of SVE, the check for the 'vg' register is done by masking the variable AT_HWCAP with the value 0x200000 which is equals to 1 << 21 instead of 1 << 22. This results in inconsistencies on certain systems where the test expects that the 'vg' register is not operational when it is, and vice-versa. During the testing on a machine that the test expected not to have the 'vg' register available, 'perf record' with the option --user-regs showed records for the 'vg' register together with all of the others, which means that the mask for the subtest of perf_event_attr is off by one. Change the value of the mask from 0x200000 to 0x400000 to correct it. Fixes: `9440ebdc33` ("perf test arm64: Add attr tests for new VG register") Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20231201194617.13012-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-04 16:42:25 -03:00
Arnaldo Carvalho de Melo	4acef67646	perf env: Cache the arch specific strerrno function in perf_env__arch_strerrno() So that we don't have to go thru the series of strcmp(arch) calls for each id -> string translation. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20231201203046.486596-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-04 16:42:15 -03:00
Arnaldo Carvalho de Melo	54373b5d53	perf env: Introduce perf_env__arch_strerrno() That will cache the arch specific function translating error numbers to strings. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20231201203046.486596-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-04 16:42:09 -03:00
Arnaldo Carvalho de Melo	556bed5c6d	perf beauty: Don't use 'find ... -printf' as it isn't available in busybox Namhyung reported: I'm seeing a build error on my Alpine linux image which uses busybox + musl libc: In file included from trace/beauty/arch_errno_names.c:1, from builtin-trace.c:899: /build/trace/beauty/generated/arch_errno_name_array.c: In function 'arch_syscalls__strerrno': /build/trace/beauty/generated/arch_errno_name_array.c:142:49: error: unused parameter 'arch' [-Werror=unused-parameter] 142 \| const char arch_syscalls__strerrno(const char arch, int err) It looks like busybox find command doesn't have -printf option find: unrecognized: -printf , Yesterday 9:16 PM , BusyBox v1.36.1 (2023-07-27 17:12:24 UTC) multi-call binary. Usage: find [-HL] [PATH]... [OPTIONS] [ACTIONS] Search for files and perform actions on them. First failed action stops processing of current file. Defaults: PATH is current directory, action is '-print' So just remove it and pipe find's entry to a basename loop to produce the same result. Then use an alternative loop that relies on the shell to avoid needless forks and execs. The discussion about it generated the impetus to stop doing strcmps to find the right table at each errno to string translation but instead do this just once and then use a function pointer to the right arch specific table. Suggested-by: David Laight <David.Laight@ACULAB.COM> Reported-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-03 21:25:44 -03:00
Nick Forrington	072b6ad7ca	perf docs: Fix man page formatting for 'perf lock' This makes "CONTENTION" a top level section (rather than a subsection of "INFO"). Fixes: `79079f21f5` ("perf lock: Add -k and -F options to 'contention' subcommand") Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Nick Forrington <nick.forrington@arm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231102161117.49533-1-nick.forrington@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-12-03 21:25:44 -03:00
Likhitha Korrapati	72a2a0a494	perf test record+probe_libc_inet_pton: Fix call chain match on powerpc The perf test "probe libc's inet_pton & backtrace it with ping" fails on powerpc as below: # perf test -v "probe libc's inet_pton & backtrace it with ping" 85: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 96028 ping 96056 [002] 127271.101961: probe_libc:inet_pton: (7fffa1779a60) 7fffa1779a60 __GI___inet_pton+0x0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6) 7fffa172a73c getaddrinfo+0x121c (/usr/lib64/glibc-hwcaps/power10/libc.so.6) FAIL: expected backtrace entry "gaih_inet.\+0x[[:xdigit:]]+[[:space:]]$/usr/lib64/glibc-hwcaps/power10/libc.so.6$$" got "7fffa172a73c getaddrinfo+0x121c (/usr/lib64/glibc-hwcaps/power10/libc.so.6)" test child finished with -1 ---- end ---- probe libc's inet_pton & backtrace it with ping: FAILED! This test installs a probe on libc's inet_pton function, which will use uprobes and then uses perf trace on a ping to localhost. It gets 3 levels deep backtrace and checks whether it is what we expected or not. The test started failing from RHEL 9.4 where as it works in previous distro version (RHEL 9.2). Test expects gaih_inet function to be part of backtrace. But in the glibc version (2.34-86) which is part of distro where it fails, this function is missing and hence the test is failing. From nm and ping command output we can confirm that gaih_inet function is not present in the expected backtrace for glibc version glibc-2.34-86 [root@xxx perf]# nm /usr/lib64/glibc-hwcaps/power10/libc.so.6 \| grep gaih_inet 00000000001273e0 t gaih_inet_serv 00000000001cd8d8 r gaih_inet_typeproto [root@xxx perf]# perf script -i /tmp/perf.data.6E8 ping 104048 [000] 128582.508976: probe_libc:inet_pton: (7fff83779a60) 7fff83779a60 __GI___inet_pton+0x0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6) 7fff8372a73c getaddrinfo+0x121c (/usr/lib64/glibc-hwcaps/power10/libc.so.6) 11dc73534 [unknown] (/usr/bin/ping) 7fff8362a8c4 __libc_start_call_main+0x84 (/usr/lib64/glibc-hwcaps/power10/libc.so.6) FAIL: expected backtrace entry "gaih_inet.\+0x[[:xdigit:]]+[[:space:]]$/usr/lib64/glibc-hwcaps/power10/libc.so.6$$" got "7fff9d52a73c getaddrinfo+0x121c (/usr/lib64/glibc-hwcaps/power10/libc.so.6)" With version glibc-2.34-60 gaih_inet function is present as part of the expected backtrace. So we cannot just remove the gaih_inet function from the backtrace. [root@xxx perf]# nm /usr/lib64/glibc-hwcaps/power10/libc.so.6 \| grep gaih_inet 0000000000130490 t gaih_inet.constprop.0 000000000012e830 t gaih_inet_serv 00000000001d45e4 r gaih_inet_typeproto [root@xxx perf]# ./perf script -i /tmp/perf.data.b6S ping 67906 [000] 22699.591699: probe_libc:inet_pton_3: (7fffbdd80820) 7fffbdd80820 __GI___inet_pton+0x0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6) 7fffbdd31160 gaih_inet.constprop.0+0xcd0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6) 7fffbdd31c7c getaddrinfo+0x14c (/usr/lib64/glibc-hwcaps/power10/libc.so.6) 1140d3558 [unknown] (/usr/bin/ping) This patch solves this issue by doing a conditional skip. If there is a gaih_inet function present in the libc then it will be added to the expected backtrace else the function will be skipped from being added to the expected backtrace. Output with the patch [root@xxx perf]# ./perf test -v "probe libc's inet_pton & backtrace it with ping" 83: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 102662 ping 102692 [000] 127935.549973: probe_libc:inet_pton: (7fff93379a60) 7fff93379a60 __GI___inet_pton+0x0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6) 7fff9332a73c getaddrinfo+0x121c (/usr/lib64/glibc-hwcaps/power10/libc.so.6) 11ef03534 [unknown] (/usr/bin/ping) test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok Reported-by: Disha Goel <disgoel@linux.ibm.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Likhitha Korrapati <likhitha@linux.ibm.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20231126070914.175332-1-likhitha@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-29 17:59:36 -03:00
Arnaldo Carvalho de Melo	650e0bde43	perf tests sigtrap: Skip if running on a kernel with sleepable spinlocks There are issues as reported that need some more investigation on the RT kernel front, till that is addressed, skip this test. This test is already skipped for multiple hardware architectures where the tested kernel feature is not supported. Acked-by: Marco Elver <elver@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kate Carcia <kcarcia@redhat.com> Cc: Marco Elver <elver@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/e368f2c848d77fbc8d259f44e2055fe469c219cf.camel@gmx.de/ Link: https://lore.kernel.org/r/20231129154718.326330-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-29 17:49:24 -03:00
Arnaldo Carvalho de Melo	a472ee42e6	perf test sigtrap: Generalize the BTF routine to reuse it in this test Move the part that loads the BTF info to a "btf__available()" that will lazy load the BTF info so that if we need it for some other test, which we will in the following cset, we can reuse it. At some point this will move from this specific 'perf test' entry to be used in other parts of perf, do it when needed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kate Carcia <kcarcia@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231129154718.326330-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-29 17:49:17 -03:00
Ian Rogers	5940a20a18	perf mmap: Lazily initialize zstd streams to save memory when not using it Zstd streams create dictionaries that can require significant RAM, especially when there is one per-CPU. Tools like 'perf record' won't use the streams without the -z option, and so the creation of the streams is pure overhead. Switch to creating the streams on first use. Committer notes: ssize_t comes from sys/types.h, size_t from stddef.h. This worked on glibc as stdlib.h includes both, but not on musl libc. So do what 'man size_t' says and include sys/types.h and stddef.h instead of stdlib.h Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231102175735.2272696-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-28 14:25:06 -03:00
Namhyung Kim	d60469d7c0	perf dwarf-aux: Add die_find_variable_by_addr() The die_find_variable_by_addr() is to find a variables in the given DIE using given (PC-relative) address. Global variables will have a location expression with DW_OP_addr which has an address so can simply compare it with the address. <1><143a7>: Abbrev Number: 2 (DW_TAG_variable) <143a8> DW_AT_name : loops_per_jiffy <143ac> DW_AT_type : <0x1cca> <143b0> DW_AT_external : 1 <143b0> DW_AT_decl_file : 193 <143b1> DW_AT_decl_line : 213 <143b2> DW_AT_location : 9 byte block: 3 b0 46 41 82 ff ff ff ff (DW_OP_addr: ffffffff824146b0) Note that the type-offset should be calculated from the base address of the global variable. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-33-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-28 14:14:53 -03:00
Yang Jihong	72108c0b9c	perf tools: Add --debug-file option to redirect debug output Currently, debug messages is output to stderr, add --debug-file option to support redirection to a specified file. Some test scenarios: # perf --list-opts --help --version --exec-path --html-path --paginate --no-pager --debugfs-dir --buildid-dir --list-cmds --list-opts --debug --debug-file # perf --debug-file No path given for --debug-file. Usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] # perf --debug-file /sys/perf.log record -v true Open debug file '/sys/perf.log' failed: Permission denied Usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] # perf --debug-file /tmp/perf.log record -v true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.013 MB perf.data (26 samples) ] # cat /tmp/perf.log DEBUGINFOD_URLS= Using CPUID GenuineIntel-6-3E-4 nr_cblocks: 0 affinity: SYS mmap flush: 1 comp level: 0 mmap size 528384B Control descriptor is not initialized mmap size 528384B Looking at the vmlinux_path (8 entries long) Using /proc/kcore for kernel data Using /proc/kallsyms for symbols symbol:unmap_start file:(null) line:0 offset:0 return:0 lazy:(null) symbol:unmap_complete file:(null) line:0 offset:0 return:0 lazy:(null) symbol:map_start file:(null) line:0 offset:0 return:0 lazy:(null) symbol:map_complete file:(null) line:0 offset:0 return:0 lazy:(null) symbol:reloc_start file:(null) line:0 offset:0 return:0 lazy:(null) symbol:reloc_complete file:(null) line:0 offset:0 return:0 lazy:(null) symbol:init_start file:(null) line:0 offset:0 return:0 lazy:(null) symbol:init_complete file:(null) line:0 offset:0 return:0 lazy:(null) symbol:lll_lock_wait_private file:(null) line:0 offset:0 return:0 lazy:(null) symbol:lll_lock_wait file:(null) line:0 offset:0 return:0 lazy:(null) symbol:setjmp file:(null) line:0 offset:0 return:0 lazy:(null) symbol:longjmp file:(null) line:0 offset:0 return:0 lazy:(null) symbol:longjmp_target file:(null) line:0 offset:0 return:0 lazy:(null) failed to write feature HYBRID_TOPOLOGY Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231031105523.1472558-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-28 14:14:53 -03:00
Namhyung Kim	08973307d2	perf annotate: Check if operand has multiple regs It needs to check all possible information in an instruction. Let's add a field indicating if the operand has multiple registers. I'll be used to search type information like in an array access on x86 like: mov 0x10(%rax,%rbx,8), %rcx ------------- here Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231012035111.676789-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 16:05:19 -03:00
James Clark	ffa96259ca	perf test: Use existing config value for objdump path There is already an existing config value for changing the objdump path, so instead of having two values that do the same thing, make 'perf test' use annotate.objdump as well. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fangrui Song <maskray@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/ZU5Cx4LTrB5q0sIG@kernel.org Link: https://lore.kernel.org/r/20231113102327.695386-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:56:34 -03:00
Inochi Amaoto	7340c6df49	perf vendor events riscv: add T-HEAD C9xx JSON file Add JSON file of T-HEAD C9xx series events. The event idx (raw value) is summary as following: event id range \| support cpu 0x01 - 0x2a \| c906,c910,c920 The event ids are based on the public document of T-HEAD and cover the c900 series. These events are the max that c900 series support. Since T-HEAD let manufacturers decide whether events are usable, the final support of the perf events is determined by the pmu node of the soc dtb. Signed-off-by: Inochi Amaoto <inochiama@outlook.com> Tested-by: Guo Ren <guoren@kernel.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chen Wang <unicorn_wang@outlook.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jisheng Zhang <jszhang@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Fu <wefu@redhat.com> Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/IA1PR20MB495325FCF603BAA841E29281BBBAA@IA1PR20MB4953.namprd20.prod.outlook.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:53:33 -03:00
Ian Rogers	19dd49c933	perf vendor events: Add skx, clx, icx and spr upi bandwidth metric Add upi_data_receive_bw metric for skylakex, cascadelakex, icelakex and sapphirerapids. The metric was added to perfmon metrics in: https://github.com/intel/perfmon/pull/119 Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231109232732.2973015-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:43:09 -03:00
Adrian Hunter	124bf6360a	perf tests: Skip data symbol test if buf1 symbol is missing perf data symbol test depends on finding symbol buf1 in perf, and fails if perf has been stripped and no debug object is available. In that case, skip the test instead. Example: Before: $ strip tools/perf/perf $ tools/perf/perf buildid-cache -p `realpath tools/perf/perf` $ tools/perf/perf test -v 'data symbol' 113: Test data symbol : --- start --- test child forked, pid 125646 Recording workload... [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.577 MB /tmp/__perf_test.perf.data.Jhbdp (7794 samples) ] Cleaning up files... test child finished with -1 ---- end ---- Test data symbol: FAILED! After: $ tools/perf/perf test -v 'data symbol' 113: Test data symbol : --- start --- test child forked, pid 125747 perf does not have symbol 'buf1' perf is missing symbols - skipping test test child finished with -2 ---- end ---- Test data symbol: Skip Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123075848.9652-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:40:28 -03:00
Adrian Hunter	3b24b15cf6	perf tests: Make data symbol test wait for perf to start The perf data symbol test waits 1 second for perf to run and collect data, which may be too little if perf takes a long time to start up, which has been noticed on systems with many CPUs. Use existing wait_for_perf_to_start helper to wait for perf to start. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123075848.9652-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:40:25 -03:00
Adrian Hunter	fcfb5a6189	perf tests: Skip branch stack sampling test if brstack_bench symbol is missing The test "Check branch stack sampling" depends on finding symbol brstack_bench (and several others) in perf, and fails if perf has been stripped and no debug object is available. In that case, skip the test instead. Example: Before: $ strip tools/perf/perf $ tools/perf/perf buildid-cache -p `realpath tools/perf/perf` $ tools/perf/perf test -v 'branch stack sampling' 112: Check branch stack sampling : --- start --- test child forked, pid 123741 Testing user branch stack sampling + grep -E -m1 ^brstack_bench\+[^ ]/brstack_foo\+[^ ]/IND_CALL/.*$ /tmp/__perf_test.program.5Dz1U/perf.script + cleanup + rm -rf /tmp/__perf_test.program.5Dz1U test child finished with -1 ---- end ---- Check branch stack sampling: FAILED! After: $ tools/perf/perf test -v 'branch stack sampling' 112: Check branch stack sampling : --- start --- test child forked, pid 125157 perf does not have symbol 'brstack_bench' perf is missing symbols - skipping test test child finished with -2 ---- end ---- Check branch stack sampling: Skip Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123075848.9652-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:40:22 -03:00
Adrian Hunter	fc1de29a8b	perf tests: Skip Arm64 callgraphs test if leafloop symbol is missing The test "Check Arm64 callgraphs are complete in fp mode" depends on finding symbol leafloop in perf, and fails if perf has been stripped and no debug object is available. In that case, skip the test instead. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123075848.9652-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:40:20 -03:00
Adrian Hunter	3c489dbe69	perf tests: Skip record test if test_loop symbol is missing perf record test depends on finding symbol test_loop in perf, and fails if perf has been stripped and no debug object is available. In that case, skip the test instead. Example: Note, building with perl support adds option -Wl,-E which causes the linker to add all (global) symbols to the dynamic symbol table. So the test_loop symbol, being global, does not get stripped unless NO_LIBPERL=1 Before: $ make NO_LIBPERL=1 -C tools/perf >/dev/null 2>&1 $ strip tools/perf/perf $ tools/perf/perf buildid-cache -p `realpath tools/perf/perf` $ tools/perf/perf test -v 'record tests' 91: perf record tests : --- start --- test child forked, pid 118750 Basic --per-thread mode test Per-thread record [Failed missing output] Register capture test Register capture test [Success] Basic --system-wide mode test System-wide record [Skipped not supported] Basic target workload test Workload record [Failed missing output] test child finished with -1 ---- end ---- perf record tests: FAILED! After: $ tools/perf/perf test -v 'record tests' 91: perf record tests : --- start --- test child forked, pid 120025 perf does not have symbol 'test_loop' perf is missing symbols - skipping test test child finished with -2 ---- end ---- perf record tests: Skip Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123075848.9652-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:40:16 -03:00
Adrian Hunter	c9526a7350	perf tests: Skip pipe test if noploop symbol is missing perf pipe recording and injection test depends on finding symbol noploop in perf, and fails if perf has been stripped and no debug object is available. In that case, skip the test instead. Example: Before: $ strip tools/perf/perf $ tools/perf/perf buildid-cache -p `realpath tools/perf/perf` $ tools/perf/perf test -v pipe 86: perf pipe recording and injection test : --- start --- test child forked, pid 47734 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] 47741 47741 -1 \|perf [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] cannot find noploop function in pipe #1 test child finished with -1 ---- end ---- perf pipe recording and injection test: FAILED! After: $ tools/perf/perf test -v pipe 86: perf pipe recording and injection test : --- start --- test child forked, pid 48996 perf does not have symbol 'noploop' perf is missing symbols - skipping test test child finished with -2 ---- end ---- perf pipe recording and injection test: Skip Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123075848.9652-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:40:00 -03:00
Adrian Hunter	96ba5999e8	perf tests lib: Add perf_has_symbol.sh Some shell tests depend on finding symbols for perf itself, and fail if perf has been stripped and no debug object is available. Add helper functions to check if perf has a needed symbol. This is preparation for amending the tests themselves to be skipped if a needed symbol is not found. The functions make use of the "Symbols" test which reads and checks symbols from a dso, perf itself by default. Note the "Symbols" test will find symbols using the same method as other perf tests, including, for example, looking in the buildid cache. An alternative would be to prevent the needed symbols from being stripped, which seems to work with gcc's externally_visible attribute, but that attribute is not supported by clang. Another alternative would be to use option -Wl,-E (which is already used when perf is built with perl support) which causes the linker to add all (global) symbols to the dynamic symbol table. Then the required symbols need only be made global in scope to avoid being strippable. However that goes beyond what is needed. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123075848.9652-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:39:55 -03:00
Adrian Hunter	70df07838f	perf header: Fix segfault on build_mem_topology() error path Do not increase the node count unless a node has been successfully read, because it can lead to a segfault if an error occurs. For example, if perf exceeds the open file limit in memory_node__read(), which, on a test system, could be made to happen by setting the file limit to exactly 32: Before: $ ulimit -n 32 $ perf mem record --all-user -- sleep 1 [ perf record: Woken up 1 times to write data ] failed: can't open memory sysfs data perf: Segmentation fault Obtained 14 stack frames. perf(sighandler_dump_stack+0x48) [0x55f4b1f59558] /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f4ba1c42520] /lib/x86_64-linux-gnu/libc.so.6(free+0x1e) [0x7f4ba1ca53fe] perf(+0x178ff4) [0x55f4b1f48ff4] perf(+0x179a70) [0x55f4b1f49a70] perf(+0x17ef5d) [0x55f4b1f4ef5d] perf(+0x85c0b) [0x55f4b1e55c0b] perf(cmd_record+0xe1d) [0x55f4b1e5920d] perf(cmd_mem+0xc96) [0x55f4b1e80e56] perf(+0x130460) [0x55f4b1f00460] perf(main+0x689) [0x55f4b1e427d9] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f4ba1c29d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f4ba1c29e40] perf(_start+0x25) [0x55f4b1e42a25] Segmentation fault (core dumped) $ After: $ ulimit -n 32 $ perf mem record --all-user -- sleep 1 [ perf record: Woken up 1 times to write data ] failed: can't open memory sysfs data [ perf record: Captured and wrote 0.005 MB perf.data (11 samples) ] $ Fixes: `f8e502b9d1` ("perf header: Ensure bitmaps are freed") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231123075848.9652-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:38:45 -03:00
Thomas Richter	8aa1e6e29a	perf report: Remove warning on missing raw data for s390 Command # ./perf report -i /tmp/111 -D > /dev/null emits an error message when a sample for event CRYPTO_ALL in the perf.data file does not contain any raw data. This is ok. Do not trigger this warning when the sample in the perf.data files does not contain any raw data at all. Check for availability of raw data for all events and return if none is available. Output before: # ./perf report -i /tmp/111 -D > /dev/null Invalid CRYPTO_ALL raw data encountered Invalid CRYPTO_ALL raw data encountered Invalid CRYPTO_ALL raw data encountered # Output after: # ./perf report -i /tmp/111 -D > /dev/null # Fixes: `b539deafba` ("perf report: Add s390 raw data interpretation for PAI counters") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20231122092703.3163191-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 15:38:37 -03:00
Athira Rajeev	1638b11ef8	perf tools: Add perf binary dependent rule for shellcheck log in Makefile.perf Add rule in new Makefile "tests/Makefile.tests" for running shellcheck on shell test scripts. This automates below shellcheck into the build. $ for F in $(find tests/shell/ -perm -o=x -name '.sh'); do shellcheck -S warning $F; done Condition for shellcheck is added in Makefile.perf to avoid build breakage in the absence of shellcheck binary. Update Makefile.perf to contain new rule for "SHELLCHECK_TEST" which is for making shellcheck test as a dependency on perf binary. Added "tests/Makefile.tests" to run shellcheck on shellscripts in tests/shell. The make rule "SHLLCHECK_RUN" ensures that, every time during make, shellcheck will be run only on modified files during subsequent invocations. By this, if any newly added shell scripts or fixes in existing scripts breaks coding/formatting style, it will get captured during the perf build. Example build failure by modifying probe_vfs_getname.sh in tests/shell: In tests/shell/probe_vfs_getname.sh line 8: . $(dirname $0)/lib/probe.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. For more information: https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt... make[3]: [/root/athira/perf-tools-next/tools/perf/tests/Makefile.tests:18: tests/shell/.probe_vfs_getname.sh.shellcheck_log] Error 1 make[2]: * [Makefile.perf:686: SHELLCHECK_TEST] Error 2 make[2]: * Waiting for unfinished jobs.... make[1]: * [Makefile.perf:244: sub-make] Error 2 make: * [Makefile:70: all] Error 2 Here, like other files which gets created during compilation (ex: .builtin-bench.o.cmd or .perf.o.cmd ), create .shellcheck_log also as a hidden file. Example: tests/shell/.probe_vfs_getname.sh.shellcheck_log shellcheck is re-run if any of the script gets modified based on its dependency of this log file. After this, for testing, changed "tests/shell/trace+probe_vfs_getname.sh" to break shellcheck format. In the next make run, it is also captured: In tests/shell/probe_vfs_getname.sh line 8: . $(dirname $0)/lib/probe.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. For more information: https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt... make[3]: * [/root/athira/perf-tools-next/tools/perf/tests/Makefile.tests:18: tests/shell/.probe_vfs_getname.sh.shellcheck_log] Error 1 make[3]: * Waiting for unfinished jobs.... In tests/shell/trace+probe_vfs_getname.sh line 14: . $(dirname $0)/lib/probe.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. For more information: https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt... make[3]: * [/root/athira/perf-tools-next/tools/perf/tests/Makefile.tests:18: tests/shell/.trace+probe_vfs_getname.sh.shellcheck_log] Error 1 make[2]: * [Makefile.perf:686: SHELLCHECK_TEST] Error 2 make[2]: * Waiting for unfinished jobs.... make[1]: * [Makefile.perf:244: sub-make] Error 2 make: * [Makefile:70: all] Error 2 Failure log can be found in the stdout of make itself. This is reported at build time. To be able to go ahead with the build or disable shellcheck even though it is known that some test is broken, add a "NO_SHELLCHECK" option. Example: make NO_SHELLCHECK=1 INSTALL libsubcmd_headers INSTALL libsymbol_headers INSTALL libapi_headers INSTALL libperf_headers INSTALL libbpf_headers LINK perf Note: This is tested on RHEL and also SLES. Use below check: "$(shell which shellcheck 2> /dev/null)" to look for presence of shellcheck binary. The approach "shell command -v" is not used here. In some of the distros(RHEL), command is available as executable file (/usr/bin/command). But in some distros(SLES), it is a shell builtin and not available as executable file. Committer testing: $ type shellcheck shellcheck is hashed (/usr/bin/shellcheck) $ rpm -qf /usr/bin/shellcheck ShellCheck-0.9.0-2.fc38.x86_64 $ $ alias m $ git diff diff --git a/tools/perf/tests/shell/probe_vfs_getname.sh b/tools/perf/tests/shell/probe_vfs_getname.sh index 554e12e83c55fd56..dbc14634678e2bf6 100755 --- a/tools/perf/tests/shell/probe_vfs_getname.sh +++ b/tools/perf/tests/shell/probe_vfs_getname.sh @@ -5,7 +5,7 @@ # Arnaldo Carvalho de Melo <acme@kernel.org>, 2017 # shellcheck source=lib/probe.sh -. "$(dirname $0)"/lib/probe.sh +. $(dirname $0)/lib/probe.sh skip_if_no_perf_probe \|\| exit 2 alias m='rm -rf ~/libexec/perf-core/ ; make -k CORESIGHT=1 O=/tmp/build/$(basename $PWD) -C tools/perf install-bin && perf test python' $ m make: Entering directory '/home/acme/git/perf-tools-next/tools/perf' BUILD: Doing 'make -j32' parallel build <SNIP> INSTALL libbpf_headers In tests/shell/probe_vfs_getname.sh line 8: . $(dirname $0)/lib/probe.sh ^-----------^ SC2046 (warning): Quote this to prevent word splitting. For more information: https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt... make[3]: * [/home/acme/git/perf-tools-next/tools/perf/tests/Makefile.tests:18: tests/shell/.probe_vfs_getname.sh.shellcheck_log] Error 1 make[2]: * [Makefile.perf:686: SHELLCHECK_TEST] Error 2 make[2]: * Waiting for unfinished jobs.... make[1]: * [Makefile.perf:244: sub-make] Error 2 make: *** [Makefile:113: install-bin] Error 2 make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf' $ Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20231123160232.94253-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 11:48:33 -03:00
Ji Sheng Teoh	5ebe2f4bf0	perf vendor events riscv: Add StarFive Dubhe-90 JSON file Similar to StarFive's Dubhe-80, Dubhe-90 supports raw event id 0x00 - 0x22. Reuse Dubhe-80 firmware and common json file. The raw events are enabled through PMU node of DT binding. Besides raw event, add standard RISC-V firmware events to support monitoring of firmware event. Example of PMU DT node: pmu { compatible = "riscv,pmu"; riscv,raw-event-to-mhpmcounters = /* Event ID 1-31 / <0x00 0x00 0xFFFFFFFF 0xFFFFFFE0 0x00007FF8>, / Event ID 32-33 / <0x00 0x20 0xFFFFFFFF 0xFFFFFFFE 0x00007FF8>, / Event ID 34 */ <0x00 0x22 0xFFFFFFFF 0xFFFFFF22 0x00007FF8>; }; 'perf stat' output: [root@user]# perf stat -a \ -e access_mmu_stlb \ -e miss_mmu_stlb \ -e access_mmu_pte_c \ -e rob_flush \ -e btb_prediction_miss \ -e itlb_miss \ -e sync_del_fetch_g \ -e icache_miss \ -e bpu_br_retire \ -e bpu_br_miss \ -e ret_ins_retire \ -e ret_ins_miss \ -- openssl speed rsa2048 Doing 2048 bits private rsa's for 10s: 39 2048 bits private RSA's in 10.03s Doing 2048 bits public rsa's for 10s: 1469 2048 bits public RSA's in 9.47s version: 3.0.10 built on: Tue Aug 1 13:47:24 2023 UTC options: bn(64,64) CPUINFO: N/A sign verify sign/s verify/s rsa 2048 bits 0.257179s 0.006447s 3.9 155.1 Performance counter stats for 'system wide': `3112882` access_mmu_stlb 10550 miss_mmu_stlb 18251 access_mmu_pte_c 274765 rob_flush 22470560 btb_prediction_miss 3035839 itlb_miss 643549060 sync_del_fetch_g 133013 icache_miss 62982796 bpu_br_retire 287548 bpu_br_miss 8935910 ret_ins_retire 8308 ret_ins_miss 20.656182600 seconds time elapsed Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Signed-off-by: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nikita Shubin <n.shubin@yadro.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20231122030908.2981502-1-jisheng.teoh@starfivetech.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 11:38:32 -03:00
zhujun2	581ff5b66c	perf tests coresight: Remove unused variables These variables are never referenced in the code, just remove them. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: zhujun2 <zhujun2@cmss.chinamobile.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231115064255.11057-1-zhujun2@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 11:35:43 -03:00
zhaimingbing	4a18ab4678	perf lock: Fix a memory leak on an error path if a strdup-ed string is NULL,the allocated memory needs freeing. Signed-off-by: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20231124092657.10392-1-zhaimingbing@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 10:21:27 -03:00
Ian Rogers	a24d9d9dc0	perf parse-events: Make legacy events lower priority than sysfs/JSON The perf tool has previously made legacy events the priority so with or without a PMU the legacy event would be opened: $ perf stat -e cpu-cycles,cpu/cpu-cycles/ true Using CPUID GenuineIntel-6-8D-1 intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch Attempting to add event pmu 'cpu' with 'cpu-cycles,' that may result in non-fatal errors After aliases, add event pmu 'cpu' with 'cpu-cycles,' that may result in non-fatal errors Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 833967 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ ... Fixes to make hybrid/BIG.little PMUs behave correctly, ie as core PMUs capable of opening legacy events on each, removing hard coded "cpu_core" and "cpu_atom" Intel PMU names, etc. caused a behavioral difference on Apple/ARM due to latent issues in the PMU driver reported in: https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/ As part of that report Mark Rutland <mark.rutland@arm.com> requested that legacy events not be higher in priority when a PMU is specified reversing what has until this change been perf's default behavior. With this change the above becomes: $ perf stat -e cpu-cycles,cpu/cpu-cycles/ true Using CPUID GenuineIntel-6-8D-1 Attempt to add: cpu/cpu-cycles=0/ ..after resolving event: cpu/event=0x3c/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 827628 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 4 (PERF_TYPE_RAW) size 136 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED\|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ ... So the second event has become a raw event as /sys/devices/cpu/events/cpu-cycles exists. A fix was necessary to config_term_pmu in parse-events.c as check_alias expansion needs to happen after config_term_pmu, and config_term_pmu may need calling a second time because of this. config_term_pmu is updated to not use the legacy event when the PMU has such a named event (either from JSON or sysfs). The bulk of this change is updating all of the parse-events test expectations so that if a sysfs/JSON event exists for a PMU the test doesn't fail - a further sign, if it were needed, that the legacy event priority was a known and tested behavior of the perf tool. Reported-by: Hector Martin <marcan@marcan.st> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Hector Martin <marcan@marcan.st> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com [ Initialize the 'alias_rewrote_terms' variable to false to address a clang warning ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 10:21:27 -03:00
Leo Yan	a4271827e6	perf cs-etm: Enable itrace option 'T' Prior to Armv8.4, the feature FEAT_TRF is not supported by Arm CPUs. Consequently, the sysfs node 'ts_source' will not be set as 1 by the CoreSight ETM driver. On the other hand, the perf tool relies on the 'ts_source' node to determine whether the kernel timestamp is traced. Since the 'ts_source' is not set for Arm CPUs prior to Armv8.4, platforms in this case cannot utilize the traced timestamp as the kernel time. This patch enables the 'T' itrace option, which forcibly utilizes the traced timestamp as the kernel time. If users are aware that their working platform's Arm CoreSight shares the same counter with the kernel time, they can specify 'T' option to decode the traced timestamp as the kernel time. An usage example is: # perf record -e cs_etm// -- test_program # perf script --itrace=i10ibT # perf report --itrace=i10ibT Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231014074513.1668000-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 10:21:27 -03:00
Leo Yan	26218331f4	perf auxtrace: Add 'T' itrace option for timestamp trace An AUX trace can contain timestamp, but in some situations, the hardware trace module (e.g. Arm CoreSight) cannot decide the traced timestamp is the same source with CPU's time, thus the decoder can not use the timestamp trace for samples. This patch introduces 'T' itrace option. If users know the platforms they are working on have the same time counter with CPUs, users can use this new option to tell a decoder for using timestamp trace as kernel time. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231014074513.1668000-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 10:21:27 -03:00
Kan Liang	697579629f	perf test: Basic branch counter support Add a basic test for the branch counter feature. The test verifies that - The new filter can be successfully applied on the supported platforms. - The counter value can be outputted via the perf report -D Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tinghao Zhang <tinghao.zhang@intel.com> Link: https://lore.kernel.org/r/20231107184020.1497571-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 10:21:27 -03:00
zhaimingbing	cd38d6b5fa	perf script perl: Fail check on dynamic allocation Return ENOMEM when dynamic allocation failed. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20231120112356.8652-1-zhaimingbing@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 10:21:27 -03:00
Paran Lee	b457c52607	perf script python: Fail check on dynamic allocation Add PyList_New() Fail check in get_field_numeric_entry() function and dynamic allocation checking for set_regs_in_dict(), python_start_script(). Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: MichelleJin <shjy180909@gmail.com> Signed-off-by: Paran Lee <p4ranlee@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Austin Kim <austindh.kim@gmail.com> Cc: Honggyu Kim <honggyu.kp@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20231120223218.9036-1-p4ranlee@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 10:21:27 -03:00
Nick Forrington	72b4ca7e99	perf test: Remove atomics from test_loop to avoid test failures The current use of atomics can lead to test failures, as tests (such as tests/shell/record.sh) search for samples with "test_loop" as the top-most stack frame, but find frames related to the atomic operation (e.g. __aarch64_ldadd4_relax). This change simply removes the "count" variable, as it is not necessary. Fixes: `1962ab6f6e` ("perf test workload thloop: Make count increments atomic") Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Nick Forrington <nick.forrington@arm.com> Acked-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231102162225.50028-1-nick.forrington@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-27 10:21:06 -03:00
Benjamin Gray	280b4e4a9e	perf tools: Address python 3.6 DeprecationWarning for string scapes Python 3.6 introduced a DeprecationWarning for invalid escape sequences. This is upgraded to a SyntaxWarning in Python 3.12, and will eventually be a syntax error. Fix these now to get ahead of it before it's an error. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mykola Lysenko <mykolal@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Todd E Brandt <todd.e.brandt@linux.intel.com> Cc: Tom Rix <trix@redhat.com> Cc: linux-doc@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230912060801.95533-6-bgray@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-23 10:56:09 -03:00
Oliver Upton	a29ee6aea7	perf build: Ensure sysreg-defs Makefile respects output dir Currently the sysreg-defs are written out to the source tree unconditionally, ignoring the specified output directory. Correct the build rule to emit the header to the output directory. Opportunistically reorganize the rules to avoid interleaving with the set of beauty make rules. Reported-by: Ian Rogers <irogers@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231121192956.919380-3-oliver.upton@linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-11-22 11:17:53 -08:00
Oliver Upton	ef5c958090	tools perf: Add arm64 sysreg files to MANIFEST Ian pointed out that source tarballs are incomplete as of commit `e2bdd172e6` ("perf build: Generate arm64's sysreg-defs.h and add to include path"), since the source files needed from the kernel tree do not appear in the manifest. Add them. Reported-by: Ian Rogers <irogers@google.com> Fixes: `e2bdd172e6` ("perf build: Generate arm64's sysreg-defs.h and add to include path") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231121192956.919380-2-oliver.upton@linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-11-22 11:17:53 -08:00
Namhyung Kim	027905fe5b	tools/perf: Update tools's copy of mips syscall table tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/.sh \| head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: linux-mips@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231121225650.390246-14-namhyung@kernel.org	2023-11-22 10:57:47 -08:00
Namhyung Kim	d3968c974a	tools/perf: Update tools's copy of s390 syscall table tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/.sh \| head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: linux-s390@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231121225650.390246-13-namhyung@kernel.org	2023-11-22 10:57:47 -08:00
Namhyung Kim	3483d24405	tools/perf: Update tools's copy of powerpc syscall table tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/.sh \| head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231121225650.390246-12-namhyung@kernel.org	2023-11-22 10:57:47 -08:00
Namhyung Kim	b3b11aed14	tools/perf: Update tools's copy of x86 syscall table tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/.sh \| head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231121225650.390246-11-namhyung@kernel.org	2023-11-22 10:57:47 -08:00
Namhyung Kim	fd2ddee727	tools headers: Update tools's copy of socket.h header tldr; Just FYI, I'm carrying this on the perf tools tree. Full explanation: There used to be no copies, with tools/ code using kernel headers directly. From time to time tools/perf/ broke due to legitimate kernel hacking. At some point Linus complained about such direct usage. Then we adopted the current model. The way these headers are used in perf are not restricted to just including them to compile something. There are sometimes used in scripts that convert defines into string tables, etc, so some change may break one of these scripts, or new MSRs may use some different #define pattern, etc. E.g.: $ ls -1 tools/perf/trace/beauty/.sh \| head -5 tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/fsmount.sh $ $ tools/perf/trace/beauty/fadvise.sh static const char fadvise_advices[] = { [0] = "NORMAL", [1] = "RANDOM", [2] = "SEQUENTIAL", [3] = "WILLNEED", [4] = "DONTNEED", [5] = "NOREUSE", }; $ The tools/perf/check-headers.sh script, part of the tools/ build process, points out changes in the original files. So its important not to touch the copies in tools/ when doing changes in the original kernel headers, that will be done later, when check-headers.sh inform about the change to the perf tools hackers. Cc: netdev@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231121225650.390246-7-namhyung@kernel.org	2023-11-22 10:57:47 -08:00
Yang Jihong	29b8e94dcf	perf lock contention: Fix a build error on 32-bit Fix a build error on 32-bit system: util/bpf_lock_contention.c: In function 'lock_contention_get_name': util/bpf_lock_contention.c:253:50: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u64 {aka long long unsigned int}' [-Werror=format=] snprintf(name_buf, sizeof(name_buf), "cgroup:%lu", cgrp_id); ~~^ %llu cc1: all warnings being treated as errors Fixes: `d0c502e46e` ("perf lock contention: Prepare to handle cgroups") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: avagin@google.com Cc: daniel.diaz@linaro.org Link: https://lore.kernel.org/r/20231118024858.1567039-3-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-11-21 10:02:38 -08:00
Yang Jihong	a6dda77a75	perf kwork: Fix a build error on 32-bit lkft reported a build error for 32-bit system: builtin-kwork.c: In function 'top_print_work': builtin-kwork.c:1646:28: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=] 1646 \| ret += printf(" %ld ", PRINT_PID_WIDTH, work->id); \| ~~~^ ~~~~~~~~ \| \| \| \| long int u64 {aka long long unsigned int} \| %lld cc1: all warnings being treated as errors make[3]: *** [/builds/linux/tools/build/Makefile.build:106: /home/tuxbuild/.cache/tuxmake/builds/1/build/builtin-kwork.o] Error 1 Fix it. Fixes: `55c40e5052` ("perf kwork top: Introduce new top utility") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: avagin@google.com Cc: daniel.diaz@linaro.org Link: https://lore.kernel.org/r/20231118024858.1567039-2-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-11-21 10:02:38 -08:00
Ji Sheng Teoh	acbf6de674	perf vendor events riscv: Add StarFive Dubhe-80 JSON file StarFive's Dubhe-80 supports raw event id 0x00 - 0x22. The raw events are enabled through PMU node of DT binding. Besides raw event, add standard RISC-V firmware events to support monitoring of firmware event. Example of PMU DT node: pmu { compatible = "riscv,pmu"; riscv,raw-event-to-mhpmcounters = /* Event ID 1-31 / <0x00 0x00 0xFFFFFFFF 0xFFFFFFE0 0x00007FF8>, / Event ID 32-33 / <0x00 0x20 0xFFFFFFFF 0xFFFFFFFE 0x00007FF8>, / Event ID 34 */ <0x00 0x22 0xFFFFFFFF 0xFFFFFF22 0x00007FF8>; }; Example of 'perf stat' output: [root@user]# perf stat -a \ -e access_mmu_stlb \ -e miss_mmu_stlb \ -e access_mmu_pte_c \ -e rob_flush \ -e btb_prediction_miss \ -e itlb_miss \ -e sync_del_fetch_g \ -e icache_miss \ -e bpu_br_retire \ -e bpu_br_miss \ -e ret_ins_retire \ -e ret_ins_miss \ -- openssl speed rsa2048 Doing 2048 bits private rsa's for 10s: 39 2048 bits private RSA's in 10.14s Doing 2048 bits public rsa's for 10s: 1563 2048 bits public RSA's in 10.00s version: 3.0.11 built on: Tue Sep 19 13:02:31 2023 UTC options: bn(64,64) CPUINFO: N/A sign verify sign/s verify/s rsa 2048 bits 0.260000s 0.006398s 3.8 156.3 Performance counter stats for 'system wide': 1338350 access_mmu_stlb 1154025 miss_mmu_stlb 1162691 access_mmu_pte_c 34067 rob_flush 11212384 btb_prediction_miss 1256242 itlb_miss 652523491 sync_del_fetch_g 384465 icache_miss 64635789 bpu_br_retire 323440 bpu_br_miss 8785143 ret_ins_retire 31236 ret_ins_miss 20.760822480 seconds time elapsed Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ley Foon Tan <leyfoon.tan@starfivetech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nikita Shubin <n.shubin@yadro.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20231103082441.1389842-1-jisheng.teoh@starfivetech.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-15 12:53:07 -05:00
Thomas Richter	b539deafba	perf report: Add s390 raw data interpretation for PAI counters Commit `39d62336f5` ("s390/pai: add support for cryptography counters") added support for Processor Activity Instrumentation Facility (PAI) counters. These counters values are added as raw data with the perf sample during 'perf record'. Now add support to display these counters in the 'perf report' command. The counter number, its assigned name and value is now printed in addition to the hexadecimal output. Output before: # perf report -D 6 514766399626050 0x7b058 [0x48]: PERF_RECORD_SAMPLE(IP, 0x1): 303977/303977: 0 period: 1 addr: 0 ... thread: paitest:303977 ...... dso: <not found> 0x7b0a0@/root/perf.data.paicrypto [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 00 00 00 09 00 01 00 48 00 00 00 00 00 00 00 00 .......H........ . 0010: 00 04 a3 69 00 04 a3 69 00 01 d4 2d 76 de a0 bb ...i...i...-v... . 0020: 00 00 00 00 00 01 5c 53 00 00 00 06 00 00 00 00 ......\S........ . 0030: 00 00 00 00 00 00 00 01 00 00 00 0c 00 07 00 00 ................ . 0040: 00 00 00 53 96 af 00 00 ...S.... Output after: # perf report -D 6 514766399626050 0x7b058 [0x48]: PERF_RECORD_SAMPLE(IP, 0x1): 303977/303977: 0 period: 1 addr: 0 ... thread: paitest:303977 ...... dso: <not found> 0x7b0a0@/root/perf.data.paicrypto [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 00 00 00 09 00 01 00 48 00 00 00 00 00 00 00 00 .......H........ . 0010: 00 04 a3 69 00 04 a3 69 00 01 d4 2d 76 de a0 bb ...i...i...-v... . 0020: 00 00 00 00 00 01 5c 53 00 00 00 06 00 00 00 00 ......\S........ . 0030: 00 00 00 00 00 00 00 01 00 00 00 0c 00 07 00 00 ................ . 0040: 00 00 00 53 96 af 00 00 ...S.... Counter:007 km_aes_128 Value:0x00000000005396af <--- new Committer notes: Had to add ignore pragmas for that __packed function: +#pragma GCC diagnostic ignored "-Wpacked" +#pragma GCC diagnostic ignored "-Wattributes" Otherwise this doesn't build in things like debian experimentao cross building to mips64, etc. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20231110110908.2312308-1-tmricht@linux.ibm.com [ Corrected non-existent commit referred to the right one: `39d62336f5` ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-15 12:51:53 -05:00
Casey Schaufler	5f42375904	LSM: wireup Linux Security Module syscalls Wireup lsm_get_self_attr, lsm_set_self_attr and lsm_list_modules system calls. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-api@vger.kernel.org Reviewed-by: Mickaël Salaün <mic@digikod.net> [PM: forward ported beyond v6.6 due merge window changes] Signed-off-by: Paul Moore <paul@paul-moore.com>	2023-11-12 22:54:42 -05:00
Namhyung Kim	c06547d020	perf probe: Convert to check dwarf_getcfi feature Now it has a feature check for the dwarf_getcfi(), use it and convert the code to check HAVE_DWARF_CFI_SUPPORT definition. Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 09:04:21 -03:00
Namhyung Kim	3f5928e461	perf dwarf-aux: Add die_find_variable_by_reg() helper The die_find_variable_by_reg() will search for a variable or a parameter sub-DIE in the given scope DIE where the location matches to the given register. For the simplest and most common case, memory access usually happens with a base register and an offset to the field so the register holds a pointer in a variable or function parameter. Then we can find one if it has a location expression at the (instruction) address. This function only handles such a simple case for now. In this case, the expression has a DW_OP_regN operation where N < 32. If the register index (N) is greater than or equal to 32, DW_OP_regx operation with an operand which saves the value for the N would be used. It rejects expressions with more operations. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 09:04:12 -03:00
Namhyung Kim	981620fd27	perf dwarf-aux: Add die_get_scopes() alternative to dwarf_getscopes() The die_get_scopes() returns the number of enclosing DIEs for the given address and it fills an array of DIEs like dwarf_getscopes(). But it doesn't follow the abstract origin of inlined functions as we want information of the concrete instance. This is needed to check the location of parameters and local variables properly. Users can check the origin separately if needed. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 09:04:08 -03:00
Namhyung Kim	3796eba7c1	perf dwarf-aux: Move #else block of #ifdef HAVE_DWARF_GETLOCATIONS_SUPPORT code to the header file It's a usual convention that the conditional code is handled in a header file. As I'm planning to add some more of them, let's move the current code to the header first. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 09:04:03 -03:00
Namhyung Kim	a65e8c0b78	perf dwarf-aux: Fix die_get_typename() for void * The die_get_typename() is to return a C-like type name from DWARF debug entry and it follows data type if the target entry is a pointer type. But I found that void pointers don't have the type attribute to follow and then the function returns an error for that case. This results in a broken type string for void pointer types. For example, the following type entries are pointer types. <1><48c>: Abbrev Number: 4 (DW_TAG_pointer_type) <48d> DW_AT_byte_size : 8 <48d> DW_AT_type : <0x481> <1><491>: Abbrev Number: 211 (DW_TAG_pointer_type) <493> DW_AT_byte_size : 8 <1><494>: Abbrev Number: 4 (DW_TAG_pointer_type) <495> DW_AT_byte_size : 8 <495> DW_AT_type : <0x49e> The first one at offset 48c and the third one at offset 494 have type information. Then they are pointer types for the referenced types. But the second one at offset 491 doesn't have the type attribute. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 09:03:58 -03:00
Namhyung Kim	6f1b6291cf	perf tools: Add util/debuginfo.[ch] files Split debuginfo data structure and related functions into a separate file so that it can be used by other components than the probe-finder. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 09:03:54 -03:00
Namhyung Kim	fb7fd2a14a	perf annotate: Move raw_comment and raw_func_start fields out of 'struct ins_operands' Thoese two fields are used only for the jump_ops, so move them into the union to save some bytes. Also add jump__delete() callback not to free the fields as they didn't allocate new strings. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: WANG Rui <wangrui@loongson.cn> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 09:03:48 -03:00
Namhyung Kim	ded8c48497	perf annotate: Pass "-l" option to objdump conditionally The "-l" option is to print line numbers in the objdump output. perf annotate TUI only can show the line numbers later but it causes big slow downs for the kernel binary. Similarly, showing source code also takes a long time and it already has an option to control it. $ time objdump ... -d -S -C vmlinux > /dev/null real 0m3.474s user 0m3.047s sys 0m0.428s $ time objdump ... -d -l -C vmlinux > /dev/null real 0m1.796s user 0m1.459s sys 0m0.338s $ time objdump ... -d -C vmlinux > /dev/null real 0m0.051s user 0m0.036s sys 0m0.016s As it's not needed for data type profiling, let's make it conditional so that it can skip the unnecessary work. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 09:03:23 -03:00
Arnaldo Carvalho de Melo	dd678532f9	perf header: Additional note on AMD IBS for max_precise pmu cap x86 core PMU exposes supported maximum precision level via max_precise PMU capability. Although, AMD core PMU does not support precise mode, certain core PMU events with precise_ip > 0 are allowed and forwarded to IBS OP PMU. Display a note about this in the 'perf report' header output and document the details in the perf-list man page. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231107083331.901-2-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-10 08:31:13 -03:00
Ian Rogers	6512b6aa23	perf bpf: Don't synthesize BPF events when disabled If BPF sideband events are disabled on the command line, don't synthesize BPF events too. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Song Liu <song@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231102175735.2272696-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
James Clark	6aad765d10	perf test: Add support for setting objdump binary via perf config Add a 'perf config' variable that does the same thing as "perf test --objdump <x>". Also update the man page. Committer testing: # perf config test.objdump # perf test "object code reading" 26: Object code reading : Ok # perf config test.objdump=blah # perf config test.objdump test.objdump=blah # perf test "object code reading" 26: Object code reading : FAILED! # perf test -v "object code reading" 26: Object code reading : --- start --- test child forked, pid 600599 Looking at the vmlinux_path (8 entries long) Using /proc/kcore for kernel data Using /proc/kallsyms for symbols Parsing event 'cycles' Using CPUID AuthenticAMD-25-21-0 mmap size 528384B Reading object code for memory address: 0x4d9a02 File is: /home/acme/bin/perf On file address is: 0xd9a02 Objdump command is: blah -z -d --start-address=0x4d9a02 --stop-address=0x4d9a82 /home/acme/bin/perf objdump read too few bytes: 128 Bytes read differ from those read by objdump buf1 (dso): 0x48 0x85 0xff 0x74 0x29 0xe8 0x94 0xdf 0x07 0x00 0x8b 0x73 0x1c 0x48 0x8b 0x43 0x08 0xeb 0xa5 0x0f 0x1f 0x00 0x48 0x8b 0x45 0xe8 0x64 0x48 0x2b 0x04 0x25 0x28 0x00 0x00 0x00 0x75 0x0f 0x48 0x8b 0x5d 0xf8 0xc9 0xc3 0x0f 0x1f 0x00 0x48 0x8b 0x43 0x08 0xeb 0x84 0xe8 0xc5 0x3e 0xf3 0xff 0x0f 0x1f 0x44 0x00 0x00 0x55 0x48 0x89 0xe5 0x41 0x56 0x41 0x55 0x49 0x89 0xd5 0x41 0x54 0x49 0x89 0xfc 0x53 0x48 0x89 0xf3 0x48 0x83 0xec 0x30 0x48 0x8b 0x7e 0x20 0x64 0x48 0x8b 0x04 0x25 0x28 0x00 0x00 0x00 0x48 0x89 0x45 0xd8 0x31 0xc0 0x48 0x89 0x75 0xb0 0x48 0xc7 0x45 0xb8 0x00 0x00 0x00 0x00 0x48 0xc7 0x45 0xc0 0x00 0x00 0x00 0x00 0xe8 0xad 0xfa buf2 (objdump): 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 test child finished with -1 ---- end ---- Object code reading: FAILED! # perf config test.objdump=/usr/bin/objdump # perf config test.objdump test.objdump=/usr/bin/objdump # perf test "object code reading" 26: Object code reading : Ok # Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Fangrui Song <maskray@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yonghong Song <yhs@fb.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20231106151051.129440-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
James Clark	33ce9fc4f8	perf test: Add option to change objdump binary All of the other Perf subcommands that use objdump have an option to specify the binary, so add the same option to 'perf test'. This is useful if you have built the kernel with a different toolchain to the system one, where the system objdump may fail to disassemble vmlinux. Now this can be fixed with something like this: $ perf test --objdump llvm-objdump "object code reading" Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Fangrui Song <maskray@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20231106151051.129440-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Thomas Richter	b861fd7e0e	perf tests offcpu: Adjust test case perf record offcpu profiling tests for s390 On s390 using linux-next the test case: 87: perf record offcpu profiling tests fails. The root cause is this command # ./perf record --off-cpu -e dummy -- ./perf bench sched messaging -l 10 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.231 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.077 MB perf.data (401 samples) ] # It does not generate 800+ sample entries, on s390 usually around 40[1-9], sometimes a few more, but never more than 450. The higher the number of CPUs the lower the number of samples. Looking at function chain: bench_sched_messaging() +--> group() the senders and receiver threads are created. The senders and receivers call function ready() which writes one bytes and wait for a reply using poll system() call. As context switches are counted, the function ready() will trigger a context switch when no input data is available after the write system call. The write system call does not trigger context switches when the data size is small. And writing 1000 bytes (10 iterations with 100 bytes) is not much and certainly won't block. The 400+ context switch on s390 occur when the some receiver/sender threads call ready() and wait for the response from function bench_sched_messaging() being kicked off. Lower the number of expected context switches to 400 to succeed on s390. Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20231106091627.2022530-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Yang Jihong	36c70e44a3	perf tools: Add the python_ext_build directory to .gitignore `python_ext_build` is the build directory for python.so, ignore it for cleaner git status. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231030111438.1357962-2-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
zhaimingbing	4a5aaaf308	perf tests attr: Fix spelling mistake "whic" to "which" There is a spelling mistake, Please fix it. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20231030075825.3701-1-zhaimingbing@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Namhyung Kim	b753d48f53	perf annotate: Move offsets array from 'struct annotation' to 'struct annotated_source' The offsets array keeps pointers to 'struct annotation_line' entries which are available in the 'struct annotated_source'. Let's move it to there. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Namhyung Kim	0aae4c99c5	perf annotate: Move some source code related fields from 'struct annotation' to 'struct annotated_source' Some fields in the 'struct annotation' are only used with 'struct annotated_source' so better to be moved there in order to reduce memory consumption for other symbols. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Namhyung Kim	2b215ec71b	perf annotate: Move max_coverage from 'struct annotation' to 'struct annotated_branch' The max_coverage field is only used when branch stack info is available so it'd be natural to move to 'struct annotated_branch'. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Namhyung Kim	b7f87e3259	perf annotate: Split branch stack cycles info from 'struct annotation' The cycles info is only meaningful when sample has branch stacks. To save the memory for normal cases, move those fields to a new 'struct annotated_branch' and dynamically allocate it when needed. Also move cycles_hist from annotated_source as it's related here. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Namhyung Kim	de2c7eb59c	perf annotate: Split branch stack cycles information out of 'struct annotation_line' The cycles info is used only when branch stack is provided. Separate them from 'struct annotation_line' into a separate struct and lazy allocate them to save some memory. Committer notes: Make annotation__compute_ipc() check if the lazy allocation works, bailing out if so, its callers already do error checking and propagation. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Namhyung Kim	89d5c48c34	perf test: Simplify "object code reading" test It tries cycles (or cpu-clock on s390) event with exclude_kernel bit to open. But other arch on a VM can fail with the hardware event and need to fallback to the software event in the same way. So let's get rid of the cpuid check and use generic fallback mechanism using an array of event candidates. Now event in the odd index excludes the kernel so use that for the return value. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: James Clark <james.clark@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20231103195541.67788-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:33 -03:00
Ian Rogers	9ffa6c7512	perf machine thread: Remove exited threads by default 'struct thread' values hold onto references to mmaps, DSOs, etc. When a thread exits it is necessary to clean all of this memory up by removing the thread from the machine's threads. Some tools require this doesn't happen, such as auxtrace events, 'perf report' if offcpu events exist or if a task list is being generated, so add a 'struct symbol_conf' member to make the behavior optional. When an exited thread is left in the machine's threads, mark it as exited. This change relates to commit `40826c45eb` ("perf thread: Remove notion of dead threads") . Dead threads were removed as they had a reference count of 0 and were difficult to reason about with the reference count checker. Here a thread is removed from threads when it exits, unless via symbol_conf the exited thread isn't remove and is marked as exited. Reference counting behaves as it normally does. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231102175735.2272696-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:32 -03:00
Ian Rogers	1a27fc0170	perf record: Lazy load kernel symbols Commit `5b7ba82a75` ("perf symbols: Load kernel maps before using") changed it so that loading a kernel DSO would cause the symbols for the DSO to be eagerly loaded. For 'perf record' this is overhead as the symbols won't be used. Add a field to 'struct symbol_conf' to control the behavior and disable it for 'perf record' and 'perf inject'. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: German Gomez <german.gomez@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Nick Terrell <terrelln@fb.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231102175735.2272696-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:32 -03:00
Colin Ian King	7ff7b7afe3	perf tools: Fix spelling mistake "parametrized" -> "parameterized" There are spelling mistakes in comments and a pr_debug message. Fix them. Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20231003074911.220216-1-colin.i.king@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:49:32 -03:00
Kan Liang	9fbb4b0230	perf tools: Add branch counter knob Add a new branch filter, "counter", for the branch counter option. It is used to mark the events which should be logged in the branch. If it is applied with the -j option, the counters of all the events should be logged in the branch. If the legacy kernel doesn't support the new branch sample type, switching off the branch counter filter. The stored counter values in each branch are displayed right after the regular branch stack information via perf report -D. Usage examples: # perf record -e "{branch-instructions,branch-misses}:S" -j any,counter Only the first event, branch-instructions, collect the LBR. Both branch-instructions and branch-misses are marked as logged events. The occurrences information of them can be found in the branch stack extension space of each branch. # perf record -e "{cpu/branch-instructions,branch_type=any/,cpu/branch-misses,branch_type=counter/}" Only the first event, branch-instructions, collect the LBR. Only the branch-misses event is marked as a logged event. Committer notes: I noticed 'perf test "Sample parsing"' failing, reported to the list and Kan provided a patch that checks if the evsel has a leader and that evsel->evlist is set, the comment in the source code further explains it. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tinghao Zhang <tinghao.zhang@intel.com> Link: https://lore.kernel.org/r/20231025201626.3000228-8-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-09 13:47:50 -03:00
Kan Liang	ac9cd7245f	perf header: Support num and width of branch counters To support the branch counters feature, the information of the maximum number of supported counters and the width of the counters is exposed in the sysfs caps folder. The perf tool can use the information to parse the logged counters in each branch. Store the information in the perf_env for later usage. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tinghao Zhang <tinghao.zhang@intel.com> Link: https://lore.kernel.org/r/20231025201626.3000228-7-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-06 18:15:16 -03:00
Linus Torvalds	7ab89417ed	perf tools changes for v6.7 Build ----- * Compile BPF programs by default if clang (>= 12.0.1) is available to enable more features like kernel lock contention, off-cpu profiling, kwork, sample filtering and so on. It can be disabled by passing BUILD_BPF_SKEL=0 to make. * Produce better error messages for bison on debug build (make DEBUG=1) by defining YYDEBUG symbol internally. perf record ----------- * Track sideband events (like FORK/MMAP) from all CPUs even if perf record targets a subset of CPUs only (using -C option). Otherwise it may lose some information happened on a CPU out of the target list. * Fix checking raw sched_switch tracepoint argument using system BTF. This affects off-cpu profiling which attaches a BPF program to the raw tracepoint. perf lock contention -------------------- * Add --lock-cgroup option to see contention by cgroups. This should be used with BPF only (using -b option). $ sudo perf lock con -ab --lock-cgroup -- sleep 1 contended total wait max wait avg wait cgroup 835 14.06 ms 41.19 us 16.83 us /system.slice/led.service 25 122.38 us 13.77 us 4.89 us / 44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope 1 491 ns 491 ns 491 ns /system.slice/connectd.service * Add -G/--cgroup-filter option to see contention only for given cgroups. This can be useful when you identified a cgroup in the above command and want to investigate more on it. It also works with other output options like -t/--threads and -l/--lock-addr. $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1 contended total wait max wait avg wait type caller 8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8 2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25 1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a * Use per-cpu array for better spinlock tracking. This is to improve performance of the BPF program and to avoid nested contention on a lock in the BPF hash map. * Update callstack check for PowerPC. To find a representative caller of a lock, it needs to look up the call stacks. It ends the lookup when it sees 0 in the call stack buffer. However, PowerPC call stacks can have 0 values in the beginning so skip them when it expects valid call stacks after. perf kwork ---------- * Support 'sched' class (for -k option) so that it can see task scheduling event (using sched_switch tracepoint) as well as irq and workqueue items. * Add perf kwork top subcommand to show more accurate cpu utilization with sched class above. It works both with a recorded data (using perf kwork record command) and BPF (using -b option). Unlike perf top command, it does not support interactive mode (yet). $ sudo perf kwork top -b -k sched Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.66%] %Cpu1 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.27%] %Cpu2 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 66.40%] %Cpu3 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.28%] %Cpu4 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.82%] %Cpu5 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 77.41%] %Cpu6 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.73%] %Cpu7 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> * Add hard/soft-irq statistics to perf kwork top. This will show the total CPU utilization with IRQ stats like below: $ sudo perf kwork top -b -k sched,irq,softirq Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 12554.889 ms, 8 cpus %Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here %Cpu0 [\| 4.60%] %Cpu1 [\| 4.59%] %Cpu2 [ 2.73%] %Cpu3 [\| 3.81%] <SNIP> perf bench ---------- * Add -G/--cgroups option to perf bench sched pipe. The pipe bench is good to measure context switch overhead. With this option, it puts the reader and writer tasks in separate cgroups to enforce context switch between two different cgroups. Also it needs to set CPU affinity of the tasks in a CPU to accurately measure the impact of cgroup context switches. $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.307 [sec] 3.078180 usecs/op 324867 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000': 200,026 context-switches 63 cgroup-switches 0.321637922 seconds time elapsed You can see small number of cgroup-switches because both write and read tasks are in the same cgroup. $ sudo mkdir /sys/fs/cgroup/{AAA,BBB} $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.351 [sec] 3.512990 usecs/op 284657 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB': 200,020 context-switches 200,019 cgroup-switches 0.365034567 seconds time elapsed Now context-switches and cgroup-switches are almost same. And you can see the pipe operation took little more. * Kill child processes when perf bench sched messaging exited abnormally. Otherwise it'd leave the child doing unnecessary work. perf test --------- * Fix various shellcheck issues on the tests written in shell script. * Skip tests when condition is not satisfied: - object code reading test for non-text section addresses. - CoreSight test if cs_etm// event is not available. - lock contention test if not enough CPUs. Event parsing ------------- * Make PMU alias name loading lazy to reduce the startup time in the event parsing code for perf record, stat and others in the general case. * Lazily compute PMU default config. In the same sense, delay PMU initialization until it's really needed to reduce the startup cost. * Fix event term values that are raw events. The event specification can have several terms including event name. But sometimes it clashes with raw event encoding which starts with 'r' and has hex-digits. For example, an event named 'read' should be processed as a normal event but it was mis-treated as a raw encoding and caused a failure. $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1 event syntax error: '..nning/event=read/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Event metrics ------------- * Add "Compat" regex to match event with multiple identifiers. * Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne. Misc ---- * Assorted memory leak fixes and footprint reduction. * Add "bpf_skeletons" to perf version --build-options so that users can check whether their perf tools have BPF support easily. * Fix unaligned access in Intel-PT packet decoder found by undefined-behavior sanitizer. * Avoid frequency mode for the dummy event. Surprisingly it'd impact kernel timer tick handler performance by force iterating all PMU events. * Update bash shell completion for events and metrics. Signed-off-by: Namhyung Kim <namhyung@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZUMg7wAKCRCMstVUGiXM g8FvAQC9KED6H8rlH7UTvxE6fM947EJbldwGrNA1zGx++Ucd3gD/ewA2A6SUcIh6 Tua/XovmYOQbuDYOwlRHe+sdDag0sgg= =GrCE -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "Build: - Compile BPF programs by default if clang (>= 12.0.1) is available to enable more features like kernel lock contention, off-cpu profiling, kwork, sample filtering and so on. This can be disabled by passing BUILD_BPF_SKEL=0 to make. - Produce better error messages for bison on debug build (make DEBUG=1) by defining YYDEBUG symbol internally. perf record: - Track sideband events (like FORK/MMAP) from all CPUs even if perf record targets a subset of CPUs only (using -C option). Otherwise it may lose some information happened on a CPU out of the target list. - Fix checking raw sched_switch tracepoint argument using system BTF. This affects off-cpu profiling which attaches a BPF program to the raw tracepoint. perf lock contention: - Add --lock-cgroup option to see contention by cgroups. This should be used with BPF only (using -b option). $ sudo perf lock con -ab --lock-cgroup -- sleep 1 contended total wait max wait avg wait cgroup 835 14.06 ms 41.19 us 16.83 us /system.slice/led.service 25 122.38 us 13.77 us 4.89 us / 44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope 1 491 ns 491 ns 491 ns /system.slice/connectd.service - Add -G/--cgroup-filter option to see contention only for given cgroups. This can be useful when you identified a cgroup in the above command and want to investigate more on it. It also works with other output options like -t/--threads and -l/--lock-addr. $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1 contended total wait max wait avg wait type caller 8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8 2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25 1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a - Use per-cpu array for better spinlock tracking. This is to improve performance of the BPF program and to avoid nested contention on a lock in the BPF hash map. - Update callstack check for PowerPC. To find a representative caller of a lock, it needs to look up the call stacks. It ends the lookup when it sees 0 in the call stack buffer. However, PowerPC call stacks can have 0 values in the beginning so skip them when it expects valid call stacks after. perf kwork: - Support 'sched' class (for -k option) so that it can see task scheduling event (using sched_switch tracepoint) as well as irq and workqueue items. - Add perf kwork top subcommand to show more accurate cpu utilization with sched class above. It works both with a recorded data (using perf kwork record command) and BPF (using -b option). Unlike perf top command, it does not support interactive mode (yet). $ sudo perf kwork top -b -k sched Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.66%] %Cpu1 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.27%] %Cpu2 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 66.40%] %Cpu3 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.28%] %Cpu4 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.82%] %Cpu5 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 77.41%] %Cpu6 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 61.73%] %Cpu7 [\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> - Add hard/soft-irq statistics to perf kwork top. This will show the total CPU utilization with IRQ stats like below: $ sudo perf kwork top -b -k sched,irq,softirq Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 12554.889 ms, 8 cpus %Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here %Cpu0 [\| 4.60%] %Cpu1 [\| 4.59%] %Cpu2 [ 2.73%] %Cpu3 [\| 3.81%] <SNIP> perf bench: - Add -G/--cgroups option to perf bench sched pipe. The pipe bench is good to measure context switch overhead. With this option, it puts the reader and writer tasks in separate cgroups to enforce context switch between two different cgroups. Also it needs to set CPU affinity of the tasks in a CPU to accurately measure the impact of cgroup context switches. $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.307 [sec] 3.078180 usecs/op 324867 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000': 200,026 context-switches 63 cgroup-switches 0.321637922 seconds time elapsed You can see small number of cgroup-switches because both write and read tasks are in the same cgroup. $ sudo mkdir /sys/fs/cgroup/{AAA,BBB} $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.351 [sec] 3.512990 usecs/op 284657 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB': 200,020 context-switches 200,019 cgroup-switches 0.365034567 seconds time elapsed Now context-switches and cgroup-switches are almost same. And you can see the pipe operation took little more. - Kill child processes when perf bench sched messaging exited abnormally. Otherwise it'd leave the child doing unnecessary work. perf test: - Fix various shellcheck issues on the tests written in shell script. - Skip tests when condition is not satisfied: - object code reading test for non-text section addresses. - CoreSight test if cs_etm// event is not available. - lock contention test if not enough CPUs. Event parsing: - Make PMU alias name loading lazy to reduce the startup time in the event parsing code for perf record, stat and others in the general case. - Lazily compute PMU default config. In the same sense, delay PMU initialization until it's really needed to reduce the startup cost. - Fix event term values that are raw events. The event specification can have several terms including event name. But sometimes it clashes with raw event encoding which starts with 'r' and has hex-digits. For example, an event named 'read' should be processed as a normal event but it was mis-treated as a raw encoding and caused a failure. $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1 event syntax error: '..nning/event=read/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Event metrics: - Add "Compat" regex to match event with multiple identifiers. - Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne. Misc: - Assorted memory leak fixes and footprint reduction. - Add "bpf_skeletons" to perf version --build-options so that users can check whether their perf tools have BPF support easily. - Fix unaligned access in Intel-PT packet decoder found by undefined-behavior sanitizer. - Avoid frequency mode for the dummy event. Surprisingly it'd impact kernel timer tick handler performance by force iterating all PMU events. - Update bash shell completion for events and metrics" * tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (187 commits) perf vendor events intel: Update tsx_cycles_per_elision metrics perf vendor events intel: Update bonnell version number to v5 perf vendor events intel: Update westmereex events to v4 perf vendor events intel: Update meteorlake events to v1.06 perf vendor events intel: Update knightslanding events to v16 perf vendor events intel: Add typo fix for ivybridge FP perf vendor events intel: Update a spelling in haswell/haswellx perf vendor events intel: Update emeraldrapids to v1.01 perf vendor events intel: Update alderlake/alderlake events to v1.23 perf build: Disable BPF skeletons if clang version is < 12.0.1 perf callchain: Fix spelling mistake "statisitcs" -> "statistics" perf report: Fix spelling mistake "heirachy" -> "hierarchy" perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit() perf tests: test_arm_coresight: Simplify source iteration perf vendor events intel: Add tigerlake two metrics perf vendor events intel: Add broadwellde two metrics perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit perf callchain: Minor layout changes to callchain_list perf callchain: Make brtype_stat in callchain_list optional ...	2023-11-03 08:17:38 -10:00
Arnaldo Carvalho de Melo	851bbccf6b	perf build: Warn about missing libelf before warning about missing libbpf As libelf is a requirement for libbpf if it is not available, as in some container build tests where NO_LIBELF=1 is used, then better warn about the most basic library first. Ditto for libz, check its availability before libbpf too. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZUEehyDk0FkPnvMR@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-03 12:25:58 -03:00
Arnaldo Carvalho de Melo	c8e3ade38b	perf tests make: Remove the last egrep call, use 'grep -E' instead One last case, caught while testing with amazonlinux:2, centos:stream, etc: 4 7.28 amazonlinux:2 : FAIL egrep: warning: egrep is obsolescent; using grep -E gcc version 7.3.1 20180712 (Red Hat 7.3.1-17) (GCC) 8 13.87 centos:stream : FAIL egrep: warning: egrep is obsolescent; using grep -E Reviewed-by: Guilherme Amadio <amadio@gentoo.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZUEdtblE8qDAQkBK@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-11-03 12:25:53 -03:00
Linus Torvalds	6803bd7956	ARM: * Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest * Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table * Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM * Guest support for memory operation instructions (FEAT_MOPS) * Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code * Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use * Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations * Miscellaneous kernel and selftest fixes LoongArch: * New architecture. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: * Support for the Smstateen and Zicond extensions * Support for virtualizing senvcfg * Support for virtualized SBI debug console (DBCN) S390: * Nested page table management can be monitored through tracepoints and statistics x86: * Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ * Avoid WARN on systems with Intel IPI virtualization * Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. * Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). * Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. * Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. * "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. * Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. * Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. * Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. * Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. * Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. * Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. * Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. * Optimize injection of PMU interrupts that are simultaneous with NMIs. * Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: * Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. * Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. * Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y. This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: * Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. * Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: * Various updates and fixes, mostly for x86 * MTRR and PAT fixes and optimizations: -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL 6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g== =UIxm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...	2023-11-02 15:45:15 -10:00
Linus Torvalds	1e0c505e13	asm-generic updates for v6.7 The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVC40IACgkQYKtH/8kJ Uidhmw/9EX+aWSXGoObJ3fngaNSMw+PmrEuP8qEKBHxfKHcCdX3hc451Oh4GlhaQ tru91pPwgNvN2/rfoKusxT+V4PemGIzfNni/04rp+P0kvmdw5otQ2yNhsQNsfVmq XGWvkxF4P2GO6bkjjfR/1dDq7GtlyXtwwPDKeLbYb6TnJOZjtx+EAN27kkfSn1Ms R4Sa3zJ+DfHUmHL5S9g+7UD/CZ5GfKNmIskI4Mz5GsfoUz/0iiU+Bge/9sdcdSJQ kmbLy5YnVzfooLZ3TQmBFsO3iAMWb0s/mDdtyhqhTVmTUshLolkPYyKnPFvdupyv shXcpEST2XJNeaDRnL2K4zSCdxdbnCZHDpjfl9wfioBg7I8NfhXKpf1jYZHH1de4 LXq8ndEFEOVQw/zSpYWfQq1sux8Jiqr+UK/ukbVeFWiGGIUs91gEWtPAf8T0AZo9 ujkJvaWGl98O1g5wmBu0/dAR6QcFJMDfVwbmlIFpU8O+MEaz6X8mM+O5/T0IyTcD eMbAUjj4uYcU7ihKzHEv/0SS9Of38kzff67CLN5k8wOP/9NlaGZ78o1bVle9b52A BdhrsAefFiWHp1jT6Y9Rg4HOO/TguQ9e6EWSKOYFulsiLH9LEFaB9RwZLeLytV0W vlAgY9rUW77g1OJcb7DoNv33nRFuxsKqsnz3DEIXtgozo9CzbYI= =H1vH -----END PGP SIGNATURE----- Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull ia64 removal and asm-generic updates from Arnd Bergmann: - The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. - The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. * tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: hexagon: Remove unusable symbols from the ptrace.h uapi asm-generic: Fix spelling of architecture arch: Reserve map_shadow_stack() syscall number for all architectures syscalls: Cleanup references to sys_lookup_dcookie() Documentation: Drop or replace remaining mentions of IA64 lib/raid6: Drop IA64 support Documentation: Drop IA64 from feature descriptions kernel: Drop IA64 support from sig_fault handlers arch: Remove Itanium (IA-64) architecture	2023-11-01 15:28:33 -10:00
Paolo Bonzini	45b890f768	KVM/arm64 updates for 6.7 - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZUFJRgAKCRCivnWIJHzd FtgYAP9cMsc1Mhlw3jNQnTc6q0cbTulD/SoEDPUat1dXMqjs+gEAnskwQTrTX834 fgGQeCAyp7Gmar+KeP64H0xm8kPSpAw= =R4M7 -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.7 - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes	2023-10-31 16:37:07 -04:00
Arnaldo Carvalho de Melo	1715b6359c	perf beauty socket/prctl_option: Cope with extended regexp complaint by grep Noticed on fedora 38, the extended regexp that so far was ok for both grep and sed now gets complaints by grep, that says '/' doesn't need to be escaped with '\'. So stop using '/' in sed, use '%' instead and remove the \ before / in the common extended regexp. Link: https://x.com/SMT_Solvers/status/1710380010098344192?s=20 Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZUEddFPTJHVLhH%2F6@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2023-10-31 12:31:19 -03:00
Namhyung Kim	fed3a1be64	perf tools fixes for v6.6: 2nd batch - Fix regression in reading scale and unit files from sysfs for PMU events, so that we can use that info to pretty print instead of printing raw numbers: # perf stat -e power/energy-ram/,power/energy-gpu/ sleep 2 Performance counter stats for 'system wide': 1.64 Joules power/energy-ram/ 0.20 Joules power/energy-gpu/ 2.001228914 seconds time elapsed # # grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz # - The small llvm.cpp file used to check if the llvm devel files are present was incorrectly deleted when removing the BPF event in 'perf trace', put it back as it is also used by tools/bpf/bpftool, that uses llvm routines to do disassembly of BPF object files. - Fix use of addr_location__exit() in dlfilter__object_code(), making sure that it is only used to pair a previous addr_location__init() call. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZTKh5AAKCRCyPKLppCJ+ J/g/AP0f6SNyHJz21JzDTzyjXAeSdMzKwic0LXv+kATQy31HJAD+Kf7UKQieUeZB fxvp60aKyFN8IVIgpYiAjZMS3k9XPAY= =N7Gv -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.6-2-2023-10-20' into perf-tools-next To get the latest fixes in the perf tools including perf stat output, dlfilter and LLVM feature detection. Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-30 13:46:27 -07:00
Ian Rogers	c43c64f8a1	perf vendor events intel: Update tsx_cycles_per_elision metrics Update tsx_cycles_per_elision as per: https://github.com/intel/perfmon/pull/116 Prefer the el-start event rather than cycles-t for detecting whether the metric will work as HLE may be disabled. Remove the metric from sapphirerapids that has no el-start event. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:45:27 -07:00
Ian Rogers	c44c311859	perf vendor events intel: Update bonnell version number to v5 Spelling fixes were already incorporated in the Linux perf tree, update the version number to reflect this. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:45:27 -07:00
Ian Rogers	b629208161	perf vendor events intel: Update westmereex events to v4 Update westmereex events from v3 to v4 fixing a spelling issue. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:45:22 -07:00
Ian Rogers	247730767c	perf vendor events intel: Update meteorlake events to v1.06 Update meteorlake from v1.04 to v1.06 adding the changes from: `bc84df0430` `405d3ee987` Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:45:18 -07:00
Ian Rogers	f9418b524d	perf vendor events intel: Update knightslanding events to v16 Update knightslanding from v10 to v16 adding the changes from: `6c1f169f6e` `b22ca587ec` `e685286f08` Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:45:12 -07:00
Ian Rogers	20e6a51f61	perf vendor events intel: Add typo fix for ivybridge FP Add a missed space. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:45:07 -07:00
Ian Rogers	99a8a4c990	perf vendor events intel: Update a spelling in haswell/haswellx The spelling of "in-flight" was switched to "inflight". Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:41:41 -07:00
Ian Rogers	8a94d3bfaf	perf vendor events intel: Update emeraldrapids to v1.01 Update emeraldrapids to v1.01 from v1.00 adding the changes from: `3993b600e0` Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:41:15 -07:00
Ian Rogers	a28a0f6773	perf vendor events intel: Update alderlake/alderlake events to v1.23 Update alderlake and alderlaken events from v1.21 to v1.23 adding the changes from: `8df4db9433` `846bd247c6` The tsx_cycles_per_elision metric is updated from PR: https://github.com/intel/perfmon/pull/116 Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20231026003149.3287633-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-28 00:40:53 -07:00
Arnaldo Carvalho de Melo	1768d3a014	perf build: Disable BPF skeletons if clang version is < 12.0.1 While building on a wide range of distros and clang versions it was noticed that at least version 12.0.1 (noticed on Alpine 3.15 with "Alpine clang version 12.0.1") is needed to not fail with BTF generation errors such as: Debian:10 Debian clang version 11.0.1-2~deb10u1: CLANG /tmp/build/perf/util/bpf_skel/.tmp/sample_filter.bpf.o <SNIP> GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2 Error: failed to open BPF object file: No such file or directory make[2]: * [Makefile.perf:1121: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254 make[2]: * Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h' Amazon Linux 2: clang version 11.1.0 (Amazon Linux 2 11.1.0-1.amzn2.0.2) GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h libbpf: elf: skipping unrecognized data section(18) .eh_frame libbpf: elf: skipping relo section(19) .rel.eh_frame for section(18) .eh_frame libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2 Error: failed to open BPF object file: No such file or directory make[2]: * [/tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254 make[2]: * Deleting file `/tmp/build/perf/util/bpf_skel/sample_filter.skel.h' Ubuntu 20.04: clang version 10.0.0-4ubuntu1 CLANG /tmp/build/perf/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o GENSKEL /tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h libbpf: sec '.reluprobe': corrupted symbol #27 pointing to invalid section #65522 for relo #0 GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h Error: failed to open BPF object file: BPF object format invalid make[2]: * [Makefile.perf:1121: /tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h] Error 95 make[2]: * Deleting file '/tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h' So check if the version is at least 12.0.1 otherwise disable building BPF skels and provide a message about it, continuing the build. The message, when running on amazonlinux:2: Makefile.config:698: Warning: Disabled BPF skeletons as reliable BTF generation needs at least clang version 12.0.1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/ZTvGx/Ou6BVnYBqi@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-27 19:27:33 -07:00
Colin Ian King	ee40490dd7	perf callchain: Fix spelling mistake "statisitcs" -> "statistics" There are a couple of spelling mistakes in perror messages. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20231027084633.1167530-1-colin.i.king@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-27 19:26:13 -07:00
Colin Ian King	0e0f03d7fc	perf report: Fix spelling mistake "heirachy" -> "hierarchy" There is a spelling mistake in a ui error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20231027084011.1167091-1-colin.i.king@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-27 19:25:13 -07:00
Arnaldo Carvalho de Melo	93c65d6143	perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit() The changes in ("perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile") ended up breaking the python binding that now references the rlimit__increase_nofile function, add the util/rlimit.o to the tools/perf/util/python-ext-sources to cure that. This was detected by the 'perf test python' regression test: $ perf test python 14: 'import perf' in python : FAILED! $ perf test -v python Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc 14: 'import perf' in python : --- start --- test child forked, pid 2912462 python usage test: "echo "import sys ; sys.path.insert(0, '/tmp/build/perf-tools-next/python'); import perf" \| '/usr/bin/python3' " Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /tmp/build/perf-tools-next/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: rlimit__increase_nofile test child finished with -1 ---- end ---- 'import perf' in python: FAILED! $ Fixes: `e093a222d7` ("perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile") Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/lkml/ZTrCS5Z3PZAmfPdV@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-27 19:23:52 -07:00
James Clark	0b783d2e82	perf tests: test_arm_coresight: Simplify source iteration There are two reasons to do this, firstly there is a shellcheck warning in cs_etm_dev_name(), which can be completely deleted. And secondly the current iteration method doesn't support systems with both ETE and ETM because it picks one or the other. There isn't a known system with this configuration, but it could happen in the future. Iterating over all the sources for each CPU can be done by going through /sys/bus/event_source/devices/cs_etm/cpu* and following the symlink back to the Coresight device in /sys/bus/coresight/devices. This will work whether the device is ETE, ETM or any future name, and is much simpler and doesn't require any hard coded version numbers Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: tianruidong@linux.alibaba.com Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: atrajeev@linux.vnet.ibm.com Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231023131550.487760-1-james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-26 10:58:10 -07:00
Ian Rogers	4ece2a7e88	perf vendor events intel: Add tigerlake two metrics Add tma_info_system_socket_clks and uncore_freq metrics. The associated converter script fix is in: https://github.com/intel/perfmon/pull/112 Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Link: https://lore.kernel.org/r/20230926205948.1399594-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-26 10:15:04 -07:00
Ian Rogers	19a214bffd	perf vendor events intel: Add broadwellde two metrics Add tma_info_system_socket_clks and uncore_freq metrics that require a broadwellx style uncore event for UNC_CLOCK. The associated converter script fix is in: https://github.com/intel/perfmon/pull/112 Fixes: `7d124303d6` ("perf vendor events intel: Update broadwell variant events/metrics") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Link: https://lore.kernel.org/r/20230926205948.1399594-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-26 10:14:53 -07:00
Ian Rogers	3779416eed	perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric Broadwell-de has a consumer core and server uncore. The uncore_arb PMU isn't present and the broadwellx style cbox PMU should be used instead. Fix the tma_info_system_dram_bw_use metric to use the server metric rather than client. The associated converter script fix is in: https://github.com/intel/perfmon/pull/111 Fixes: `7d124303d6` ("perf vendor events intel: Update broadwell variant events/metrics") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Link: https://lore.kernel.org/r/20230926031034.1201145-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 15:19:28 -07:00
Ian Rogers	56e144fe98	perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit Fix leak where mem_info__put wouldn't release the maps/map as used by perf mem. Add exit functions and use elsewhere that the maps and map are released. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:39:58 -07:00
Ian Rogers	dec07fe5d4	perf callchain: Minor layout changes to callchain_list Avoid 6 byte hole for padding. Place more frequently used fields first in an attempt to use just 1 cacheline in the common case. Before: ``` struct callchain_list { u64 ip; /* 0 8 / struct map_symbol ms; / 8 24 / struct { _Bool unfolded; / 32 1 / _Bool has_children; / 33 1 / }; / 32 2 / / XXX 6 bytes hole, try to pack / u64 branch_count; / 40 8 / u64 from_count; / 48 8 / u64 predicted_count; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u64 abort_count; / 64 8 / u64 cycles_count; / 72 8 / u64 iter_count; / 80 8 / u64 iter_cycles; / 88 8 / struct branch_type_stat brtype_stat; /* 96 8 / const char srcline; /* 104 8 / struct list_head list; / 112 16 / / size: 128, cachelines: 2, members: 13 / / sum members: 122, holes: 1, sum holes: 6 / }; ``` After: ``` struct callchain_list { struct list_head list; / 0 16 / u64 ip; / 16 8 / struct map_symbol ms; / 24 24 / const char srcline; /* 48 8 / u64 branch_count; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u64 from_count; / 64 8 / u64 cycles_count; / 72 8 / u64 iter_count; / 80 8 / u64 iter_cycles; / 88 8 / struct branch_type_stat brtype_stat; /* 96 8 / u64 predicted_count; / 104 8 / u64 abort_count; / 112 8 / struct { _Bool unfolded; / 120 1 / _Bool has_children; / 121 1 / }; / 120 2 / / size: 128, cachelines: 2, members: 13 / / padding: 6 */ }; ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:39:32 -07:00
Ian Rogers	6ba29fbb0b	perf callchain: Make brtype_stat in callchain_list optional struct callchain_list is 352bytes in size, 232 of which are brtype_stat. brtype_stat is only used for certain callchain_list items so make it optional, allocating when necessary. So that printing doesn't need to deal with an optional brtype_stat, pass an empty/zero version. Before: ``` struct callchain_list { u64 ip; /* 0 8 / struct map_symbol ms; / 8 24 / struct { _Bool unfolded; / 32 1 / _Bool has_children; / 33 1 / }; / 32 2 / / XXX 6 bytes hole, try to pack / u64 branch_count; / 40 8 / u64 from_count; / 48 8 / u64 predicted_count; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u64 abort_count; / 64 8 / u64 cycles_count; / 72 8 / u64 iter_count; / 80 8 / u64 iter_cycles; / 88 8 / struct branch_type_stat brtype_stat; / 96 232 / / --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- / const char srcline; /* 328 8 / struct list_head list; / 336 16 / / size: 352, cachelines: 6, members: 13 / / sum members: 346, holes: 1, sum holes: 6 / / last cacheline: 32 bytes / }; ``` After: ``` struct callchain_list { u64 ip; / 0 8 / struct map_symbol ms; / 8 24 / struct { _Bool unfolded; / 32 1 / _Bool has_children; / 33 1 / }; / 32 2 / / XXX 6 bytes hole, try to pack / u64 branch_count; / 40 8 / u64 from_count; / 48 8 / u64 predicted_count; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u64 abort_count; / 64 8 / u64 cycles_count; / 72 8 / u64 iter_count; / 80 8 / u64 iter_cycles; / 88 8 / struct branch_type_stat brtype_stat; /* 96 8 / const char srcline; /* 104 8 / struct list_head list; / 112 16 / / size: 128, cachelines: 2, members: 13 / / sum members: 122, holes: 1, sum holes: 6 */ }; ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:39:08 -07:00
Ian Rogers	d47d876d72	perf callchain: Make display use of branch_type_stat const Display code doesn't modify the branch_type_stat so switch uses to const. This is done to aid refactoring struct callchain_list where current the branch_type_stat is embedded even if not used. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:38:50 -07:00
Ian Rogers	67a3ebf1c3	perf offcpu: Add missed btf_free Caught by address/leak sanitizer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:38:33 -07:00
Ian Rogers	7b2e444b76	perf threads: Remove unused dead thread list Commit `40826c45eb` ("perf thread: Remove notion of dead threads") removed dead threads but the list head wasn't removed. Remove it here. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:38:09 -07:00
Ian Rogers	c1149037f6	perf hist: Add missing puts to hist__account_cycles Caught using reference count checking on perf top with "--call-graph=lbr". After this no memory leaks were detected. Fixes: `57849998e2` ("perf report: Add processing for cycle histograms") Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:37:48 -07:00
Ian Rogers	78c32f4cb1	libperf rc_check: Add RC_CHK_EQUAL Comparing pointers with reference count checking is tricky to avoid a SEGV. Add a convenience macro to simplify and use. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:37:22 -07:00
Ian Rogers	ab8ce15078	perf machine: Avoid out of bounds LBR memory read Running perf top with address sanitizer and "--call-graph=lbr" fails due to reading sample 0 when no samples exist. Add a guard to prevent this. Fixes: `e2b23483eb` ("perf machine: Factor out lbr_callchain_add_lbr_ip()") Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:36:20 -07:00
Ian Rogers	7a8f349e9d	perf rwsem: Add debug mode that uses a mutex Mutex error check will capture trying to take the lock recursively and other problems that rwlock won't. At the expense of concurrency, adda debug mode that uses a mutex in place of a rwsem. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 13:35:35 -07:00
Arnaldo Carvalho de Melo	b27778ed5d	perf build: Address stray '\' before # that is warned about since grep 3.8 To address this grep 3.8 warning: grep: warning: stray \ before # We needed to remove the '' around the grep expression and keep the \ before # so that it is escaped by the $(shell grep ...) and thus doesn't get to grep. We need that \ before the #, otherwise we get this: Makefile.perf:364: * unterminated call to function 'shell': missing ')'. Stop. As everything after the # will be considered a comment. Removing the single quotes needs some more escaping so that _some_ of the escaped chars gets to grep, like the '\\|' that becomes '\\\\|´. Running on debian:10, where there is no libtraceevent-devel available, we get: Makefile.perf:367: * PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c. Stop. make[1]: * [Makefile.perf:242: sub-make] Error 2 I.e. both the comments and the util/trace-event.c were removed. When using: msg := $(error PYTHON_EXT_SRCS=$(PYTHON_EXT_SRCS)) While on the more recent fedora:38, with the new grep and make packages and libtraceevent-devel installed: Makefile.perf:367: * PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c util/trace-event.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c. Stop. make[1]: * [Makefile.perf:242: sub-make] Error 2 make: * [Makefile:113: install-bin] Error 2 make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf' $ I.e. only the comments were removed. If we build it on the same fedora:38 system, but using NO_LIBTRACEEVENT=1 $ make NO_LIBTRACEEVENT=1 CORESIGHT=1 O=/tmp/build/$(basename $PWD) -C tools/perf install-bin Makefile.perf:367: * PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c. Stop. make[1]: * [Makefile.perf:242: sub-make] Error 2 make: *** [Makefile:113: install-bin] Error 2 make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf' $ Both comments and the util/trace-event.c file removed. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/ZTj6mfM9UqY2DggC@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 10:05:03 -07:00
Namhyung Kim	a6e4a4a14a	perf report: Fix hierarchy mode on pipe input The hierarchy mode needs to setup output formats for each evsel. Normally setup_sorting() handles this at the beginning, but it cannot do that if data comes from a pipe since there's no evsel info before reading the data. And then perf report cannot process the samples in hierarchy mode and think as if there's no sample. Let's check the condition and setup the output formats after reading data so that it can find evsels. Before: $ ./perf record -o- true \| ./perf report -i- --hierarchy -q [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] Error: The - data has no samples! After: $ ./perf record -o- true \| ./perf report -i- --hierarchy -q [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] 94.76% true 94.76% [kernel.kallsyms] 94.76% [k] filemap_fault 5.24% perf-ex 5.24% [kernel.kallsyms] 5.06% [k] __memset 0.18% [k] native_write_msr Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231025003121.2811738-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 10:04:19 -07:00
Namhyung Kim	b5711042a1	perf lock contention: Use per-cpu array map for spinlocks Currently lock contention timestamp is maintained in a hash map keyed by pid. That means it needs to get and release a map element (which is proctected by spinlock!) on each contention begin and end pair. This can impact on performance if there are a lot of contention (usually from spinlocks). It used to go with task local storage but it had an issue on memory allocation in some critical paths. Although it's addressed in recent kernels IIUC, the tool should support old kernels too. So it cannot simply switch to the task local storage at least for now. As spinlocks create lots of contention and they disabled preemption during the spinning, it can use per-cpu array to keep the timestamp to avoid overhead in hashmap update and delete. In contention_begin, it's easy to check the lock types since it can see the flags. But contention_end cannot see it. So let's try to per-cpu array first (unconditionally) if it has an active element (lock != 0). Then it should be used and per-task tstamp map should not be used until the per-cpu array element is cleared which means nested spinlock contention (if any) was finished and it nows see (the outer) lock. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org	2023-10-25 10:02:55 -07:00
Namhyung Kim	6a070573f2	perf lock contention: Check race in tstamp elem creation When pelem is NULL, it'd create a new entry with zero data. But it might be preempted by IRQ/NMI just before calling bpf_map_update_elem() then there's a chance to call it twice for the same pid. So it'd be better to use BPF_NOEXIST flag and check the return value to prevent the race. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org	2023-10-25 10:02:47 -07:00
Namhyung Kim	d99317f214	perf lock contention: Clear lock addr after use It checks the current lock to calculated the delta of contention time. The address is saved in the tstamp map which is allocated at begining of contention and released at end of contention. But it's possible for bpf_map_delete_elem() to fail. In that case, the element in the tstamp map kept for the current lock and it makes the next contention for the same lock tracked incorrectly. Specificially the next contention begin will see the existing element for the task and it'd just return. Then the next contention end will see the element and calculate the time using the timestamp for the previous begin. This can result in a large value for two small contentions happened from time to time. Let's clear the lock address so that it can be updated next time even if the bpf_map_delete_elem() failed. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org	2023-10-25 10:02:34 -07:00
Yang Jihong	e093a222d7	perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile evsel__increase_rlimit() helper does nothing with evsel, and description of the functionality is inaccurate, rename it and move to util/rlimit.c. By the way, fix a checkppatch warning about misplaced license tag: WARNING: Misplaced SPDX-License-Identifier tag - use line 1 instead #160: FILE: tools/perf/util/rlimit.h:3: /* SPDX-License-Identifier: LGPL-2.1 */ No functional change. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231023033144.1011896-1-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 10:02:11 -07:00
Namhyung Kim	79a3371bdf	perf bench sched pipe: Add -G/--cgroups option The -G/--cgroups option is to put sender and receiver in different cgroups in order to measure cgroup context switch overheads. Users need to make sure the cgroups exist and accessible. The following example should the effect of this change. Please don't forget taskset before the perf bench to measure cgroup switches properly. Otherwise each task would run on a different CPU and generate cgroup switches regardless of this change. # perf stat -e context-switches,cgroup-switches \ > taskset -c 0 perf bench sched pipe -l 10000 > /dev/null Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000': 20,001 context-switches 2 cgroup-switches 0.053449651 seconds time elapsed 0.011286000 seconds user 0.041869000 seconds sys # perf stat -e context-switches,cgroup-switches \ > taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB > /dev/null Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB': 20,001 context-switches 20,001 cgroup-switches 0.052768627 seconds time elapsed 0.006284000 seconds user 0.046266000 seconds sys Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20231017202342.1353124-1-namhyung@kernel.org	2023-10-25 10:02:10 -07:00
Michael Petlan	cbf5f58461	perf test: Skip CoreSight tests if cs_etm// event is not available CoreSight might be not available, in such case, skip the tests. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Carsten Haitzler <carsten.haitzler@arm.com> Cc: vmolnaro@redhat.com Link: https://lore.kernel.org/r/20231019091137.22525-1-mpetlan@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-25 10:02:01 -07:00
Yang Jihong	c4a852635e	perf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create_dir() If using parallel threads to collect data, perf record needs at least 6 fds per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX) For an environment with more than 100 cores, if perf record uses both `-a` and `--threads` options, it is easy to exceed the upper limit of the file descriptor number, when we run out of them try to increase the limits. Before: $ ulimit -n 1024 $ lscpu \| grep 'On-line CPU(s)' On-line CPU(s) list: 0-159 $ perf record --threads -a sleep 1 Failed to create data directory: Too many open files After: $ ulimit -n 1024 $ lscpu \| grep 'On-line CPU(s)' On-line CPU(s) list: 0-159 $ perf record --threads -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ] Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20231013075945.698874-1-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-19 23:38:27 -07:00
Kajol Jain	3f8b6e5b11	perf vendor events: Update PMC used in PM_RUN_INST_CMPL event for power10 platform The CPI_STALL_RATIO metric group can be used to present the high level CPI stall breakdown metrics in powerpc, which will show: - DISPATCH_STALL_CPI ( Dispatch stall cycles per insn ) - ISSUE_STALL_CPI ( Issue stall cycles per insn ) - EXECUTION_STALL_CPI ( Execution stall cycles per insn ) - COMPLETION_STALL_CPI ( Completion stall cycles per insn ) Commit `cf26e043c2` ("perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc)" which added the CPI_STALL_RATIO metric group, also modified the PMC value used in PM_RUN_INST_CMPL event from PMC4 to PMC5, to avoid multiplexing of events. But that got revert in recent changes. Fix this issue by changing back the PMC value used in PM_RUN_INST_CMPL to PMC5. Result with the fix: ./perf stat --metric-no-group -M CPI_STALL_RATIO <workload> Performance counter stats for 'workload': 68,745,426 PM_CMPL_STALL # 0.21 COMPLETION_STALL_CPI 7,692,827 PM_ISSUE_STALL # 0.02 ISSUE_STALL_CPI 322,638,223 PM_RUN_INST_CMPL # 0.05 DISPATCH_STALL_CPI # 0.48 EXECUTION_STALL_CPI 16,858,553 PM_DISP_STALL_CYC 153,880,133 PM_EXEC_STALL 0.089774592 seconds time elapsed "--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled in all group for more accuracy. Fixes: `7d473f475b` ("perf vendor events: Move JSON/events to appropriate files for power10 platform") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel<disgoel@linux.ibm.com> Cc: maddy@linux.ibm.com Link: https://lore.kernel.org/r/20231016143110.244255-1-kjain@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-19 23:35:20 -07:00
Thomas Richter	5069211e2f	perf trace: Use the right bpf_probe_read(_str) variant for reading user data Perf test case 111 Check open filename arg using perf trace + vfs_getname fails on s390. This is caused by a failing function bpf_probe_read() in file util/bpf_skel/augmented_raw_syscalls.bpf.c. The root cause is the lookup by address. Function bpf_probe_read() is used. This function works only for architectures with ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. On s390 is not possible to determine from the address to which address space the address belongs to (user or kernel space). Replace bpf_probe_read() by bpf_probe_read_kernel() and bpf_probe_read_str() by bpf_probe_read_user_str() to explicity specify the address space the address refers to. Output before: # ./perf trace -eopen,openat -- touch /tmp/111 libbpf: prog 'sys_enter': BPF program load failed: Invalid argument libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG -- reg type unsupported for arg#0 function sys_enter#75 0: R1=ctx(off=0,imm=0) R10=fp0 ; int sys_enter(struct syscall_enter_args args) 0: (bf) r6 = r1 ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0) ; return bpf_get_current_pid_tgid(); 1: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 2: (63) (u32 )(r10 -8) = r0 ; R0_w=scalar() R10=fp0 fp-8=????mmmm 3: (bf) r2 = r10 ; R2_w=fp0 R10=fp0 ; ..... lines deleted here ..... 23: (bf) r3 = r6 ; R3_w=ctx(off=0,imm=0) R6=ctx(off=0,imm=0) 24: (85) call bpf_probe_read#4 unknown func bpf_probe_read#4 processed 23 insns (limit 1000000) max_states_per_insn 0 \ total_states 2 peak_states 2 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'sys_enter': failed to load: -22 libbpf: failed to load object 'augmented_raw_syscalls_bpf' libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -22 .... Output after: # ./perf test -Fv 111 111: Check open filename arg using perf trace + vfs_getname : --- start --- 1.085 ( 0.011 ms): touch/320753 openat(dfd: CWD, filename: \ "/tmp/temporary_file.SWH85", \ flags: CREAT\|NOCTTY\|NONBLOCK\|WRONLY, mode: IRUGO\|IWUGO) = 3 ---- end ---- Check open filename arg using perf trace + vfs_getname: Ok # Test with the sleep command shows: Output before: # ./perf trace -e sleep sleep 1.234567890 0.000 (1234.681 ms): sleep/63114 clock_nanosleep(rqtp: \ { .tv_sec: 0, .tv_nsec: 0 }, rmtp: 0x3ffe0979720) = 0 # Output after: # ./perf trace -e *sleep sleep 1.234567890 0.000 (1234.686 ms): sleep/64277 clock_nanosleep(rqtp: \ { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x3fff3df9ea0) = 0 # Fixes: `14e4b9f428` ("perf trace: Raw augmented syscalls fix libbpf 1.0+ compatibility") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Co-developed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Cc: svens@linux.ibm.com Link: https://lore.kernel.org/r/20231019082642.3286650-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-19 22:41:46 -07:00
Oliver Upton	e2bdd172e6	perf build: Generate arm64's sysreg-defs.h and add to include path Start generating sysreg-defs.h in anticipation of updating sysreg.h to a version that needs the generated output. Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231011195740.3349631-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-10-18 23:36:25 +00:00
Namhyung Kim	1f36b190ad	perf tools: Do not ignore the default vmlinux.h The recent change made it possible to generate vmlinux.h from BTF and to ignore the file. But we also have a minimal vmlinux.h that will be used by default. It should not be ignored by GIT. Fixes: `b7a2d774c9` ("perf build: Add ability to build with a generated vmlinux.h") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310110451.rvdUZJEY-lkp@intel.com/ Cc: oe-kbuild-all@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-18 15:35:20 -07:00
Changbin Du	9a13ee457a	perf: script: fix missing ',' for fields option A comma is missed at the end of line. Signed-off-by: Changbin Du <changbin.du@huawei.com> Link: https://lore.kernel.org/r/20231017015524.797065-1-changbin.du@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:51 -07:00
Athira Rajeev	a20fca2c5d	perf tests: Fix shellcheck warning in stat_all_metricgroups Running shellcheck on stat_all_metricgroups.sh reports below warning: In ./tests/shell/stat_all_metricgroups.sh line 7: function ParanoidAndNotRoot() ^-- SC2112: 'function' keyword is non-standard. Delete it. As per the format, "function" is a non-standard keyword that can be used to declare functions. Fix this by removing the "function" keyword from ParanoidAndNotRoot function Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20231013073021.99794-4-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:51 -07:00
Athira Rajeev	eff65ee26e	perf tests: Fix shellcheck warning in record_sideband.sh Running shellcheck on record_sideband.sh throws below warning: In tests/shell/record_sideband.sh line 25: if ! perf record -o ${perfdata} -BN --no-bpf-event -C $1 true 2>&1 >/dev/null ^--^ SC2069: To redirect stdout+stderr, 2>&1 must be last (or use '{ cmd > file; } 2>&1' to clarify). This shows shellcheck warning SC2069 where the redirection order needs to be fixed. Use "cmd > /dev/null 2>&1" to fix the redirection of perf record output Fixes: `23b97c7ee9` ("perf test: Add test case for record sideband events") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: disgoel@linux.vnet.ibm.com Link: https://lore.kernel.org/r/20231013073021.99794-3-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:51 -07:00
Athira Rajeev	47f5693c4c	perf tests: Ignore shellcheck warning in lock_contention Running shellcheck on lock_contention.sh generates below warning In tests/shell/lock_contention.sh line 36: if [ `nproc` -lt 4 ]; then ^-----^ SC2046: Quote this to prevent word splitting. Here since nproc will generate a single word output and there is no possibility of word splitting, this warning can be ignored. Use exception for this with "disable" option in shellcheck. This warning is observed after commit: "commit `29441ab3a3` ("perf test lock_contention.sh: Skip test if not enough CPUs")" Fixes: `29441ab3a3` ("perf test lock_contention.sh: Skip test if not enough CPUs") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20231013073021.99794-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:51 -07:00
Athira Rajeev	f6a66ff98a	tools/perf/arch/powerpc: Fix the CPU ID const char* value by adding 0x prefix Simple expression parser test fails in powerpc as below: 4: Simple expression parser test child forked, pid 170385 Using CPUID 004e2102 division by zero syntax error syntax error FAILED tests/expr.c:65 parse test failed test child finished with -1 Simple expression parser: FAILED! This is observed after commit: 'commit `9d5da30e4a` ("perf jevents: Add a new expression builtin strcmp_cpuid_str()")' With this commit, a new expression builtin strcmp_cpuid_str got added. This function takes an 'ID' type value, which is a string. So expression parse for strcmp_cpuid_str expects const char * as cpuid value type. In case of powerpc, CPU IDs are numbers. Hence it doesn't get interpreted correctly by bison parser. Example in case of power9, cpuid string returns as: 004e2102 cpuid of string type is expected in two cases: 1. char get_cpuid_str(struct perf_pmu pmu __maybe_unused); Testcase "tests/expr.c" uses "perf_pmu__getcpuid" which calls get_cpuid_str to get the cpuid string. 2. cpuid field in :struct pmu_events_map struct pmu_events_map { const char arch; const char cpuid; Here cpuid field is used in "perf_pmu__find_events_table" function as "strcmp_cpuid_str(map->cpuid, cpuid)". The value for cpuid field is picked from mapfile.csv. Fix the mapfile.csv and get_cpuid_str function to prefix cpuid with 0x so that it gets correctly interpreted by the bison parser Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel<disgoel@linux.ibm.com> Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20231009050052.64935-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:51 -07:00
Leo Yan	78efa7b411	perf cs-etm: Respect timestamp option When users pass the option '--timestamp' or '-T' in the record command, all events will set the PERF_SAMPLE_TIME bit in the attribution. In this case, the AUX event will record the kernel timestamp, but it doesn't mean Arm CoreSight enables timestamp packets in its hardware tracing. If the option '--timestamp' or '-T' is set, this patch always enables Arm CoreSight timestamp, as a result, the bit 28 in event's config is to be set. Before: # perf record -e cs_etm// --per-thread --timestamp -- ls # perf script --header-only ... # event : name = cs_etm//, , id = { 69 }, type = 12, size = 136, config = 0, { sample_period, sample_freq } = 1, sample_type = IP\|TID\|TIME\|CPU\|IDENTIFIER, read_format = ID\|LOST, disabled = 1, enable_on_exec = 1, sample_id_all = 1, exclude_guest = 1 ... After: # perf record -e cs_etm// --per-thread --timestamp -- ls # perf script --header-only ... # event : name = cs_etm//, , id = { 49 }, type = 12, size = 136, config = 0x10000000, { sample_period, sample_freq } = 1, sample_type = IP\|TID\|TIME\|CPU\|IDENTIFIER, read_format = ID\|LOST, disabled = 1, enable_on_exec = 1, sample_id_all = 1, exclude_guest = 1 ... Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231014074159.1667880-3-leo.yan@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:51 -07:00
Leo Yan	f8ccc2d5cc	perf cs-etm: Validate timestamp tracing in per-thread mode So far, it's impossible to validate timestamp trace in Arm CoreSight when the perf is in the per-thread mode. E.g. for the command: perf record -e cs_etm/timestamp/ --per-thread -- ls The command enables config 'timestamp' for 'cs_etm' event in the per-thread mode. In this case, the function cs_etm_validate_config() directly bails out and skips validation. Given profiled process can be scheduled on any CPUs in the per-thread mode, this patch validates timestamp tracing for all CPUs when detect the CPU map is empty. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231014074159.1667880-2-leo.yan@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:51 -07:00
Ian Rogers	0197da7aff	perf pmu: Lazily compute default config The default config is computed during creation of the PMU and may do things like scanning sysfs, when the PMU may just be used as part of scanning. Change default_config to perf_event_attr_init_default, a callback that is used when a default config needs initializing. This avoids holding onto the memory for a perf_event_attr and copying. On a tigerlake laptop running the pmu-scan benchmark: Before: Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average core PMU scanning took: 28.780 usec (+- 0.503 usec) Average PMU scanning took: 283.480 usec (+- 18.471 usec) Number of openat syscalls: 30,227 After: Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average core PMU scanning took: 27.880 usec (+- 0.169 usec) Average PMU scanning took: 245.260 usec (+- 15.758 usec) Number of openat syscalls: 28,914 Over 3 runs it is a nearly 12% reduction in execution time and a 4.3% of openat calls. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:50 -07:00
Ian Rogers	f20c15d13f	perf pmu-events: Remember the perf_events_map for a PMU strcmp_cpuid_str performs regular expression comparisons and so per CPUID linear searches over the perf_events_map are expensive. Add a helper function called map_for_pmu that does the search but also caches the map specific to a PMU. As the PMU may differ, also cache the CPUID string so that PMUs with the same CPUID string don't require the linear search and regular expression comparisons. This speeds loading PMUs as the search is done once per PMU to find the appropriate tables. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Yang Jihong <yangjihong1@huawei.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:50 -07:00
Ian Rogers	63883cb063	perf pmu: Const-ify perf_pmu__config_terms Add const to related APIs, this is so they can be used to default initialize a perf_event_attr from a const pmu. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:50 -07:00
Ian Rogers	3a42f4c796	perf pmu: Const-ify file APIs File APIs don't alter the struct pmu so allow const ones to be passed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:50 -07:00
Ian Rogers	672bd21390	perf arm-spe: Move PMU initialization from default config code Avoid setting PMU values in arm_spe_pmu_default_config, move to perf_pmu__arch_init. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:50 -07:00
Ian Rogers	461e3e636a	perf intel-pt: Move PMU initialization from default config code Avoid setting PMU values in intel_pt_pmu_default_config, move to perf_pmu__arch_init. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:50 -07:00
Ian Rogers	aa61360155	perf pmu: Rename perf_pmu__get_default_config to perf_pmu__arch_init Assign default_config as part of the init. perf_pmu__get_default_config was doing more than just getting the default config and so this is intended to better align with the code. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20231012175645.1849503-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:50 -07:00
Adrian Hunter	661ce78105	perf intel-pt: Prefer get_unaligned_le64 to memcpy_le64 Use get_unaligned_le64() instead of memcpy_le64(..., 8) because it produces simpler code. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-6-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:50 -07:00
Adrian Hunter	3b4fa67fc6	perf intel-pt: Use get_unaligned_le16() etc Avoid unaligned access by using get_unaligned_le16(), get_unaligned_le32() and get_unaligned_le64(). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-5-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:49 -07:00
Adrian Hunter	f058fa5b07	perf intel-pt: Use existing definitions of le16_to_cpu() etc Use definitions from tools/include/linux/kernel.h Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-4-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:49 -07:00
Adrian Hunter	1d2dbce9bb	perf intel-pt: Simplify intel_pt_get_vmcs() Simplify and remove unnecessary constant expressions. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-3-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:49 -07:00
Adrian Hunter	a91c987254	perf tools: Add get_unaligned_leNN() Add get_unaligned_le16(), get_unaligned_le32 and get_unaligned_le64, same as include/asm-generic/unaligned.h. And add include/asm-generic/unaligned.h to check-headers.sh bringing tools/include/asm-generic/unaligned.h up to date so that the kernel and tools versions match. Use diagnostic pragmas to ignore -Wpacked used by perf build. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20231005190451.175568-2-adrian.hunter@intel.com Link: https://lore.kernel.org/r/20231010142234.20061-1-adrian.hunter@intel.com [ squashed check-header.sh addition ] Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-17 12:40:02 -07:00
Besar Wicaksono	a16afcc58a	perf cs-etm: Fix incorrect or missing decoder for raw trace The decoder creation for raw trace uses metadata from the first CPU. On per-cpu mode, this metadata is incorrectly used for every decoder. On per-process/per-thread traces, the first CPU is CPU0. If CPU0 trace is not enabled, its metadata will be marked unused and the decoder is not created. Perf report dump skips the decoding part because the decoder is missing. To fix this, use metadata of the CPU associated with sample object. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: suzuki.poulose@arm.com Cc: mike.leach@linaro.org Cc: jonathanh@nvidia.com Cc: rwiley@nvidia.com Cc: treding@nvidia.com Cc: vsethi@nvidia.com Cc: ywan@nvidia.com Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: linux-tegra@vger.kernel.org Link: https://lore.kernel.org/r/20231010234803.5419-1-bwicaksono@nvidia.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2023-10-12 10:01:57 -07:00

1 2 3 4 5 ...

15745 Commits