linux/tools/perf
Ian Rogers a59fb796a3 perf metrics: Compute unmerged uncore metrics individually
When merging counts from multiple uncore PMUs the metric is only
computed for the metric leader. When merging/aggregation is disabled,
prior to this patch just the leader's metric would be computed. Fix
this by computing the metric for each PMU.

On a SkylakeX:
Before:
```
$ perf stat -A -M memory_bandwidth_total -a sleep 1

 Performance counter stats for 'system wide':

CPU0               82,217      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      9.2 MB/s  memory_bandwidth_total
CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      0.0 MB/s  memory_bandwidth_total
CPU0               61,395      UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_1]
CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_1]
CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU0               81,570      UNC_M_CAS_COUNT.RD [uncore_imc_2]
CPU18             113,886      UNC_M_CAS_COUNT.RD [uncore_imc_2]
CPU0               62,330      UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU18              66,942      UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU0               75,489      UNC_M_CAS_COUNT.RD [uncore_imc_3]
CPU18              27,958      UNC_M_CAS_COUNT.RD [uncore_imc_3]
CPU0               55,864      UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU18              38,727      UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_4]
CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_4]
CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU0               75,423      UNC_M_CAS_COUNT.RD [uncore_imc_5]
CPU18             104,527      UNC_M_CAS_COUNT.RD [uncore_imc_5]
CPU0               57,596      UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU18              56,777      UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU0        1,003,440,851 ns   duration_time

       1.003440851 seconds time elapsed
```

After:
```
$ perf stat -A -M memory_bandwidth_total -a sleep 1

 Performance counter stats for 'system wide':

CPU0               88,968      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      9.5 MB/s  memory_bandwidth_total
CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      0.0 MB/s  memory_bandwidth_total
CPU0               59,498      UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_0]
CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_1] #      0.0 MB/s  memory_bandwidth_total
CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_1] #      0.0 MB/s  memory_bandwidth_total
CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
CPU0               88,635      UNC_M_CAS_COUNT.RD [uncore_imc_2] #      9.5 MB/s  memory_bandwidth_total
CPU18             117,975      UNC_M_CAS_COUNT.RD [uncore_imc_2] #     11.5 MB/s  memory_bandwidth_total
CPU0               60,829      UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU18              62,105      UNC_M_CAS_COUNT.WR [uncore_imc_2]
CPU0               82,238      UNC_M_CAS_COUNT.RD [uncore_imc_3] #      8.7 MB/s  memory_bandwidth_total
CPU18              22,906      UNC_M_CAS_COUNT.RD [uncore_imc_3] #      3.6 MB/s  memory_bandwidth_total
CPU0               53,959      UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU18              32,990      UNC_M_CAS_COUNT.WR [uncore_imc_3]
CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_4] #      0.0 MB/s  memory_bandwidth_total
CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_4] #      0.0 MB/s  memory_bandwidth_total
CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
CPU0               83,595      UNC_M_CAS_COUNT.RD [uncore_imc_5] #      8.9 MB/s  memory_bandwidth_total
CPU18             110,151      UNC_M_CAS_COUNT.RD [uncore_imc_5] #     10.5 MB/s  memory_bandwidth_total
CPU0               56,540      UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU18              53,816      UNC_M_CAS_COUNT.WR [uncore_imc_5]
CPU0        1,003,353,416 ns   duration_time
```

Signed-off-by: Ian Rogers <irogers@google.com>                                  |
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240221070754.4163916-2-irogers@google.com
2024-02-22 08:57:09 -08:00
..
arch perf parse-regs: Introduce a weak function arch__sample_reg_masks() 2024-02-15 13:48:36 -08:00
bench libperf cpumap: Replace usage of perf_cpu_map__new(NULL) with perf_cpu_map__new_online_cpus() 2023-12-12 14:55:13 -03:00
dlfilters
Documentation perf: script: prefer capstone to XED 2024-02-20 18:07:34 -08:00
include/perf
jvmti
pmu-events perf vendor events intel: Update tigerlake TMA metrics to 4.7 2024-02-16 15:29:11 -08:00
python
scripts perf scripts python arm-cs-trace-disasm.py: Do not ignore disam first sample 2023-12-20 14:31:59 -03:00
tests perf: build: introduce the libcapstone 2024-02-20 18:06:25 -08:00
trace tools headers uapi: Sync linux/stat.h with the kernel sources to pick STATX_MNT_ID_UNIQUE 2024-01-26 10:51:48 -03:00
ui perf: script: prefer capstone to XED 2024-02-20 18:07:34 -08:00
util perf metrics: Compute unmerged uncore metrics individually 2024-02-22 08:57:09 -08:00
.gitignore perf build: Shellcheck support for OUTPUT directory 2023-12-05 15:46:43 -03:00
Build
builtin-annotate.c perf annotate: Add --insn-stat option for debugging 2023-12-23 22:40:17 -03:00
builtin-bench.c
builtin-buildid-cache.c perf buildid-cache: Fix use of uninitialized value 2023-10-12 10:01:56 -07:00
builtin-buildid-list.c
builtin-c2c.c perf mem: Clean up perf_pmus__num_mem_pmus() 2024-01-24 14:05:22 -08:00
builtin-config.c
builtin-daemon.c
builtin-data.c
builtin-diff.c
builtin-evlist.c
builtin-ftrace.c libperf cpumap: Replace usage of perf_cpu_map__new(NULL) with perf_cpu_map__new_online_cpus() 2023-12-12 14:55:13 -03:00
builtin-help.c
builtin-inject.c perf record: Lazy load kernel symbols 2023-11-09 13:49:32 -03:00
builtin-kallsyms.c
builtin-kmem.c
builtin-kvm.c
builtin-kwork.c perf kwork: Fix a build error on 32-bit 2023-11-21 10:02:38 -08:00
builtin-list.c perf list: For metricgroup only list include description 2024-02-16 16:07:34 -08:00
builtin-lock.c perf lock: Fix a memory leak on an error path 2023-11-27 10:21:27 -03:00
builtin-mem.c perf mem: Clean up perf_pmus__num_mem_pmus() 2024-01-24 14:05:22 -08:00
builtin-probe.c
builtin-record.c Merge branch 'perf-tools' into perf-tools-next 2024-02-12 12:19:21 -08:00
builtin-report.c perf tools: Make it possible to see perf's kernel and module memory mappings 2024-02-08 15:49:39 -08:00
builtin-sched.c perf sched: Move curr_pid and cpu_last_switched initialization to perf_sched__{lat|map|replay}() 2024-02-09 14:08:41 -08:00
builtin-script.c perf: script: add raw|disasm arguments to --insn-trace option 2024-02-20 18:07:21 -08:00
builtin-stat.c perf stat: Support per-cluster aggregation 2024-02-09 14:59:53 -08:00
builtin-timechart.c
builtin-top.c Merge branch 'perf-tools' into perf-tools-next 2024-02-12 12:19:21 -08:00
builtin-trace.c perf env: Introduce perf_env__arch_strerrno() 2023-12-04 16:42:09 -03:00
builtin-version.c perf: build: introduce the libcapstone 2024-02-20 18:06:25 -08:00
builtin.h
check-headers.sh perf tools: Add get_unaligned_leNN() 2023-10-17 12:40:02 -07:00
command-list.txt
CREDITS
design.txt
Makefile
Makefile.config perf: build: introduce the libcapstone 2024-02-20 18:06:25 -08:00
Makefile.perf perf: build: introduce the libcapstone 2024-02-20 18:06:25 -08:00
MANIFEST tools perf: Add arm64 sysreg files to MANIFEST 2023-11-22 11:17:53 -08:00
perf-archive.sh perf archive: Add new option '--unpack' to expand tarballs 2023-12-20 13:20:45 -03:00
perf-completion.sh
perf-iostat.sh
perf-read-vdso.c
perf-sys.h
perf.c perf tools: Add --debug-file option to redirect debug output 2023-11-28 14:14:53 -03:00
perf.h