linux

iv/linux

Author	SHA1	Message	Date
Ian Rogers	d5bcade989	perf test: Rename builtin-test-list and add missed header guard builtin-test-list is primarily concerned with shell script tests. Rename the file to better reflect this and add a missed header guard. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-6-irogers@google.com	2024-02-22 09:12:40 -08:00
Ian Rogers	526f2ac9f6	perf tests: Avoid fork in perf_has_symbol test perf test -vv Symbols is used to indentify symbols within the perf binary. Add the -F flag so that the test command doesn't fork the test before running. This removes a little overhead. Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-4-irogers@google.com	2024-02-22 09:12:04 -08:00
Ian Rogers	8ece26ad5a	perf list: Add scandirat compatibility function scandirat is used during the printing of tracepoint events but may be missing from certain libcs. Add a compatibility implementation that uses the symlink of an fd in /proc as a path for the reliably present scandir. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-3-irogers@google.com	2024-02-22 09:11:41 -08:00
Ian Rogers	510e528786	perf thread_map: Skip exited threads when scanning /proc Scanning /proc is inherently racy. Scanning /proc/pid/task within that is also racy as the pid can terminate. Rather than failing in __thread_map__new_all_cpus, skip pids for such failures. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-2-irogers@google.com	2024-02-22 09:11:03 -08:00
Thomas Richter	b6968f9b50	perf list: fix short description for some cache events Correct the short description of the following events: DCW_REQ, DCW_REQ_CHIP_HIT, DCW_REQ_DRAWER_HIT, DCW_REQ_IV, DCW_ON_CHIP, DCW_ON_CHIP_IV, DCW_ON_CHIP_CHIP_HIT, DCW_ON_CHIP_DRAWER_HIT, CW_ON_MODULE, DCW_ON_DRAWER, DCW_OFF_DRAWER, IDCW_ON_MODULE_IV, IDCW_ON_MODULE_CHIP_HIT, IDCW_ON_MODULE_DRAWER_HIT, IDCW_ON_DRAWER_IV, IDCW_ON_DRAWER_CHIP_HIT, IDCW_ON_DRAWER_DRAWER_HIT, IDCW_OFF_DRAWER_IV, IDCW_OFF_DRAWER_CHIP_HIT, IDCW_OFF_DRAWER_DRAWER_HIT, ICW_REQ, ICW_REQ_IV, CW_REQ_CHIP_HIT, ICW_REQ_DRAWER_HIT, ICW_ON_CHIP, ICW_ON_CHIP_IV, ICW_ON_CHIP_CHIP_HIT, ICW_ON_CHIP_DRAWER_HIT, ICW_ON_MODULE and ICW_OFF_DRAWER. The second Cache should be L2-Cache. Output before (display diff of the first four events) # perf list -d DCW_REQ [Directory Write Level 1 Data Cache from Cache. Unit: cpum_cf] DCW_REQ_CHIP_HIT [Directory Write Level 1 Data Cache from Cache with Chip HP \ Hit. Unit: cpum_cf] DCW_REQ_DRAWER_HIT [Directory Write Level 1 Data Cache from Cache with Drawer \ HP Hit. Unit: cpum_cf] DCW_REQ_IV [Directory Write Level 1 Data Cache from Cache with Intervention. \ Unit: cpum_cf] Output after: # perf list -d DCW_REQ [Directory Write Level 1 Data Cache from L2-Cache. Unit: cpum_cf] DCW_REQ_CHIP_HIT [Directory Write Level 1 Data Cache from L2-Cache with Chip HP \ Hit. Unit: cpum_cf] DCW_REQ_DRAWER_HIT [Directory Write Level 1 Data Cache from L2-Cache with Drawer \ HP Hit. Unit: cpum_cf] DCW_REQ_IV [Directory Write Level 1 Data Cache from L2-Cache with \ Intervention. Unit: cpum_cf] Fixes: 7f76b3113068 ("perf list: Add IBM z16 event description for s390") Reported-by: Andreas Krebbel <krebbel@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Andreas Krebbel <krebbel@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Cc: svens@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221091908.1759083-1-tmricht@linux.ibm.com	2024-02-22 09:02:59 -08:00
Ian Rogers	bafd4e75c1	perf stat: Fix metric-only aggregation index Aggregation index was being computed using the evsel's cpumap which may have a different (typically the same or fewer) entries. Before: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 12.8 0.0 12.9 12.7 0.0 12.6 CPU1 1.007806367 seconds time elapsed ``` After: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 15.4 0.0 15.3 15.0 0.0 14.9 CPU18 0.0 0.0 13.5 5.2 0.0 11.9 1.007858736 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> \| Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-3-irogers@google.com	2024-02-22 08:57:29 -08:00
Ian Rogers	a59fb796a3	perf metrics: Compute unmerged uncore metrics individually When merging counts from multiple uncore PMUs the metric is only computed for the metric leader. When merging/aggregation is disabled, prior to this patch just the leader's metric would be computed. Fix this by computing the metric for each PMU. On a SkylakeX: Before: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,440,851 ns duration_time 1.003440851 seconds time elapsed ``` After: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,353,416 ns duration_time ``` Signed-off-by: Ian Rogers <irogers@google.com> \| Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-2-irogers@google.com	2024-02-22 08:57:09 -08:00
Ian Rogers	eee41e6b28	perf stat: Pass fewer metric arguments Pass metric_expr and evsel rather than specific variables from the struct, thereby reducing the number of arguments. This will enable later fixes. To reduce the size of the diff, local variables are added to match the previous parameter names. This isn't done in the case of "name" as evsel->name is more intention revealing. A whitespace issue is also addressed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-1-irogers@google.com	2024-02-22 08:56:45 -08:00
Changbin Du	659663f0bc	perf: script: prefer capstone to XED Now perf can show assembly instructions with libcapstone for x86, and the capstone is better in general. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-6-changbin.du@huawei.com	2024-02-20 18:07:34 -08:00
Changbin Du	6750ba4b64	perf: script: add raw\|disasm arguments to --insn-trace option Now '--insn-trace' accept a argument to specify the output format: - raw: display raw instructions. - disasm: display mnemonic instructions (if capstone is installed). $ sudo perf script --insn-trace=raw ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: 48 89 e7 ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: e8 e8 0c 00 00 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: f3 0f 1e fa $ sudo perf script --insn-trace=disasm ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) illegal instruction ls 1443864 [006] 2275506.209908875: 7f216b426df4 _dl_start+0x4 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df5 _dl_start+0x5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df8 _dl_start+0x8 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %r15 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-5-changbin.du@huawei.com	2024-02-20 18:07:21 -08:00
Changbin Du	9941723438	perf: script: add field 'disasm' to display mnemonic instructions In addition to the 'insn' field, this adds a new field 'disasm' to display mnemonic instructions instead of the raw code. $ sudo perf script -F +disasm perf-exec 1443864 [006] 2275506.209848: psb: psb offs: 0 0 [unknown] ([unknown]) perf-exec 1443864 [006] 2275506.209848: cbr: cbr: 41 freq: 4100 MHz (114%) 0 [unknown] ([unknown]) ls 1443864 [006] 2275506.209905: 1 branches:uH: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908: 1 branches:uH: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-4-changbin.du@huawei.com	2024-02-20 18:07:07 -08:00
Changbin Du	8f0ec15ff6	perf: util: use capstone disasm engine to show assembly instructions Currently, the instructions of samples are shown as raw hex strings which are hard to read. x86 has a special option '--xed' to disassemble the hex string via intel XED tool. Here we use capstone as our disassembler engine to give more friendly instructions. We select libcapstone because capstone can provide more insn details. Perf will fallback to raw instructions if libcapstone is not available. The advantages compared to XED tool: * Support arm, arm64, x86-32, x86_64 (more could be supported), xed only for x86_64. * Immediate address operands are shown as symbol+offs. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-3-changbin.du@huawei.com	2024-02-20 18:06:48 -08:00
Changbin Du	8b767db330	perf: build: introduce the libcapstone Later we will use libcapstone to disassemble instructions of samples. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-2-changbin.du@huawei.com	2024-02-20 18:06:25 -08:00
Ian Rogers	81377de00f	perf list: For metricgroup only list include description If perf list is invoked with 'metricgroups' include the description unless it is invoked with flags to exclude it. Make the description of metricgroup dumping dependent on the desc flag in print_state as with metrics. Before: ``` $ perf list metricgroups List of pre-defined events (to be used in -e or -M): Metric Groups: Backend Bad BadSpec ... ``` After: ``` $ perf list metricgroups List of pre-defined events (to be used in -e or -M): Metric Groups: Backend [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] Bad [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet] BadSpec ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240216192044.119897-1-irogers@google.com	2024-02-16 16:07:34 -08:00
Namhyung Kim	bacefe0c7b	perf tools: Fixup module symbol end address properly I got a strange error on ARM to fail on processing FINISHED_ROUND record. It turned out that it was failing in symbol__alloc_hist() because the symbol size is too big. When a sample is captured on a specific BPF program, it failed. I've added a debug code and found the end address of the symbol is from the next module which is placed far way. ffff800008795778-ffff80000879d6d8: bpf_prog_1bac53b8aac4bc58_netcg_sock [bpf] ffff80000879d6d8-ffff80000ad656b4: bpf_prog_76867454b5944e15_netcg_getsockopt [bpf] ffff80000ad656b4-ffffd69b7af74048: bpf_prog_1d50286d2eb1be85_hn_egress [bpf] <---------- here ffffd69b7af74048-ffffd69b7af74048: $x.5 [sha3_generic] ffffd69b7af74048-ffffd69b7af740b8: crypto_sha3_init [sha3_generic] ffffd69b7af740b8-ffffd69b7af741e0: crypto_sha3_update [sha3_generic] The logic in symbols__fixup_end() just uses curr->start to update the prev->end. But in this case, it won't work as it's too different. I think ARM has a different kernel memory layout for modules and BPF than on x86. Actually there's a logic to handle kernel and module boundary. Let's do the same for symbols between different modules. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Leo Yan <leo.yan@linux.dev> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240212233322.1855161-1-namhyung@kernel.org	2024-02-16 16:07:28 -08:00
Ian Rogers	6f146b249b	perf vendor events intel: Update tigerlake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-31-irogers@google.com	2024-02-16 15:29:11 -08:00
Ian Rogers	e2c8b40e37	perf vendor events intel: Update skylakex TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-30-irogers@google.com	2024-02-16 15:28:59 -08:00
Ian Rogers	f15fa6ba76	perf vendor events intel: Update skylake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-29-irogers@google.com	2024-02-16 15:28:47 -08:00
Ian Rogers	53c83c79aa	perf vendor events intel: Update sapphirerapids TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - tma_c01_wait and tma_c02_wait metrics measure power-performance states. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-28-irogers@google.com	2024-02-16 15:28:36 -08:00
Ian Rogers	176e66715d	perf vendor events intel: Update sandybridge TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Add metrics tma_fp_vector_128b, tma_fp_vector_256b and tma_info_system_cpus_utilized. - Remove metrics tma_info_system_mem_parallel_requests, tma_info_system_core_frequency and tma_info_system_mem_request_latency. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-27-irogers@google.com	2024-02-16 15:28:24 -08:00
Ian Rogers	74f76c3ba7	perf vendor events intel: Update rocketlake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-26-irogers@google.com	2024-02-16 15:28:12 -08:00
Ian Rogers	5f9a13bee0	perf vendor events intel: Update jaketown TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-25-irogers@google.com	2024-02-16 15:27:59 -08:00
Ian Rogers	14bc1a59f2	perf vendor events intel: Update ivytown TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Reduced number of events when SMT is off. - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-24-irogers@google.com	2024-02-16 15:27:47 -08:00
Ian Rogers	8cf54fa844	perf vendor events intel: Update ivybridge TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Reduced number of events when SMT is off. - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-23-irogers@google.com	2024-02-16 15:27:34 -08:00
Ian Rogers	b15cae3f69	perf vendor events intel: Update icelakex TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-22-irogers@google.com	2024-02-16 15:27:22 -08:00
Ian Rogers	70bfdad63f	perf vendor events intel: Update icelake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-21-irogers@google.com	2024-02-16 15:27:07 -08:00
Ian Rogers	2a264a1946	perf vendor events intel: Update haswellx TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-20-irogers@google.com	2024-02-16 15:26:54 -08:00
Ian Rogers	89b66259a7	perf vendor events intel: Update haswell TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-19-irogers@google.com	2024-02-16 15:26:42 -08:00
Ian Rogers	c72a20435a	perf vendor events intel: Update cascadelakex TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-18-irogers@google.com	2024-02-16 15:26:28 -08:00
Ian Rogers	8792e8f89d	perf vendor events intel: Update broadwellx TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-17-irogers@google.com	2024-02-16 15:26:15 -08:00
Ian Rogers	4018680df9	perf vendor events intel: Update broadwellde TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-16-irogers@google.com	2024-02-16 15:26:03 -08:00
Ian Rogers	eedd6d0a72	perf vendor events intel: Update broadwell TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc and tma_info_inst_mix_ipflop. - Removal of tma_info_bad_spec_branch_misprediction_cost. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-15-irogers@google.com	2024-02-16 15:25:51 -08:00
Ian Rogers	52530942ba	perf vendor events intel: Update alderlake TMA metrics to 4.7 Top-Down Microarchitecture Analysis (TMA) metrics simplify cycle-accounting using microarchitecture-abstracted metrics organized in one hierarchy. This update is from version 4.5 to 4.7. The update includes: - tma_info_bottleneck* metrics, an abstraction or summarization of the 100+ TMA tree nodes into 12-entry familiar performance metrics. - tma_c01_wait and tma_c02_wait metrics measure power-performance states. - Reduce number of events (multiplexing) for tma_info_system_gflops, tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0. - Fixes for tma_info_bottleneck_mispredictions and tma_info_bad_spec_branch_misprediction_cost. - New tma_info_inst_mix_ippause metric. - tma_serializing_operation is raised to level 3. - Swapped tma_info_core_ilp (becomes per SMT thread) and tma_info_pipeline_execute (per physical core). - tma_nop_instructions and tma_shuffles_256b are lowered to level 4 under tma_other_light_ops_group. - Reduced number of events when SMT is off. - Tuned thresholds for tma_info_bottleneck_branching_overhead, tma_fetch_bandwidth and tma_ports_utilized_3m. The update came from: https://github.com/intel/perfmon/pull/140 https://github.com/intel/perfmon/pull/138 Running the script: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-14-irogers@google.com	2024-02-16 15:25:40 -08:00
Ian Rogers	c4bb31c7b0	perf vendor events intel: Update tigerlake events to v1.15 Update alderlake events to v1.15 released in: `282a6951fd` Documentation fixes, removal of TOPDOWN.BR_MISPREDICT_SLOTS, deprecation of UNC_ARB_DAT_REQUESTS.RD, UNC_ARB_DAT_REQUESTS.RD and UNC_ARB_IFA_OCCUPANCY.ALL. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-13-irogers@google.com	2024-02-16 15:25:28 -08:00
Ian Rogers	c31d718ca2	perf vendor events intel: Update skylake events to v58 Update skylake events to v58 released in: `625fb75073` Improves documentation. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-12-irogers@google.com	2024-02-16 15:25:17 -08:00
Ian Rogers	9626368d42	perf vendor events intel: Update sierraforst events to v1.01 Update sierraforest events to v1.01 released in: `582bca24aa` Adds the majority of core and uncore events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-11-irogers@google.com	2024-02-16 15:25:06 -08:00
Ian Rogers	8972c03353	perf vendor events intel: Update rocketlake events to v1.02 Update alderlake events to v1.02 released in: `4931178d1e` Improves documentation and removes TOPDOWN.BR_MISPREDICT_SLOTS. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-10-irogers@google.com	2024-02-16 15:24:54 -08:00
Ian Rogers	1d262a85e2	perf vendor events intel: Update meteorlake events to v1.07 Update meteorlake events to v1.07 released in: `6251722308` Umask changed on atom mem_bound events. Adds atom events ARITH.FPDIV_ACTIVE, FP_FLOPS_RETIRED.ALL, FP_FLOPS_RETIRED.DP, FP_FLOPS_RETIRED.FP32, ARITH.DIV_ACTIVE, BR_INST_RETIRED.COND, BR_INST_RETIRED.COND_TAKEN, BR_INST_RETIRED.INDIRECT, BR_INST_RETIRED.INDIRECT_CALL, BR_INST_RETIRED.IND_CALL, BR_INST_RETIRED.NEAR_RETURN, DTLB_LOAD_MISSES.WALK_COMPLETED_4K, DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M, DTLB_STORE_MISSES.WALK_COMPLETED_4K, ITLB_MISSES.WALK_COMPLETED_4K, and alias events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-9-irogers@google.com	2024-02-16 15:24:16 -08:00
Ian Rogers	e8866cdbe1	perf vendor events intel: Update icelake events to v1.21 Update icelake events to v1.21 released in: `54f1246b04` Improves descriptions, removes TOPDOWN.BR_MISPREDICT_SLOTS. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-8-irogers@google.com	2024-02-16 15:24:04 -08:00
Ian Rogers	f9044d46b7	perf vendor events intel: Update haswell events to v35 Update haswell events to v35 released in: `c0f9b34d42` Updates "must be precise" on RTM_RETIRED.ABORTED. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: linux-perf-users@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-7-irogers@google.com	2024-02-16 15:23:53 -08:00
Ian Rogers	24cda3081a	perf vendor events intel: Update grandridge events to v1.01 Update grandridge events to v1.01 released in: `211d607165` Adds the majority of core and uncore events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-6-irogers@google.com	2024-02-16 15:23:40 -08:00
Ian Rogers	ea518afc99	perf vendor events intel: Update emeraldrapids events to v1.03 Update emeraldrapids events to v1.03 released in: `c7c6f72dae` Adds uncore CHA events. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-5-irogers@google.com	2024-02-16 15:23:24 -08:00
Ian Rogers	7163acea30	perf vendor events intel: Update broadwell events to v29 Update broadwell events to v29 released in: `47117146c6` Updates "must be precise" on RTM_RETIRED.ABORTED. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-4-irogers@google.com	2024-02-16 15:23:07 -08:00
Ian Rogers	5dcc2abaa5	perf vendor events intel: Update alderlaken events to v1.24 Update alderlaken events to v1.24 released in: `e627dd8d89` Adds LBR_INSERTS.ANY/MISC_RETIRED.LBR_INSERTS event. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-3-irogers@google.com	2024-02-16 15:22:48 -08:00
Ian Rogers	2252ddf434	perf vendor events intel: Update alderlake events to v1.24 Update alderlake events to v1.24 released in: `e627dd8d89` Adds aliased events, improves documentation and fix some event fields. Event json automatically generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240214011820.644458-2-irogers@google.com	2024-02-16 15:22:26 -08:00
Arnaldo Carvalho de Melo	29d16de26d	perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h If we instead decide to generate vmlinux.h from BTF info, it will be there: $ pahole timespec64 struct timespec64 { time64_t tv_sec; /* 0 8 / long int tv_nsec; / 8 8 / / size: 16, cachelines: 1, members: 2 / / last cacheline: 16 bytes */ }; $ pahole manages to find it from /sys/kernel/btf/vmlinux, that is generated from the kernel types. With this linux/bpf.h doesn't need to be included, as its already in the minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h file or what we need comes when generating a vmlinux.h file from BTF info, i.e. when using GEN_VMLINUX_H=1, as noticed by Namyung in a build break before removing linux/bpf.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/Zc_fp6CgDClPhS_O@x1	2024-02-16 15:19:57 -08:00
Michael Petlan	f512e08fd0	perf testsuite: Install kprobe tests and common files Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-8-mpetlan@redhat.com	2024-02-16 11:50:02 -08:00
Veronika Molnarova	e7d759f31c	perf testsuite: Add test for kprobe handling Test perf interface to kprobes: listing, adding and removing probes. It is run as a part of perftool-testsuite_probe test case. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-7-mpetlan@redhat.com	2024-02-16 11:49:47 -08:00
Veronika Molnarova	61d348f1e9	perf testsuite: Add common output checking helpers As a form of validation, it is a common practice to check the outputs of commands whether they contain expected patterns or match a certain regex. Add helpers for verifying that all regexes are found in the output, that all lines match any pattern from a set and that a certain expression is not present in the output. In verbose mode these helpers log mismatches for easier failure investigation. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-6-mpetlan@redhat.com	2024-02-16 11:49:36 -08:00
Veronika Molnarova	c8eb2a9ff8	perf testsuite: Add test case for perf probe Add new perf probe test case that acts as an entry element in perf test list. Runs multiple subtests from directory "base_probe", which will be added in incomming patches and can be expanded without further editing. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: kjain@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240215110231.15385-5-mpetlan@redhat.com	2024-02-16 11:49:22 -08:00

... 5 6 7 8 9 ...

16183 Commits