perf doc: Refresh topdown documentation
perf stat now supports --topdown for any platform with the TopdownL1 metric group including Intel before Icelake. Tweak the documentation to reflect this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-stm32@st-md-mailman.stormreply.com Link: https://lore.kernel.org/r/20230219092848.639226-43-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit is contained in:
parent
7b86475f02
commit
20cb10eadb
@ -394,10 +394,10 @@ See perf list output for the possible metrics and metricgroups.
|
||||
Do not aggregate counts across all monitored CPUs.
|
||||
|
||||
--topdown::
|
||||
Print complete top-down metrics supported by the CPU. This allows to
|
||||
determine bottle necks in the CPU pipeline for CPU bound workloads,
|
||||
by breaking the cycles consumed down into frontend bound, backend bound,
|
||||
bad speculation and retiring.
|
||||
Print top-down metrics supported by the CPU. This allows to determine
|
||||
bottle necks in the CPU pipeline for CPU bound workloads, by breaking
|
||||
the cycles consumed down into frontend bound, backend bound, bad
|
||||
speculation and retiring.
|
||||
|
||||
Frontend bound means that the CPU cannot fetch and decode instructions fast
|
||||
enough. Backend bound means that computation or memory access is the bottle
|
||||
@ -430,15 +430,18 @@ CPUs the workload runs on. If needed the CPUs can be forced using
|
||||
taskset.
|
||||
|
||||
--td-level::
|
||||
Print the top-down statistics that equal to or lower than the input level.
|
||||
It allows users to print the interested top-down metrics level instead of
|
||||
the complete top-down metrics.
|
||||
Print the top-down statistics that equal the input level. It allows
|
||||
users to print the interested top-down metrics level instead of the
|
||||
level 1 top-down metrics.
|
||||
|
||||
The availability of the top-down metrics level depends on the hardware. For
|
||||
example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids
|
||||
supports both L1 and L2 top-down metrics.
|
||||
As the higher levels gather more metrics and use more counters they
|
||||
will be less accurate. By convention a metric can be examined by
|
||||
appending '_group' to it and this will increase accuracy compared to
|
||||
gathering all metrics for a level. For example, level 1 analysis may
|
||||
highlight 'tma_frontend_bound'. This metric may be drilled into with
|
||||
'tma_frontend_bound_group' with
|
||||
'perf stat -M tma_frontend_bound_group...'.
|
||||
|
||||
Default: 0 means the max level that the current hardware support.
|
||||
Error out if the input is higher than the supported max level.
|
||||
|
||||
--no-merge::
|
||||
|
@ -1,46 +1,35 @@
|
||||
Using TopDown metrics in user space
|
||||
-----------------------------------
|
||||
Using TopDown metrics
|
||||
---------------------
|
||||
|
||||
Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
|
||||
methodology to break down CPU pipeline execution into 4 bottlenecks:
|
||||
frontend bound, backend bound, bad speculation, retiring.
|
||||
TopDown metrics break apart performance bottlenecks. Starting at level
|
||||
1 it is typical to get metrics on retiring, bad speculation, frontend
|
||||
bound, and backend bound. Higher levels provide more detail in to the
|
||||
level 1 bottlenecks, such as at level 2: core bound, memory bound,
|
||||
heavy operations, light operations, branch mispredicts, machine
|
||||
clears, fetch latency and fetch bandwidth. For more details see [1][2][3].
|
||||
|
||||
For more details on Topdown see [1][5]
|
||||
perf stat --topdown implements this using available metrics that vary
|
||||
per architecture.
|
||||
|
||||
Traditionally this was implemented by events in generic counters
|
||||
and specific formulas to compute the bottlenecks.
|
||||
% perf stat -a --topdown -I1000
|
||||
# time % tma_retiring % tma_backend_bound % tma_frontend_bound % tma_bad_speculation
|
||||
1.001141351 11.5 34.9 46.9 6.7
|
||||
2.006141972 13.4 28.1 50.4 8.1
|
||||
3.010162040 12.9 28.1 51.1 8.0
|
||||
4.014009311 12.5 28.6 51.8 7.2
|
||||
5.017838554 11.8 33.0 48.0 7.2
|
||||
5.704818971 14.0 27.5 51.3 7.3
|
||||
...
|
||||
|
||||
perf stat --topdown implements this.
|
||||
|
||||
Full Top Down includes more levels that can break down the
|
||||
bottlenecks further. This is not directly implemented in perf,
|
||||
but available in other tools that can run on top of perf,
|
||||
such as toplev[2] or vtune[3]
|
||||
|
||||
New Topdown features in Ice Lake
|
||||
===============================
|
||||
New Topdown features in Intel Ice Lake
|
||||
======================================
|
||||
|
||||
With Ice Lake CPUs the TopDown metrics are directly available as
|
||||
fixed counters and do not require generic counters. This allows
|
||||
to collect TopDown always in addition to other events.
|
||||
|
||||
% perf stat -a --topdown -I1000
|
||||
# time retiring bad speculation frontend bound backend bound
|
||||
1.001281330 23.0% 15.3% 29.6% 32.1%
|
||||
2.003009005 5.0% 6.8% 46.6% 41.6%
|
||||
3.004646182 6.7% 6.7% 46.0% 40.6%
|
||||
4.006326375 5.0% 6.4% 47.6% 41.0%
|
||||
5.007991804 5.1% 6.3% 46.3% 42.3%
|
||||
6.009626773 6.2% 7.1% 47.3% 39.3%
|
||||
7.011296356 4.7% 6.7% 46.2% 42.4%
|
||||
8.012951831 4.7% 6.7% 47.5% 41.1%
|
||||
...
|
||||
|
||||
This also enables measuring TopDown per thread/process instead
|
||||
of only per core.
|
||||
|
||||
Using TopDown through RDPMC in applications on Ice Lake
|
||||
======================================================
|
||||
Using TopDown through RDPMC in applications on Intel Ice Lake
|
||||
=============================================================
|
||||
|
||||
For more fine grained measurements it can be useful to
|
||||
access the new directly from user space. This is more complicated,
|
||||
@ -301,8 +290,8 @@ This "opens" a new measurement period.
|
||||
A program using RDPMC for TopDown should schedule such a reset
|
||||
regularly, as in every few seconds.
|
||||
|
||||
Limits on Ice Lake
|
||||
==================
|
||||
Limits on Intel Ice Lake
|
||||
========================
|
||||
|
||||
Four pseudo TopDown metric events are exposed for the end-users,
|
||||
topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
|
||||
@ -318,8 +307,8 @@ a sampling read group. Since the SLOTS event must be the leader of a TopDown
|
||||
group, the second event of the group is the sampling event.
|
||||
For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
|
||||
|
||||
Extension on Sapphire Rapids Server
|
||||
===================================
|
||||
Extension on Intel Sapphire Rapids Server
|
||||
=========================================
|
||||
The metrics counter is extended to support TMA method level 2 metrics.
|
||||
The lower half of the register is the TMA level 1 metrics (legacy).
|
||||
The upper half is also divided into four 8-bit fields for the new level 2
|
||||
@ -338,7 +327,6 @@ other four level 2 metrics by subtracting corresponding metrics as below.
|
||||
|
||||
|
||||
[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
|
||||
[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
|
||||
[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
|
||||
[2] https://sites.google.com/site/analysismethods/yasin-pubs
|
||||
[3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
|
||||
[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
|
||||
[5] https://sites.google.com/site/analysismethods/yasin-pubs
|
||||
|
Loading…
x
Reference in New Issue
Block a user