e89eaa611c
The Intel hybrid description is written in a different style than the rest of the perf record man page. There were some new command line options added after it which resulted in very strange section ordering. Move the hybrid include last. Also the sub sections in the hybrid document don't fit the record manpage well (especially since it talks about all kinds of unrelated commands). I left this for now, but would be better to separate this properly in the different man pages. It would be better to use sub sections for the other sections, but these don't seem to be supported in AsciiDoc? Some of the examples are still misrendered in the manpage with an indented troff command, but I don't know how to fix that. In any case it's now better than before. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220818100127.249401-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
205 lines
6.9 KiB
Plaintext
205 lines
6.9 KiB
Plaintext
Intel hybrid support
|
|
--------------------
|
|
Support for Intel hybrid events within perf tools.
|
|
|
|
For some Intel platforms, such as AlderLake, which is hybrid platform and
|
|
it consists of atom cpu and core cpu. Each cpu has dedicated event list.
|
|
Part of events are available on core cpu, part of events are available
|
|
on atom cpu and even part of events are available on both.
|
|
|
|
Kernel exports two new cpu pmus via sysfs:
|
|
/sys/devices/cpu_core
|
|
/sys/devices/cpu_atom
|
|
|
|
The 'cpus' files are created under the directories. For example,
|
|
|
|
cat /sys/devices/cpu_core/cpus
|
|
0-15
|
|
|
|
cat /sys/devices/cpu_atom/cpus
|
|
16-23
|
|
|
|
It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
|
|
|
|
As before, use perf-list to list the symbolic event.
|
|
|
|
perf list
|
|
|
|
inst_retired.any
|
|
[Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom]
|
|
inst_retired.any
|
|
[Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core]
|
|
|
|
The 'Unit: xxx' is added to brief description to indicate which pmu
|
|
the event is belong to. Same event name but with different pmu can
|
|
be supported.
|
|
|
|
Enable hybrid event with a specific pmu
|
|
|
|
To enable a core only event or atom only event, following syntax is supported:
|
|
|
|
cpu_core/<event name>/
|
|
or
|
|
cpu_atom/<event name>/
|
|
|
|
For example, count the 'cycles' event on core cpus.
|
|
|
|
perf stat -e cpu_core/cycles/
|
|
|
|
Create two events for one hardware event automatically
|
|
|
|
When creating one event and the event is available on both atom and core,
|
|
two events are created automatically. One is for atom, the other is for
|
|
core. Most of hardware events and cache events are available on both
|
|
cpu_core and cpu_atom.
|
|
|
|
For hardware events, they have pre-defined configs (e.g. 0 for cycles).
|
|
But on hybrid platform, kernel needs to know where the event comes from
|
|
(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
|
|
can't carry pmu information. So now this type is extended to be PMU aware
|
|
type. The PMU type ID is stored at attr.config[63:32].
|
|
|
|
PMU type ID is retrieved from sysfs.
|
|
/sys/devices/cpu_atom/type
|
|
/sys/devices/cpu_core/type
|
|
|
|
The new attr.config layout for PERF_TYPE_HARDWARE:
|
|
|
|
PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
|
|
AA: hardware event ID
|
|
EEEEEEEE: PMU type ID
|
|
|
|
Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
|
|
PMU aware type. The PMU type ID is stored at attr.config[63:32].
|
|
|
|
The new attr.config layout for PERF_TYPE_HW_CACHE:
|
|
|
|
PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB
|
|
BB: hardware cache ID
|
|
CC: hardware cache op ID
|
|
DD: hardware cache op result ID
|
|
EEEEEEEE: PMU type ID
|
|
|
|
When enabling a hardware event without specified pmu, such as,
|
|
perf stat -e cycles -a (use system-wide in this example), two events
|
|
are created automatically.
|
|
|
|
------------------------------------------------------------
|
|
perf_event_attr:
|
|
size 120
|
|
config 0x400000000
|
|
sample_type IDENTIFIER
|
|
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
|
|
disabled 1
|
|
inherit 1
|
|
exclude_guest 1
|
|
------------------------------------------------------------
|
|
|
|
and
|
|
|
|
------------------------------------------------------------
|
|
perf_event_attr:
|
|
size 120
|
|
config 0x800000000
|
|
sample_type IDENTIFIER
|
|
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
|
|
disabled 1
|
|
inherit 1
|
|
exclude_guest 1
|
|
------------------------------------------------------------
|
|
|
|
type 0 is PERF_TYPE_HARDWARE.
|
|
0x4 in 0x400000000 indicates it's cpu_core pmu.
|
|
0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random).
|
|
|
|
The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus),
|
|
and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus).
|
|
|
|
For perf-stat result, it displays two events:
|
|
|
|
Performance counter stats for 'system wide':
|
|
|
|
6,744,979 cpu_core/cycles/
|
|
1,965,552 cpu_atom/cycles/
|
|
|
|
The first 'cycles' is core event, the second 'cycles' is atom event.
|
|
|
|
Thread mode example:
|
|
|
|
perf-stat reports the scaled counts for hybrid event and with a percentage
|
|
displayed. The percentage is the event's running time/enabling time.
|
|
|
|
One example, 'triad_loop' runs on cpu16 (atom core), while we can see the
|
|
scaled value for core cycles is 160,444,092 and the percentage is 0.47%.
|
|
|
|
perf stat -e cycles \-- taskset -c 16 ./triad_loop
|
|
|
|
As previous, two events are created.
|
|
|
|
------------------------------------------------------------
|
|
perf_event_attr:
|
|
size 120
|
|
config 0x400000000
|
|
sample_type IDENTIFIER
|
|
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
|
|
disabled 1
|
|
inherit 1
|
|
enable_on_exec 1
|
|
exclude_guest 1
|
|
------------------------------------------------------------
|
|
|
|
and
|
|
|
|
------------------------------------------------------------
|
|
perf_event_attr:
|
|
size 120
|
|
config 0x800000000
|
|
sample_type IDENTIFIER
|
|
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
|
|
disabled 1
|
|
inherit 1
|
|
enable_on_exec 1
|
|
exclude_guest 1
|
|
------------------------------------------------------------
|
|
|
|
Performance counter stats for 'taskset -c 16 ./triad_loop':
|
|
|
|
233,066,666 cpu_core/cycles/ (0.43%)
|
|
604,097,080 cpu_atom/cycles/ (99.57%)
|
|
|
|
perf-record:
|
|
|
|
If there is no '-e' specified in perf record, on hybrid platform,
|
|
it creates two default 'cycles' and adds them to event list. One
|
|
is for core, the other is for atom.
|
|
|
|
perf-stat:
|
|
|
|
If there is no '-e' specified in perf stat, on hybrid platform,
|
|
besides of software events, following events are created and
|
|
added to event list in order.
|
|
|
|
cpu_core/cycles/,
|
|
cpu_atom/cycles/,
|
|
cpu_core/instructions/,
|
|
cpu_atom/instructions/,
|
|
cpu_core/branches/,
|
|
cpu_atom/branches/,
|
|
cpu_core/branch-misses/,
|
|
cpu_atom/branch-misses/
|
|
|
|
Of course, both perf-stat and perf-record support to enable
|
|
hybrid event with a specific pmu.
|
|
|
|
e.g.
|
|
perf stat -e cpu_core/cycles/
|
|
perf stat -e cpu_atom/cycles/
|
|
perf stat -e cpu_core/r1a/
|
|
perf stat -e cpu_atom/L1-icache-loads/
|
|
perf stat -e cpu_core/cycles/,cpu_atom/instructions/
|
|
perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}'
|
|
|
|
But '{cpu_core/cycles/,cpu_atom/instructions/}' will return
|
|
warning and disable grouping, because the pmus in group are
|
|
not matched (cpu_core vs. cpu_atom).
|