f7b58cbdb3
The 'perf mem' and 'perf c2c' tools are wrappers around 'perf record' with mem load/ store events. IBS tagged load/store sample provides most of the information needed for these tools. Wire in the "ibs_op//" event as mem-ldst event for AMD. There are some limitations though: Only load/store micro-ops provide mem/c2c information. Whereas, IBS does not have a way to choose a particular type of micro-op to tag. This results in many non-LS micro-ops being tagged which appear as N/A in the perf report. IBS, being an uncore pmu from kernel point of view[1], does not support per process monitoring. Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ [1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-6-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
97 lines
2.3 KiB
Plaintext
97 lines
2.3 KiB
Plaintext
perf-mem(1)
|
|
===========
|
|
|
|
NAME
|
|
----
|
|
perf-mem - Profile memory accesses
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'perf mem' [<options>] (record [<command>] | report)
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
"perf mem record" runs a command and gathers memory operation data
|
|
from it, into perf.data. Perf record options are accepted and are passed through.
|
|
|
|
"perf mem report" displays the result. It invokes perf report with the
|
|
right set of options to display a memory access profile. By default, loads
|
|
and stores are sampled. Use the -t option to limit to loads or stores.
|
|
|
|
Note that on Intel systems the memory latency reported is the use-latency,
|
|
not the pure load (or store latency). Use latency includes any pipeline
|
|
queueing delays in addition to the memory subsystem latency.
|
|
|
|
OPTIONS
|
|
-------
|
|
<command>...::
|
|
Any command you can specify in a shell.
|
|
|
|
-i::
|
|
--input=<file>::
|
|
Input file name.
|
|
|
|
-f::
|
|
--force::
|
|
Don't do ownership validation
|
|
|
|
-t::
|
|
--type=<type>::
|
|
Select the memory operation type: load or store (default: load,store)
|
|
|
|
-D::
|
|
--dump-raw-samples::
|
|
Dump the raw decoded samples on the screen in a format that is easy to parse with
|
|
one sample per line.
|
|
|
|
-x::
|
|
--field-separator=<separator>::
|
|
Specify the field separator used when dump raw samples (-D option). By default,
|
|
The separator is the space character.
|
|
|
|
-C::
|
|
--cpu=<cpu>::
|
|
Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
|
|
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. Default
|
|
is to monitor all CPUS.
|
|
-U::
|
|
--hide-unresolved::
|
|
Only display entries resolved to a symbol.
|
|
|
|
-p::
|
|
--phys-data::
|
|
Record/Report sample physical addresses
|
|
|
|
--data-page-size::
|
|
Record/Report sample data address page size
|
|
|
|
RECORD OPTIONS
|
|
--------------
|
|
-e::
|
|
--event <event>::
|
|
Event selector. Use 'perf mem record -e list' to list available events.
|
|
|
|
-K::
|
|
--all-kernel::
|
|
Configure all used events to run in kernel space.
|
|
|
|
-U::
|
|
--all-user::
|
|
Configure all used events to run in user space.
|
|
|
|
-v::
|
|
--verbose::
|
|
Be more verbose (show counter open errors, etc)
|
|
|
|
--ldlat <n>::
|
|
Specify desired latency for loads event. Supported on Intel and Arm64
|
|
processors only. Ignored on other archs.
|
|
|
|
In addition, for report all perf report options are valid, and for record
|
|
all perf record options.
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkperf:perf-record[1], linkperf:perf-report[1]
|