2013-01-24 19:10:38 +04:00
perf-mem(1)
===========
NAME
----
perf-mem - Profile memory accesses
SYNOPSIS
--------
[verse]
'perf mem' [<options>] (record [<command>] | report)
DESCRIPTION
-----------
2014-12-17 18:23:55 +03:00
"perf mem record" runs a command and gathers memory operation data
2013-01-24 19:10:38 +04:00
from it, into perf.data. Perf record options are accepted and are passed through.
2014-12-17 18:23:55 +03:00
"perf mem report" displays the result. It invokes perf report with the
right set of options to display a memory access profile. By default, loads
and stores are sampled. Use the -t option to limit to loads or stores.
2013-01-24 19:10:38 +04:00
2014-02-28 18:02:14 +04:00
Note that on Intel systems the memory latency reported is the use-latency,
not the pure load (or store latency). Use latency includes any pipeline
queueing delays in addition to the memory subsystem latency.
2023-01-24 17:59:29 +03:00
On Arm64 this uses SPE to sample load and store operations, therefore hardware
and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
Due to the statistical nature of SPE sampling, not every memory operation will
be sampled.
2013-01-24 19:10:38 +04:00
OPTIONS
-------
<command>...::
Any command you can specify in a shell.
2018-04-22 10:29:06 +03:00
-i::
--input=<file>::
Input file name.
2018-02-11 23:38:37 +03:00
-f::
--force::
Don't do ownership validation
2013-01-24 19:10:38 +04:00
-t::
2018-04-22 10:29:06 +03:00
--type=<type>::
2014-12-17 18:23:55 +03:00
Select the memory operation type: load or store (default: load,store)
2013-01-24 19:10:38 +04:00
-D::
2018-04-22 10:29:06 +03:00
--dump-raw-samples::
2013-01-24 19:10:38 +04:00
Dump the raw decoded samples on the screen in a format that is easy to parse with
one sample per line.
-x::
2018-04-22 10:29:06 +03:00
--field-separator=<separator>::
2013-01-24 19:10:38 +04:00
Specify the field separator used when dump raw samples (-D option). By default,
The separator is the space character.
-C::
2018-04-22 10:29:06 +03:00
--cpu=<cpu>::
Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. Default
is to monitor all CPUS.
-U::
--hide-unresolved::
Only display entries resolved to a symbol.
-p::
--phys-data::
Record/Report sample physical addresses
perf mem: Support data page size
Add option --data-page-size in "perf mem" to record/report data page
size.
Here are some examples:
# perf mem --phys-data --data-page-size report -D
# PID, TID, IP, ADDR, PHYS ADDR, DATA PAGE SIZE, LOCAL WEIGHT, DSRC, SYMBOL
20134 20134 0xffffffffb5bd2fd0 0x016ffff9a274e96a308 0x000000044e96a308 4K 1168 0x5080144 /lib/modules/4.18.0-rc7+/build/vmlinux:perf_ctx_unlock
20134 20134 0xffffffffb63f645c 0xffffffffb752b814 0xcfb52b814 2M 225 0x26a100142 /lib/modules/4.18.0-rc7+/build/vmlinux:_raw_spin_lock
20134 20134 0xffffffffb660300c 0xfffffe00016b8bb0 0x0 4K 0 0x5080144 /lib/modules/4.18.0-rc7+/build/vmlinux:__x86_indirect_thunk_rax
#
# perf mem --phys-data --data-page-size report --stdio
# To display the perf.data header info, please use
# --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 5K of event 'cpu/mem-loads,ldlat=30/P'
# Total weight : 281234
# Sort order :
# mem,sym,dso,symbol_daddr,dso_daddr,tlb,locked,phys_daddr,data_page_size
#
# Overhead Samples Memory access Symbol Shared Object Data Symbol Data Object TLB access Locked Data Physical Address Data Page Size
# ........ ....... ............. ............................ ................ ...................... ........... ............ ...... ...................... ..............
28.54% 1826 L1 or L1 hit [k] __x86_indirect_thunk_rax [kernel.vmlinux] [k] 0xffffb0df31b0ff28 [unknown] L1 or L2 hit No [k] 0x0000000000000000 4K
6.02% 256 L1 or L1 hit [.] touch_buffer dtlb [.] 0x00007ffd50109da8 [stack] L1 or L2 hit No [.] 0x000000042454ada8 4K
3.23% 5 L1 or L1 hit [k] clear_huge_page [kernel.vmlinux] [k] 0xffff9a2753b8ce60 [unknown] L1 or L2 hit No [k] 0x0000000453b8ce60 2M
2.98% 4 L1 or L1 hit [k] clear_page_erms [kernel.vmlinux] [k] 0xffffb0df31b0fd00 [unknown] L1 or L2 hit No [k] 0x0000000000000000 4K
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-05 22:57:48 +03:00
--data-page-size::
Record/Report sample data address page size
2018-04-22 10:29:06 +03:00
RECORD OPTIONS
--------------
-e::
--event <event>::
Event selector. Use 'perf mem record -e list' to list available events.
2013-01-24 19:10:38 +04:00
2016-03-24 15:52:16 +03:00
-K::
--all-kernel::
Configure all used events to run in kernel space.
-U::
--all-user::
Configure all used events to run in user space.
2018-04-22 10:29:06 +03:00
-v::
--verbose::
Be more verbose (show counter open errors, etc)
2016-06-14 21:19:11 +03:00
2018-04-22 10:29:06 +03:00
--ldlat <n>::
2022-10-06 18:39:43 +03:00
Specify desired latency for loads event. Supported on Intel and Arm64
processors only. Ignored on other archs.
2017-08-29 20:11:10 +03:00
2018-04-06 23:38:09 +03:00
In addition, for report all perf report options are valid, and for record
all perf record options.
2013-01-24 19:10:38 +04:00
SEE ALSO
--------
2023-01-24 17:59:29 +03:00
linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]