7e55b95651
There is no good reason why we cannot synthesize "cycle" events from Intel PT just as we can synthesize "instruction" events, in particular when CYC packets are available. This enables using PT to getting much more accurate cycle profiles than regular sampling (record -e cycles) when the work last for very short periods (<10 ms). Thus, add support for this, based off of the existing IPC calculation framework. The new option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle and instruction events can be synthesized together, and are by default. The only real caveat is that CYC packets are only emitted whenever some other packet is, which in practice is when a branch instruction is encountered (and not even all branches). Thus, even at no subsampling (e.g. --itrace=y0ns), it is impossible to get more accuracy than a single basic block, and all cycles spent executing that block will get attributed to the branch instruction that ends the packet. Thus, one cannot know whether the cycles came from e.g. a specific load, a mispredicted branch, or something else. When subsampling (which is the default), the cycle events will get smeared out even more, but will still be generally useful to attribute cycle counts to functions. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Steinar H. Gunderson <sesse@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220322082452.1429091-1-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
72 lines
2.6 KiB
Plaintext
72 lines
2.6 KiB
Plaintext
i synthesize instructions events
|
|
y synthesize cycles events
|
|
b synthesize branches events (branch misses for Arm SPE)
|
|
c synthesize branches events (calls only)
|
|
r synthesize branches events (returns only)
|
|
x synthesize transactions events
|
|
w synthesize ptwrite events
|
|
p synthesize power events (incl. PSB events for Intel PT)
|
|
o synthesize other events recorded due to the use
|
|
of aux-output (refer to perf record)
|
|
I synthesize interrupt or similar (asynchronous) events
|
|
(e.g. Intel PT Event Trace)
|
|
e synthesize error events
|
|
d create a debug log
|
|
f synthesize first level cache events
|
|
m synthesize last level cache events
|
|
M synthesize memory events
|
|
t synthesize TLB events
|
|
a synthesize remote access events
|
|
g synthesize a call chain (use with i or x)
|
|
G synthesize a call chain on existing event records
|
|
l synthesize last branch entries (use with i or x)
|
|
L synthesize last branch entries on existing event records
|
|
s skip initial number of events
|
|
q quicker (less detailed) decoding
|
|
A approximate IPC
|
|
Z prefer to ignore timestamps (so-called "timeless" decoding)
|
|
|
|
The default is all events i.e. the same as --itrace=iybxwpe,
|
|
except for perf script where it is --itrace=ce
|
|
|
|
In addition, the period (default 100000, except for perf script where it is 1)
|
|
for instructions events can be specified in units of:
|
|
|
|
i instructions
|
|
t ticks
|
|
ms milliseconds
|
|
us microseconds
|
|
ns nanoseconds (default)
|
|
|
|
Also the call chain size (default 16, max. 1024) for instructions or
|
|
transactions events can be specified.
|
|
|
|
Also the number of last branch entries (default 64, max. 1024) for
|
|
instructions or transactions events can be specified.
|
|
|
|
Similar to options g and l, size may also be specified for options G and L.
|
|
On x86, note that G and L work poorly when data has been recorded with
|
|
large PEBS. Refer linkperf:perf-intel-pt[1] man page for details.
|
|
|
|
It is also possible to skip events generated (instructions, branches, transactions,
|
|
ptwrite, power) at the beginning. This is useful to ignore initialization code.
|
|
|
|
--itrace=i0nss1000000
|
|
|
|
skips the first million instructions.
|
|
|
|
The 'e' option may be followed by flags which affect what errors will or
|
|
will not be reported. Each flag must be preceded by either '+' or '-'.
|
|
The flags are:
|
|
o overflow
|
|
l trace data lost
|
|
|
|
If supported, the 'd' option may be followed by flags which affect what
|
|
debug messages will or will not be logged. Each flag must be preceded
|
|
by either '+' or '-'. The flags are:
|
|
a all perf events
|
|
e output only on errors (size configurable - see linkperf:perf-config[1])
|
|
o output to stdout
|
|
|
|
If supported, the 'q' option may be repeated to increase the effect.
|