2a09a84c72
This patch enables perf-diff with "--stream" option. "--stream": Enable hot streams comparison Now let's see example. perf record -b ... Generate perf.data.old with branch data perf record -b ... Generate perf.data with branch data perf diff --stream [ Matched hot streams ] hot chain pair 1: cycles: 1, hits: 27.77% cycles: 1, hits: 9.24% --------------------------- -------------------------- main div.c:39 main div.c:39 main div.c:44 main div.c:44 hot chain pair 2: cycles: 34, hits: 20.06% cycles: 27, hits: 16.98% --------------------------- -------------------------- __random_r random_r.c:360 __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:380 __random_r random_r.c:357 __random_r random_r.c:357 __random random.c:293 __random random.c:293 __random random.c:293 __random random.c:293 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:288 __random random.c:288 rand rand.c:27 rand rand.c:27 rand rand.c:26 rand rand.c:26 rand@plt rand@plt rand@plt rand@plt compute_flag div.c:25 compute_flag div.c:25 compute_flag div.c:22 compute_flag div.c:22 main div.c:40 main div.c:40 main div.c:40 main div.c:40 main div.c:39 main div.c:39 hot chain pair 3: cycles: 9, hits: 4.48% cycles: 6, hits: 4.51% --------------------------- -------------------------- __random_r random_r.c:360 __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:380 [ Hot streams in old perf data only ] hot chain 1: cycles: 18, hits: 6.75% -------------------------- __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:357 __random random.c:293 __random random.c:293 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:288 rand rand.c:27 rand rand.c:26 rand@plt rand@plt compute_flag div.c:25 compute_flag div.c:22 main div.c:40 hot chain 2: cycles: 29, hits: 2.78% -------------------------- compute_flag div.c:22 main div.c:40 main div.c:40 main div.c:39 [ Hot streams in new perf data only ] hot chain 1: cycles: 4, hits: 4.54% -------------------------- main div.c:42 compute_flag div.c:28 hot chain 2: cycles: 5, hits: 3.51% -------------------------- main div.c:39 main div.c:44 main div.c:42 compute_flag div.c:28 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
306 lines
9.1 KiB
Plaintext
306 lines
9.1 KiB
Plaintext
perf-diff(1)
|
|
============
|
|
|
|
NAME
|
|
----
|
|
perf-diff - Read perf.data files and display the differential profile
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'perf diff' [baseline file] [data file1] [[data file2] ... ]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
This command displays the performance difference amongst two or more perf.data
|
|
files captured via perf record.
|
|
|
|
If no parameters are passed it will assume perf.data.old and perf.data.
|
|
|
|
The differential profile is displayed only for events matching both
|
|
specified perf.data files.
|
|
|
|
If no parameters are passed the samples will be sorted by dso and symbol.
|
|
As the perf.data files could come from different binaries, the symbols addresses
|
|
could vary. So perf diff is based on the comparison of the files and
|
|
symbols name.
|
|
|
|
OPTIONS
|
|
-------
|
|
-D::
|
|
--dump-raw-trace::
|
|
Dump raw trace in ASCII.
|
|
|
|
--kallsyms=<file>::
|
|
kallsyms pathname
|
|
|
|
-m::
|
|
--modules::
|
|
Load module symbols. WARNING: use only with -k and LIVE kernel
|
|
|
|
-d::
|
|
--dsos=::
|
|
Only consider symbols in these dsos. CSV that understands
|
|
file://filename entries. This option will affect the percentage
|
|
of the Baseline/Delta column. See --percentage for more info.
|
|
|
|
-C::
|
|
--comms=::
|
|
Only consider symbols in these comms. CSV that understands
|
|
file://filename entries. This option will affect the percentage
|
|
of the Baseline/Delta column. See --percentage for more info.
|
|
|
|
-S::
|
|
--symbols=::
|
|
Only consider these symbols. CSV that understands
|
|
file://filename entries. This option will affect the percentage
|
|
of the Baseline/Delta column. See --percentage for more info.
|
|
|
|
-s::
|
|
--sort=::
|
|
Sort by key(s): pid, comm, dso, symbol, cpu, parent, srcline.
|
|
Please see description of --sort in the perf-report man page.
|
|
|
|
-t::
|
|
--field-separator=::
|
|
|
|
Use a special separator character and don't pad with spaces, replacing
|
|
all occurrences of this separator in symbol names (and other output)
|
|
with a '.' character, that thus it's the only non valid separator.
|
|
|
|
-v::
|
|
--verbose::
|
|
Be verbose, for instance, show the raw counts in addition to the
|
|
diff.
|
|
|
|
-q::
|
|
--quiet::
|
|
Do not show any message. (Suppress -v)
|
|
|
|
-f::
|
|
--force::
|
|
Don't do ownership validation.
|
|
|
|
--symfs=<directory>::
|
|
Look for files with symbols relative to this directory.
|
|
|
|
-b::
|
|
--baseline-only::
|
|
Show only items with match in baseline.
|
|
|
|
-c::
|
|
--compute::
|
|
Differential computation selection - delta, ratio, wdiff, cycles,
|
|
delta-abs (default is delta-abs). Default can be changed using
|
|
diff.compute config option. See COMPARISON METHODS section for
|
|
more info.
|
|
|
|
--cycles-hist::
|
|
Report a histogram and the standard deviation for cycles data.
|
|
It can help us to judge if the reported cycles data is noisy or
|
|
not. This option should be used with '-c cycles'.
|
|
|
|
-p::
|
|
--period::
|
|
Show period values for both compared hist entries.
|
|
|
|
-F::
|
|
--formula::
|
|
Show formula for given computation.
|
|
|
|
-o::
|
|
--order::
|
|
Specify compute sorting column number. 0 means sorting by baseline
|
|
overhead and 1 (default) means sorting by computed value of column 1
|
|
(data from the first file other base baseline). Values more than 1
|
|
can be used only if enough data files are provided.
|
|
The default value can be set using the diff.order config option.
|
|
|
|
--percentage::
|
|
Determine how to display the overhead percentage of filtered entries.
|
|
Filters can be applied by --comms, --dsos and/or --symbols options.
|
|
|
|
"relative" means it's relative to filtered entries only so that the
|
|
sum of shown entries will be always 100%. "absolute" means it retains
|
|
the original value before and after the filter is applied.
|
|
|
|
--time::
|
|
Analyze samples within given time window. It supports time
|
|
percent with multiple time ranges. Time string is 'a%/n,b%/m,...'
|
|
or 'a%-b%,c%-%d,...'.
|
|
|
|
For example:
|
|
|
|
Select the second 10% time slice to diff:
|
|
|
|
perf diff --time 10%/2
|
|
|
|
Select from 0% to 10% time slice to diff:
|
|
|
|
perf diff --time 0%-10%
|
|
|
|
Select the first and the second 10% time slices to diff:
|
|
|
|
perf diff --time 10%/1,10%/2
|
|
|
|
Select from 0% to 10% and 30% to 40% slices to diff:
|
|
|
|
perf diff --time 0%-10%,30%-40%
|
|
|
|
It also supports analyzing samples within a given time window
|
|
<start>,<stop>. Times have the format seconds.nanoseconds. If 'start'
|
|
is not given (i.e. time string is ',x.y') then analysis starts at
|
|
the beginning of the file. If stop time is not given (i.e. time
|
|
string is 'x.y,') then analysis goes to the end of the file.
|
|
Multiple ranges can be separated by spaces, which requires the argument
|
|
to be quoted e.g. --time "1234.567,1234.789 1235,"
|
|
Time string is'a1.b1,c1.d1:a2.b2,c2.d2'. Use ':' to separate timestamps
|
|
for different perf.data files.
|
|
|
|
For example, we get the timestamp information from 'perf script'.
|
|
|
|
perf script -i perf.data.old
|
|
mgen 13940 [000] 3946.361400: ...
|
|
|
|
perf script -i perf.data
|
|
mgen 13940 [000] 3971.150589 ...
|
|
|
|
perf diff --time 3946.361400,:3971.150589,
|
|
|
|
It analyzes the perf.data.old from the timestamp 3946.361400 to
|
|
the end of perf.data.old and analyzes the perf.data from the
|
|
timestamp 3971.150589 to the end of perf.data.
|
|
|
|
--cpu:: Only diff samples for the list of CPUs provided. Multiple CPUs can
|
|
be provided as a comma-separated list with no space: 0,1. Ranges of
|
|
CPUs are specified with -: 0-2. Default is to report samples on all
|
|
CPUs.
|
|
|
|
--pid=::
|
|
Only diff samples for given process ID (comma separated list).
|
|
|
|
--tid=::
|
|
Only diff samples for given thread ID (comma separated list).
|
|
|
|
--stream::
|
|
Enable hot streams comparison. Stream can be a callchain which is
|
|
aggregated by the branch records from samples.
|
|
|
|
COMPARISON
|
|
----------
|
|
The comparison is governed by the baseline file. The baseline perf.data
|
|
file is iterated for samples. All other perf.data files specified on
|
|
the command line are searched for the baseline sample pair. If the pair
|
|
is found, specified computation is made and result is displayed.
|
|
|
|
All samples from non-baseline perf.data files, that do not match any
|
|
baseline entry, are displayed with empty space within baseline column
|
|
and possible computation results (delta) in their related column.
|
|
|
|
Example files samples:
|
|
- file A with samples f1, f2, f3, f4, f6
|
|
- file B with samples f2, f4, f5
|
|
- file C with samples f1, f2, f5
|
|
|
|
Example output:
|
|
x - computation takes place for pair
|
|
b - baseline sample percentage
|
|
|
|
- perf diff A B C
|
|
|
|
baseline/A compute/B compute/C samples
|
|
---------------------------------------
|
|
b x f1
|
|
b x x f2
|
|
b f3
|
|
b x f4
|
|
b f6
|
|
x x f5
|
|
|
|
- perf diff B A C
|
|
|
|
baseline/B compute/A compute/C samples
|
|
---------------------------------------
|
|
b x x f2
|
|
b x f4
|
|
b x f5
|
|
x x f1
|
|
x f3
|
|
x f6
|
|
|
|
- perf diff C B A
|
|
|
|
baseline/C compute/B compute/A samples
|
|
---------------------------------------
|
|
b x f1
|
|
b x x f2
|
|
b x f5
|
|
x f3
|
|
x x f4
|
|
x f6
|
|
|
|
COMPARISON METHODS
|
|
------------------
|
|
delta
|
|
~~~~~
|
|
If specified the 'Delta' column is displayed with value 'd' computed as:
|
|
|
|
d = A->period_percent - B->period_percent
|
|
|
|
with:
|
|
- A/B being matching hist entry from data/baseline file specified
|
|
(or perf.data/perf.data.old) respectively.
|
|
|
|
- period_percent being the % of the hist entry period value within
|
|
single data file
|
|
|
|
- with filtering by -C, -d and/or -S, period_percent might be changed
|
|
relative to how entries are filtered. Use --percentage=absolute to
|
|
prevent such fluctuation.
|
|
|
|
delta-abs
|
|
~~~~~~~~~
|
|
Same as 'delta` method, but sort the result with the absolute values.
|
|
|
|
ratio
|
|
~~~~~
|
|
If specified the 'Ratio' column is displayed with value 'r' computed as:
|
|
|
|
r = A->period / B->period
|
|
|
|
with:
|
|
- A/B being matching hist entry from data/baseline file specified
|
|
(or perf.data/perf.data.old) respectively.
|
|
|
|
- period being the hist entry period value
|
|
|
|
wdiff:WEIGHT-B,WEIGHT-A
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
If specified the 'Weighted diff' column is displayed with value 'd' computed as:
|
|
|
|
d = B->period * WEIGHT-A - A->period * WEIGHT-B
|
|
|
|
- A/B being matching hist entry from data/baseline file specified
|
|
(or perf.data/perf.data.old) respectively.
|
|
|
|
- period being the hist entry period value
|
|
|
|
- WEIGHT-A/WEIGHT-B being user supplied weights in the the '-c' option
|
|
behind ':' separator like '-c wdiff:1,2'.
|
|
- WEIGHT-A being the weight of the data file
|
|
- WEIGHT-B being the weight of the baseline data file
|
|
|
|
cycles
|
|
~~~~~~
|
|
If specified the '[Program Block Range] Cycles Diff' column is displayed.
|
|
It displays the cycles difference of same program basic block amongst
|
|
two perf.data. The program basic block is the code between two branches.
|
|
|
|
'[Program Block Range]' indicates the range of a program basic block.
|
|
Source line is reported if it can be found otherwise uses symbol+offset
|
|
instead.
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkperf:perf-record[1], linkperf:perf-report[1]
|