2019-08-30 14:45:20 -03:00
# include <stdlib.h>
2018-08-30 08:32:52 +02:00
# include <stdio.h>
# include <inttypes.h>
2019-06-25 21:28:49 -03:00
# include <linux/string.h>
2018-08-30 08:32:52 +02:00
# include <linux/time64.h>
# include <math.h>
2019-01-22 10:47:38 -02:00
# include "color.h"
2019-08-21 14:02:05 -03:00
# include "counts.h"
2018-08-30 08:32:52 +02:00
# include "evlist.h"
# include "evsel.h"
# include "stat.h"
# include "top.h"
# include "thread_map.h"
# include "cpumap.h"
# include "string2.h"
tools perf: Move from sane_ctype.h obtained from git to the Linux's original
We got the sane_ctype.h headers from git and kept using it so far, but
since that code originally came from the kernel sources to the git
sources, perhaps its better to just use the one in the kernel, so that
we can leverage tools/perf/check_headers.sh to be notified when our copy
gets out of sync, i.e. when fixes or goodies are added to the code we've
copied.
This will help with things like tools/lib/string.c where we want to have
more things in common with the kernel, such as strim(), skip_spaces(),
etc so as to go on removing the things that we have in tools/perf/util/
and instead using the code in the kernel, indirectly and removing things
like EXPORT_SYMBOL(), etc, getting notified when fixes and improvements
are made to the original code.
Hopefully this also should help with reducing the difference of code
hosted in tools/ to the one in the kernel proper.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-7k9868l713wqtgo01xxygn12@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-25 17:27:31 -03:00
# include <linux/ctype.h>
2018-08-30 08:32:52 +02:00
# include "cgroup.h"
# include <api/fs/fs.h>
2020-02-24 13:59:22 -08:00
# include "util.h"
2021-04-19 12:41:44 +03:00
# include "iostat.h"
2021-04-27 15:01:20 +08:00
# include "pmu-hybrid.h"
perf stat: Disable the NMI watchdog message on hybrid
If we run a single workload that only runs on big core, there is always
a ugly message about disabling the NMI watchdog because the atom is not
counted.
Before:
# ./perf stat true
Performance counter stats for 'true':
0.43 msec task-clock # 0.396 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
45 page-faults # 103.918 K/sec
639,634 cpu_core/cycles/ # 1.477 G/sec
<not counted> cpu_atom/cycles/ (0.00%)
643,498 cpu_core/instructions/ # 1.486 G/sec
<not counted> cpu_atom/instructions/ (0.00%)
123,715 cpu_core/branches/ # 285.694 M/sec
<not counted> cpu_atom/branches/ (0.00%)
4,094 cpu_core/branch-misses/ # 9.454 M/sec
<not counted> cpu_atom/branch-misses/ (0.00%)
0.001092407 seconds time elapsed
0.001144000 seconds user
0.000000000 seconds sys
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
# ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
Performance counter stats for 'true':
<not counted> cpu_atom/cycles/ (0.00%)
<not counted> msr/tsc/ (0.00%)
0.001904106 seconds time elapsed
0.001947000 seconds user
0.000000000 seconds sys
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try reorganizing the group.
Now we disable the NMI watchdog message on hybrid, otherwise there
are too many false positives.
After:
# ./perf stat true
Performance counter stats for 'true':
0.79 msec task-clock # 0.419 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
48 page-faults # 60.889 K/sec
777,692 cpu_core/cycles/ # 986.519 M/sec
<not counted> cpu_atom/cycles/ (0.00%)
669,147 cpu_core/instructions/ # 848.828 M/sec
<not counted> cpu_atom/instructions/ (0.00%)
128,635 cpu_core/branches/ # 163.176 M/sec
<not counted> cpu_atom/branches/ (0.00%)
4,089 cpu_core/branch-misses/ # 5.187 M/sec
<not counted> cpu_atom/branch-misses/ (0.00%)
0.001880649 seconds time elapsed
0.001935000 seconds user
0.000000000 seconds sys
# ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
Performance counter stats for 'true':
<not counted> cpu_atom/cycles/ (0.00%)
<not counted> msr/tsc/ (0.00%)
0.000963319 seconds time elapsed
0.000999000 seconds user
0.000000000 seconds sys
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210610034557.29766-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-10 11:45:57 +08:00
# include "evlist-hybrid.h"
2018-08-30 08:32:52 +02:00
# define CNTR_NOT_SUPPORTED "<not supported>"
# define CNTR_NOT_COUNTED "<not counted>"
static void print_running ( struct perf_stat_config * config ,
u64 run , u64 ena )
{
if ( config - > csv_output ) {
fprintf ( config - > output , " %s% " PRIu64 " %s%.2f " ,
config - > csv_sep ,
run ,
config - > csv_sep ,
ena ? 100.0 * run / ena : 100.0 ) ;
} else if ( run ! = ena ) {
fprintf ( config - > output , " (%.2f%%) " , 100.0 * run / ena ) ;
}
}
static void print_noise_pct ( struct perf_stat_config * config ,
double total , double avg )
{
double pct = rel_stddev_stats ( total , avg ) ;
if ( config - > csv_output )
fprintf ( config - > output , " %s%.2f%% " , config - > csv_sep , pct ) ;
else if ( pct )
fprintf ( config - > output , " ( +-%6.2f%% ) " , pct ) ;
}
static void print_noise ( struct perf_stat_config * config ,
2019-07-21 13:23:51 +02:00
struct evsel * evsel , double avg )
2018-08-30 08:32:52 +02:00
{
struct perf_stat_evsel * ps ;
if ( config - > run_count = = 1 )
return ;
ps = evsel - > stats ;
print_noise_pct ( config , stddev_stats ( & ps - > res_stats [ 0 ] ) , avg ) ;
}
2019-07-21 13:23:51 +02:00
static void print_cgroup ( struct perf_stat_config * config , struct evsel * evsel )
perf stat: Fix CSV mode column output for non-cgroup events
When using the -x option, perf stat prints CSV-style output with one
event per line. For each event, it prints the count, the unit, the
event name, the cgroup, and a bunch of other event specific fields (such
as insn per cycles).
When you use CSV-style mode, you expect a normalized output where each
event is printed with the same number of fields regardless of what it is
so it can easily be imported into a spreadsheet or parsed.
For instance, if an event does not have a unit, then print an empty
field for it.
Although this approach was implemented for the unit, it was not for the
cgroup.
When mixing cgroup and non-cgroup events, then non-cgroup events would
not show an empty field, instead the next field was printed, make
columns not line up correctly.
This patch fixes the cgroup output issues by forcing an empty field
for non-cgroup events as soon as one event has cgroup.
Before:
<not counted> @ @cycles @foo @ 0 @100.00@@
2531614 @ @cycles @6420922@100.00@ @
foo cgroup lines up with time_running!
After:
<not counted> @ @cycles @foo @0 @100.00@@
2594834 @ @cycles @ @5287372 @100.00@@
Fields line up.
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1541587845-9150-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-07 02:50:45 -08:00
{
if ( nr_cgroups ) {
const char * cgrp_name = evsel - > cgrp ? evsel - > cgrp - > name : " " ;
fprintf ( config - > output , " %s%s " , config - > csv_sep , cgrp_name ) ;
}
}
2018-08-30 08:32:52 +02:00
static void aggr_printout ( struct perf_stat_config * config ,
2020-11-26 16:13:20 +02:00
struct evsel * evsel , struct aggr_cpu_id id , int nr )
2018-08-30 08:32:52 +02:00
{
switch ( config - > aggr_mode ) {
case AGGR_CORE :
2019-06-04 15:50:42 -07:00
fprintf ( config - > output , " S%d-D%d-C%*d%s%*d%s " ,
2020-11-26 16:13:25 +02:00
id . socket ,
2020-11-26 16:13:26 +02:00
id . die ,
2018-08-30 08:32:52 +02:00
config - > csv_output ? 0 : - 8 ,
2020-11-26 16:13:27 +02:00
id . core ,
2018-08-30 08:32:52 +02:00
config - > csv_sep ,
config - > csv_output ? 0 : 4 ,
nr ,
config - > csv_sep ) ;
break ;
2019-06-04 15:50:42 -07:00
case AGGR_DIE :
fprintf ( config - > output , " S%d-D%*d%s%*d%s " ,
2020-11-26 16:13:25 +02:00
id . socket ,
2019-06-04 15:50:42 -07:00
config - > csv_output ? 0 : - 8 ,
2020-11-26 16:13:26 +02:00
id . die ,
2019-06-04 15:50:42 -07:00
config - > csv_sep ,
config - > csv_output ? 0 : 4 ,
nr ,
config - > csv_sep ) ;
break ;
2018-08-30 08:32:52 +02:00
case AGGR_SOCKET :
fprintf ( config - > output , " S%*d%s%*d%s " ,
config - > csv_output ? 0 : - 5 ,
2020-11-26 16:13:25 +02:00
id . socket ,
2018-08-30 08:32:52 +02:00
config - > csv_sep ,
config - > csv_output ? 0 : 4 ,
nr ,
config - > csv_sep ) ;
break ;
perf stat: Add --per-node agregation support
Adding new --per-node option to aggregate counts per NUMA
nodes for system-wide mode measurements.
You can specify --per-node in live mode:
# perf stat -a -I 1000 -e cycles --per-node
# time node cpus counts unit events
1.000542550 N0 20 6,202,097 cycles
1.000542550 N1 20 639,559 cycles
2.002040063 N0 20 7,412,495 cycles
2.002040063 N1 20 2,185,577 cycles
3.003451699 N0 20 6,508,917 cycles
3.003451699 N1 20 765,607 cycles
...
Or in the record/report stat session:
# perf stat record -a -I 1000 -e cycles
# time counts unit events
1.000536937 10,008,468 cycles
2.002090152 9,578,539 cycles
3.003625233 7,647,869 cycles
4.005135036 7,032,086 cycles
^C 4.340902364 3,923,893 cycles
# perf stat report --per-node
# time node cpus counts unit events
1.000536937 N0 20 9,355,086 cycles
1.000536937 N1 20 653,382 cycles
2.002090152 N0 20 7,712,838 cycles
2.002090152 N1 20 1,865,701 cycles
3.003625233 N0 20 6,604,441 cycles
3.003625233 N1 20 1,043,428 cycles
4.005135036 N0 20 6,350,522 cycles
4.005135036 N1 20 681,564 cycles
4.340902364 N0 20 3,403,188 cycles
4.340902364 N1 20 520,705 cycles
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28 10:17:43 +02:00
case AGGR_NODE :
fprintf ( config - > output , " N%*d%s%*d%s " ,
config - > csv_output ? 0 : - 5 ,
2020-11-26 16:13:24 +02:00
id . node ,
perf stat: Add --per-node agregation support
Adding new --per-node option to aggregate counts per NUMA
nodes for system-wide mode measurements.
You can specify --per-node in live mode:
# perf stat -a -I 1000 -e cycles --per-node
# time node cpus counts unit events
1.000542550 N0 20 6,202,097 cycles
1.000542550 N1 20 639,559 cycles
2.002040063 N0 20 7,412,495 cycles
2.002040063 N1 20 2,185,577 cycles
3.003451699 N0 20 6,508,917 cycles
3.003451699 N1 20 765,607 cycles
...
Or in the record/report stat session:
# perf stat record -a -I 1000 -e cycles
# time counts unit events
1.000536937 10,008,468 cycles
2.002090152 9,578,539 cycles
3.003625233 7,647,869 cycles
4.005135036 7,032,086 cycles
^C 4.340902364 3,923,893 cycles
# perf stat report --per-node
# time node cpus counts unit events
1.000536937 N0 20 9,355,086 cycles
1.000536937 N1 20 653,382 cycles
2.002090152 N0 20 7,712,838 cycles
2.002090152 N1 20 1,865,701 cycles
3.003625233 N0 20 6,604,441 cycles
3.003625233 N1 20 1,043,428 cycles
4.005135036 N0 20 6,350,522 cycles
4.005135036 N1 20 681,564 cycles
4.340902364 N0 20 3,403,188 cycles
4.340902364 N1 20 520,705 cycles
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28 10:17:43 +02:00
config - > csv_sep ,
config - > csv_output ? 0 : 4 ,
nr ,
config - > csv_sep ) ;
break ;
2018-08-30 08:32:52 +02:00
case AGGR_NONE :
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
if ( evsel - > percore & & ! config - > percore_show_thread ) {
2019-06-04 15:50:42 -07:00
fprintf ( config - > output , " S%d-D%d-C%*d%s " ,
2020-11-26 16:13:25 +02:00
id . socket ,
2020-11-26 16:13:26 +02:00
id . die ,
perf stat: Align the output for interval aggregation mode
There is a slight misalignment in -A -I output.
For example:
# perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
# time CPU counts unit events
1.000440863 CPU0 1,068,388 cpu/event=cpu-cycles/
1.000440863 CPU1 875,954 cpu/event=cpu-cycles/
1.000440863 CPU2 3,072,538 cpu/event=cpu-cycles/
1.000440863 CPU3 4,026,870 cpu/event=cpu-cycles/
1.000440863 CPU4 5,919,630 cpu/event=cpu-cycles/
1.000440863 CPU5 2,714,260 cpu/event=cpu-cycles/
1.000440863 CPU6 2,219,240 cpu/event=cpu-cycles/
1.000440863 CPU7 1,299,232 cpu/event=cpu-cycles/
The value of counts is not aligned with the column "counts" and
the event name is not aligned with the column "events".
With this patch, the output is,
# perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
# time CPU counts unit events
1.000423009 CPU0 997,421 cpu/event=cpu-cycles/
1.000423009 CPU1 1,422,042 cpu/event=cpu-cycles/
1.000423009 CPU2 484,651 cpu/event=cpu-cycles/
1.000423009 CPU3 525,791 cpu/event=cpu-cycles/
1.000423009 CPU4 1,370,100 cpu/event=cpu-cycles/
1.000423009 CPU5 442,072 cpu/event=cpu-cycles/
1.000423009 CPU6 205,643 cpu/event=cpu-cycles/
1.000423009 CPU7 1,302,250 cpu/event=cpu-cycles/
Now output is aligned.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-18 15:16:14 +08:00
config - > csv_output ? 0 : - 3 ,
2020-11-26 16:13:27 +02:00
id . core , config - > csv_sep ) ;
} else if ( id . core > - 1 ) {
perf stat: Align the output for interval aggregation mode
There is a slight misalignment in -A -I output.
For example:
# perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
# time CPU counts unit events
1.000440863 CPU0 1,068,388 cpu/event=cpu-cycles/
1.000440863 CPU1 875,954 cpu/event=cpu-cycles/
1.000440863 CPU2 3,072,538 cpu/event=cpu-cycles/
1.000440863 CPU3 4,026,870 cpu/event=cpu-cycles/
1.000440863 CPU4 5,919,630 cpu/event=cpu-cycles/
1.000440863 CPU5 2,714,260 cpu/event=cpu-cycles/
1.000440863 CPU6 2,219,240 cpu/event=cpu-cycles/
1.000440863 CPU7 1,299,232 cpu/event=cpu-cycles/
The value of counts is not aligned with the column "counts" and
the event name is not aligned with the column "events".
With this patch, the output is,
# perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
# time CPU counts unit events
1.000423009 CPU0 997,421 cpu/event=cpu-cycles/
1.000423009 CPU1 1,422,042 cpu/event=cpu-cycles/
1.000423009 CPU2 484,651 cpu/event=cpu-cycles/
1.000423009 CPU3 525,791 cpu/event=cpu-cycles/
1.000423009 CPU4 1,370,100 cpu/event=cpu-cycles/
1.000423009 CPU5 442,072 cpu/event=cpu-cycles/
1.000423009 CPU6 205,643 cpu/event=cpu-cycles/
1.000423009 CPU7 1,302,250 cpu/event=cpu-cycles/
Now output is aligned.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-18 15:16:14 +08:00
fprintf ( config - > output , " CPU%*d%s " ,
config - > csv_output ? 0 : - 7 ,
2020-11-26 16:13:27 +02:00
evsel__cpus ( evsel ) - > map [ id . core ] ,
perf stat: Support 'percore' event qualifier
With this patch, we can use the 'percore' event qualifier in perf-stat.
root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
1.000773050 S0-C0 98,352,832 cpu/event=0,umask=0x3,percore=1/ (50.01%)
1.000773050 S0-C1 103,763,057 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C2 196,776,995 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C3 176,493,779 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 CPU0 47,699,641 cpu/event=0,umask=0x3/ (50.02%)
1.000773050 CPU1 49,052,451 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU2 102,771,422 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU3 100,784,662 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU4 43,171,342 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU5 54,152,158 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU6 93,618,410 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU7 74,477,589 cpu/event=0,umask=0x3/ (49.99%)
In this example, we count the event 'ref-cycles' per-core and per-CPU in
one perf stat command-line. From the output, we can see:
S0-C0 = CPU0 + CPU4
S0-C1 = CPU1 + CPU5
S0-C2 = CPU2 + CPU6
S0-C3 = CPU3 + CPU7
So the result is expected (tiny difference is ignored).
Note that, the 'percore' event qualifier needs to use with option '-A'.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-12 21:59:49 +08:00
config - > csv_sep ) ;
}
2018-08-30 08:32:52 +02:00
break ;
case AGGR_THREAD :
fprintf ( config - > output , " %*s-%*d%s " ,
config - > csv_output ? 0 : 16 ,
2020-11-26 16:13:28 +02:00
perf_thread_map__comm ( evsel - > core . threads , id . thread ) ,
2018-08-30 08:32:52 +02:00
config - > csv_output ? 0 : - 8 ,
2020-11-26 16:13:28 +02:00
perf_thread_map__pid ( evsel - > core . threads , id . thread ) ,
2018-08-30 08:32:52 +02:00
config - > csv_sep ) ;
break ;
case AGGR_GLOBAL :
case AGGR_UNSET :
default :
break ;
}
}
struct outstate {
FILE * fh ;
bool newline ;
const char * prefix ;
int nfields ;
2020-11-26 16:13:20 +02:00
int nr ;
struct aggr_cpu_id id ;
2019-07-21 13:23:51 +02:00
struct evsel * evsel ;
2018-08-30 08:32:52 +02:00
} ;
# define METRIC_LEN 35
static void new_line_std ( struct perf_stat_config * config __maybe_unused ,
void * ctx )
{
struct outstate * os = ctx ;
os - > newline = true ;
}
static void do_new_line_std ( struct perf_stat_config * config ,
struct outstate * os )
{
fputc ( ' \n ' , os - > fh ) ;
fputs ( os - > prefix , os - > fh ) ;
aggr_printout ( config , os - > evsel , os - > id , os - > nr ) ;
if ( config - > aggr_mode = = AGGR_NONE )
fprintf ( os - > fh , " " ) ;
fprintf ( os - > fh , " " ) ;
}
static void print_metric_std ( struct perf_stat_config * config ,
void * ctx , const char * color , const char * fmt ,
const char * unit , double val )
{
struct outstate * os = ctx ;
FILE * out = os - > fh ;
int n ;
bool newline = os - > newline ;
os - > newline = false ;
if ( unit = = NULL | | fmt = = NULL ) {
fprintf ( out , " %-*s " , METRIC_LEN , " " ) ;
return ;
}
if ( newline )
do_new_line_std ( config , os ) ;
n = fprintf ( out , " # " ) ;
if ( color )
n + = color_fprintf ( out , color , fmt , val ) ;
else
n + = fprintf ( out , fmt , val ) ;
fprintf ( out , " %-*s " , METRIC_LEN - n - 1 , unit ) ;
}
static void new_line_csv ( struct perf_stat_config * config , void * ctx )
{
struct outstate * os = ctx ;
int i ;
fputc ( ' \n ' , os - > fh ) ;
if ( os - > prefix )
fprintf ( os - > fh , " %s%s " , os - > prefix , config - > csv_sep ) ;
aggr_printout ( config , os - > evsel , os - > id , os - > nr ) ;
for ( i = 0 ; i < os - > nfields ; i + + )
fputs ( config - > csv_sep , os - > fh ) ;
}
static void print_metric_csv ( struct perf_stat_config * config __maybe_unused ,
void * ctx ,
const char * color __maybe_unused ,
const char * fmt , const char * unit , double val )
{
struct outstate * os = ctx ;
FILE * out = os - > fh ;
char buf [ 64 ] , * vals , * ends ;
if ( unit = = NULL | | fmt = = NULL ) {
fprintf ( out , " %s%s " , config - > csv_sep , config - > csv_sep ) ;
return ;
}
snprintf ( buf , sizeof ( buf ) , fmt , val ) ;
2019-06-26 11:42:03 -03:00
ends = vals = skip_spaces ( buf ) ;
2018-08-30 08:32:52 +02:00
while ( isdigit ( * ends ) | | * ends = = ' . ' )
ends + + ;
* ends = 0 ;
2019-06-25 21:28:49 -03:00
fprintf ( out , " %s%s%s%s " , config - > csv_sep , vals , config - > csv_sep , skip_spaces ( unit ) ) ;
2018-08-30 08:32:52 +02:00
}
/* Filter out some columns that don't work well in metrics only mode */
static bool valid_only_metric ( const char * unit )
{
if ( ! unit )
return false ;
if ( strstr ( unit , " /sec " ) | |
strstr ( unit , " CPUs utilized " ) )
return false ;
return true ;
}
2019-07-21 13:23:51 +02:00
static const char * fixunit ( char * buf , struct evsel * evsel ,
2018-08-30 08:32:52 +02:00
const char * unit )
{
if ( ! strncmp ( unit , " of all " , 6 ) ) {
2020-04-29 16:07:09 -03:00
snprintf ( buf , 1024 , " %s %s " , evsel__name ( evsel ) ,
2018-08-30 08:32:52 +02:00
unit ) ;
return buf ;
}
return unit ;
}
static void print_metric_only ( struct perf_stat_config * config ,
void * ctx , const char * color , const char * fmt ,
const char * unit , double val )
{
struct outstate * os = ctx ;
FILE * out = os - > fh ;
char buf [ 1024 ] , str [ 1024 ] ;
unsigned mlen = config - > metric_only_len ;
if ( ! valid_only_metric ( unit ) )
return ;
unit = fixunit ( buf , os - > evsel , unit ) ;
if ( mlen < strlen ( unit ) )
mlen = strlen ( unit ) + 1 ;
if ( color )
mlen + = strlen ( color ) + sizeof ( PERF_COLOR_RESET ) - 1 ;
color_snprintf ( str , sizeof ( str ) , color ? : " " , fmt , val ) ;
fprintf ( out , " %*s " , mlen , str ) ;
}
static void print_metric_only_csv ( struct perf_stat_config * config __maybe_unused ,
void * ctx , const char * color __maybe_unused ,
const char * fmt ,
const char * unit , double val )
{
struct outstate * os = ctx ;
FILE * out = os - > fh ;
char buf [ 64 ] , * vals , * ends ;
char tbuf [ 1024 ] ;
if ( ! valid_only_metric ( unit ) )
return ;
unit = fixunit ( tbuf , os - > evsel , unit ) ;
snprintf ( buf , sizeof buf , fmt , val ) ;
2019-06-26 11:42:03 -03:00
ends = vals = skip_spaces ( buf ) ;
2018-08-30 08:32:52 +02:00
while ( isdigit ( * ends ) | | * ends = = ' . ' )
ends + + ;
* ends = 0 ;
fprintf ( out , " %s%s " , vals , config - > csv_sep ) ;
}
static void new_line_metric ( struct perf_stat_config * config __maybe_unused ,
void * ctx __maybe_unused )
{
}
static void print_metric_header ( struct perf_stat_config * config ,
void * ctx , const char * color __maybe_unused ,
const char * fmt __maybe_unused ,
const char * unit , double val __maybe_unused )
{
struct outstate * os = ctx ;
char tbuf [ 1024 ] ;
2021-04-19 12:41:44 +03:00
/* In case of iostat, print metric header for first root port only */
if ( config - > iostat_run & &
os - > evsel - > priv ! = os - > evsel - > evlist - > selected - > priv )
return ;
2018-08-30 08:32:52 +02:00
if ( ! valid_only_metric ( unit ) )
return ;
unit = fixunit ( tbuf , os - > evsel , unit ) ;
if ( config - > csv_output )
fprintf ( os - > fh , " %s%s " , unit , config - > csv_sep ) ;
else
fprintf ( os - > fh , " %*s " , config - > metric_only_len , unit ) ;
}
static int first_shadow_cpu ( struct perf_stat_config * config ,
2020-11-26 16:13:20 +02:00
struct evsel * evsel , struct aggr_cpu_id id )
2018-08-30 08:32:52 +02:00
{
2019-07-21 13:23:52 +02:00
struct evlist * evlist = evsel - > evlist ;
2018-08-30 08:32:52 +02:00
int i ;
if ( config - > aggr_mode = = AGGR_NONE )
2020-11-26 16:13:27 +02:00
return id . core ;
2018-08-30 08:32:52 +02:00
perf stat: Use proper cpu for shadow stats
Currently perf stat shows some metrics (like IPC) for defined events.
But when no aggregation mode is used (-A option), it shows incorrect
values since it used a value from a different cpu.
Before:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 116,057,380 cycles
CPU1 86,084,722 cycles
CPU2 99,423,125 cycles
CPU3 98,272,994 cycles
CPU0 53,369,217 instructions # 0.46 insn per cycle
CPU1 33,378,058 instructions # 0.29 insn per cycle
CPU2 58,150,086 instructions # 0.50 insn per cycle
CPU3 40,029,703 instructions # 0.34 insn per cycle
1.001816971 seconds time elapsed
So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
After:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 109,621,384 cycles
CPU1 159,026,454 cycles
CPU2 99,460,366 cycles
CPU3 124,144,142 cycles
CPU0 44,396,706 instructions # 0.41 insn per cycle
CPU1 120,195,425 instructions # 0.76 insn per cycle
CPU2 44,763,978 instructions # 0.45 insn per cycle
CPU3 69,049,079 instructions # 0.56 insn per cycle
1.001910444 seconds time elapsed
Fixes: 44d49a600259 ("perf stat: Support metrics in --per-core/socket mode")
Reported-by: Sam Xi <xyzsam@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127041404.390276-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 13:14:03 +09:00
if ( ! config - > aggr_get_id )
2018-08-30 08:32:52 +02:00
return 0 ;
2020-04-29 15:45:09 -03:00
for ( i = 0 ; i < evsel__nr_cpus ( evsel ) ; i + + ) {
2019-07-21 13:24:05 +02:00
int cpu2 = evsel__cpus ( evsel ) - > map [ i ] ;
2018-08-30 08:32:52 +02:00
2020-11-26 16:13:20 +02:00
if ( cpu_map__compare_aggr_cpu_id (
config - > aggr_get_id ( config , evlist - > core . cpus , cpu2 ) ,
id ) ) {
2018-08-30 08:32:52 +02:00
return cpu2 ;
2020-11-26 16:13:20 +02:00
}
2018-08-30 08:32:52 +02:00
}
return 0 ;
}
static void abs_printout ( struct perf_stat_config * config ,
2020-11-26 16:13:20 +02:00
struct aggr_cpu_id id , int nr , struct evsel * evsel , double avg )
2018-08-30 08:32:52 +02:00
{
FILE * output = config - > output ;
double sc = evsel - > scale ;
const char * fmt ;
if ( config - > csv_output ) {
fmt = floor ( sc ) ! = sc ? " %.2f%s " : " %.0f%s " ;
} else {
if ( config - > big_num )
fmt = floor ( sc ) ! = sc ? " %'18.2f%s " : " %'18.0f%s " ;
else
fmt = floor ( sc ) ! = sc ? " %18.2f%s " : " %18.0f%s " ;
}
aggr_printout ( config , evsel , id , nr ) ;
fprintf ( output , fmt , avg , config - > csv_sep ) ;
if ( evsel - > unit )
fprintf ( output , " %-*s%s " ,
config - > csv_output ? 0 : config - > unit_width ,
evsel - > unit , config - > csv_sep ) ;
2020-04-29 16:07:09 -03:00
fprintf ( output , " %-*s " , config - > csv_output ? 0 : 25 , evsel__name ( evsel ) ) ;
2018-08-30 08:32:52 +02:00
perf stat: Fix CSV mode column output for non-cgroup events
When using the -x option, perf stat prints CSV-style output with one
event per line. For each event, it prints the count, the unit, the
event name, the cgroup, and a bunch of other event specific fields (such
as insn per cycles).
When you use CSV-style mode, you expect a normalized output where each
event is printed with the same number of fields regardless of what it is
so it can easily be imported into a spreadsheet or parsed.
For instance, if an event does not have a unit, then print an empty
field for it.
Although this approach was implemented for the unit, it was not for the
cgroup.
When mixing cgroup and non-cgroup events, then non-cgroup events would
not show an empty field, instead the next field was printed, make
columns not line up correctly.
This patch fixes the cgroup output issues by forcing an empty field
for non-cgroup events as soon as one event has cgroup.
Before:
<not counted> @ @cycles @foo @ 0 @100.00@@
2531614 @ @cycles @6420922@100.00@ @
foo cgroup lines up with time_running!
After:
<not counted> @ @cycles @foo @0 @100.00@@
2594834 @ @cycles @ @5287372 @100.00@@
Fields line up.
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1541587845-9150-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-07 02:50:45 -08:00
print_cgroup ( config , evsel ) ;
2018-08-30 08:32:52 +02:00
}
2019-07-21 13:23:51 +02:00
static bool is_mixed_hw_group ( struct evsel * counter )
2018-08-30 08:32:52 +02:00
{
2019-07-21 13:23:52 +02:00
struct evlist * evlist = counter - > evlist ;
2019-07-21 13:24:29 +02:00
u32 pmu_type = counter - > core . attr . type ;
2019-07-21 13:23:51 +02:00
struct evsel * pos ;
2018-08-30 08:32:52 +02:00
2019-07-21 13:24:46 +02:00
if ( counter - > core . nr_members < 2 )
2018-08-30 08:32:52 +02:00
return false ;
evlist__for_each_entry ( evlist , pos ) {
/* software events can be part of any hardware group */
2019-07-21 13:24:29 +02:00
if ( pos - > core . attr . type = = PERF_TYPE_SOFTWARE )
2018-08-30 08:32:52 +02:00
continue ;
if ( pmu_type = = PERF_TYPE_SOFTWARE ) {
2019-07-21 13:24:29 +02:00
pmu_type = pos - > core . attr . type ;
2018-08-30 08:32:52 +02:00
continue ;
}
2019-07-21 13:24:29 +02:00
if ( pmu_type ! = pos - > core . attr . type )
2018-08-30 08:32:52 +02:00
return true ;
}
return false ;
}
2020-11-26 16:13:20 +02:00
static void printout ( struct perf_stat_config * config , struct aggr_cpu_id id , int nr ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , double uval ,
2018-08-30 08:32:52 +02:00
char * prefix , u64 run , u64 ena , double noise ,
struct runtime_stat * st )
{
struct perf_stat_output_ctx out ;
struct outstate os = {
. fh = config - > output ,
. prefix = prefix ? prefix : " " ,
. id = id ,
. nr = nr ,
. evsel = counter ,
} ;
print_metric_t pm = print_metric_std ;
new_line_t nl ;
if ( config - > metric_only ) {
nl = new_line_metric ;
if ( config - > csv_output )
pm = print_metric_only_csv ;
else
pm = print_metric_only ;
} else
nl = new_line_std ;
if ( config - > csv_output & & ! config - > metric_only ) {
static int aggr_fields [ ] = {
[ AGGR_GLOBAL ] = 0 ,
[ AGGR_THREAD ] = 1 ,
[ AGGR_NONE ] = 1 ,
[ AGGR_SOCKET ] = 2 ,
2019-06-04 15:50:42 -07:00
[ AGGR_DIE ] = 2 ,
2018-08-30 08:32:52 +02:00
[ AGGR_CORE ] = 2 ,
} ;
pm = print_metric_csv ;
nl = new_line_csv ;
os . nfields = 3 ;
os . nfields + = aggr_fields [ config - > aggr_mode ] ;
if ( counter - > cgrp )
os . nfields + + ;
}
perf stat: Align CSV output for summary mode
The 'perf stat' subcommand supports the request for a summary of the
interval counter readings. But the summary lines break the CSV output
so it's hard for scripts to parse the result.
Before:
# perf stat -x, -I1000 --interval-count 1 --summary
1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
270,,context-switches,8013513297,100.00,0.034,K/sec
13,,cpu-migrations,8013530032,100.00,0.002,K/sec
184,,page-faults,8013546992,100.00,0.023,K/sec
20574191,,cycles,8013551506,100.00,0.003,GHz
10562267,,instructions,8013564958,100.00,0.51,insn per cycle
2019244,,branches,8013575673,100.00,0.252,M/sec
106152,,branch-misses,8013585776,100.00,5.26,of all branches
The summary line loses the timestamp column, which breaks the CSV
output.
We add a column at the original 'timestamp' position and it just says
'summary' for the summary line.
After:
# perf stat -x, -I1000 --interval-count 1 --summary
1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
summary,218,,context-switches,8012753271,100.00,0.027,K/sec
summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
summary,0,,page-faults,8012786257,100.00,0.000,K/sec
summary,15004518,,cycles,8012790637,100.00,0.002,GHz
summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
summary,1590259,,branches,8012814766,100.00,0.198,M/sec
summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches
Now it's easy for script to analyse the summary lines.
Of course, we also consider not to break possible existing scripts which
can continue to use the broken CSV format by using a new '--no-csv-summary.'
option.
# perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary
1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized
1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle
1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches
8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
197,,context-switches,8012703742,100.00,24.586,/sec
9,,cpu-migrations,8012720902,100.00,1.123,/sec
644,,page-faults,8012738266,100.00,80.373,/sec
18350698,,cycles,8012744109,100.00,0.002,GHz
12745021,,instructions,8012759001,100.00,0.69,insn per cycle
2458033,,branches,8012770864,100.00,306.768,K/sec
102107,,branch-misses,8012781751,100.00,4.15,of all branches
This option can be enabled in perf config by setting the variable
'stat.no-csv-summary'.
# perf config stat.no-csv-summary=true
# perf config -l
stat.no-csv-summary=true
# perf stat -x, -I1000 --interval-count 1 --summary
1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized
1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle
1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches
8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
205,,context-switches,8013308394,100.00,25.583,/sec
10,,cpu-migrations,8013324681,100.00,1.248,/sec
0,,page-faults,8013340926,100.00,0.000,/sec
8027742,,cycles,8013344503,100.00,0.001,GHz
2871717,,instructions,8013356501,100.00,0.36,insn per cycle
553564,,branches,8013366204,100.00,69.081,K/sec
54021,,branch-misses,8013375952,100.00,9.76,of all branches
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-19 15:01:55 +08:00
if ( ! config - > no_csv_summary & & config - > csv_output & &
config - > summary & & ! config - > interval ) {
fprintf ( config - > output , " %16s%s " , " summary " , config - > csv_sep ) ;
}
2018-08-30 08:32:52 +02:00
if ( run = = 0 | | ena = = 0 | | counter - > counts - > scaled = = - 1 ) {
if ( config - > metric_only ) {
pm ( config , & os , NULL , " " , " " , 0 ) ;
return ;
}
aggr_printout ( config , counter , id , nr ) ;
fprintf ( config - > output , " %*s%s " ,
config - > csv_output ? 0 : 18 ,
counter - > supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED ,
config - > csv_sep ) ;
if ( counter - > supported ) {
perf stat: Disable the NMI watchdog message on hybrid
If we run a single workload that only runs on big core, there is always
a ugly message about disabling the NMI watchdog because the atom is not
counted.
Before:
# ./perf stat true
Performance counter stats for 'true':
0.43 msec task-clock # 0.396 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
45 page-faults # 103.918 K/sec
639,634 cpu_core/cycles/ # 1.477 G/sec
<not counted> cpu_atom/cycles/ (0.00%)
643,498 cpu_core/instructions/ # 1.486 G/sec
<not counted> cpu_atom/instructions/ (0.00%)
123,715 cpu_core/branches/ # 285.694 M/sec
<not counted> cpu_atom/branches/ (0.00%)
4,094 cpu_core/branch-misses/ # 9.454 M/sec
<not counted> cpu_atom/branch-misses/ (0.00%)
0.001092407 seconds time elapsed
0.001144000 seconds user
0.000000000 seconds sys
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
# ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
Performance counter stats for 'true':
<not counted> cpu_atom/cycles/ (0.00%)
<not counted> msr/tsc/ (0.00%)
0.001904106 seconds time elapsed
0.001947000 seconds user
0.000000000 seconds sys
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try reorganizing the group.
Now we disable the NMI watchdog message on hybrid, otherwise there
are too many false positives.
After:
# ./perf stat true
Performance counter stats for 'true':
0.79 msec task-clock # 0.419 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
48 page-faults # 60.889 K/sec
777,692 cpu_core/cycles/ # 986.519 M/sec
<not counted> cpu_atom/cycles/ (0.00%)
669,147 cpu_core/instructions/ # 848.828 M/sec
<not counted> cpu_atom/instructions/ (0.00%)
128,635 cpu_core/branches/ # 163.176 M/sec
<not counted> cpu_atom/branches/ (0.00%)
4,089 cpu_core/branch-misses/ # 5.187 M/sec
<not counted> cpu_atom/branch-misses/ (0.00%)
0.001880649 seconds time elapsed
0.001935000 seconds user
0.000000000 seconds sys
# ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
Performance counter stats for 'true':
<not counted> cpu_atom/cycles/ (0.00%)
<not counted> msr/tsc/ (0.00%)
0.000963319 seconds time elapsed
0.000999000 seconds user
0.000000000 seconds sys
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210610034557.29766-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-10 11:45:57 +08:00
if ( ! evlist__has_hybrid ( counter - > evlist ) ) {
config - > print_free_counters_hint = 1 ;
if ( is_mixed_hw_group ( counter ) )
config - > print_mixed_hw_group_error = 1 ;
}
2018-08-30 08:32:52 +02:00
}
fprintf ( config - > output , " %-*s%s " ,
config - > csv_output ? 0 : config - > unit_width ,
counter - > unit , config - > csv_sep ) ;
fprintf ( config - > output , " %*s " ,
2020-04-29 16:07:09 -03:00
config - > csv_output ? 0 : - 25 , evsel__name ( counter ) ) ;
2018-08-30 08:32:52 +02:00
perf stat: Fix CSV mode column output for non-cgroup events
When using the -x option, perf stat prints CSV-style output with one
event per line. For each event, it prints the count, the unit, the
event name, the cgroup, and a bunch of other event specific fields (such
as insn per cycles).
When you use CSV-style mode, you expect a normalized output where each
event is printed with the same number of fields regardless of what it is
so it can easily be imported into a spreadsheet or parsed.
For instance, if an event does not have a unit, then print an empty
field for it.
Although this approach was implemented for the unit, it was not for the
cgroup.
When mixing cgroup and non-cgroup events, then non-cgroup events would
not show an empty field, instead the next field was printed, make
columns not line up correctly.
This patch fixes the cgroup output issues by forcing an empty field
for non-cgroup events as soon as one event has cgroup.
Before:
<not counted> @ @cycles @foo @ 0 @100.00@@
2531614 @ @cycles @6420922@100.00@ @
foo cgroup lines up with time_running!
After:
<not counted> @ @cycles @foo @0 @100.00@@
2594834 @ @cycles @ @5287372 @100.00@@
Fields line up.
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1541587845-9150-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-07 02:50:45 -08:00
print_cgroup ( config , counter ) ;
2018-08-30 08:32:52 +02:00
if ( ! config - > csv_output )
pm ( config , & os , NULL , NULL , " " , 0 ) ;
print_noise ( config , counter , noise ) ;
print_running ( config , run , ena ) ;
if ( config - > csv_output )
pm ( config , & os , NULL , NULL , " " , 0 ) ;
return ;
}
if ( ! config - > metric_only )
abs_printout ( config , id , nr , counter , uval ) ;
out . print_metric = pm ;
out . new_line = nl ;
out . ctx = & os ;
out . force_header = false ;
if ( config - > csv_output & & ! config - > metric_only ) {
print_noise ( config , counter , noise ) ;
print_running ( config , run , ena ) ;
}
perf_stat__print_shadow_stats ( config , counter , uval ,
first_shadow_cpu ( config , counter , id ) ,
& out , & config - > metric_events , st ) ;
if ( ! config - > csv_output & & ! config - > metric_only ) {
print_noise ( config , counter , noise ) ;
print_running ( config , run , ena ) ;
}
}
static void aggr_update_shadow ( struct perf_stat_config * config ,
2019-07-21 13:23:52 +02:00
struct evlist * evlist )
2018-08-30 08:32:52 +02:00
{
2020-11-26 16:13:20 +02:00
int cpu , s ;
struct aggr_cpu_id s2 , id ;
2018-08-30 08:32:52 +02:00
u64 val ;
2019-07-21 13:23:51 +02:00
struct evsel * counter ;
2018-08-30 08:32:52 +02:00
for ( s = 0 ; s < config - > aggr_map - > nr ; s + + ) {
2020-11-26 16:13:23 +02:00
id = config - > aggr_map - > map [ s ] ;
2018-08-30 08:32:52 +02:00
evlist__for_each_entry ( evlist , counter ) {
val = 0 ;
2020-04-29 15:45:09 -03:00
for ( cpu = 0 ; cpu < evsel__nr_cpus ( counter ) ; cpu + + ) {
2019-07-21 13:24:41 +02:00
s2 = config - > aggr_get_id ( config , evlist - > core . cpus , cpu ) ;
2020-11-26 16:13:20 +02:00
if ( ! cpu_map__compare_aggr_cpu_id ( s2 , id ) )
2018-08-30 08:32:52 +02:00
continue ;
val + = perf_counts ( counter - > counts , cpu , 0 ) - > val ;
}
perf_stat__update_shadow_stats ( counter , val ,
first_shadow_cpu ( config , counter , id ) ,
& rt_stat ) ;
}
}
}
2019-07-21 13:23:51 +02:00
static void uniquify_event_name ( struct evsel * counter )
2018-08-30 08:32:52 +02:00
{
char * new_name ;
char * config ;
2021-04-27 15:01:20 +08:00
int ret = 0 ;
2018-08-30 08:32:52 +02:00
2021-06-02 14:22:41 -07:00
if ( counter - > uniquified_name | | counter - > use_config_name | |
2018-08-30 08:32:52 +02:00
! counter - > pmu_name | | ! strncmp ( counter - > name , counter - > pmu_name ,
strlen ( counter - > pmu_name ) ) )
return ;
config = strchr ( counter - > name , ' / ' ) ;
if ( config ) {
if ( asprintf ( & new_name ,
" %s%s " , counter - > pmu_name , config ) > 0 ) {
free ( counter - > name ) ;
counter - > name = new_name ;
}
} else {
2021-04-27 15:01:20 +08:00
if ( perf_pmu__has_hybrid ( ) ) {
2021-06-02 14:22:41 -07:00
ret = asprintf ( & new_name , " %s/%s/ " ,
counter - > pmu_name , counter - > name ) ;
2021-04-27 15:01:20 +08:00
} else {
ret = asprintf ( & new_name , " %s [%s] " ,
counter - > name , counter - > pmu_name ) ;
}
if ( ret ) {
2018-08-30 08:32:52 +02:00
free ( counter - > name ) ;
counter - > name = new_name ;
}
}
counter - > uniquified_name = true ;
}
2019-07-21 13:23:51 +02:00
static void collect_all_aliases ( struct perf_stat_config * config , struct evsel * counter ,
void ( * cb ) ( struct perf_stat_config * config , struct evsel * counter , void * data ,
2018-08-30 08:32:52 +02:00
bool first ) ,
void * data )
{
2019-07-21 13:23:52 +02:00
struct evlist * evlist = counter - > evlist ;
2019-07-21 13:23:51 +02:00
struct evsel * alias ;
2018-08-30 08:32:52 +02:00
2019-07-21 13:24:23 +02:00
alias = list_prepare_entry ( counter , & ( evlist - > core . entries ) , core . node ) ;
list_for_each_entry_continue ( alias , & evlist - > core . entries , core . node ) {
2020-04-29 16:07:09 -03:00
if ( strcmp ( evsel__name ( alias ) , evsel__name ( counter ) ) | |
2018-08-30 08:32:52 +02:00
alias - > scale ! = counter - > scale | |
alias - > cgrp ! = counter - > cgrp | |
strcmp ( alias - > unit , counter - > unit ) | |
2020-04-30 10:51:16 -03:00
evsel__is_clock ( alias ) ! = evsel__is_clock ( counter ) | |
2019-06-24 12:37:09 -07:00
! strcmp ( alias - > pmu_name , counter - > pmu_name ) )
2018-08-30 08:32:52 +02:00
break ;
alias - > merged_stat = true ;
cb ( config , alias , data , false ) ;
}
}
perf stat: Merge uncore events by default for hybrid platform
On a hybrid platform, by default 'perf stat' aggregates and reports the
event counts per PMU. For example,
# perf stat -e cycles -a true
Performance counter stats for 'system wide':
1,400,445 cpu_core/cycles/
680,881 cpu_atom/cycles/
0.001770773 seconds time elapsed
But for uncore events that's not a suitable method. Uncore has nothing
to do with hybrid. So for uncore events, we aggregate event counts from
all PMUs and report the counts without PMUs.
Before:
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
Performance counter stats for 'system wide':
2,058 uncore_arb_0/event=0x81,umask=0x1/
2,028 uncore_arb_1/event=0x81,umask=0x1/
0 uncore_arb_0/event=0x84,umask=0x1/
0 uncore_arb_1/event=0x84,umask=0x1/
0.000614498 seconds time elapsed
After:
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
Performance counter stats for 'system wide':
3,996 arb/event=0x81,umask=0x1/
0 arb/event=0x84,umask=0x1/
0.000630046 seconds time elapsed
Of course, we also keep the '--no-merge' working for uncore events.
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true
Performance counter stats for 'system wide':
1,952 uncore_arb_0/event=0x81,umask=0x1/
1,921 uncore_arb_1/event=0x81,umask=0x1/
0 uncore_arb_0/event=0x84,umask=0x1/
0 uncore_arb_1/event=0x84,umask=0x1/
0.000575536 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-07 13:56:52 +08:00
static bool is_uncore ( struct evsel * evsel )
{
struct perf_pmu * pmu = evsel__find_pmu ( evsel ) ;
return pmu & & pmu - > is_uncore ;
}
static bool hybrid_uniquify ( struct evsel * evsel )
{
return perf_pmu__has_hybrid ( ) & & ! is_uncore ( evsel ) ;
}
2019-07-21 13:23:51 +02:00
static bool collect_data ( struct perf_stat_config * config , struct evsel * counter ,
void ( * cb ) ( struct perf_stat_config * config , struct evsel * counter , void * data ,
2018-08-30 08:32:52 +02:00
bool first ) ,
void * data )
{
if ( counter - > merged_stat )
return false ;
cb ( config , counter , data , true ) ;
perf stat: Merge uncore events by default for hybrid platform
On a hybrid platform, by default 'perf stat' aggregates and reports the
event counts per PMU. For example,
# perf stat -e cycles -a true
Performance counter stats for 'system wide':
1,400,445 cpu_core/cycles/
680,881 cpu_atom/cycles/
0.001770773 seconds time elapsed
But for uncore events that's not a suitable method. Uncore has nothing
to do with hybrid. So for uncore events, we aggregate event counts from
all PMUs and report the counts without PMUs.
Before:
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
Performance counter stats for 'system wide':
2,058 uncore_arb_0/event=0x81,umask=0x1/
2,028 uncore_arb_1/event=0x81,umask=0x1/
0 uncore_arb_0/event=0x84,umask=0x1/
0 uncore_arb_1/event=0x84,umask=0x1/
0.000614498 seconds time elapsed
After:
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
Performance counter stats for 'system wide':
3,996 arb/event=0x81,umask=0x1/
0 arb/event=0x84,umask=0x1/
0.000630046 seconds time elapsed
Of course, we also keep the '--no-merge' working for uncore events.
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true
Performance counter stats for 'system wide':
1,952 uncore_arb_0/event=0x81,umask=0x1/
1,921 uncore_arb_1/event=0x81,umask=0x1/
0 uncore_arb_0/event=0x84,umask=0x1/
0 uncore_arb_1/event=0x84,umask=0x1/
0.000575536 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-07 13:56:52 +08:00
if ( config - > no_merge | | hybrid_uniquify ( counter ) )
2018-08-30 08:32:52 +02:00
uniquify_event_name ( counter ) ;
else if ( counter - > auto_merge_stats )
collect_all_aliases ( config , counter , cb , data ) ;
return true ;
}
struct aggr_data {
u64 ena , run , val ;
2020-11-26 16:13:20 +02:00
struct aggr_cpu_id id ;
2018-08-30 08:32:52 +02:00
int nr ;
int cpu ;
} ;
static void aggr_cb ( struct perf_stat_config * config ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , void * data , bool first )
2018-08-30 08:32:52 +02:00
{
struct aggr_data * ad = data ;
2020-11-26 16:13:20 +02:00
int cpu ;
struct aggr_cpu_id s2 ;
2018-08-30 08:32:52 +02:00
2020-04-29 15:45:09 -03:00
for ( cpu = 0 ; cpu < evsel__nr_cpus ( counter ) ; cpu + + ) {
2018-08-30 08:32:52 +02:00
struct perf_counts_values * counts ;
2019-07-21 13:24:05 +02:00
s2 = config - > aggr_get_id ( config , evsel__cpus ( counter ) , cpu ) ;
2020-11-26 16:13:20 +02:00
if ( ! cpu_map__compare_aggr_cpu_id ( s2 , ad - > id ) )
2018-08-30 08:32:52 +02:00
continue ;
if ( first )
ad - > nr + + ;
counts = perf_counts ( counter - > counts , cpu , 0 ) ;
/*
* When any result is bad , make them all to give
* consistent output in interval mode .
*/
if ( counts - > ena = = 0 | | counts - > run = = 0 | |
counter - > counts - > scaled = = - 1 ) {
ad - > ena = 0 ;
ad - > run = 0 ;
break ;
}
ad - > val + = counts - > val ;
ad - > ena + = counts - > ena ;
ad - > run + = counts - > run ;
}
}
2019-04-12 21:59:48 +08:00
static void print_counter_aggrdata ( struct perf_stat_config * config ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , int s ,
2019-04-12 21:59:48 +08:00
char * prefix , bool metric_only ,
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
bool * first , int cpu )
2019-04-12 21:59:48 +08:00
{
struct aggr_data ad ;
FILE * output = config - > output ;
u64 ena , run , val ;
2020-11-26 16:13:20 +02:00
int nr ;
struct aggr_cpu_id id ;
2019-04-12 21:59:48 +08:00
double uval ;
2020-11-26 16:13:23 +02:00
ad . id = id = config - > aggr_map - > map [ s ] ;
2019-04-12 21:59:48 +08:00
ad . val = ad . ena = ad . run = 0 ;
ad . nr = 0 ;
if ( ! collect_data ( config , counter , aggr_cb , & ad ) )
return ;
perf stat: Filter out unmatched aggregation for hybrid event
perf-stat has supported some aggregation modes, such as --per-core,
--per-socket and etc. While for hybrid event, it may only available
on part of cpus. So for --per-core, we need to filter out the
unavailable cores, for --per-socket, filter out the unavailable
sockets, and so on.
Before:
# perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 2 479,530 cpu_core/cycles/
S0-D0-C4 2 175,007 cpu_core/cycles/
S0-D0-C8 2 166,240 cpu_core/cycles/
S0-D0-C12 2 704,673 cpu_core/cycles/
S0-D0-C16 2 865,835 cpu_core/cycles/
S0-D0-C20 2 2,958,461 cpu_core/cycles/
S0-D0-C24 2 163,988 cpu_core/cycles/
S0-D0-C28 2 164,729 cpu_core/cycles/
S0-D0-C32 0 <not counted> cpu_core/cycles/
S0-D0-C33 0 <not counted> cpu_core/cycles/
S0-D0-C34 0 <not counted> cpu_core/cycles/
S0-D0-C35 0 <not counted> cpu_core/cycles/
S0-D0-C36 0 <not counted> cpu_core/cycles/
S0-D0-C37 0 <not counted> cpu_core/cycles/
S0-D0-C38 0 <not counted> cpu_core/cycles/
S0-D0-C39 0 <not counted> cpu_core/cycles/
1.003597211 seconds time elapsed
After:
# perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 2 210,428 cpu_core/cycles/
S0-D0-C4 2 444,830 cpu_core/cycles/
S0-D0-C8 2 435,241 cpu_core/cycles/
S0-D0-C12 2 423,976 cpu_core/cycles/
S0-D0-C16 2 859,350 cpu_core/cycles/
S0-D0-C20 2 1,559,589 cpu_core/cycles/
S0-D0-C24 2 163,924 cpu_core/cycles/
S0-D0-C28 2 376,610 cpu_core/cycles/
1.003621290 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Co-developed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-16-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-27 15:01:28 +08:00
if ( perf_pmu__has_hybrid ( ) & & ad . ena = = 0 )
return ;
2019-04-12 21:59:48 +08:00
nr = ad . nr ;
ena = ad . ena ;
run = ad . run ;
val = ad . val ;
if ( * first & & metric_only ) {
* first = false ;
aggr_printout ( config , counter , id , nr ) ;
}
if ( prefix & & ! metric_only )
fprintf ( output , " %s " , prefix ) ;
uval = val * counter - > scale ;
2020-11-26 16:13:20 +02:00
if ( cpu ! = - 1 ) {
id = cpu_map__empty_aggr_cpu_id ( ) ;
2020-11-26 16:13:27 +02:00
id . core = cpu ;
2020-11-26 16:13:20 +02:00
}
printout ( config , id , nr , counter , uval ,
prefix , run , ena , 1.0 , & rt_stat ) ;
2019-04-12 21:59:48 +08:00
if ( ! metric_only )
fputc ( ' \n ' , output ) ;
}
2018-08-30 08:32:52 +02:00
static void print_aggr ( struct perf_stat_config * config ,
2019-07-21 13:23:52 +02:00
struct evlist * evlist ,
2018-08-30 08:32:52 +02:00
char * prefix )
{
bool metric_only = config - > metric_only ;
FILE * output = config - > output ;
2019-07-21 13:23:51 +02:00
struct evsel * counter ;
2019-04-12 21:59:48 +08:00
int s ;
2018-08-30 08:32:52 +02:00
bool first ;
2020-06-05 17:17:40 +08:00
if ( ! config - > aggr_map | | ! config - > aggr_get_id )
2018-08-30 08:32:52 +02:00
return ;
aggr_update_shadow ( config , evlist ) ;
/*
* With metric_only everything is on a single line .
* Without each counter has its own line .
*/
for ( s = 0 ; s < config - > aggr_map - > nr ; s + + ) {
if ( prefix & & metric_only )
fprintf ( output , " %s " , prefix ) ;
first = true ;
evlist__for_each_entry ( evlist , counter ) {
2019-04-12 21:59:48 +08:00
print_counter_aggrdata ( config , counter , s ,
prefix , metric_only ,
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
& first , - 1 ) ;
2018-08-30 08:32:52 +02:00
}
if ( metric_only )
fputc ( ' \n ' , output ) ;
}
}
static int cmp_val ( const void * a , const void * b )
{
return ( ( struct perf_aggr_thread_value * ) b ) - > val -
( ( struct perf_aggr_thread_value * ) a ) - > val ;
}
static struct perf_aggr_thread_value * sort_aggr_thread (
2019-07-21 13:23:51 +02:00
struct evsel * counter ,
2018-08-30 08:32:52 +02:00
int nthreads , int ncpus ,
int * ret ,
struct target * _target )
{
int cpu , thread , i = 0 ;
double uval ;
struct perf_aggr_thread_value * buf ;
buf = calloc ( nthreads , sizeof ( struct perf_aggr_thread_value ) ) ;
if ( ! buf )
return NULL ;
for ( thread = 0 ; thread < nthreads ; thread + + ) {
u64 ena = 0 , run = 0 , val = 0 ;
for ( cpu = 0 ; cpu < ncpus ; cpu + + ) {
val + = perf_counts ( counter - > counts , cpu , thread ) - > val ;
ena + = perf_counts ( counter - > counts , cpu , thread ) - > ena ;
run + = perf_counts ( counter - > counts , cpu , thread ) - > run ;
}
uval = val * counter - > scale ;
/*
* Skip value 0 when enabling - - per - thread globally ,
* otherwise too many 0 output .
*/
if ( uval = = 0.0 & & target__has_per_thread ( _target ) )
continue ;
buf [ i ] . counter = counter ;
2020-11-26 16:13:20 +02:00
buf [ i ] . id = cpu_map__empty_aggr_cpu_id ( ) ;
2020-11-26 16:13:28 +02:00
buf [ i ] . id . thread = thread ;
2018-08-30 08:32:52 +02:00
buf [ i ] . uval = uval ;
buf [ i ] . val = val ;
buf [ i ] . run = run ;
buf [ i ] . ena = ena ;
i + + ;
}
qsort ( buf , i , sizeof ( struct perf_aggr_thread_value ) , cmp_val ) ;
if ( ret )
* ret = i ;
return buf ;
}
static void print_aggr_thread ( struct perf_stat_config * config ,
struct target * _target ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , char * prefix )
2018-08-30 08:32:52 +02:00
{
FILE * output = config - > output ;
2019-08-22 13:11:41 +02:00
int nthreads = perf_thread_map__nr ( counter - > core . threads ) ;
2019-08-22 13:11:38 +02:00
int ncpus = perf_cpu_map__nr ( counter - > core . cpus ) ;
2020-11-26 16:13:20 +02:00
int thread , sorted_threads ;
struct aggr_cpu_id id ;
2018-08-30 08:32:52 +02:00
struct perf_aggr_thread_value * buf ;
buf = sort_aggr_thread ( counter , nthreads , ncpus , & sorted_threads , _target ) ;
if ( ! buf ) {
perror ( " cannot sort aggr thread " ) ;
return ;
}
for ( thread = 0 ; thread < sorted_threads ; thread + + ) {
if ( prefix )
fprintf ( output , " %s " , prefix ) ;
id = buf [ thread ] . id ;
if ( config - > stats )
printout ( config , id , 0 , buf [ thread ] . counter , buf [ thread ] . uval ,
prefix , buf [ thread ] . run , buf [ thread ] . ena , 1.0 ,
2020-11-26 16:13:28 +02:00
& config - > stats [ id . thread ] ) ;
2018-08-30 08:32:52 +02:00
else
printout ( config , id , 0 , buf [ thread ] . counter , buf [ thread ] . uval ,
prefix , buf [ thread ] . run , buf [ thread ] . ena , 1.0 ,
& rt_stat ) ;
fputc ( ' \n ' , output ) ;
}
free ( buf ) ;
}
struct caggr_data {
double avg , avg_enabled , avg_running ;
} ;
static void counter_aggr_cb ( struct perf_stat_config * config __maybe_unused ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , void * data ,
2018-08-30 08:32:52 +02:00
bool first __maybe_unused )
{
struct caggr_data * cd = data ;
2021-04-22 19:38:33 -07:00
struct perf_counts_values * aggr = & counter - > counts - > aggr ;
2018-08-30 08:32:52 +02:00
2021-04-22 19:38:33 -07:00
cd - > avg + = aggr - > val ;
cd - > avg_enabled + = aggr - > ena ;
cd - > avg_running + = aggr - > run ;
2018-08-30 08:32:52 +02:00
}
/*
* Print out the results of a single counter :
* aggregated counts in system - wide mode
*/
static void print_counter_aggr ( struct perf_stat_config * config ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , char * prefix )
2018-08-30 08:32:52 +02:00
{
bool metric_only = config - > metric_only ;
FILE * output = config - > output ;
double uval ;
struct caggr_data cd = { . avg = 0.0 } ;
if ( ! collect_data ( config , counter , counter_aggr_cb , & cd ) )
return ;
if ( prefix & & ! metric_only )
fprintf ( output , " %s " , prefix ) ;
uval = cd . avg * counter - > scale ;
2020-11-26 16:13:20 +02:00
printout ( config , cpu_map__empty_aggr_cpu_id ( ) , 0 , counter , uval , prefix , cd . avg_running ,
cd . avg_enabled , cd . avg , & rt_stat ) ;
2018-08-30 08:32:52 +02:00
if ( ! metric_only )
fprintf ( output , " \n " ) ;
}
static void counter_cb ( struct perf_stat_config * config __maybe_unused ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , void * data ,
2018-08-30 08:32:52 +02:00
bool first __maybe_unused )
{
struct aggr_data * ad = data ;
ad - > val + = perf_counts ( counter - > counts , ad - > cpu , 0 ) - > val ;
ad - > ena + = perf_counts ( counter - > counts , ad - > cpu , 0 ) - > ena ;
ad - > run + = perf_counts ( counter - > counts , ad - > cpu , 0 ) - > run ;
}
/*
* Print out the results of a single counter :
* does not use aggregated count in system - wide
*/
static void print_counter ( struct perf_stat_config * config ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , char * prefix )
2018-08-30 08:32:52 +02:00
{
FILE * output = config - > output ;
u64 ena , run , val ;
double uval ;
int cpu ;
2020-11-26 16:13:20 +02:00
struct aggr_cpu_id id ;
2018-08-30 08:32:52 +02:00
2020-04-29 15:45:09 -03:00
for ( cpu = 0 ; cpu < evsel__nr_cpus ( counter ) ; cpu + + ) {
2018-08-30 08:32:52 +02:00
struct aggr_data ad = { . cpu = cpu } ;
if ( ! collect_data ( config , counter , counter_cb , & ad ) )
return ;
val = ad . val ;
ena = ad . ena ;
run = ad . run ;
if ( prefix )
fprintf ( output , " %s " , prefix ) ;
uval = val * counter - > scale ;
2020-11-26 16:13:20 +02:00
id = cpu_map__empty_aggr_cpu_id ( ) ;
2020-11-26 16:13:27 +02:00
id . core = cpu ;
2020-11-26 16:13:20 +02:00
printout ( config , id , 0 , counter , uval , prefix ,
run , ena , 1.0 , & rt_stat ) ;
2018-08-30 08:32:52 +02:00
fputc ( ' \n ' , output ) ;
}
}
static void print_no_aggr_metric ( struct perf_stat_config * config ,
2019-07-21 13:23:52 +02:00
struct evlist * evlist ,
2018-08-30 08:32:52 +02:00
char * prefix )
{
int cpu ;
int nrcpus = 0 ;
2019-07-21 13:23:51 +02:00
struct evsel * counter ;
2018-08-30 08:32:52 +02:00
u64 ena , run , val ;
double uval ;
2020-11-26 16:13:20 +02:00
struct aggr_cpu_id id ;
2018-08-30 08:32:52 +02:00
2019-07-21 13:24:41 +02:00
nrcpus = evlist - > core . cpus - > nr ;
2018-08-30 08:32:52 +02:00
for ( cpu = 0 ; cpu < nrcpus ; cpu + + ) {
bool first = true ;
if ( prefix )
fputs ( prefix , config - > output ) ;
evlist__for_each_entry ( evlist , counter ) {
2020-11-26 16:13:20 +02:00
id = cpu_map__empty_aggr_cpu_id ( ) ;
2020-11-26 16:13:27 +02:00
id . core = cpu ;
2018-08-30 08:32:52 +02:00
if ( first ) {
2020-11-26 16:13:20 +02:00
aggr_printout ( config , counter , id , 0 ) ;
2018-08-30 08:32:52 +02:00
first = false ;
}
val = perf_counts ( counter - > counts , cpu , 0 ) - > val ;
ena = perf_counts ( counter - > counts , cpu , 0 ) - > ena ;
run = perf_counts ( counter - > counts , cpu , 0 ) - > run ;
uval = val * counter - > scale ;
2020-11-26 16:13:20 +02:00
printout ( config , id , 0 , counter , uval , prefix ,
run , ena , 1.0 , & rt_stat ) ;
2018-08-30 08:32:52 +02:00
}
fputc ( ' \n ' , config - > output ) ;
}
}
static int aggr_header_lens [ ] = {
2019-06-04 15:50:42 -07:00
[ AGGR_CORE ] = 24 ,
[ AGGR_DIE ] = 18 ,
2018-08-30 08:32:52 +02:00
[ AGGR_SOCKET ] = 12 ,
[ AGGR_NONE ] = 6 ,
[ AGGR_THREAD ] = 24 ,
[ AGGR_GLOBAL ] = 0 ,
} ;
static const char * aggr_header_csv [ ] = {
[ AGGR_CORE ] = " core,cpus, " ,
2019-06-04 15:50:42 -07:00
[ AGGR_DIE ] = " die,cpus " ,
2018-08-30 08:32:52 +02:00
[ AGGR_SOCKET ] = " socket,cpus " ,
[ AGGR_NONE ] = " cpu, " ,
[ AGGR_THREAD ] = " comm-pid, " ,
[ AGGR_GLOBAL ] = " "
} ;
static void print_metric_headers ( struct perf_stat_config * config ,
2019-07-21 13:23:52 +02:00
struct evlist * evlist ,
2018-08-30 08:32:52 +02:00
const char * prefix , bool no_indent )
{
struct perf_stat_output_ctx out ;
2019-07-21 13:23:51 +02:00
struct evsel * counter ;
2018-08-30 08:32:52 +02:00
struct outstate os = {
. fh = config - > output
} ;
if ( prefix )
fprintf ( config - > output , " %s " , prefix ) ;
if ( ! config - > csv_output & & ! no_indent )
fprintf ( config - > output , " %*s " ,
aggr_header_lens [ config - > aggr_mode ] , " " ) ;
if ( config - > csv_output ) {
if ( config - > interval )
fputs ( " time, " , config - > output ) ;
2021-04-19 12:41:44 +03:00
if ( ! config - > iostat_run )
fputs ( aggr_header_csv [ config - > aggr_mode ] , config - > output ) ;
2018-08-30 08:32:52 +02:00
}
2021-04-19 12:41:44 +03:00
if ( config - > iostat_run )
iostat_print_header_prefix ( config ) ;
2018-08-30 08:32:52 +02:00
/* Print metrics headers only */
evlist__for_each_entry ( evlist , counter ) {
os . evsel = counter ;
out . ctx = & os ;
out . print_metric = print_metric_header ;
out . new_line = new_line_metric ;
out . force_header = true ;
perf_stat__print_shadow_stats ( config , counter , 0 ,
0 ,
& out ,
& config - > metric_events ,
& rt_stat ) ;
}
fputc ( ' \n ' , config - > output ) ;
}
static void print_interval ( struct perf_stat_config * config ,
2019-07-21 13:23:52 +02:00
struct evlist * evlist ,
2018-08-30 08:32:52 +02:00
char * prefix , struct timespec * ts )
{
bool metric_only = config - > metric_only ;
unsigned int unit_width = config - > unit_width ;
FILE * output = config - > output ;
static int num_print_interval ;
if ( config - > interval_clear )
puts ( CONSOLE_CLEAR ) ;
2021-04-19 12:41:44 +03:00
if ( ! config - > iostat_run )
sprintf ( prefix , " %6lu.%09lu%s " , ( unsigned long ) ts - > tv_sec , ts - > tv_nsec , config - > csv_sep ) ;
2018-08-30 08:32:52 +02:00
if ( ( num_print_interval = = 0 & & ! config - > csv_output ) | | config - > interval_clear ) {
switch ( config - > aggr_mode ) {
perf stat: Add --per-node agregation support
Adding new --per-node option to aggregate counts per NUMA
nodes for system-wide mode measurements.
You can specify --per-node in live mode:
# perf stat -a -I 1000 -e cycles --per-node
# time node cpus counts unit events
1.000542550 N0 20 6,202,097 cycles
1.000542550 N1 20 639,559 cycles
2.002040063 N0 20 7,412,495 cycles
2.002040063 N1 20 2,185,577 cycles
3.003451699 N0 20 6,508,917 cycles
3.003451699 N1 20 765,607 cycles
...
Or in the record/report stat session:
# perf stat record -a -I 1000 -e cycles
# time counts unit events
1.000536937 10,008,468 cycles
2.002090152 9,578,539 cycles
3.003625233 7,647,869 cycles
4.005135036 7,032,086 cycles
^C 4.340902364 3,923,893 cycles
# perf stat report --per-node
# time node cpus counts unit events
1.000536937 N0 20 9,355,086 cycles
1.000536937 N1 20 653,382 cycles
2.002090152 N0 20 7,712,838 cycles
2.002090152 N1 20 1,865,701 cycles
3.003625233 N0 20 6,604,441 cycles
3.003625233 N1 20 1,043,428 cycles
4.005135036 N0 20 6,350,522 cycles
4.005135036 N1 20 681,564 cycles
4.340902364 N0 20 3,403,188 cycles
4.340902364 N1 20 520,705 cycles
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28 10:17:43 +02:00
case AGGR_NODE :
fprintf ( output , " # time node cpus " ) ;
if ( ! metric_only )
fprintf ( output , " counts %*s events \n " , unit_width , " unit " ) ;
break ;
2018-08-30 08:32:52 +02:00
case AGGR_SOCKET :
fprintf ( output , " # time socket cpus " ) ;
if ( ! metric_only )
fprintf ( output , " counts %*s events \n " , unit_width , " unit " ) ;
break ;
2019-06-04 15:50:42 -07:00
case AGGR_DIE :
fprintf ( output , " # time die cpus " ) ;
if ( ! metric_only )
fprintf ( output , " counts %*s events \n " , unit_width , " unit " ) ;
break ;
2018-08-30 08:32:52 +02:00
case AGGR_CORE :
2019-06-04 15:50:42 -07:00
fprintf ( output , " # time core cpus " ) ;
2018-08-30 08:32:52 +02:00
if ( ! metric_only )
fprintf ( output , " counts %*s events \n " , unit_width , " unit " ) ;
break ;
case AGGR_NONE :
fprintf ( output , " # time CPU " ) ;
if ( ! metric_only )
fprintf ( output , " counts %*s events \n " , unit_width , " unit " ) ;
break ;
case AGGR_THREAD :
fprintf ( output , " # time comm-pid " ) ;
if ( ! metric_only )
fprintf ( output , " counts %*s events \n " , unit_width , " unit " ) ;
break ;
case AGGR_GLOBAL :
default :
2021-04-19 12:41:44 +03:00
if ( ! config - > iostat_run ) {
fprintf ( output , " # time " ) ;
if ( ! metric_only )
fprintf ( output , " counts %*s events \n " , unit_width , " unit " ) ;
}
2018-08-30 08:32:52 +02:00
case AGGR_UNSET :
break ;
}
}
if ( ( num_print_interval = = 0 | | config - > interval_clear ) & & metric_only )
print_metric_headers ( config , evlist , " " , true ) ;
if ( + + num_print_interval = = 25 )
num_print_interval = 0 ;
}
static void print_header ( struct perf_stat_config * config ,
struct target * _target ,
int argc , const char * * argv )
{
FILE * output = config - > output ;
int i ;
fflush ( stdout ) ;
if ( ! config - > csv_output ) {
fprintf ( output , " \n " ) ;
fprintf ( output , " Performance counter stats for " ) ;
perf stat: Enable counting events for BPF programs
Introduce 'perf stat -b' option, which counts events for BPF programs, like:
[root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
1.487903822 115,200 ref-cycles
1.487903822 86,012 cycles
2.489147029 80,560 ref-cycles
2.489147029 73,784 cycles
3.490341825 60,720 ref-cycles
3.490341825 37,797 cycles
4.491540887 37,120 ref-cycles
4.491540887 31,963 cycles
The example above counts 'cycles' and 'ref-cycles' of BPF program of id
254. This is similar to bpftool-prog-profile command, but more
flexible.
'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
programs (monitor-progs) to the target BPF program (target-prog). The
monitor-progs read perf_event before and after the target-prog, and
aggregate the difference in a BPF map. Then the user space reads data
from these maps.
A new 'struct bpf_counter' is introduced to provide a common interface
that uses BPF programs/maps to count perf events.
Committer notes:
Removed all but bpf_counter.h includes from evsel.h, not needed at all.
Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
buffer passed to the kernel libbpf_num_possible_cpus() entries, not
evsel__nr_cpus(evsel), as the former uses
/sys/devices/system/cpu/possible while the later uses
/sys/devices/system/cpu/online, which may be less than the 'possible'
number making the bpf map lookup overwrite memory and cause hard to
debug memory corruption.
We need to continue using evsel__nr_cpus(evsel) when accessing the
perf_counts array tho, not to overwrite another are of memory :-)
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-29 13:42:14 -08:00
if ( _target - > bpf_str )
fprintf ( output , " \' BPF program(s) %s " , _target - > bpf_str ) ;
else if ( _target - > system_wide )
2018-08-30 08:32:52 +02:00
fprintf ( output , " \' system wide " ) ;
else if ( _target - > cpu_list )
fprintf ( output , " \' CPU(s) %s " , _target - > cpu_list ) ;
else if ( ! target__has_task ( _target ) ) {
fprintf ( output , " \' %s " , argv ? argv [ 0 ] : " pipe " ) ;
for ( i = 1 ; argv & & ( i < argc ) ; i + + )
fprintf ( output , " %s " , argv [ i ] ) ;
} else if ( _target - > pid )
fprintf ( output , " process id \' %s " , _target - > pid ) ;
else
fprintf ( output , " thread id \' %s " , _target - > tid ) ;
fprintf ( output , " \' " ) ;
if ( config - > run_count > 1 )
fprintf ( output , " (%d runs) " , config - > run_count ) ;
fprintf ( output , " : \n \n " ) ;
}
}
static int get_precision ( double num )
{
if ( num > 1 )
return 0 ;
return lround ( ceil ( - log10 ( num ) ) ) ;
}
static void print_table ( struct perf_stat_config * config ,
FILE * output , int precision , double avg )
{
char tmp [ 64 ] ;
int idx , indent = 0 ;
scnprintf ( tmp , 64 , " %17.*f " , precision , avg ) ;
while ( tmp [ indent ] = = ' ' )
indent + + ;
fprintf ( output , " %*s# Table of individual measurements: \n " , indent , " " ) ;
for ( idx = 0 ; idx < config - > run_count ; idx + + ) {
double run = ( double ) config - > walltime_run [ idx ] / NSEC_PER_SEC ;
int h , n = 1 + abs ( ( int ) ( 100.0 * ( run - avg ) / run ) / 5 ) ;
fprintf ( output , " %17.*f (%+.*f) " ,
precision , run , precision , run - avg ) ;
for ( h = 0 ; h < n ; h + + )
fprintf ( output , " # " ) ;
fprintf ( output , " \n " ) ;
}
fprintf ( output , " \n %*s# Final result: \n " , indent , " " ) ;
}
static double timeval2double ( struct timeval * t )
{
return t - > tv_sec + ( double ) t - > tv_usec / USEC_PER_SEC ;
}
static void print_footer ( struct perf_stat_config * config )
{
double avg = avg_stats ( config - > walltime_nsecs_stats ) / NSEC_PER_SEC ;
FILE * output = config - > output ;
if ( ! config - > null_run )
fprintf ( output , " \n " ) ;
if ( config - > run_count = = 1 ) {
fprintf ( output , " %17.9f seconds time elapsed " , avg ) ;
if ( config - > ru_display ) {
double ru_utime = timeval2double ( & config - > ru_data . ru_utime ) ;
double ru_stime = timeval2double ( & config - > ru_data . ru_stime ) ;
fprintf ( output , " \n \n " ) ;
fprintf ( output , " %17.9f seconds user \n " , ru_utime ) ;
fprintf ( output , " %17.9f seconds sys \n " , ru_stime ) ;
}
} else {
double sd = stddev_stats ( config - > walltime_nsecs_stats ) / NSEC_PER_SEC ;
/*
* Display at most 2 more significant
* digits than the stddev inaccuracy .
*/
int precision = get_precision ( sd ) + 2 ;
if ( config - > walltime_run_table )
print_table ( config , output , precision , avg ) ;
fprintf ( output , " %17.*f +- %.*f seconds time elapsed " ,
precision , avg , precision , sd ) ;
print_noise_pct ( config , sd , avg ) ;
}
fprintf ( output , " \n \n " ) ;
2020-02-24 13:59:22 -08:00
if ( config - > print_free_counters_hint & & sysctl__nmi_watchdog_enabled ( ) )
2018-08-30 08:32:52 +02:00
fprintf ( output ,
" Some events weren't counted. Try disabling the NMI watchdog: \n "
" echo 0 > /proc/sys/kernel/nmi_watchdog \n "
" perf stat ... \n "
" echo 1 > /proc/sys/kernel/nmi_watchdog \n " ) ;
if ( config - > print_mixed_hw_group_error )
fprintf ( output ,
" The events in group usually have to be from "
" the same PMU. Try reorganizing the group. \n " ) ;
}
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
static void print_percore_thread ( struct perf_stat_config * config ,
struct evsel * counter , char * prefix )
{
2020-11-26 16:13:20 +02:00
int s ;
struct aggr_cpu_id s2 , id ;
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
bool first = true ;
2020-04-29 15:45:09 -03:00
for ( int i = 0 ; i < evsel__nr_cpus ( counter ) ; i + + ) {
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
s2 = config - > aggr_get_id ( config , evsel__cpus ( counter ) , i ) ;
for ( s = 0 ; s < config - > aggr_map - > nr ; s + + ) {
2020-11-26 16:13:23 +02:00
id = config - > aggr_map - > map [ s ] ;
2020-11-26 16:13:20 +02:00
if ( cpu_map__compare_aggr_cpu_id ( s2 , id ) )
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
break ;
}
print_counter_aggrdata ( config , counter , s ,
prefix , false ,
& first , i ) ;
}
}
perf stat: Support 'percore' event qualifier
With this patch, we can use the 'percore' event qualifier in perf-stat.
root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
1.000773050 S0-C0 98,352,832 cpu/event=0,umask=0x3,percore=1/ (50.01%)
1.000773050 S0-C1 103,763,057 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C2 196,776,995 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C3 176,493,779 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 CPU0 47,699,641 cpu/event=0,umask=0x3/ (50.02%)
1.000773050 CPU1 49,052,451 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU2 102,771,422 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU3 100,784,662 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU4 43,171,342 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU5 54,152,158 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU6 93,618,410 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU7 74,477,589 cpu/event=0,umask=0x3/ (49.99%)
In this example, we count the event 'ref-cycles' per-core and per-CPU in
one perf stat command-line. From the output, we can see:
S0-C0 = CPU0 + CPU4
S0-C1 = CPU1 + CPU5
S0-C2 = CPU2 + CPU6
S0-C3 = CPU3 + CPU7
So the result is expected (tiny difference is ignored).
Note that, the 'percore' event qualifier needs to use with option '-A'.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-12 21:59:49 +08:00
static void print_percore ( struct perf_stat_config * config ,
2019-07-21 13:23:51 +02:00
struct evsel * counter , char * prefix )
perf stat: Support 'percore' event qualifier
With this patch, we can use the 'percore' event qualifier in perf-stat.
root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
1.000773050 S0-C0 98,352,832 cpu/event=0,umask=0x3,percore=1/ (50.01%)
1.000773050 S0-C1 103,763,057 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C2 196,776,995 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C3 176,493,779 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 CPU0 47,699,641 cpu/event=0,umask=0x3/ (50.02%)
1.000773050 CPU1 49,052,451 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU2 102,771,422 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU3 100,784,662 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU4 43,171,342 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU5 54,152,158 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU6 93,618,410 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU7 74,477,589 cpu/event=0,umask=0x3/ (49.99%)
In this example, we count the event 'ref-cycles' per-core and per-CPU in
one perf stat command-line. From the output, we can see:
S0-C0 = CPU0 + CPU4
S0-C1 = CPU1 + CPU5
S0-C2 = CPU2 + CPU6
S0-C3 = CPU3 + CPU7
So the result is expected (tiny difference is ignored).
Note that, the 'percore' event qualifier needs to use with option '-A'.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-12 21:59:49 +08:00
{
bool metric_only = config - > metric_only ;
FILE * output = config - > output ;
int s ;
bool first = true ;
2020-06-05 17:17:40 +08:00
if ( ! config - > aggr_map | | ! config - > aggr_get_id )
perf stat: Support 'percore' event qualifier
With this patch, we can use the 'percore' event qualifier in perf-stat.
root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
1.000773050 S0-C0 98,352,832 cpu/event=0,umask=0x3,percore=1/ (50.01%)
1.000773050 S0-C1 103,763,057 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C2 196,776,995 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C3 176,493,779 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 CPU0 47,699,641 cpu/event=0,umask=0x3/ (50.02%)
1.000773050 CPU1 49,052,451 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU2 102,771,422 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU3 100,784,662 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU4 43,171,342 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU5 54,152,158 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU6 93,618,410 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU7 74,477,589 cpu/event=0,umask=0x3/ (49.99%)
In this example, we count the event 'ref-cycles' per-core and per-CPU in
one perf stat command-line. From the output, we can see:
S0-C0 = CPU0 + CPU4
S0-C1 = CPU1 + CPU5
S0-C2 = CPU2 + CPU6
S0-C3 = CPU3 + CPU7
So the result is expected (tiny difference is ignored).
Note that, the 'percore' event qualifier needs to use with option '-A'.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-12 21:59:49 +08:00
return ;
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
if ( config - > percore_show_thread )
return print_percore_thread ( config , counter , prefix ) ;
perf stat: Support 'percore' event qualifier
With this patch, we can use the 'percore' event qualifier in perf-stat.
root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
1.000773050 S0-C0 98,352,832 cpu/event=0,umask=0x3,percore=1/ (50.01%)
1.000773050 S0-C1 103,763,057 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C2 196,776,995 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C3 176,493,779 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 CPU0 47,699,641 cpu/event=0,umask=0x3/ (50.02%)
1.000773050 CPU1 49,052,451 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU2 102,771,422 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU3 100,784,662 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU4 43,171,342 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU5 54,152,158 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU6 93,618,410 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU7 74,477,589 cpu/event=0,umask=0x3/ (49.99%)
In this example, we count the event 'ref-cycles' per-core and per-CPU in
one perf stat command-line. From the output, we can see:
S0-C0 = CPU0 + CPU4
S0-C1 = CPU1 + CPU5
S0-C2 = CPU2 + CPU6
S0-C3 = CPU3 + CPU7
So the result is expected (tiny difference is ignored).
Note that, the 'percore' event qualifier needs to use with option '-A'.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-12 21:59:49 +08:00
for ( s = 0 ; s < config - > aggr_map - > nr ; s + + ) {
if ( prefix & & metric_only )
fprintf ( output , " %s " , prefix ) ;
print_counter_aggrdata ( config , counter , s ,
prefix , metric_only ,
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-14 16:04:52 +08:00
& first , - 1 ) ;
perf stat: Support 'percore' event qualifier
With this patch, we can use the 'percore' event qualifier in perf-stat.
root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
1.000773050 S0-C0 98,352,832 cpu/event=0,umask=0x3,percore=1/ (50.01%)
1.000773050 S0-C1 103,763,057 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C2 196,776,995 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C3 176,493,779 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 CPU0 47,699,641 cpu/event=0,umask=0x3/ (50.02%)
1.000773050 CPU1 49,052,451 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU2 102,771,422 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU3 100,784,662 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU4 43,171,342 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU5 54,152,158 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU6 93,618,410 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU7 74,477,589 cpu/event=0,umask=0x3/ (49.99%)
In this example, we count the event 'ref-cycles' per-core and per-CPU in
one perf stat command-line. From the output, we can see:
S0-C0 = CPU0 + CPU4
S0-C1 = CPU1 + CPU5
S0-C2 = CPU2 + CPU6
S0-C3 = CPU3 + CPU7
So the result is expected (tiny difference is ignored).
Note that, the 'percore' event qualifier needs to use with option '-A'.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-12 21:59:49 +08:00
}
if ( metric_only )
fputc ( ' \n ' , output ) ;
}
2020-11-30 14:55:12 -03:00
void evlist__print_counters ( struct evlist * evlist , struct perf_stat_config * config ,
struct target * _target , struct timespec * ts , int argc , const char * * argv )
2018-08-30 08:32:52 +02:00
{
bool metric_only = config - > metric_only ;
int interval = config - > interval ;
2019-07-21 13:23:51 +02:00
struct evsel * counter ;
2018-08-30 08:32:52 +02:00
char buf [ 64 ] , * prefix = NULL ;
2021-04-19 12:41:44 +03:00
if ( config - > iostat_run )
evlist - > selected = evlist__first ( evlist ) ;
2018-08-30 08:32:52 +02:00
if ( interval )
print_interval ( config , evlist , prefix = buf , ts ) ;
else
print_header ( config , _target , argc , argv ) ;
if ( metric_only ) {
static int num_print_iv ;
if ( num_print_iv = = 0 & & ! interval )
print_metric_headers ( config , evlist , prefix , false ) ;
if ( num_print_iv + + = = 25 )
num_print_iv = 0 ;
2021-04-19 12:41:44 +03:00
if ( config - > aggr_mode = = AGGR_GLOBAL & & prefix & & ! config - > iostat_run )
2018-08-30 08:32:52 +02:00
fprintf ( config - > output , " %s " , prefix ) ;
}
switch ( config - > aggr_mode ) {
case AGGR_CORE :
2019-06-04 15:50:42 -07:00
case AGGR_DIE :
2018-08-30 08:32:52 +02:00
case AGGR_SOCKET :
perf stat: Add --per-node agregation support
Adding new --per-node option to aggregate counts per NUMA
nodes for system-wide mode measurements.
You can specify --per-node in live mode:
# perf stat -a -I 1000 -e cycles --per-node
# time node cpus counts unit events
1.000542550 N0 20 6,202,097 cycles
1.000542550 N1 20 639,559 cycles
2.002040063 N0 20 7,412,495 cycles
2.002040063 N1 20 2,185,577 cycles
3.003451699 N0 20 6,508,917 cycles
3.003451699 N1 20 765,607 cycles
...
Or in the record/report stat session:
# perf stat record -a -I 1000 -e cycles
# time counts unit events
1.000536937 10,008,468 cycles
2.002090152 9,578,539 cycles
3.003625233 7,647,869 cycles
4.005135036 7,032,086 cycles
^C 4.340902364 3,923,893 cycles
# perf stat report --per-node
# time node cpus counts unit events
1.000536937 N0 20 9,355,086 cycles
1.000536937 N1 20 653,382 cycles
2.002090152 N0 20 7,712,838 cycles
2.002090152 N1 20 1,865,701 cycles
3.003625233 N0 20 6,604,441 cycles
3.003625233 N1 20 1,043,428 cycles
4.005135036 N0 20 6,350,522 cycles
4.005135036 N1 20 681,564 cycles
4.340902364 N0 20 3,403,188 cycles
4.340902364 N1 20 520,705 cycles
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28 10:17:43 +02:00
case AGGR_NODE :
2018-08-30 08:32:52 +02:00
print_aggr ( config , evlist , prefix ) ;
break ;
case AGGR_THREAD :
evlist__for_each_entry ( evlist , counter ) {
print_aggr_thread ( config , _target , counter , prefix ) ;
}
break ;
case AGGR_GLOBAL :
2021-04-19 12:41:44 +03:00
if ( config - > iostat_run )
iostat_print_counters ( evlist , config , ts , prefix = buf ,
print_counter_aggr ) ;
else {
evlist__for_each_entry ( evlist , counter ) {
print_counter_aggr ( config , counter , prefix ) ;
}
if ( metric_only )
fputc ( ' \n ' , config - > output ) ;
2018-08-30 08:32:52 +02:00
}
break ;
case AGGR_NONE :
if ( metric_only )
print_no_aggr_metric ( config , evlist , prefix ) ;
else {
evlist__for_each_entry ( evlist , counter ) {
perf stat: Support 'percore' event qualifier
With this patch, we can use the 'percore' event qualifier in perf-stat.
root@skl:/tmp# perf stat -e cpu/event=0,umask=0x3,percore=1/,cpu/event=0,umask=0x3/ -a -A -I1000
1.000773050 S0-C0 98,352,832 cpu/event=0,umask=0x3,percore=1/ (50.01%)
1.000773050 S0-C1 103,763,057 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C2 196,776,995 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 S0-C3 176,493,779 cpu/event=0,umask=0x3,percore=1/ (50.02%)
1.000773050 CPU0 47,699,641 cpu/event=0,umask=0x3/ (50.02%)
1.000773050 CPU1 49,052,451 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU2 102,771,422 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU3 100,784,662 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU4 43,171,342 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU5 54,152,158 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU6 93,618,410 cpu/event=0,umask=0x3/ (49.98%)
1.000773050 CPU7 74,477,589 cpu/event=0,umask=0x3/ (49.99%)
In this example, we count the event 'ref-cycles' per-core and per-CPU in
one perf stat command-line. From the output, we can see:
S0-C0 = CPU0 + CPU4
S0-C1 = CPU1 + CPU5
S0-C2 = CPU2 + CPU6
S0-C3 = CPU3 + CPU7
So the result is expected (tiny difference is ignored).
Note that, the 'percore' event qualifier needs to use with option '-A'.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1555077590-27664-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-12 21:59:49 +08:00
if ( counter - > percore )
print_percore ( config , counter , prefix ) ;
else
print_counter ( config , counter , prefix ) ;
2018-08-30 08:32:52 +02:00
}
}
break ;
case AGGR_UNSET :
default :
break ;
}
if ( ! interval & & ! config - > csv_output )
print_footer ( config ) ;
fflush ( config - > output ) ;
}