License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 17:07:57 +03:00
// SPDX-License-Identifier: GPL-2.0
2016-08-26 11:36:12 +03:00
/*
* This is rewrite of original c2c tool introduced in here :
* http : //lwn.net/Articles/588866/
*
* The original tool was changed to fit in current perf state .
*
* Original authors :
* Don Zickus < dzickus @ redhat . com >
* Dick Fowles < fowles @ inreach . com >
* Joe Mario < jmario @ redhat . com >
*/
2017-04-18 16:46:11 +03:00
# include <errno.h>
2017-04-17 21:23:08 +03:00
# include <inttypes.h>
2016-09-22 18:36:38 +03:00
# include <linux/compiler.h>
2019-08-22 10:20:49 +03:00
# include <linux/err.h>
2016-09-22 18:36:38 +03:00
# include <linux/kernel.h>
2016-09-22 18:36:48 +03:00
# include <linux/stringify.h>
2019-07-04 17:32:27 +03:00
# include <linux/zalloc.h>
2016-06-03 16:40:28 +03:00
# include <asm/bug.h>
2017-04-20 00:51:14 +03:00
# include <sys/param.h>
2016-09-22 18:36:38 +03:00
# include "debug.h"
# include "builtin.h"
2019-09-10 18:29:02 +03:00
# include <perf/cpumap.h>
2019-08-29 22:18:59 +03:00
# include <subcmd/pager.h>
2016-09-22 18:36:38 +03:00
# include <subcmd/parse-options.h>
2019-08-30 21:09:54 +03:00
# include "map_symbol.h"
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
# include "mem-events.h"
2016-09-22 18:36:40 +03:00
# include "session.h"
# include "hist.h"
2016-09-22 18:36:48 +03:00
# include "sort.h"
2016-09-22 18:36:40 +03:00
# include "tool.h"
2019-08-22 22:58:29 +03:00
# include "cacheline.h"
2016-09-22 18:36:40 +03:00
# include "data.h"
2017-04-25 21:30:47 +03:00
# include "event.h"
2016-08-27 12:40:23 +03:00
# include "evlist.h"
# include "evsel.h"
2016-01-06 18:59:02 +03:00
# include "ui/browsers/hists.h"
2017-04-20 03:34:35 +03:00
# include "thread.h"
2018-03-09 13:14:40 +03:00
# include "mem2node.h"
2019-01-28 02:03:34 +03:00
# include "symbol.h"
2019-08-29 22:18:59 +03:00
# include "ui/ui.h"
2019-08-30 17:28:14 +03:00
# include "ui/progress.h"
2021-05-27 03:16:10 +03:00
# include "pmu.h"
# include "pmu-hybrid.h"
2022-03-25 12:20:32 +03:00
# include "string2.h"
2023-04-10 19:25:10 +03:00
# include "util/util.h"
2016-09-22 18:36:40 +03:00
2016-09-22 18:36:41 +03:00
struct c2c_hists {
struct hists hists ;
struct perf_hpp_list list ;
2016-09-22 18:36:46 +03:00
struct c2c_stats stats ;
2016-09-22 18:36:41 +03:00
} ;
2016-06-05 14:40:53 +03:00
struct compute_stats {
struct stats lcl_hitm ;
struct stats rmt_hitm ;
2022-08-11 09:24:44 +03:00
struct stats lcl_peer ;
struct stats rmt_peer ;
2016-06-05 14:40:53 +03:00
struct stats load ;
} ;
2016-09-22 18:36:44 +03:00
struct c2c_hist_entry {
struct c2c_hists * hists ;
2016-09-22 18:36:46 +03:00
struct c2c_stats stats ;
2016-06-03 16:40:28 +03:00
unsigned long * cpuset ;
2018-03-09 13:14:40 +03:00
unsigned long * nodeset ;
2016-06-03 16:40:28 +03:00
struct c2c_stats * node_stats ;
2016-07-06 16:40:09 +03:00
unsigned int cacheline_idx ;
2016-06-05 14:40:53 +03:00
struct compute_stats cstats ;
2018-06-08 03:22:11 +03:00
unsigned long paddr ;
unsigned long paddr_cnt ;
bool paddr_zero ;
char * nodestr ;
2016-09-22 18:36:44 +03:00
/*
* must be at the end ,
* because of its callchain dynamic entry
*/
struct hist_entry he ;
} ;
2018-12-28 13:18:19 +03:00
static char const * coalesce_default = " iaddr " ;
2016-05-24 15:14:38 +03:00
2016-09-22 18:36:40 +03:00
struct perf_c2c {
2016-09-22 18:36:41 +03:00
struct perf_tool tool ;
struct c2c_hists hists ;
2018-03-09 13:14:40 +03:00
struct mem2node mem2node ;
2016-06-03 16:40:28 +03:00
unsigned long * * nodes ;
int nodes_cnt ;
int cpus_cnt ;
int * cpu2node ;
int node_info ;
2016-07-10 16:47:40 +03:00
bool show_src ;
2016-10-11 14:52:05 +03:00
bool show_all ;
2016-01-06 18:59:02 +03:00
bool use_stdio ;
2016-05-02 21:01:59 +03:00
bool stats_only ;
2016-07-10 17:25:15 +03:00
bool symbol_full ;
2020-03-19 23:25:16 +03:00
bool stitch_lbr ;
2016-07-01 12:12:11 +03:00
2021-01-14 18:46:41 +03:00
/* Shared cache line stats */
struct c2c_stats shared_clines_stats ;
2016-07-01 12:12:11 +03:00
int shared_clines ;
2016-05-29 11:21:45 +03:00
int display ;
2016-05-24 15:14:38 +03:00
const char * coalesce ;
char * cl_sort ;
char * cl_resort ;
char * cl_output ;
2016-05-29 11:21:45 +03:00
} ;
enum {
2022-08-11 09:24:45 +03:00
DISPLAY_LCL_HITM ,
DISPLAY_RMT_HITM ,
DISPLAY_TOT_HITM ,
2022-08-11 09:24:49 +03:00
DISPLAY_SNP_PEER ,
2016-11-22 00:33:30 +03:00
DISPLAY_MAX ,
} ;
static const char * display_str [ DISPLAY_MAX ] = {
2022-08-11 09:24:48 +03:00
[ DISPLAY_LCL_HITM ] = " Local HITMs " ,
[ DISPLAY_RMT_HITM ] = " Remote HITMs " ,
[ DISPLAY_TOT_HITM ] = " Total HITMs " ,
2022-08-11 09:24:49 +03:00
[ DISPLAY_SNP_PEER ] = " Peer Snoop " ,
2016-09-22 18:36:40 +03:00
} ;
2016-11-22 00:33:31 +03:00
static const struct option c2c_options [ ] = {
OPT_INCR ( ' v ' , " verbose " , & verbose , " be more verbose (show counter open errors, etc) " ) ,
OPT_END ( )
} ;
2016-09-22 18:36:40 +03:00
static struct perf_c2c c2c ;
2016-09-22 18:36:38 +03:00
2016-09-22 18:36:44 +03:00
static void * c2c_he_zalloc ( size_t size )
{
struct c2c_hist_entry * c2c_he ;
c2c_he = zalloc ( size + sizeof ( * c2c_he ) ) ;
if ( ! c2c_he )
return NULL ;
2021-09-08 05:59:35 +03:00
c2c_he - > cpuset = bitmap_zalloc ( c2c . cpus_cnt ) ;
2016-06-03 16:40:28 +03:00
if ( ! c2c_he - > cpuset )
2022-09-06 06:29:06 +03:00
goto out_free ;
2016-06-03 16:40:28 +03:00
2021-09-08 05:59:35 +03:00
c2c_he - > nodeset = bitmap_zalloc ( c2c . nodes_cnt ) ;
2018-03-09 13:14:40 +03:00
if ( ! c2c_he - > nodeset )
2022-09-06 06:29:06 +03:00
goto out_free ;
2018-03-09 13:14:40 +03:00
2016-06-03 16:40:28 +03:00
c2c_he - > node_stats = zalloc ( c2c . nodes_cnt * sizeof ( * c2c_he - > node_stats ) ) ;
if ( ! c2c_he - > node_stats )
2022-09-06 06:29:06 +03:00
goto out_free ;
2016-06-03 16:40:28 +03:00
2016-06-05 14:40:53 +03:00
init_stats ( & c2c_he - > cstats . lcl_hitm ) ;
init_stats ( & c2c_he - > cstats . rmt_hitm ) ;
2022-08-11 09:24:44 +03:00
init_stats ( & c2c_he - > cstats . lcl_peer ) ;
init_stats ( & c2c_he - > cstats . rmt_peer ) ;
2016-06-05 14:40:53 +03:00
init_stats ( & c2c_he - > cstats . load ) ;
2016-09-22 18:36:44 +03:00
return & c2c_he - > he ;
2022-09-06 06:29:06 +03:00
out_free :
2023-04-12 15:50:08 +03:00
zfree ( & c2c_he - > nodeset ) ;
zfree ( & c2c_he - > cpuset ) ;
2022-09-06 06:29:06 +03:00
free ( c2c_he ) ;
return NULL ;
2016-09-22 18:36:44 +03:00
}
static void c2c_he_free ( void * he )
{
struct c2c_hist_entry * c2c_he ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
if ( c2c_he - > hists ) {
hists__delete_entries ( & c2c_he - > hists - > hists ) ;
2023-04-12 15:50:08 +03:00
zfree ( & c2c_he - > hists ) ;
2016-09-22 18:36:44 +03:00
}
2023-04-12 15:50:08 +03:00
zfree ( & c2c_he - > cpuset ) ;
zfree ( & c2c_he - > nodeset ) ;
zfree ( & c2c_he - > nodestr ) ;
zfree ( & c2c_he - > node_stats ) ;
2016-09-22 18:36:44 +03:00
free ( c2c_he ) ;
}
static struct hist_entry_ops c2c_entry_ops = {
. new = c2c_he_zalloc ,
. free = c2c_he_free ,
} ;
2016-09-22 18:36:45 +03:00
static int c2c_hists__init ( struct c2c_hists * hists ,
2016-05-24 11:12:31 +03:00
const char * sort ,
int nr_header_lines ) ;
2016-09-22 18:36:45 +03:00
2016-09-22 18:36:46 +03:00
static struct c2c_hists *
he__get_c2c_hists ( struct hist_entry * he ,
2016-05-24 11:12:31 +03:00
const char * sort ,
int nr_header_lines )
2016-09-22 18:36:45 +03:00
{
struct c2c_hist_entry * c2c_he ;
struct c2c_hists * hists ;
int ret ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
if ( c2c_he - > hists )
2016-09-22 18:36:46 +03:00
return c2c_he - > hists ;
2016-09-22 18:36:45 +03:00
hists = c2c_he - > hists = zalloc ( sizeof ( * hists ) ) ;
if ( ! hists )
return NULL ;
2016-05-24 11:12:31 +03:00
ret = c2c_hists__init ( hists , sort , nr_header_lines ) ;
2016-09-22 18:36:45 +03:00
if ( ret ) {
free ( hists ) ;
return NULL ;
}
2016-09-22 18:36:46 +03:00
return hists ;
2016-09-22 18:36:45 +03:00
}
2016-06-03 16:40:28 +03:00
static void c2c_he__set_cpu ( struct c2c_hist_entry * c2c_he ,
struct perf_sample * sample )
{
if ( WARN_ONCE ( sample - > cpu = = ( unsigned int ) - 1 ,
" WARNING: no sample cpu value " ) )
return ;
2022-11-19 04:34:46 +03:00
__set_bit ( sample - > cpu , c2c_he - > cpuset ) ;
2016-06-03 16:40:28 +03:00
}
2018-03-09 13:14:40 +03:00
static void c2c_he__set_node ( struct c2c_hist_entry * c2c_he ,
struct perf_sample * sample )
{
int node ;
if ( ! sample - > phys_addr ) {
c2c_he - > paddr_zero = true ;
return ;
}
node = mem2node__node ( & c2c . mem2node , sample - > phys_addr ) ;
if ( WARN_ONCE ( node < 0 , " WARNING: failed to find node \n " ) )
return ;
2022-11-19 04:34:46 +03:00
__set_bit ( node , c2c_he - > nodeset ) ;
2018-03-09 13:14:40 +03:00
if ( c2c_he - > paddr ! = sample - > phys_addr ) {
c2c_he - > paddr_cnt + + ;
c2c_he - > paddr = sample - > phys_addr ;
}
}
2016-06-05 14:40:53 +03:00
static void compute_stats ( struct c2c_hist_entry * c2c_he ,
struct c2c_stats * stats ,
u64 weight )
{
struct compute_stats * cstats = & c2c_he - > cstats ;
if ( stats - > rmt_hitm )
update_stats ( & cstats - > rmt_hitm , weight ) ;
else if ( stats - > lcl_hitm )
update_stats ( & cstats - > lcl_hitm , weight ) ;
2022-08-11 09:24:44 +03:00
else if ( stats - > rmt_peer )
update_stats ( & cstats - > rmt_peer , weight ) ;
else if ( stats - > lcl_peer )
update_stats ( & cstats - > lcl_peer , weight ) ;
2016-06-05 14:40:53 +03:00
else if ( stats - > load )
update_stats ( & cstats - > load , weight ) ;
}
2016-09-22 18:36:44 +03:00
static int process_sample_event ( struct perf_tool * tool __maybe_unused ,
union perf_event * event ,
struct perf_sample * sample ,
2019-07-21 14:23:51 +03:00
struct evsel * evsel ,
2016-09-22 18:36:44 +03:00
struct machine * machine )
{
2016-09-22 18:36:46 +03:00
struct c2c_hists * c2c_hists = & c2c . hists ;
struct c2c_hist_entry * c2c_he ;
struct c2c_stats stats = { . nr_entries = 0 , } ;
2016-09-22 18:36:44 +03:00
struct hist_entry * he ;
struct addr_location al ;
2016-09-22 18:36:45 +03:00
struct mem_info * mi , * mi_dup ;
2016-09-22 18:36:44 +03:00
int ret ;
if ( machine__resolve ( machine , & al , sample ) < 0 ) {
pr_debug ( " problem processing %d event, skipping it. \n " ,
event - > header . type ) ;
return - 1 ;
}
2020-03-19 23:25:16 +03:00
if ( c2c . stitch_lbr )
al . thread - > lbr_stitch_enable = true ;
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
ret = sample__resolve_callchain ( sample , & callchain_cursor , NULL ,
evsel , & al , sysctl_perf_event_max_stack ) ;
if ( ret )
goto out ;
2016-09-22 18:36:44 +03:00
mi = sample__resolve_mem ( sample , & al ) ;
if ( mi = = NULL )
return - ENOMEM ;
2018-03-07 18:50:07 +03:00
/*
* The mi object is released in hists__add_entry_ops ,
* if it gets sorted out into existing data , so we need
* to take the copy now .
*/
mi_dup = mem_info__get ( mi ) ;
2016-09-22 18:36:45 +03:00
2016-09-22 18:36:46 +03:00
c2c_decode_stats ( & stats , mi ) ;
he = hists__add_entry_ops ( & c2c_hists - > hists , & c2c_entry_ops ,
2023-03-15 17:51:05 +03:00
& al , NULL , NULL , mi , NULL ,
2016-09-22 18:36:44 +03:00
sample , true ) ;
2016-09-22 18:36:45 +03:00
if ( he = = NULL )
2018-03-07 18:50:07 +03:00
goto free_mi ;
2016-09-22 18:36:44 +03:00
2016-09-22 18:36:46 +03:00
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
c2c_add_stats ( & c2c_he - > stats , & stats ) ;
c2c_add_stats ( & c2c_hists - > stats , & stats ) ;
2016-06-03 16:40:28 +03:00
c2c_he__set_cpu ( c2c_he , sample ) ;
2018-03-09 13:14:40 +03:00
c2c_he__set_node ( c2c_he , sample ) ;
2016-06-03 16:40:28 +03:00
2016-09-22 18:36:46 +03:00
hists__inc_nr_samples ( & c2c_hists - > hists , he - > filtered ) ;
2016-09-22 18:36:44 +03:00
ret = hist_entry__append_callchain ( he , sample ) ;
2016-09-22 18:36:45 +03:00
if ( ! ret ) {
2016-06-03 16:40:28 +03:00
/*
* There ' s already been warning about missing
* sample ' s cpu value . Let ' s account all to
* node 0 in this case , without any further
* warning .
*
* Doing node stats only for single callchain data .
*/
int cpu = sample - > cpu = = ( unsigned int ) - 1 ? 0 : sample - > cpu ;
int node = c2c . cpu2node [ cpu ] ;
2016-09-22 18:36:45 +03:00
mi = mi_dup ;
2016-05-24 15:14:38 +03:00
c2c_hists = he__get_c2c_hists ( he , c2c . cl_sort , 2 ) ;
2016-09-22 18:36:46 +03:00
if ( ! c2c_hists )
2018-03-07 18:50:07 +03:00
goto free_mi ;
2016-09-22 18:36:45 +03:00
2016-09-22 18:36:46 +03:00
he = hists__add_entry_ops ( & c2c_hists - > hists , & c2c_entry_ops ,
2023-03-15 17:51:05 +03:00
& al , NULL , NULL , mi , NULL ,
2016-09-22 18:36:45 +03:00
sample , true ) ;
if ( he = = NULL )
2018-03-07 18:50:07 +03:00
goto free_mi ;
2016-09-22 18:36:45 +03:00
2016-09-22 18:36:46 +03:00
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
c2c_add_stats ( & c2c_he - > stats , & stats ) ;
c2c_add_stats ( & c2c_hists - > stats , & stats ) ;
2016-06-03 16:40:28 +03:00
c2c_add_stats ( & c2c_he - > node_stats [ node ] , & stats ) ;
2016-06-05 14:40:53 +03:00
compute_stats ( c2c_he , & stats , sample - > weight ) ;
2016-06-03 16:40:28 +03:00
c2c_he__set_cpu ( c2c_he , sample ) ;
2018-03-09 13:14:40 +03:00
c2c_he__set_node ( c2c_he , sample ) ;
2016-09-22 18:36:46 +03:00
hists__inc_nr_samples ( & c2c_hists - > hists , he - > filtered ) ;
2016-09-22 18:36:45 +03:00
ret = hist_entry__append_callchain ( he , sample ) ;
}
out :
2016-09-22 18:36:44 +03:00
addr_location__put ( & al ) ;
return ret ;
2016-09-22 18:36:45 +03:00
free_mi :
2018-03-07 18:50:07 +03:00
mem_info__put ( mi_dup ) ;
mem_info__put ( mi ) ;
2016-09-22 18:36:45 +03:00
ret = - ENOMEM ;
goto out ;
2016-09-22 18:36:44 +03:00
}
static struct perf_c2c c2c = {
. tool = {
. sample = process_sample_event ,
. mmap = perf_event__process_mmap ,
. mmap2 = perf_event__process_mmap2 ,
. comm = perf_event__process_comm ,
. exit = perf_event__process_exit ,
. fork = perf_event__process_fork ,
. lost = perf_event__process_lost ,
2020-11-06 12:48:52 +03:00
. attr = perf_event__process_attr ,
. auxtrace_info = perf_event__process_auxtrace_info ,
. auxtrace = perf_event__process_auxtrace ,
. auxtrace_error = perf_event__process_auxtrace_error ,
2016-09-22 18:36:44 +03:00
. ordered_events = true ,
. ordering_requires_timestamps = true ,
} ,
} ;
2016-09-22 18:36:38 +03:00
static const char * const c2c_usage [ ] = {
2016-09-22 18:36:40 +03:00
" perf c2c {record|report} " ,
2016-09-22 18:36:38 +03:00
NULL
} ;
2016-09-22 18:36:40 +03:00
static const char * const __usage_report [ ] = {
" perf c2c report " ,
NULL
} ;
static const char * const * report_c2c_usage = __usage_report ;
2016-09-22 18:36:41 +03:00
# define C2C_HEADER_MAX 2
struct c2c_header {
struct {
const char * text ;
int span ;
} line [ C2C_HEADER_MAX ] ;
} ;
struct c2c_dimension {
struct c2c_header header ;
const char * name ;
int width ;
2016-09-22 18:36:42 +03:00
struct sort_entry * se ;
2016-09-22 18:36:41 +03:00
int64_t ( * cmp ) ( struct perf_hpp_fmt * fmt ,
struct hist_entry * , struct hist_entry * ) ;
int ( * entry ) ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he ) ;
int ( * color ) ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he ) ;
} ;
struct c2c_fmt {
struct perf_hpp_fmt fmt ;
struct c2c_dimension * dim ;
} ;
2016-07-10 17:25:15 +03:00
# define SYMBOL_WIDTH 30
static struct c2c_dimension dim_symbol ;
static struct c2c_dimension dim_srcline ;
static int symbol_width ( struct hists * hists , struct sort_entry * se )
{
int width = hists__col_len ( hists , se - > se_width_idx ) ;
if ( ! c2c . symbol_full )
width = MIN ( width , SYMBOL_WIDTH ) ;
return width ;
}
2016-09-22 18:36:41 +03:00
static int c2c_width ( struct perf_hpp_fmt * fmt ,
struct perf_hpp * hpp __maybe_unused ,
2016-12-12 16:52:10 +03:00
struct hists * hists )
2016-09-22 18:36:41 +03:00
{
struct c2c_fmt * c2c_fmt ;
2016-09-22 18:36:42 +03:00
struct c2c_dimension * dim ;
2016-09-22 18:36:41 +03:00
c2c_fmt = container_of ( fmt , struct c2c_fmt , fmt ) ;
2016-09-22 18:36:42 +03:00
dim = c2c_fmt - > dim ;
2016-07-10 17:25:15 +03:00
if ( dim = = & dim_symbol | | dim = = & dim_srcline )
return symbol_width ( hists , dim - > se ) ;
2016-09-22 18:36:42 +03:00
return dim - > se ? hists__col_len ( hists , dim - > se - > se_width_idx ) :
c2c_fmt - > dim - > width ;
2016-09-22 18:36:41 +03:00
}
static int c2c_header ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
2016-09-22 18:36:42 +03:00
struct hists * hists , int line , int * span )
2016-09-22 18:36:41 +03:00
{
2016-09-22 18:36:42 +03:00
struct perf_hpp_list * hpp_list = hists - > hpp_list ;
2016-09-22 18:36:41 +03:00
struct c2c_fmt * c2c_fmt ;
struct c2c_dimension * dim ;
2016-09-22 18:36:42 +03:00
const char * text = NULL ;
int width = c2c_width ( fmt , hpp , hists ) ;
2016-09-22 18:36:41 +03:00
c2c_fmt = container_of ( fmt , struct c2c_fmt , fmt ) ;
dim = c2c_fmt - > dim ;
2016-09-22 18:36:42 +03:00
if ( dim - > se ) {
text = dim - > header . line [ line ] . text ;
/* Use the last line from sort_entry if not defined. */
if ( ! text & & ( line = = hpp_list - > nr_header_lines - 1 ) )
text = dim - > se - > se_header ;
2016-09-22 18:36:41 +03:00
} else {
2016-09-22 18:36:42 +03:00
text = dim - > header . line [ line ] . text ;
if ( * span ) {
( * span ) - - ;
return 0 ;
} else {
* span = dim - > header . line [ line ] . span ;
}
2016-09-22 18:36:41 +03:00
}
2016-09-22 18:36:42 +03:00
if ( text = = NULL )
text = " " ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , text ) ;
2016-09-22 18:36:41 +03:00
}
2016-09-22 18:36:48 +03:00
# define HEX_STR(__s, __v) \
( { \
scnprintf ( __s , sizeof ( __s ) , " 0x% " PRIx64 , __v ) ; \
__s ; \
} )
static int64_t
dcacheline_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
return sort__dcacheline_cmp ( left , right ) ;
}
static int dcacheline_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
uint64_t addr = 0 ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
char buf [ 20 ] ;
if ( he - > mem_info )
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 10:58:23 +03:00
addr = cl_address ( he - > mem_info - > daddr . addr , chk_double_cl ) ;
2016-09-22 18:36:48 +03:00
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , HEX_STR ( buf , addr ) ) ;
}
2018-03-09 13:14:40 +03:00
static int
dcacheline_node_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
if ( WARN_ON_ONCE ( ! c2c_he - > nodestr ) )
return 0 ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , c2c_he - > nodestr ) ;
}
2018-03-09 13:14:42 +03:00
static int
dcacheline_node_count ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
return scnprintf ( hpp - > buf , hpp - > size , " %*lu " , width , c2c_he - > paddr_cnt ) ;
}
2016-04-29 15:37:06 +03:00
static int offset_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
uint64_t addr = 0 ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
char buf [ 20 ] ;
if ( he - > mem_info )
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 10:58:23 +03:00
addr = cl_offset ( he - > mem_info - > daddr . al_addr , chk_double_cl ) ;
2016-04-29 15:37:06 +03:00
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , HEX_STR ( buf , addr ) ) ;
}
static int64_t
offset_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
uint64_t l = 0 , r = 0 ;
if ( left - > mem_info )
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 10:58:23 +03:00
l = cl_offset ( left - > mem_info - > daddr . addr , chk_double_cl ) ;
2016-04-29 15:37:06 +03:00
if ( right - > mem_info )
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 10:58:23 +03:00
r = cl_offset ( right - > mem_info - > daddr . addr , chk_double_cl ) ;
2016-04-29 15:37:06 +03:00
return ( int64_t ) ( r - l ) ;
}
2016-05-03 22:48:56 +03:00
static int
iaddr_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
uint64_t addr = 0 ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
char buf [ 20 ] ;
if ( he - > mem_info )
addr = he - > mem_info - > iaddr . addr ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , HEX_STR ( buf , addr ) ) ;
}
static int64_t
iaddr_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
return sort__iaddr_cmp ( left , right ) ;
}
2016-05-23 17:20:14 +03:00
static int
tot_hitm_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
unsigned int tot_hitm ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
tot_hitm = c2c_he - > stats . lcl_hitm + c2c_he - > stats . rmt_hitm ;
return scnprintf ( hpp - > buf , hpp - > size , " %*u " , width , tot_hitm ) ;
}
static int64_t
tot_hitm_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
struct c2c_hist_entry * c2c_left ;
struct c2c_hist_entry * c2c_right ;
2020-01-09 07:30:30 +03:00
uint64_t tot_hitm_left ;
uint64_t tot_hitm_right ;
2016-05-23 17:20:14 +03:00
c2c_left = container_of ( left , struct c2c_hist_entry , he ) ;
c2c_right = container_of ( right , struct c2c_hist_entry , he ) ;
tot_hitm_left = c2c_left - > stats . lcl_hitm + c2c_left - > stats . rmt_hitm ;
tot_hitm_right = c2c_right - > stats . lcl_hitm + c2c_right - > stats . rmt_hitm ;
return tot_hitm_left - tot_hitm_right ;
}
# define STAT_FN_ENTRY(__f) \
static int \
__f # # _entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp , \
struct hist_entry * he ) \
{ \
struct c2c_hist_entry * c2c_he ; \
int width = c2c_width ( fmt , hpp , he - > hists ) ; \
\
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ; \
return scnprintf ( hpp - > buf , hpp - > size , " %*u " , width , \
c2c_he - > stats . __f ) ; \
}
# define STAT_FN_CMP(__f) \
static int64_t \
__f # # _cmp ( struct perf_hpp_fmt * fmt __maybe_unused , \
struct hist_entry * left , struct hist_entry * right ) \
{ \
struct c2c_hist_entry * c2c_left , * c2c_right ; \
\
c2c_left = container_of ( left , struct c2c_hist_entry , he ) ; \
c2c_right = container_of ( right , struct c2c_hist_entry , he ) ; \
2020-01-09 07:30:30 +03:00
return ( uint64_t ) c2c_left - > stats . __f - \
( uint64_t ) c2c_right - > stats . __f ; \
2016-05-23 17:20:14 +03:00
}
# define STAT_FN(__f) \
STAT_FN_ENTRY ( __f ) \
STAT_FN_CMP ( __f )
STAT_FN ( rmt_hitm )
STAT_FN ( lcl_hitm )
2022-08-11 09:24:42 +03:00
STAT_FN ( rmt_peer )
STAT_FN ( lcl_peer )
STAT_FN ( tot_peer )
2016-05-04 11:10:11 +03:00
STAT_FN ( store )
STAT_FN ( st_l1hit )
STAT_FN ( st_l1miss )
2022-05-18 08:57:20 +03:00
STAT_FN ( st_na )
2016-05-04 11:18:24 +03:00
STAT_FN ( ld_fbhit )
STAT_FN ( ld_l1hit )
STAT_FN ( ld_l2hit )
2016-05-04 11:27:51 +03:00
STAT_FN ( ld_llchit )
STAT_FN ( rmt_hit )
2016-05-23 17:20:14 +03:00
2022-09-06 06:29:05 +03:00
static uint64_t get_load_llc_misses ( struct c2c_stats * stats )
2016-05-04 11:35:29 +03:00
{
2022-09-06 06:29:05 +03:00
return stats - > lcl_dram +
stats - > rmt_dram +
stats - > rmt_hitm +
stats - > rmt_hit ;
}
2016-05-04 11:35:29 +03:00
2022-09-06 06:29:05 +03:00
static uint64_t get_load_cache_hits ( struct c2c_stats * stats )
{
return stats - > ld_fbhit +
stats - > ld_l1hit +
stats - > ld_l2hit +
stats - > ld_llchit +
stats - > lcl_hitm ;
}
2016-05-04 11:35:29 +03:00
2022-09-06 06:29:05 +03:00
static uint64_t get_stores ( struct c2c_stats * stats )
{
return stats - > st_l1hit +
stats - > st_l1miss +
stats - > st_na ;
}
2016-05-04 11:35:29 +03:00
2022-09-06 06:29:05 +03:00
static uint64_t total_records ( struct c2c_stats * stats )
{
return get_load_llc_misses ( stats ) +
get_load_cache_hits ( stats ) +
get_stores ( stats ) ;
2016-05-04 11:35:29 +03:00
}
static int
tot_recs_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
uint64_t tot_recs ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
tot_recs = total_records ( & c2c_he - > stats ) ;
return scnprintf ( hpp - > buf , hpp - > size , " %* " PRIu64 , width , tot_recs ) ;
}
static int64_t
tot_recs_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
struct c2c_hist_entry * c2c_left ;
struct c2c_hist_entry * c2c_right ;
uint64_t tot_recs_left ;
uint64_t tot_recs_right ;
c2c_left = container_of ( left , struct c2c_hist_entry , he ) ;
c2c_right = container_of ( right , struct c2c_hist_entry , he ) ;
tot_recs_left = total_records ( & c2c_left - > stats ) ;
tot_recs_right = total_records ( & c2c_right - > stats ) ;
return tot_recs_left - tot_recs_right ;
}
2016-05-19 10:52:37 +03:00
static uint64_t total_loads ( struct c2c_stats * stats )
{
2022-09-06 06:29:05 +03:00
return get_load_llc_misses ( stats ) +
get_load_cache_hits ( stats ) ;
2016-05-19 10:52:37 +03:00
}
static int
tot_loads_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
uint64_t tot_recs ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
tot_recs = total_loads ( & c2c_he - > stats ) ;
return scnprintf ( hpp - > buf , hpp - > size , " %* " PRIu64 , width , tot_recs ) ;
}
static int64_t
tot_loads_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
struct c2c_hist_entry * c2c_left ;
struct c2c_hist_entry * c2c_right ;
uint64_t tot_recs_left ;
uint64_t tot_recs_right ;
c2c_left = container_of ( left , struct c2c_hist_entry , he ) ;
c2c_right = container_of ( right , struct c2c_hist_entry , he ) ;
tot_recs_left = total_loads ( & c2c_left - > stats ) ;
tot_recs_right = total_loads ( & c2c_right - > stats ) ;
return tot_recs_left - tot_recs_right ;
}
2016-05-04 11:50:09 +03:00
typedef double ( get_percent_cb ) ( struct c2c_hist_entry * ) ;
static int
percent_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he , get_percent_cb get_percent )
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
double per ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
per = get_percent ( c2c_he ) ;
2016-01-06 18:59:02 +03:00
# ifdef HAVE_SLANG_SUPPORT
if ( use_browser )
return __hpp__slsmg_color_printf ( hpp , " %*.2f%% " , width - 1 , per ) ;
# endif
2016-05-04 11:50:09 +03:00
return hpp_color_scnprintf ( hpp , " %*.2f%% " , width - 1 , per ) ;
}
2022-08-11 09:24:46 +03:00
static double percent_costly_snoop ( struct c2c_hist_entry * c2c_he )
2016-05-04 11:50:09 +03:00
{
struct c2c_hists * hists ;
struct c2c_stats * stats ;
struct c2c_stats * total ;
2016-05-29 11:21:45 +03:00
int tot = 0 , st = 0 ;
2016-05-04 11:50:09 +03:00
double p ;
hists = container_of ( c2c_he - > he . hists , struct c2c_hists , hists ) ;
stats = & c2c_he - > stats ;
total = & hists - > stats ;
2016-05-29 11:21:45 +03:00
switch ( c2c . display ) {
2022-08-11 09:24:45 +03:00
case DISPLAY_RMT_HITM :
2016-05-29 11:21:45 +03:00
st = stats - > rmt_hitm ;
tot = total - > rmt_hitm ;
break ;
2022-08-11 09:24:45 +03:00
case DISPLAY_LCL_HITM :
2016-05-29 11:21:45 +03:00
st = stats - > lcl_hitm ;
tot = total - > lcl_hitm ;
2016-11-22 00:33:30 +03:00
break ;
2022-08-11 09:24:45 +03:00
case DISPLAY_TOT_HITM :
2016-11-22 00:33:30 +03:00
st = stats - > tot_hitm ;
tot = total - > tot_hitm ;
2022-08-11 09:24:49 +03:00
break ;
case DISPLAY_SNP_PEER :
st = stats - > tot_peer ;
tot = total - > tot_peer ;
break ;
2016-05-29 11:21:45 +03:00
default :
break ;
}
2016-05-04 11:50:09 +03:00
p = tot ? ( double ) st / tot : 0 ;
return 100 * p ;
}
# define PERC_STR(__s, __v) \
( { \
scnprintf ( __s , sizeof ( __s ) , " %.2F%% " , __v ) ; \
__s ; \
} )
static int
2022-08-11 09:24:46 +03:00
percent_costly_snoop_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
2016-05-04 11:50:09 +03:00
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
char buf [ 10 ] ;
double per ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
2022-08-11 09:24:46 +03:00
per = percent_costly_snoop ( c2c_he ) ;
2016-05-04 11:50:09 +03:00
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , PERC_STR ( buf , per ) ) ;
}
static int
2022-08-11 09:24:46 +03:00
percent_costly_snoop_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
2016-05-04 11:50:09 +03:00
{
2022-08-11 09:24:46 +03:00
return percent_color ( fmt , hpp , he , percent_costly_snoop ) ;
2016-05-04 11:50:09 +03:00
}
static int64_t
2022-08-11 09:24:46 +03:00
percent_costly_snoop_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
2016-05-04 11:50:09 +03:00
{
struct c2c_hist_entry * c2c_left ;
struct c2c_hist_entry * c2c_right ;
double per_left ;
double per_right ;
c2c_left = container_of ( left , struct c2c_hist_entry , he ) ;
c2c_right = container_of ( right , struct c2c_hist_entry , he ) ;
2022-08-11 09:24:46 +03:00
per_left = percent_costly_snoop ( c2c_left ) ;
per_right = percent_costly_snoop ( c2c_right ) ;
2016-05-04 11:50:09 +03:00
return per_left - per_right ;
}
2016-05-04 13:16:50 +03:00
static struct c2c_stats * he_stats ( struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
return & c2c_he - > stats ;
}
static struct c2c_stats * total_stats ( struct hist_entry * he )
{
struct c2c_hists * hists ;
hists = container_of ( he - > hists , struct c2c_hists , hists ) ;
return & hists - > stats ;
}
2021-01-14 18:46:44 +03:00
static double percent ( u32 st , u32 tot )
2016-05-04 13:16:50 +03:00
{
return tot ? 100. * ( double ) st / ( double ) tot : 0 ;
}
# define PERCENT(__h, __f) percent(he_stats(__h)->__f, total_stats(__h)->__f)
# define PERCENT_FN(__f) \
static double percent_ # # __f ( struct c2c_hist_entry * c2c_he ) \
{ \
struct c2c_hists * hists ; \
\
hists = container_of ( c2c_he - > he . hists , struct c2c_hists , hists ) ; \
return percent ( c2c_he - > stats . __f , hists - > stats . __f ) ; \
}
PERCENT_FN ( rmt_hitm )
PERCENT_FN ( lcl_hitm )
2022-08-11 09:24:43 +03:00
PERCENT_FN ( rmt_peer )
PERCENT_FN ( lcl_peer )
2016-05-04 13:16:50 +03:00
PERCENT_FN ( st_l1hit )
PERCENT_FN ( st_l1miss )
2022-05-18 08:57:20 +03:00
PERCENT_FN ( st_na )
2016-05-04 13:16:50 +03:00
static int
percent_rmt_hitm_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
double per = PERCENT ( he , rmt_hitm ) ;
char buf [ 10 ] ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , PERC_STR ( buf , per ) ) ;
}
static int
percent_rmt_hitm_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
return percent_color ( fmt , hpp , he , percent_rmt_hitm ) ;
}
static int64_t
percent_rmt_hitm_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
double per_left ;
double per_right ;
2022-05-30 11:42:53 +03:00
per_left = PERCENT ( left , rmt_hitm ) ;
per_right = PERCENT ( right , rmt_hitm ) ;
2016-05-04 13:16:50 +03:00
return per_left - per_right ;
}
static int
percent_lcl_hitm_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
double per = PERCENT ( he , lcl_hitm ) ;
char buf [ 10 ] ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , PERC_STR ( buf , per ) ) ;
}
static int
percent_lcl_hitm_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
return percent_color ( fmt , hpp , he , percent_lcl_hitm ) ;
}
static int64_t
percent_lcl_hitm_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
double per_left ;
double per_right ;
per_left = PERCENT ( left , lcl_hitm ) ;
per_right = PERCENT ( right , lcl_hitm ) ;
return per_left - per_right ;
}
2022-08-11 09:24:43 +03:00
static int
percent_lcl_peer_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
double per = PERCENT ( he , lcl_peer ) ;
char buf [ 10 ] ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , PERC_STR ( buf , per ) ) ;
}
static int
percent_lcl_peer_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
return percent_color ( fmt , hpp , he , percent_lcl_peer ) ;
}
static int64_t
percent_lcl_peer_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
double per_left ;
double per_right ;
per_left = PERCENT ( left , lcl_peer ) ;
per_right = PERCENT ( right , lcl_peer ) ;
return per_left - per_right ;
}
static int
percent_rmt_peer_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
double per = PERCENT ( he , rmt_peer ) ;
char buf [ 10 ] ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , PERC_STR ( buf , per ) ) ;
}
static int
percent_rmt_peer_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
return percent_color ( fmt , hpp , he , percent_rmt_peer ) ;
}
static int64_t
percent_rmt_peer_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
double per_left ;
double per_right ;
per_left = PERCENT ( left , rmt_peer ) ;
per_right = PERCENT ( right , rmt_peer ) ;
return per_left - per_right ;
}
2016-05-04 13:16:50 +03:00
static int
percent_stores_l1hit_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
double per = PERCENT ( he , st_l1hit ) ;
char buf [ 10 ] ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , PERC_STR ( buf , per ) ) ;
}
static int
percent_stores_l1hit_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
return percent_color ( fmt , hpp , he , percent_st_l1hit ) ;
}
static int64_t
percent_stores_l1hit_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
double per_left ;
double per_right ;
per_left = PERCENT ( left , st_l1hit ) ;
per_right = PERCENT ( right , st_l1hit ) ;
return per_left - per_right ;
}
static int
percent_stores_l1miss_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
double per = PERCENT ( he , st_l1miss ) ;
char buf [ 10 ] ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , PERC_STR ( buf , per ) ) ;
}
static int
percent_stores_l1miss_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
return percent_color ( fmt , hpp , he , percent_st_l1miss ) ;
}
static int64_t
percent_stores_l1miss_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
double per_left ;
double per_right ;
per_left = PERCENT ( left , st_l1miss ) ;
per_right = PERCENT ( right , st_l1miss ) ;
return per_left - per_right ;
}
2022-05-18 08:57:20 +03:00
static int
percent_stores_na_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
double per = PERCENT ( he , st_na ) ;
char buf [ 10 ] ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , PERC_STR ( buf , per ) ) ;
}
static int
percent_stores_na_color ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
return percent_color ( fmt , hpp , he , percent_st_na ) ;
}
static int64_t
percent_stores_na_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
double per_left ;
double per_right ;
per_left = PERCENT ( left , st_na ) ;
per_right = PERCENT ( right , st_na ) ;
return per_left - per_right ;
}
2016-05-28 13:30:13 +03:00
STAT_FN ( lcl_dram )
STAT_FN ( rmt_dram )
2016-05-24 14:09:47 +03:00
static int
pid_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
return scnprintf ( hpp - > buf , hpp - > size , " %*d " , width , he - > thread - > pid_ ) ;
}
static int64_t
pid_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left , struct hist_entry * right )
{
return left - > thread - > pid_ - right - > thread - > pid_ ;
}
2016-06-03 16:40:28 +03:00
static int64_t
empty_cmp ( struct perf_hpp_fmt * fmt __maybe_unused ,
struct hist_entry * left __maybe_unused ,
struct hist_entry * right __maybe_unused )
{
return 0 ;
}
2021-01-14 18:46:45 +03:00
static int display_metrics ( struct perf_hpp * hpp , u32 val , u32 sum )
{
int ret ;
if ( sum ! = 0 )
ret = scnprintf ( hpp - > buf , hpp - > size , " %5.1f%% " ,
percent ( val , sum ) ) ;
else
ret = scnprintf ( hpp - > buf , hpp - > size , " %6s " , " n/a " ) ;
return ret ;
}
2016-06-03 16:40:28 +03:00
static int
node_entry ( struct perf_hpp_fmt * fmt __maybe_unused , struct perf_hpp * hpp ,
struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
bool first = true ;
int node ;
int ret = 0 ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
for ( node = 0 ; node < c2c . nodes_cnt ; node + + ) {
DECLARE_BITMAP ( set , c2c . cpus_cnt ) ;
bitmap_zero ( set , c2c . cpus_cnt ) ;
bitmap_and ( set , c2c_he - > cpuset , c2c . nodes [ node ] , c2c . cpus_cnt ) ;
2022-01-23 21:38:43 +03:00
if ( bitmap_empty ( set , c2c . cpus_cnt ) ) {
2016-06-03 16:40:28 +03:00
if ( c2c . node_info = = 1 ) {
ret = scnprintf ( hpp - > buf , hpp - > size , " %21s " , " " ) ;
advance_hpp ( hpp , ret ) ;
}
continue ;
}
if ( ! first ) {
ret = scnprintf ( hpp - > buf , hpp - > size , " " ) ;
advance_hpp ( hpp , ret ) ;
}
switch ( c2c . node_info ) {
case 0 :
ret = scnprintf ( hpp - > buf , hpp - > size , " %2d " , node ) ;
advance_hpp ( hpp , ret ) ;
break ;
case 1 :
{
2019-08-20 17:02:19 +03:00
int num = bitmap_weight ( set , c2c . cpus_cnt ) ;
2016-06-03 16:40:28 +03:00
struct c2c_stats * stats = & c2c_he - > node_stats [ node ] ;
ret = scnprintf ( hpp - > buf , hpp - > size , " %2d{%2d " , node , num ) ;
advance_hpp ( hpp , ret ) ;
2016-05-29 11:21:45 +03:00
switch ( c2c . display ) {
2022-08-11 09:24:45 +03:00
case DISPLAY_RMT_HITM :
2021-01-14 18:46:45 +03:00
ret = display_metrics ( hpp , stats - > rmt_hitm ,
c2c_he - > stats . rmt_hitm ) ;
2016-05-29 11:21:45 +03:00
break ;
2022-08-11 09:24:45 +03:00
case DISPLAY_LCL_HITM :
2021-01-14 18:46:45 +03:00
ret = display_metrics ( hpp , stats - > lcl_hitm ,
c2c_he - > stats . lcl_hitm ) ;
2016-11-22 00:33:30 +03:00
break ;
2022-08-11 09:24:45 +03:00
case DISPLAY_TOT_HITM :
2021-01-14 18:46:45 +03:00
ret = display_metrics ( hpp , stats - > tot_hitm ,
c2c_he - > stats . tot_hitm ) ;
break ;
2022-08-11 09:24:49 +03:00
case DISPLAY_SNP_PEER :
ret = display_metrics ( hpp , stats - > tot_peer ,
c2c_he - > stats . tot_peer ) ;
break ;
2016-05-29 11:21:45 +03:00
default :
break ;
2016-06-03 16:40:28 +03:00
}
advance_hpp ( hpp , ret ) ;
if ( c2c_he - > stats . store > 0 ) {
ret = scnprintf ( hpp - > buf , hpp - > size , " %5.1f%%} " ,
percent ( stats - > store , c2c_he - > stats . store ) ) ;
} else {
ret = scnprintf ( hpp - > buf , hpp - > size , " %6s} " , " n/a " ) ;
}
advance_hpp ( hpp , ret ) ;
break ;
}
case 2 :
ret = scnprintf ( hpp - > buf , hpp - > size , " %2d{ " , node ) ;
advance_hpp ( hpp , ret ) ;
ret = bitmap_scnprintf ( set , c2c . cpus_cnt , hpp - > buf , hpp - > size ) ;
advance_hpp ( hpp , ret ) ;
ret = scnprintf ( hpp - > buf , hpp - > size , " } " ) ;
advance_hpp ( hpp , ret ) ;
break ;
default :
break ;
}
first = false ;
}
return 0 ;
}
2016-06-05 14:40:53 +03:00
static int
mean_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he , double mean )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
char buf [ 10 ] ;
scnprintf ( buf , 10 , " %6.0f " , mean ) ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , buf ) ;
}
# define MEAN_ENTRY(__func, __val) \
static int \
__func ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp , struct hist_entry * he ) \
{ \
struct c2c_hist_entry * c2c_he ; \
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ; \
return mean_entry ( fmt , hpp , he , avg_stats ( & c2c_he - > cstats . __val ) ) ; \
}
MEAN_ENTRY ( mean_rmt_entry , rmt_hitm ) ;
MEAN_ENTRY ( mean_lcl_entry , lcl_hitm ) ;
MEAN_ENTRY ( mean_load_entry , load ) ;
2022-08-11 09:24:44 +03:00
MEAN_ENTRY ( mean_rmt_peer_entry , rmt_peer ) ;
MEAN_ENTRY ( mean_lcl_peer_entry , lcl_peer ) ;
2016-06-05 14:40:53 +03:00
2016-06-24 00:05:52 +03:00
static int
2016-12-12 16:52:10 +03:00
cpucnt_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
2016-06-24 00:05:52 +03:00
struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
char buf [ 10 ] ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
scnprintf ( buf , 10 , " %d " , bitmap_weight ( c2c_he - > cpuset , c2c . cpus_cnt ) ) ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , buf ) ;
}
2016-07-06 16:40:09 +03:00
static int
2016-12-12 16:52:10 +03:00
cl_idx_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
2016-07-06 16:40:09 +03:00
struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
int width = c2c_width ( fmt , hpp , he - > hists ) ;
char buf [ 10 ] ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
scnprintf ( buf , 10 , " %u " , c2c_he - > cacheline_idx ) ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , buf ) ;
}
static int
2016-12-12 16:52:10 +03:00
cl_idx_empty_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
2016-07-06 16:40:09 +03:00
struct hist_entry * he )
{
int width = c2c_width ( fmt , hpp , he - > hists ) ;
return scnprintf ( hpp - > buf , hpp - > size , " %*s " , width , " " ) ;
}
2016-09-22 18:36:47 +03:00
# define HEADER_LOW(__h) \
{ \
. line [ 1 ] = { \
. text = __h , \
} , \
}
# define HEADER_BOTH(__h0, __h1) \
{ \
. line [ 0 ] = { \
. text = __h0 , \
} , \
. line [ 1 ] = { \
. text = __h1 , \
} , \
}
# define HEADER_SPAN(__h0, __h1, __s) \
{ \
. line [ 0 ] = { \
. text = __h0 , \
. span = __s , \
} , \
. line [ 1 ] = { \
. text = __h1 , \
} , \
}
# define HEADER_SPAN_LOW(__h) \
{ \
. line [ 1 ] = { \
. text = __h , \
} , \
}
2016-09-22 18:36:48 +03:00
static struct c2c_dimension dim_dcacheline = {
2018-03-09 13:14:42 +03:00
. header = HEADER_SPAN ( " --- Cacheline ---- " , " Address " , 2 ) ,
2016-09-22 18:36:48 +03:00
. name = " dcacheline " ,
. cmp = dcacheline_cmp ,
. entry = dcacheline_entry ,
. width = 18 ,
} ;
2018-03-09 13:14:40 +03:00
static struct c2c_dimension dim_dcacheline_node = {
. header = HEADER_LOW ( " Node " ) ,
. name = " dcacheline_node " ,
. cmp = empty_cmp ,
. entry = dcacheline_node_entry ,
. width = 4 ,
} ;
2018-03-09 13:14:42 +03:00
static struct c2c_dimension dim_dcacheline_count = {
. header = HEADER_LOW ( " PA cnt " ) ,
. name = " dcacheline_count " ,
. cmp = empty_cmp ,
. entry = dcacheline_node_count ,
. width = 6 ,
} ;
static struct c2c_header header_offset_tui = HEADER_SPAN ( " ----- " , " Off " , 2 ) ;
2016-01-06 18:59:02 +03:00
2016-04-29 15:37:06 +03:00
static struct c2c_dimension dim_offset = {
2018-03-09 13:14:42 +03:00
. header = HEADER_SPAN ( " --- Data address - " , " Offset " , 2 ) ,
2016-04-29 15:37:06 +03:00
. name = " offset " ,
. cmp = offset_cmp ,
. entry = offset_entry ,
. width = 18 ,
} ;
2018-03-09 13:14:40 +03:00
static struct c2c_dimension dim_offset_node = {
. header = HEADER_LOW ( " Node " ) ,
. name = " offset_node " ,
. cmp = empty_cmp ,
. entry = dcacheline_node_entry ,
. width = 4 ,
} ;
2016-05-03 22:48:56 +03:00
static struct c2c_dimension dim_iaddr = {
. header = HEADER_LOW ( " Code address " ) ,
. name = " iaddr " ,
. cmp = iaddr_cmp ,
. entry = iaddr_entry ,
. width = 18 ,
} ;
2016-05-23 17:20:14 +03:00
static struct c2c_dimension dim_tot_hitm = {
2020-10-14 08:09:17 +03:00
. header = HEADER_SPAN ( " ------- Load Hitm ------- " , " Total " , 2 ) ,
2016-05-23 17:20:14 +03:00
. name = " tot_hitm " ,
. cmp = tot_hitm_cmp ,
. entry = tot_hitm_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_lcl_hitm = {
2020-10-14 08:09:18 +03:00
. header = HEADER_SPAN_LOW ( " LclHitm " ) ,
2016-05-23 17:20:14 +03:00
. name = " lcl_hitm " ,
. cmp = lcl_hitm_cmp ,
. entry = lcl_hitm_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_rmt_hitm = {
2020-10-14 08:09:18 +03:00
. header = HEADER_SPAN_LOW ( " RmtHitm " ) ,
2016-05-23 17:20:14 +03:00
. name = " rmt_hitm " ,
. cmp = rmt_hitm_cmp ,
. entry = rmt_hitm_entry ,
. width = 7 ,
} ;
2022-08-11 09:24:42 +03:00
static struct c2c_dimension dim_tot_peer = {
. header = HEADER_SPAN ( " ------- Load Peer ------- " , " Total " , 2 ) ,
. name = " tot_peer " ,
. cmp = tot_peer_cmp ,
. entry = tot_peer_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_lcl_peer = {
. header = HEADER_SPAN_LOW ( " Local " ) ,
. name = " lcl_peer " ,
. cmp = lcl_peer_cmp ,
. entry = lcl_peer_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_rmt_peer = {
. header = HEADER_SPAN_LOW ( " Remote " ) ,
. name = " rmt_peer " ,
. cmp = rmt_peer_cmp ,
. entry = rmt_peer_entry ,
. width = 7 ,
} ;
2016-05-23 17:20:14 +03:00
static struct c2c_dimension dim_cl_rmt_hitm = {
. header = HEADER_SPAN ( " ----- HITM ----- " , " Rmt " , 1 ) ,
. name = " cl_rmt_hitm " ,
. cmp = rmt_hitm_cmp ,
. entry = rmt_hitm_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_cl_lcl_hitm = {
. header = HEADER_SPAN_LOW ( " Lcl " ) ,
. name = " cl_lcl_hitm " ,
. cmp = lcl_hitm_cmp ,
. entry = lcl_hitm_entry ,
. width = 7 ,
} ;
2022-08-11 09:24:43 +03:00
static struct c2c_dimension dim_cl_rmt_peer = {
. header = HEADER_SPAN ( " ----- Peer ----- " , " Rmt " , 1 ) ,
. name = " cl_rmt_peer " ,
. cmp = rmt_peer_cmp ,
. entry = rmt_peer_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_cl_lcl_peer = {
. header = HEADER_SPAN_LOW ( " Lcl " ) ,
. name = " cl_lcl_peer " ,
. cmp = lcl_peer_cmp ,
. entry = lcl_peer_entry ,
. width = 7 ,
} ;
2020-10-14 08:09:15 +03:00
static struct c2c_dimension dim_tot_stores = {
. header = HEADER_BOTH ( " Total " , " Stores " ) ,
. name = " tot_stores " ,
2016-05-04 11:10:11 +03:00
. cmp = store_cmp ,
. entry = store_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_stores_l1hit = {
2022-05-18 08:57:20 +03:00
. header = HEADER_SPAN ( " --------- Stores -------- " , " L1Hit " , 2 ) ,
2016-05-04 11:10:11 +03:00
. name = " stores_l1hit " ,
. cmp = st_l1hit_cmp ,
. entry = st_l1hit_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_stores_l1miss = {
. header = HEADER_SPAN_LOW ( " L1Miss " ) ,
. name = " stores_l1miss " ,
. cmp = st_l1miss_cmp ,
. entry = st_l1miss_entry ,
. width = 7 ,
} ;
2022-05-18 08:57:20 +03:00
static struct c2c_dimension dim_stores_na = {
. header = HEADER_SPAN_LOW ( " N/A " ) ,
. name = " stores_na " ,
. cmp = st_na_cmp ,
. entry = st_na_entry ,
. width = 7 ,
} ;
2016-05-04 11:10:11 +03:00
static struct c2c_dimension dim_cl_stores_l1hit = {
2022-05-18 08:57:20 +03:00
. header = HEADER_SPAN ( " ------- Store Refs ------ " , " L1 Hit " , 2 ) ,
2016-05-04 11:10:11 +03:00
. name = " cl_stores_l1hit " ,
. cmp = st_l1hit_cmp ,
. entry = st_l1hit_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_cl_stores_l1miss = {
. header = HEADER_SPAN_LOW ( " L1 Miss " ) ,
. name = " cl_stores_l1miss " ,
. cmp = st_l1miss_cmp ,
. entry = st_l1miss_entry ,
. width = 7 ,
} ;
2022-05-18 08:57:20 +03:00
static struct c2c_dimension dim_cl_stores_na = {
. header = HEADER_SPAN_LOW ( " N/A " ) ,
. name = " cl_stores_na " ,
. cmp = st_na_cmp ,
. entry = st_na_entry ,
. width = 7 ,
} ;
2016-05-04 11:18:24 +03:00
static struct c2c_dimension dim_ld_fbhit = {
. header = HEADER_SPAN ( " ----- Core Load Hit ----- " , " FB " , 2 ) ,
. name = " ld_fbhit " ,
. cmp = ld_fbhit_cmp ,
. entry = ld_fbhit_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_ld_l1hit = {
. header = HEADER_SPAN_LOW ( " L1 " ) ,
. name = " ld_l1hit " ,
. cmp = ld_l1hit_cmp ,
. entry = ld_l1hit_entry ,
. width = 7 ,
} ;
static struct c2c_dimension dim_ld_l2hit = {
. header = HEADER_SPAN_LOW ( " L2 " ) ,
. name = " ld_l2hit " ,
. cmp = ld_l2hit_cmp ,
. entry = ld_l2hit_entry ,
. width = 7 ,
} ;
2016-05-04 11:27:51 +03:00
static struct c2c_dimension dim_ld_llchit = {
2020-10-14 08:09:20 +03:00
. header = HEADER_SPAN ( " - LLC Load Hit -- " , " LclHit " , 1 ) ,
2016-05-04 11:27:51 +03:00
. name = " ld_lclhit " ,
. cmp = ld_llchit_cmp ,
. entry = ld_llchit_entry ,
. width = 8 ,
} ;
static struct c2c_dimension dim_ld_rmthit = {
2020-10-14 08:09:21 +03:00
. header = HEADER_SPAN ( " - RMT Load Hit -- " , " RmtHit " , 1 ) ,
2016-05-04 11:27:51 +03:00
. name = " ld_rmthit " ,
. cmp = rmt_hit_cmp ,
. entry = rmt_hit_entry ,
. width = 8 ,
} ;
2016-05-04 11:35:29 +03:00
static struct c2c_dimension dim_tot_recs = {
. header = HEADER_BOTH ( " Total " , " records " ) ,
. name = " tot_recs " ,
. cmp = tot_recs_cmp ,
. entry = tot_recs_entry ,
. width = 7 ,
} ;
2016-05-19 10:52:37 +03:00
static struct c2c_dimension dim_tot_loads = {
. header = HEADER_BOTH ( " Total " , " Loads " ) ,
. name = " tot_loads " ,
. cmp = tot_loads_cmp ,
. entry = tot_loads_entry ,
. width = 7 ,
} ;
2022-08-11 09:24:46 +03:00
static struct c2c_header percent_costly_snoop_header [ ] = {
2022-08-11 09:24:45 +03:00
[ DISPLAY_LCL_HITM ] = HEADER_BOTH ( " Lcl " , " Hitm " ) ,
[ DISPLAY_RMT_HITM ] = HEADER_BOTH ( " Rmt " , " Hitm " ) ,
[ DISPLAY_TOT_HITM ] = HEADER_BOTH ( " Tot " , " Hitm " ) ,
2022-08-11 09:24:49 +03:00
[ DISPLAY_SNP_PEER ] = HEADER_BOTH ( " Peer " , " Snoop " ) ,
2016-05-29 11:21:45 +03:00
} ;
2022-08-11 09:24:46 +03:00
static struct c2c_dimension dim_percent_costly_snoop = {
. name = " percent_costly_snoop " ,
. cmp = percent_costly_snoop_cmp ,
. entry = percent_costly_snoop_entry ,
. color = percent_costly_snoop_color ,
2016-05-04 11:50:09 +03:00
. width = 7 ,
} ;
2016-05-04 13:16:50 +03:00
static struct c2c_dimension dim_percent_rmt_hitm = {
2020-10-14 08:09:18 +03:00
. header = HEADER_SPAN ( " ----- HITM ----- " , " RmtHitm " , 1 ) ,
2016-05-04 13:16:50 +03:00
. name = " percent_rmt_hitm " ,
. cmp = percent_rmt_hitm_cmp ,
. entry = percent_rmt_hitm_entry ,
. color = percent_rmt_hitm_color ,
. width = 7 ,
} ;
static struct c2c_dimension dim_percent_lcl_hitm = {
2020-10-14 08:09:18 +03:00
. header = HEADER_SPAN_LOW ( " LclHitm " ) ,
2016-05-04 13:16:50 +03:00
. name = " percent_lcl_hitm " ,
. cmp = percent_lcl_hitm_cmp ,
. entry = percent_lcl_hitm_entry ,
. color = percent_lcl_hitm_color ,
. width = 7 ,
} ;
2022-08-11 09:24:43 +03:00
static struct c2c_dimension dim_percent_rmt_peer = {
. header = HEADER_SPAN ( " -- Peer Snoop -- " , " Rmt " , 1 ) ,
. name = " percent_rmt_peer " ,
. cmp = percent_rmt_peer_cmp ,
. entry = percent_rmt_peer_entry ,
. color = percent_rmt_peer_color ,
. width = 7 ,
} ;
static struct c2c_dimension dim_percent_lcl_peer = {
. header = HEADER_SPAN_LOW ( " Lcl " ) ,
. name = " percent_lcl_peer " ,
. cmp = percent_lcl_peer_cmp ,
. entry = percent_lcl_peer_entry ,
. color = percent_lcl_peer_color ,
. width = 7 ,
} ;
2016-05-04 13:16:50 +03:00
static struct c2c_dimension dim_percent_stores_l1hit = {
2022-05-18 08:57:20 +03:00
. header = HEADER_SPAN ( " ------- Store Refs ------ " , " L1 Hit " , 2 ) ,
2016-05-04 13:16:50 +03:00
. name = " percent_stores_l1hit " ,
. cmp = percent_stores_l1hit_cmp ,
. entry = percent_stores_l1hit_entry ,
. color = percent_stores_l1hit_color ,
. width = 7 ,
} ;
static struct c2c_dimension dim_percent_stores_l1miss = {
. header = HEADER_SPAN_LOW ( " L1 Miss " ) ,
. name = " percent_stores_l1miss " ,
. cmp = percent_stores_l1miss_cmp ,
. entry = percent_stores_l1miss_entry ,
. color = percent_stores_l1miss_color ,
. width = 7 ,
} ;
2022-05-18 08:57:20 +03:00
static struct c2c_dimension dim_percent_stores_na = {
. header = HEADER_SPAN_LOW ( " N/A " ) ,
. name = " percent_stores_na " ,
. cmp = percent_stores_na_cmp ,
. entry = percent_stores_na_entry ,
. color = percent_stores_na_color ,
. width = 7 ,
} ;
2016-05-28 13:30:13 +03:00
static struct c2c_dimension dim_dram_lcl = {
. header = HEADER_SPAN ( " --- Load Dram ---- " , " Lcl " , 1 ) ,
. name = " dram_lcl " ,
. cmp = lcl_dram_cmp ,
. entry = lcl_dram_entry ,
. width = 8 ,
} ;
static struct c2c_dimension dim_dram_rmt = {
. header = HEADER_SPAN_LOW ( " Rmt " ) ,
. name = " dram_rmt " ,
. cmp = rmt_dram_cmp ,
. entry = rmt_dram_entry ,
. width = 8 ,
} ;
2016-05-24 14:09:47 +03:00
static struct c2c_dimension dim_pid = {
. header = HEADER_LOW ( " Pid " ) ,
. name = " pid " ,
. cmp = pid_cmp ,
. entry = pid_entry ,
. width = 7 ,
} ;
2016-05-25 09:50:10 +03:00
static struct c2c_dimension dim_tid = {
. header = HEADER_LOW ( " Tid " ) ,
. name = " tid " ,
. se = & sort_thread ,
} ;
2016-05-25 00:41:52 +03:00
static struct c2c_dimension dim_symbol = {
. name = " symbol " ,
. se = & sort_sym ,
} ;
static struct c2c_dimension dim_dso = {
. header = HEADER_BOTH ( " Shared " , " Object " ) ,
. name = " dso " ,
. se = & sort_dso ,
} ;
2016-06-03 16:40:28 +03:00
static struct c2c_dimension dim_node = {
. name = " node " ,
. cmp = empty_cmp ,
. entry = node_entry ,
. width = 4 ,
} ;
2016-06-05 14:40:53 +03:00
static struct c2c_dimension dim_mean_rmt = {
. header = HEADER_SPAN ( " ---------- cycles ---------- " , " rmt hitm " , 2 ) ,
. name = " mean_rmt " ,
. cmp = empty_cmp ,
. entry = mean_rmt_entry ,
. width = 8 ,
} ;
static struct c2c_dimension dim_mean_lcl = {
. header = HEADER_SPAN_LOW ( " lcl hitm " ) ,
. name = " mean_lcl " ,
. cmp = empty_cmp ,
. entry = mean_lcl_entry ,
. width = 8 ,
} ;
static struct c2c_dimension dim_mean_load = {
. header = HEADER_SPAN_LOW ( " load " ) ,
. name = " mean_load " ,
. cmp = empty_cmp ,
. entry = mean_load_entry ,
. width = 8 ,
} ;
2022-08-11 09:24:44 +03:00
static struct c2c_dimension dim_mean_rmt_peer = {
. header = HEADER_SPAN ( " ---------- cycles ---------- " , " rmt peer " , 2 ) ,
. name = " mean_rmt_peer " ,
. cmp = empty_cmp ,
. entry = mean_rmt_peer_entry ,
. width = 8 ,
} ;
static struct c2c_dimension dim_mean_lcl_peer = {
. header = HEADER_SPAN_LOW ( " lcl peer " ) ,
. name = " mean_lcl_peer " ,
. cmp = empty_cmp ,
. entry = mean_lcl_peer_entry ,
. width = 8 ,
} ;
2016-06-24 00:05:52 +03:00
static struct c2c_dimension dim_cpucnt = {
. header = HEADER_BOTH ( " cpu " , " cnt " ) ,
. name = " cpucnt " ,
. cmp = empty_cmp ,
. entry = cpucnt_entry ,
. width = 8 ,
} ;
2016-07-10 16:47:40 +03:00
static struct c2c_dimension dim_srcline = {
. name = " cl_srcline " ,
. se = & sort_srcline ,
} ;
2016-07-06 16:40:09 +03:00
static struct c2c_dimension dim_dcacheline_idx = {
. header = HEADER_LOW ( " Index " ) ,
. name = " cl_idx " ,
. cmp = empty_cmp ,
. entry = cl_idx_entry ,
. width = 5 ,
} ;
static struct c2c_dimension dim_dcacheline_num = {
. header = HEADER_LOW ( " Num " ) ,
. name = " cl_num " ,
. cmp = empty_cmp ,
. entry = cl_idx_entry ,
. width = 5 ,
} ;
static struct c2c_dimension dim_dcacheline_num_empty = {
. header = HEADER_LOW ( " Num " ) ,
. name = " cl_num_empty " ,
. cmp = empty_cmp ,
. entry = cl_idx_empty_entry ,
. width = 5 ,
} ;
2016-09-22 18:36:41 +03:00
static struct c2c_dimension * dimensions [ ] = {
2016-09-22 18:36:48 +03:00
& dim_dcacheline ,
2018-03-09 13:14:40 +03:00
& dim_dcacheline_node ,
2018-03-09 13:14:42 +03:00
& dim_dcacheline_count ,
2016-04-29 15:37:06 +03:00
& dim_offset ,
2018-03-09 13:14:40 +03:00
& dim_offset_node ,
2016-05-03 22:48:56 +03:00
& dim_iaddr ,
2016-05-23 17:20:14 +03:00
& dim_tot_hitm ,
& dim_lcl_hitm ,
& dim_rmt_hitm ,
2022-08-11 09:24:42 +03:00
& dim_tot_peer ,
& dim_lcl_peer ,
& dim_rmt_peer ,
2016-05-23 17:20:14 +03:00
& dim_cl_lcl_hitm ,
& dim_cl_rmt_hitm ,
2022-08-11 09:24:43 +03:00
& dim_cl_lcl_peer ,
& dim_cl_rmt_peer ,
2020-10-14 08:09:15 +03:00
& dim_tot_stores ,
2016-05-04 11:10:11 +03:00
& dim_stores_l1hit ,
& dim_stores_l1miss ,
2022-05-18 08:57:20 +03:00
& dim_stores_na ,
2016-05-04 11:10:11 +03:00
& dim_cl_stores_l1hit ,
& dim_cl_stores_l1miss ,
2022-05-18 08:57:20 +03:00
& dim_cl_stores_na ,
2016-05-04 11:18:24 +03:00
& dim_ld_fbhit ,
& dim_ld_l1hit ,
& dim_ld_l2hit ,
2016-05-04 11:27:51 +03:00
& dim_ld_llchit ,
& dim_ld_rmthit ,
2016-05-04 11:35:29 +03:00
& dim_tot_recs ,
2016-05-19 10:52:37 +03:00
& dim_tot_loads ,
2022-08-11 09:24:46 +03:00
& dim_percent_costly_snoop ,
2016-05-04 13:16:50 +03:00
& dim_percent_rmt_hitm ,
& dim_percent_lcl_hitm ,
2022-08-11 09:24:43 +03:00
& dim_percent_rmt_peer ,
& dim_percent_lcl_peer ,
2016-05-04 13:16:50 +03:00
& dim_percent_stores_l1hit ,
& dim_percent_stores_l1miss ,
2022-05-18 08:57:20 +03:00
& dim_percent_stores_na ,
2016-05-28 13:30:13 +03:00
& dim_dram_lcl ,
& dim_dram_rmt ,
2016-05-24 14:09:47 +03:00
& dim_pid ,
2016-05-25 09:50:10 +03:00
& dim_tid ,
2016-05-25 00:41:52 +03:00
& dim_symbol ,
& dim_dso ,
2016-06-03 16:40:28 +03:00
& dim_node ,
2016-06-05 14:40:53 +03:00
& dim_mean_rmt ,
& dim_mean_lcl ,
2022-08-11 09:24:44 +03:00
& dim_mean_rmt_peer ,
& dim_mean_lcl_peer ,
2016-06-05 14:40:53 +03:00
& dim_mean_load ,
2016-06-24 00:05:52 +03:00
& dim_cpucnt ,
2016-07-10 16:47:40 +03:00
& dim_srcline ,
2016-07-06 16:40:09 +03:00
& dim_dcacheline_idx ,
& dim_dcacheline_num ,
& dim_dcacheline_num_empty ,
2016-09-22 18:36:41 +03:00
NULL ,
} ;
static void fmt_free ( struct perf_hpp_fmt * fmt )
{
struct c2c_fmt * c2c_fmt ;
c2c_fmt = container_of ( fmt , struct c2c_fmt , fmt ) ;
free ( c2c_fmt ) ;
}
static bool fmt_equal ( struct perf_hpp_fmt * a , struct perf_hpp_fmt * b )
{
struct c2c_fmt * c2c_a = container_of ( a , struct c2c_fmt , fmt ) ;
struct c2c_fmt * c2c_b = container_of ( b , struct c2c_fmt , fmt ) ;
return c2c_a - > dim = = c2c_b - > dim ;
}
static struct c2c_dimension * get_dimension ( const char * name )
{
unsigned int i ;
for ( i = 0 ; dimensions [ i ] ; i + + ) {
struct c2c_dimension * dim = dimensions [ i ] ;
if ( ! strcmp ( dim - > name , name ) )
return dim ;
2020-04-28 11:58:56 +03:00
}
2016-09-22 18:36:41 +03:00
return NULL ;
}
2016-09-22 18:36:42 +03:00
static int c2c_se_entry ( struct perf_hpp_fmt * fmt , struct perf_hpp * hpp ,
struct hist_entry * he )
{
struct c2c_fmt * c2c_fmt = container_of ( fmt , struct c2c_fmt , fmt ) ;
struct c2c_dimension * dim = c2c_fmt - > dim ;
size_t len = fmt - > user_len ;
2016-07-10 17:25:15 +03:00
if ( ! len ) {
2016-09-22 18:36:42 +03:00
len = hists__col_len ( he - > hists , dim - > se - > se_width_idx ) ;
2016-07-10 17:25:15 +03:00
if ( dim = = & dim_symbol | | dim = = & dim_srcline )
len = symbol_width ( he - > hists , dim - > se ) ;
}
2016-09-22 18:36:42 +03:00
return dim - > se - > se_snprintf ( he , hpp - > buf , hpp - > size , len ) ;
}
static int64_t c2c_se_cmp ( struct perf_hpp_fmt * fmt ,
struct hist_entry * a , struct hist_entry * b )
{
struct c2c_fmt * c2c_fmt = container_of ( fmt , struct c2c_fmt , fmt ) ;
struct c2c_dimension * dim = c2c_fmt - > dim ;
return dim - > se - > se_cmp ( a , b ) ;
}
static int64_t c2c_se_collapse ( struct perf_hpp_fmt * fmt ,
struct hist_entry * a , struct hist_entry * b )
{
struct c2c_fmt * c2c_fmt = container_of ( fmt , struct c2c_fmt , fmt ) ;
struct c2c_dimension * dim = c2c_fmt - > dim ;
int64_t ( * collapse_fn ) ( struct hist_entry * , struct hist_entry * ) ;
collapse_fn = dim - > se - > se_collapse ? : dim - > se - > se_cmp ;
return collapse_fn ( a , b ) ;
}
2016-09-22 18:36:41 +03:00
static struct c2c_fmt * get_format ( const char * name )
{
struct c2c_dimension * dim = get_dimension ( name ) ;
struct c2c_fmt * c2c_fmt ;
struct perf_hpp_fmt * fmt ;
if ( ! dim )
return NULL ;
c2c_fmt = zalloc ( sizeof ( * c2c_fmt ) ) ;
if ( ! c2c_fmt )
return NULL ;
c2c_fmt - > dim = dim ;
fmt = & c2c_fmt - > fmt ;
INIT_LIST_HEAD ( & fmt - > list ) ;
INIT_LIST_HEAD ( & fmt - > sort_list ) ;
2016-09-22 18:36:42 +03:00
fmt - > cmp = dim - > se ? c2c_se_cmp : dim - > cmp ;
fmt - > sort = dim - > se ? c2c_se_cmp : dim - > cmp ;
2016-05-04 13:16:50 +03:00
fmt - > color = dim - > se ? NULL : dim - > color ;
2016-09-22 18:36:42 +03:00
fmt - > entry = dim - > se ? c2c_se_entry : dim - > entry ;
2016-09-22 18:36:41 +03:00
fmt - > header = c2c_header ;
fmt - > width = c2c_width ;
2016-09-22 18:36:42 +03:00
fmt - > collapse = dim - > se ? c2c_se_collapse : dim - > cmp ;
2016-09-22 18:36:41 +03:00
fmt - > equal = fmt_equal ;
fmt - > free = fmt_free ;
return c2c_fmt ;
}
static int c2c_hists__init_output ( struct perf_hpp_list * hpp_list , char * name )
{
struct c2c_fmt * c2c_fmt = get_format ( name ) ;
2016-09-22 18:36:43 +03:00
if ( ! c2c_fmt ) {
reset_dimensions ( ) ;
return output_field_add ( hpp_list , name ) ;
}
2016-09-22 18:36:41 +03:00
perf_hpp_list__column_register ( hpp_list , & c2c_fmt - > fmt ) ;
return 0 ;
}
static int c2c_hists__init_sort ( struct perf_hpp_list * hpp_list , char * name )
{
struct c2c_fmt * c2c_fmt = get_format ( name ) ;
2016-05-25 00:41:52 +03:00
struct c2c_dimension * dim ;
2016-09-22 18:36:41 +03:00
2016-09-22 18:36:43 +03:00
if ( ! c2c_fmt ) {
reset_dimensions ( ) ;
return sort_dimension__add ( hpp_list , name , NULL , 0 ) ;
}
2016-09-22 18:36:41 +03:00
2016-05-25 00:41:52 +03:00
dim = c2c_fmt - > dim ;
if ( dim = = & dim_dso )
hpp_list - > dso = 1 ;
2016-09-22 18:36:41 +03:00
perf_hpp_list__register_sort_field ( hpp_list , & c2c_fmt - > fmt ) ;
return 0 ;
}
# define PARSE_LIST(_list, _fn) \
do { \
char * tmp , * tok ; \
ret = 0 ; \
\
if ( ! _list ) \
break ; \
\
for ( tok = strtok_r ( ( char * ) _list , " , " , & tmp ) ; \
tok ; tok = strtok_r ( NULL , " , " , & tmp ) ) { \
ret = _fn ( hpp_list , tok ) ; \
if ( ret = = - EINVAL ) { \
2017-06-27 17:22:31 +03:00
pr_err ( " Invalid --fields key: `%s' " , tok ) ; \
2016-09-22 18:36:41 +03:00
break ; \
} else if ( ret = = - ESRCH ) { \
2017-06-27 17:22:31 +03:00
pr_err ( " Unknown --fields key: `%s' " , tok ) ; \
2016-09-22 18:36:41 +03:00
break ; \
} \
} \
} while ( 0 )
static int hpp_list__parse ( struct perf_hpp_list * hpp_list ,
const char * output_ ,
const char * sort_ )
{
char * output = output_ ? strdup ( output_ ) : NULL ;
char * sort = sort_ ? strdup ( sort_ ) : NULL ;
int ret ;
PARSE_LIST ( output , c2c_hists__init_output ) ;
PARSE_LIST ( sort , c2c_hists__init_sort ) ;
/* copy sort keys to output fields */
perf_hpp__setup_output_field ( hpp_list ) ;
/*
* We dont need other sorting keys other than those
* we already specified . It also really slows down
* the processing a lot with big number of output
* fields , so switching this off for c2c .
*/
#if 0
/* and then copy output fields to sort keys */
perf_hpp__append_sort_keys ( & hists - > list ) ;
# endif
free ( output ) ;
free ( sort ) ;
return ret ;
}
static int c2c_hists__init ( struct c2c_hists * hists ,
2016-05-24 11:12:31 +03:00
const char * sort ,
int nr_header_lines )
2016-09-22 18:36:41 +03:00
{
__hists__init ( & hists - > hists , & hists - > list ) ;
/*
* Initialize only with sort fields , we need to resort
* later anyway , and that ' s where we add output fields
* as well .
*/
perf_hpp_list__init ( & hists - > list ) ;
2016-05-24 11:12:31 +03:00
/* Overload number of header lines.*/
hists - > list . nr_header_lines = nr_header_lines ;
2016-09-22 18:36:41 +03:00
return hpp_list__parse ( & hists - > list , NULL , sort ) ;
}
static int c2c_hists__reinit ( struct c2c_hists * c2c_hists ,
const char * output ,
const char * sort )
{
perf_hpp__reset_output_field ( & c2c_hists - > list ) ;
return hpp_list__parse ( & c2c_hists - > list , output , sort ) ;
}
2018-12-28 13:18:20 +03:00
# define DISPLAY_LINE_LIMIT 0.001
2016-08-17 15:55:23 +03:00
2021-01-14 18:46:43 +03:00
static u8 filter_display ( u32 val , u32 sum )
{
if ( sum = = 0 | | ( ( double ) val / sum ) < DISPLAY_LINE_LIMIT )
return HIST_FILTER__C2C ;
return 0 ;
}
2016-08-17 15:55:23 +03:00
static bool he__display ( struct hist_entry * he , struct c2c_stats * stats )
{
struct c2c_hist_entry * c2c_he ;
2016-10-11 14:52:05 +03:00
if ( c2c . show_all )
return true ;
2016-08-17 15:55:23 +03:00
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
2016-05-29 11:21:45 +03:00
switch ( c2c . display ) {
2022-08-11 09:24:45 +03:00
case DISPLAY_LCL_HITM :
2021-01-14 18:46:43 +03:00
he - > filtered = filter_display ( c2c_he - > stats . lcl_hitm ,
stats - > lcl_hitm ) ;
2016-05-29 11:21:45 +03:00
break ;
2022-08-11 09:24:45 +03:00
case DISPLAY_RMT_HITM :
2021-01-14 18:46:43 +03:00
he - > filtered = filter_display ( c2c_he - > stats . rmt_hitm ,
stats - > rmt_hitm ) ;
2016-11-22 00:33:30 +03:00
break ;
2022-08-11 09:24:45 +03:00
case DISPLAY_TOT_HITM :
2021-01-14 18:46:43 +03:00
he - > filtered = filter_display ( c2c_he - > stats . tot_hitm ,
stats - > tot_hitm ) ;
break ;
2022-08-11 09:24:49 +03:00
case DISPLAY_SNP_PEER :
he - > filtered = filter_display ( c2c_he - > stats . tot_peer ,
stats - > tot_peer ) ;
break ;
2016-05-29 11:21:45 +03:00
default :
break ;
2020-04-28 11:58:56 +03:00
}
2016-05-29 11:21:45 +03:00
2016-08-17 15:55:23 +03:00
return he - > filtered = = 0 ;
}
2021-01-14 18:46:42 +03:00
static inline bool is_valid_hist_entry ( struct hist_entry * he )
2016-08-17 15:55:23 +03:00
{
struct c2c_hist_entry * c2c_he ;
2021-01-14 18:46:42 +03:00
bool has_record = false ;
2016-08-17 15:55:23 +03:00
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
2021-01-14 18:46:42 +03:00
/* It's a valid entry if contains stores */
if ( c2c_he - > stats . store )
return true ;
switch ( c2c . display ) {
2022-08-11 09:24:45 +03:00
case DISPLAY_LCL_HITM :
2021-01-14 18:46:42 +03:00
has_record = ! ! c2c_he - > stats . lcl_hitm ;
break ;
2022-08-11 09:24:45 +03:00
case DISPLAY_RMT_HITM :
2021-01-14 18:46:42 +03:00
has_record = ! ! c2c_he - > stats . rmt_hitm ;
break ;
2022-08-11 09:24:45 +03:00
case DISPLAY_TOT_HITM :
2021-01-14 18:46:42 +03:00
has_record = ! ! c2c_he - > stats . tot_hitm ;
break ;
2022-08-11 09:24:49 +03:00
case DISPLAY_SNP_PEER :
has_record = ! ! c2c_he - > stats . tot_peer ;
2021-01-14 18:46:42 +03:00
default :
break ;
}
return has_record ;
2016-08-17 15:55:23 +03:00
}
2018-03-09 13:14:40 +03:00
static void set_node_width ( struct c2c_hist_entry * c2c_he , int len )
{
struct c2c_dimension * dim ;
dim = & c2c . hists = = c2c_he - > hists ?
& dim_dcacheline_node : & dim_offset_node ;
if ( len > dim - > width )
dim - > width = len ;
}
static int set_nodestr ( struct c2c_hist_entry * c2c_he )
{
char buf [ 30 ] ;
int len ;
if ( c2c_he - > nodestr )
return 0 ;
2022-01-23 21:38:43 +03:00
if ( ! bitmap_empty ( c2c_he - > nodeset , c2c . nodes_cnt ) ) {
2018-03-09 13:14:40 +03:00
len = bitmap_scnprintf ( c2c_he - > nodeset , c2c . nodes_cnt ,
buf , sizeof ( buf ) ) ;
} else {
len = scnprintf ( buf , sizeof ( buf ) , " N/A " ) ;
}
set_node_width ( c2c_he , len ) ;
c2c_he - > nodestr = strdup ( buf ) ;
return c2c_he - > nodestr ? 0 : - ENOMEM ;
}
2018-03-09 13:14:38 +03:00
static void calc_width ( struct c2c_hist_entry * c2c_he )
2016-06-07 20:02:43 +03:00
{
struct c2c_hists * c2c_hists ;
2018-03-09 13:14:38 +03:00
c2c_hists = container_of ( c2c_he - > he . hists , struct c2c_hists , hists ) ;
hists__calc_col_len ( & c2c_hists - > hists , & c2c_he - > he ) ;
2018-03-09 13:14:40 +03:00
set_nodestr ( c2c_he ) ;
2016-06-07 20:02:43 +03:00
}
2019-02-04 17:18:06 +03:00
static int filter_cb ( struct hist_entry * he , void * arg __maybe_unused )
2016-09-22 18:36:45 +03:00
{
2018-03-09 13:14:38 +03:00
struct c2c_hist_entry * c2c_he ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
2016-07-10 16:47:40 +03:00
if ( c2c . show_src & & ! he - > srcline )
2018-05-28 17:06:58 +03:00
he - > srcline = hist_entry__srcline ( he ) ;
2016-07-10 16:47:40 +03:00
2018-03-09 13:14:38 +03:00
calc_width ( c2c_he ) ;
2016-06-07 20:02:43 +03:00
2021-01-14 18:46:42 +03:00
if ( ! is_valid_hist_entry ( he ) )
2016-08-17 15:55:23 +03:00
he - > filtered = HIST_FILTER__C2C ;
2016-09-22 18:36:45 +03:00
return 0 ;
}
2019-02-04 17:18:06 +03:00
static int resort_cl_cb ( struct hist_entry * he , void * arg __maybe_unused )
2016-09-22 18:36:45 +03:00
{
struct c2c_hist_entry * c2c_he ;
struct c2c_hists * c2c_hists ;
2021-01-14 18:46:41 +03:00
bool display = he__display ( he , & c2c . shared_clines_stats ) ;
2016-09-22 18:36:45 +03:00
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
c2c_hists = c2c_he - > hists ;
2016-08-17 15:55:23 +03:00
if ( display & & c2c_hists ) {
2016-07-06 16:40:09 +03:00
static unsigned int idx ;
c2c_he - > cacheline_idx = idx + + ;
2018-03-09 13:14:39 +03:00
calc_width ( c2c_he ) ;
2016-07-06 16:40:09 +03:00
2016-05-24 15:14:38 +03:00
c2c_hists__reinit ( c2c_hists , c2c . cl_output , c2c . cl_resort ) ;
perf c2c report: Set final resort fields
Set resort/display fields for both cachelines and single cacheline
displays.
Cachelines are sorted on:
rmt_hitm
will be made configurable in following patches.
Following fields are display for cachelines:
dcacheline
tot_recs
percent_hitm
tot_hitm,lcl_hitm,rmt_hitm
stores,stores_l1hit,stores_l1miss
dram_lcl,dram_rmt
ld_llcmiss
tot_loads
ld_fbhit,ld_l1hit,ld_l2hit
ld_lclhit,ld_rmthit
The single cacheline is sort by:
offset,rmt_hitm,lcl_hitm
will be made configurable in following patches.
Following fields are display for each cacheline:
percent_rmt_hitm
percent_lcl_hitm
percent_stores_l1hit
percent_stores_l1miss
offset
pid
tid
mean_rmt
mean_lcl
mean_load
cpucnt
symbol
dso
node
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-0rclftliywdq9qr2sjbugb6b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10 15:08:29 +03:00
2016-09-22 18:36:45 +03:00
hists__collapse_resort ( & c2c_hists - > hists , NULL ) ;
hists__output_resort_cb ( & c2c_hists - > hists , NULL , filter_cb ) ;
}
return 0 ;
}
2022-08-11 09:24:47 +03:00
static struct c2c_header header_node_0 = HEADER_LOW ( " Node " ) ;
2022-08-11 09:24:49 +03:00
static struct c2c_header header_node_1_hitms_stores =
HEADER_LOW ( " Node{cpus %hitms %stores} " ) ;
static struct c2c_header header_node_1_peers_stores =
HEADER_LOW ( " Node{cpus %peers %stores} " ) ;
2022-08-11 09:24:47 +03:00
static struct c2c_header header_node_2 = HEADER_LOW ( " Node{cpu list} " ) ;
2016-06-03 16:40:28 +03:00
static void setup_nodes_header ( void )
{
2022-08-11 09:24:47 +03:00
switch ( c2c . node_info ) {
case 0 :
dim_node . header = header_node_0 ;
break ;
case 1 :
2022-08-11 09:24:49 +03:00
if ( c2c . display = = DISPLAY_SNP_PEER )
dim_node . header = header_node_1_peers_stores ;
else
dim_node . header = header_node_1_hitms_stores ;
2022-08-11 09:24:47 +03:00
break ;
case 2 :
dim_node . header = header_node_2 ;
break ;
default :
break ;
}
return ;
2016-06-03 16:40:28 +03:00
}
static int setup_nodes ( struct perf_session * session )
{
struct numa_node * n ;
unsigned long * * nodes ;
2022-01-05 09:13:51 +03:00
int node , idx ;
struct perf_cpu cpu ;
2016-06-03 16:40:28 +03:00
int * cpu2node ;
if ( c2c . node_info > 2 )
c2c . node_info = 2 ;
c2c . nodes_cnt = session - > header . env . nr_numa_nodes ;
2019-08-22 11:50:45 +03:00
c2c . cpus_cnt = session - > header . env . nr_cpus_avail ;
2016-06-03 16:40:28 +03:00
n = session - > header . env . numa_nodes ;
if ( ! n )
return - EINVAL ;
nodes = zalloc ( sizeof ( unsigned long * ) * c2c . nodes_cnt ) ;
if ( ! nodes )
return - ENOMEM ;
c2c . nodes = nodes ;
cpu2node = zalloc ( sizeof ( int ) * c2c . cpus_cnt ) ;
if ( ! cpu2node )
return - ENOMEM ;
2022-01-05 09:13:51 +03:00
for ( idx = 0 ; idx < c2c . cpus_cnt ; idx + + )
cpu2node [ idx ] = - 1 ;
2016-06-03 16:40:28 +03:00
c2c . cpu2node = cpu2node ;
for ( node = 0 ; node < c2c . nodes_cnt ; node + + ) {
2019-07-21 14:23:49 +03:00
struct perf_cpu_map * map = n [ node ] . map ;
2016-06-03 16:40:28 +03:00
unsigned long * set ;
2021-09-08 05:59:35 +03:00
set = bitmap_zalloc ( c2c . cpus_cnt ) ;
2016-06-03 16:40:28 +03:00
if ( ! set )
return - ENOMEM ;
2019-03-05 18:25:29 +03:00
nodes [ node ] = set ;
/* empty node, skip */
2019-08-22 14:11:39 +03:00
if ( perf_cpu_map__empty ( map ) )
2019-03-05 18:25:29 +03:00
continue ;
2022-01-05 09:13:48 +03:00
perf_cpu_map__for_each_cpu ( cpu , idx , map ) {
2022-11-19 04:34:46 +03:00
__set_bit ( cpu . cpu , set ) ;
2016-06-03 16:40:28 +03:00
2022-01-05 09:13:51 +03:00
if ( WARN_ONCE ( cpu2node [ cpu . cpu ] ! = - 1 , " node/cpu topology bug " ) )
2016-06-03 16:40:28 +03:00
return - EINVAL ;
2022-01-05 09:13:51 +03:00
cpu2node [ cpu . cpu ] = node ;
2016-06-03 16:40:28 +03:00
}
}
setup_nodes_header ( ) ;
return 0 ;
}
2016-07-01 12:12:11 +03:00
# define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
2022-08-11 09:24:49 +03:00
# define HAS_PEER(__h) ((__h)->stats.lcl_peer || (__h)->stats.rmt_peer)
2016-07-01 12:12:11 +03:00
2021-01-14 18:46:41 +03:00
static int resort_shared_cl_cb ( struct hist_entry * he , void * arg __maybe_unused )
2016-07-01 12:12:11 +03:00
{
struct c2c_hist_entry * c2c_he ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
2022-08-11 09:24:49 +03:00
if ( HAS_HITMS ( c2c_he ) | | HAS_PEER ( c2c_he ) ) {
2016-07-01 12:12:11 +03:00
c2c . shared_clines + + ;
2021-01-14 18:46:41 +03:00
c2c_add_stats ( & c2c . shared_clines_stats , & c2c_he - > stats ) ;
2016-07-01 12:12:11 +03:00
}
return 0 ;
}
static int hists__iterate_cb ( struct hists * hists , hists__resort_cb_t cb )
{
2018-12-06 22:18:18 +03:00
struct rb_node * next = rb_first_cached ( & hists - > entries ) ;
2016-07-01 12:12:11 +03:00
int ret = 0 ;
while ( next ) {
struct hist_entry * he ;
he = rb_entry ( next , struct hist_entry , rb_node ) ;
2019-02-04 17:18:06 +03:00
ret = cb ( he , NULL ) ;
2016-07-01 12:12:11 +03:00
if ( ret )
break ;
next = rb_next ( & he - > rb_node ) ;
}
return ret ;
}
2016-05-02 21:01:59 +03:00
static void print_c2c__display_stats ( FILE * out )
{
int llc_misses ;
struct c2c_stats * stats = & c2c . hists . stats ;
2022-09-06 06:29:05 +03:00
llc_misses = get_load_llc_misses ( stats ) ;
2016-05-02 21:01:59 +03:00
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " Trace Event Information \n " ) ;
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " Total records : %10d \n " , stats - > nr_entries ) ;
fprintf ( out , " Locked Load/Store Operations : %10d \n " , stats - > locks ) ;
fprintf ( out , " Load Operations : %10d \n " , stats - > load ) ;
fprintf ( out , " Loads - uncacheable : %10d \n " , stats - > ld_uncache ) ;
fprintf ( out , " Loads - IO : %10d \n " , stats - > ld_io ) ;
fprintf ( out , " Loads - Miss : %10d \n " , stats - > ld_miss ) ;
fprintf ( out , " Loads - no mapping : %10d \n " , stats - > ld_noadrs ) ;
fprintf ( out , " Load Fill Buffer Hit : %10d \n " , stats - > ld_fbhit ) ;
fprintf ( out , " Load L1D hit : %10d \n " , stats - > ld_l1hit ) ;
fprintf ( out , " Load L2D hit : %10d \n " , stats - > ld_l2hit ) ;
fprintf ( out , " Load LLC hit : %10d \n " , stats - > ld_llchit + stats - > lcl_hitm ) ;
fprintf ( out , " Load Local HITM : %10d \n " , stats - > lcl_hitm ) ;
fprintf ( out , " Load Remote HITM : %10d \n " , stats - > rmt_hitm ) ;
fprintf ( out , " Load Remote HIT : %10d \n " , stats - > rmt_hit ) ;
fprintf ( out , " Load Local DRAM : %10d \n " , stats - > lcl_dram ) ;
fprintf ( out , " Load Remote DRAM : %10d \n " , stats - > rmt_dram ) ;
fprintf ( out , " Load MESI State Exclusive : %10d \n " , stats - > ld_excl ) ;
fprintf ( out , " Load MESI State Shared : %10d \n " , stats - > ld_shared ) ;
fprintf ( out , " Load LLC Misses : %10d \n " , llc_misses ) ;
2021-02-02 23:09:08 +03:00
fprintf ( out , " Load access blocked by data : %10d \n " , stats - > blk_data ) ;
fprintf ( out , " Load access blocked by address : %10d \n " , stats - > blk_addr ) ;
2022-08-11 09:24:41 +03:00
fprintf ( out , " Load HIT Local Peer : %10d \n " , stats - > lcl_peer ) ;
fprintf ( out , " Load HIT Remote Peer : %10d \n " , stats - > rmt_peer ) ;
2016-05-02 21:01:59 +03:00
fprintf ( out , " LLC Misses to Local DRAM : %10.1f%% \n " , ( ( double ) stats - > lcl_dram / ( double ) llc_misses ) * 100. ) ;
fprintf ( out , " LLC Misses to Remote DRAM : %10.1f%% \n " , ( ( double ) stats - > rmt_dram / ( double ) llc_misses ) * 100. ) ;
fprintf ( out , " LLC Misses to Remote cache (HIT) : %10.1f%% \n " , ( ( double ) stats - > rmt_hit / ( double ) llc_misses ) * 100. ) ;
fprintf ( out , " LLC Misses to Remote cache (HITM) : %10.1f%% \n " , ( ( double ) stats - > rmt_hitm / ( double ) llc_misses ) * 100. ) ;
fprintf ( out , " Store Operations : %10d \n " , stats - > store ) ;
fprintf ( out , " Store - uncacheable : %10d \n " , stats - > st_uncache ) ;
fprintf ( out , " Store - no mapping : %10d \n " , stats - > st_noadrs ) ;
fprintf ( out , " Store L1D Hit : %10d \n " , stats - > st_l1hit ) ;
fprintf ( out , " Store L1D Miss : %10d \n " , stats - > st_l1miss ) ;
2022-05-18 08:57:20 +03:00
fprintf ( out , " Store No available memory level : %10d \n " , stats - > st_na ) ;
2016-05-02 21:01:59 +03:00
fprintf ( out , " No Page Map Rejects : %10d \n " , stats - > nomap ) ;
fprintf ( out , " Unable to parse data source : %10d \n " , stats - > noparse ) ;
}
2016-07-01 12:12:11 +03:00
static void print_shared_cacheline_info ( FILE * out )
{
2021-01-14 18:46:41 +03:00
struct c2c_stats * stats = & c2c . shared_clines_stats ;
2016-07-01 12:12:11 +03:00
int hitm_cnt = stats - > lcl_hitm + stats - > rmt_hitm ;
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " Global Shared Cache Line Event Information \n " ) ;
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " Total Shared Cache Lines : %10d \n " , c2c . shared_clines ) ;
fprintf ( out , " Load HITs on shared lines : %10d \n " , stats - > load ) ;
fprintf ( out , " Fill Buffer Hits on shared lines : %10d \n " , stats - > ld_fbhit ) ;
fprintf ( out , " L1D hits on shared lines : %10d \n " , stats - > ld_l1hit ) ;
fprintf ( out , " L2D hits on shared lines : %10d \n " , stats - > ld_l2hit ) ;
fprintf ( out , " LLC hits on shared lines : %10d \n " , stats - > ld_llchit + stats - > lcl_hitm ) ;
2022-08-11 09:24:41 +03:00
fprintf ( out , " Load hits on peer cache or nodes : %10d \n " , stats - > lcl_peer + stats - > rmt_peer ) ;
2016-07-01 12:12:11 +03:00
fprintf ( out , " Locked Access on shared lines : %10d \n " , stats - > locks ) ;
2021-02-02 23:09:08 +03:00
fprintf ( out , " Blocked Access on shared lines : %10d \n " , stats - > blk_data + stats - > blk_addr ) ;
2016-07-01 12:12:11 +03:00
fprintf ( out , " Store HITs on shared lines : %10d \n " , stats - > store ) ;
fprintf ( out , " Store L1D hits on shared lines : %10d \n " , stats - > st_l1hit ) ;
2022-05-18 08:57:20 +03:00
fprintf ( out , " Store No available memory level : %10d \n " , stats - > st_na ) ;
2016-07-01 12:12:11 +03:00
fprintf ( out , " Total Merged records : %10d \n " , hitm_cnt + stats - > store ) ;
}
2016-05-03 15:32:56 +03:00
static void print_cacheline ( struct c2c_hists * c2c_hists ,
struct hist_entry * he_cl ,
struct perf_hpp_list * hpp_list ,
FILE * out )
{
char bf [ 1000 ] ;
struct perf_hpp hpp = {
. buf = bf ,
. size = 1000 ,
} ;
static bool once ;
if ( ! once ) {
hists__fprintf_headers ( & c2c_hists - > hists , out ) ;
once = true ;
} else {
fprintf ( out , " \n " ) ;
}
2022-05-18 08:57:20 +03:00
fprintf ( out , " ---------------------------------------------------------------------- \n " ) ;
2016-05-03 15:32:56 +03:00
__hist_entry__snprintf ( he_cl , & hpp , hpp_list ) ;
fprintf ( out , " %s \n " , bf ) ;
2022-05-18 08:57:20 +03:00
fprintf ( out , " ---------------------------------------------------------------------- \n " ) ;
2016-05-03 15:32:56 +03:00
2018-06-20 21:58:20 +03:00
hists__fprintf ( & c2c_hists - > hists , false , 0 , 0 , 0 , out , false ) ;
2016-05-03 15:32:56 +03:00
}
static void print_pareto ( FILE * out )
{
struct perf_hpp_list hpp_list ;
struct rb_node * nd ;
int ret ;
2021-01-14 18:46:46 +03:00
const char * cl_output ;
2022-08-11 09:24:49 +03:00
if ( c2c . display ! = DISPLAY_SNP_PEER )
cl_output = " cl_num, "
" cl_rmt_hitm, "
" cl_lcl_hitm, "
" cl_stores_l1hit, "
" cl_stores_l1miss, "
" cl_stores_na, "
" dcacheline " ;
else
cl_output = " cl_num, "
" cl_rmt_peer, "
" cl_lcl_peer, "
" cl_stores_l1hit, "
" cl_stores_l1miss, "
" cl_stores_na, "
" dcacheline " ;
2016-05-03 15:32:56 +03:00
perf_hpp_list__init ( & hpp_list ) ;
2021-01-14 18:46:46 +03:00
ret = hpp_list__parse ( & hpp_list , cl_output , NULL ) ;
2016-05-03 15:32:56 +03:00
if ( WARN_ONCE ( ret , " failed to setup sort entries \n " ) )
return ;
2018-12-06 22:18:18 +03:00
nd = rb_first_cached ( & c2c . hists . hists . entries ) ;
2016-05-03 15:32:56 +03:00
for ( ; nd ; nd = rb_next ( nd ) ) {
struct hist_entry * he = rb_entry ( nd , struct hist_entry , rb_node ) ;
struct c2c_hist_entry * c2c_he ;
if ( he - > filtered )
continue ;
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
print_cacheline ( c2c_he - > hists , he , & hpp_list , out ) ;
}
}
2016-08-27 12:40:23 +03:00
static void print_c2c_info ( FILE * out , struct perf_session * session )
{
2019-07-21 14:23:52 +03:00
struct evlist * evlist = session - > evlist ;
2019-07-21 14:23:51 +03:00
struct evsel * evsel ;
2016-08-27 12:40:23 +03:00
bool first = true ;
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " c2c details \n " ) ;
fprintf ( out , " ================================================= \n " ) ;
evlist__for_each_entry ( evlist , evsel ) {
2020-04-29 22:07:09 +03:00
fprintf ( out , " %-36s: %s \n " , first ? " Events " : " " , evsel__name ( evsel ) ) ;
2016-08-27 12:40:23 +03:00
first = false ;
}
2022-08-11 09:24:48 +03:00
fprintf ( out , " Cachelines sort on : %s \n " ,
2016-11-22 00:33:30 +03:00
display_str [ c2c . display ] ) ;
2016-05-24 15:14:38 +03:00
fprintf ( out , " Cacheline data grouping : %s \n " , c2c . cl_sort ) ;
2016-08-27 12:40:23 +03:00
}
static void perf_c2c__hists_fprintf ( FILE * out , struct perf_session * session )
2016-05-03 15:32:56 +03:00
{
setup_pager ( ) ;
2016-05-02 21:01:59 +03:00
print_c2c__display_stats ( out ) ;
2016-07-01 12:12:11 +03:00
fprintf ( out , " \n " ) ;
print_shared_cacheline_info ( out ) ;
2016-08-27 12:40:23 +03:00
fprintf ( out , " \n " ) ;
print_c2c_info ( out , session ) ;
2016-05-02 21:01:59 +03:00
if ( c2c . stats_only )
return ;
2016-05-03 15:32:56 +03:00
fprintf ( out , " \n " ) ;
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " Shared Data Cache Line Table \n " ) ;
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " # \n " ) ;
2018-06-20 21:58:20 +03:00
hists__fprintf ( & c2c . hists . hists , true , 0 , 0 , 0 , stdout , true ) ;
2016-05-03 15:32:56 +03:00
fprintf ( out , " \n " ) ;
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " Shared Cache Line Distribution Pareto \n " ) ;
fprintf ( out , " ================================================= \n " ) ;
fprintf ( out , " # \n " ) ;
print_pareto ( out ) ;
}
2016-06-03 16:40:28 +03:00
2016-01-06 18:59:02 +03:00
# ifdef HAVE_SLANG_SUPPORT
static void c2c_browser__update_nr_entries ( struct hist_browser * hb )
{
u64 nr_entries = 0 ;
2018-12-06 22:18:18 +03:00
struct rb_node * nd = rb_first_cached ( & hb - > hists - > entries ) ;
2016-01-06 18:59:02 +03:00
while ( nd ) {
struct hist_entry * he = rb_entry ( nd , struct hist_entry , rb_node ) ;
if ( ! he - > filtered )
nr_entries + + ;
nd = rb_next ( nd ) ;
}
hb - > nr_non_filtered_entries = nr_entries ;
}
2016-05-02 19:30:44 +03:00
struct c2c_cacheline_browser {
struct hist_browser hb ;
struct hist_entry * he ;
} ;
static int
perf_c2c_cacheline_browser__title ( struct hist_browser * browser ,
char * bf , size_t size )
{
struct c2c_cacheline_browser * cl_browser ;
struct hist_entry * he ;
uint64_t addr = 0 ;
cl_browser = container_of ( browser , struct c2c_cacheline_browser , hb ) ;
he = cl_browser - > he ;
if ( he - > mem_info )
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 10:58:23 +03:00
addr = cl_address ( he - > mem_info - > daddr . addr , chk_double_cl ) ;
2016-05-02 19:30:44 +03:00
scnprintf ( bf , size , " Cacheline 0x%lx " , addr ) ;
return 0 ;
}
static struct c2c_cacheline_browser *
c2c_cacheline_browser__new ( struct hists * hists , struct hist_entry * he )
{
struct c2c_cacheline_browser * browser ;
browser = zalloc ( sizeof ( * browser ) ) ;
if ( browser ) {
hist_browser__init ( & browser - > hb , hists ) ;
browser - > hb . c2c_filter = true ;
browser - > hb . title = perf_c2c_cacheline_browser__title ;
browser - > he = he ;
}
return browser ;
}
static int perf_c2c__browse_cacheline ( struct hist_entry * he )
{
struct c2c_hist_entry * c2c_he ;
struct c2c_hists * c2c_hists ;
struct c2c_cacheline_browser * cl_browser ;
struct hist_browser * browser ;
int key = - 1 ;
perf tools: Replace automatic const char[] variables by statics
An automatic const char[] variable gets initialized at runtime, just
like any other automatic variable. For long strings, that uses a lot of
stack and wastes time building the string; e.g. for the "No %s
allocation events..." case one has:
444516: 48 b8 4e 6f 20 25 73 20 61 6c movabs $0x6c61207325206f4e,%rax # "No %s al"
...
444674: 48 89 45 80 mov %rax,-0x80(%rbp)
444678: 48 b8 6c 6f 63 61 74 69 6f 6e movabs $0x6e6f697461636f6c,%rax # "location"
444682: 48 89 45 88 mov %rax,-0x78(%rbp)
444686: 48 b8 20 65 76 65 6e 74 73 20 movabs $0x2073746e65766520,%rax # " events "
444690: 66 44 89 55 c4 mov %r10w,-0x3c(%rbp)
444695: 48 89 45 90 mov %rax,-0x70(%rbp)
444699: 48 b8 66 6f 75 6e 64 2e 20 20 movabs $0x20202e646e756f66,%rax
Make them all static so that the compiler just references objects in .rodata.
Committer testing:
Ok, using dwarves's codiff tool:
$ codiff --functions /tmp/perf.before ~/bin/perf
builtin-sched.c:
cmd_sched | -48
1 function changed, 48 bytes removed, diff: -48
builtin-report.c:
cmd_report | -32
1 function changed, 32 bytes removed, diff: -32
builtin-kmem.c:
cmd_kmem | -64
build_alloc_func_list | -50
2 functions changed, 114 bytes removed, diff: -114
builtin-c2c.c:
perf_c2c__report | -390
1 function changed, 390 bytes removed, diff: -390
ui/browsers/header.c:
tui__header_window | -104
1 function changed, 104 bytes removed, diff: -104
/home/acme/bin/perf:
9 functions changed, 688 bytes removed, diff: -688
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181102230624.20064-1-linux@rasmusvillemoes.dk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-03 02:06:23 +03:00
static const char help [ ] =
2017-11-15 00:04:47 +03:00
" ENTER Toggle callchains (if present) \n "
" n Toggle Node details info \n "
" s Toggle full length of symbol and source line columns \n "
2016-08-17 16:54:58 +03:00
" q Return back to cacheline list \n " ;
2018-07-24 09:20:08 +03:00
if ( ! he )
return 0 ;
2016-05-02 19:30:44 +03:00
2016-07-10 17:25:15 +03:00
/* Display compact version first. */
c2c . symbol_full = false ;
2016-05-02 19:30:44 +03:00
c2c_he = container_of ( he , struct c2c_hist_entry , he ) ;
c2c_hists = c2c_he - > hists ;
cl_browser = c2c_cacheline_browser__new ( & c2c_hists - > hists , he ) ;
if ( cl_browser = = NULL )
return - 1 ;
browser = & cl_browser - > hb ;
/* reset abort key so that it can get Ctrl-C as a key */
SLang_reset_tty ( ) ;
SLang_init_tty ( 0 , 0 , 0 ) ;
c2c_browser__update_nr_entries ( browser ) ;
while ( 1 ) {
2019-12-12 21:31:40 +03:00
key = hist_browser__run ( browser , " ? - help " , true , 0 ) ;
2016-05-02 19:30:44 +03:00
switch ( key ) {
2016-07-10 17:25:15 +03:00
case ' s ' :
c2c . symbol_full = ! c2c . symbol_full ;
break ;
2016-07-10 17:30:27 +03:00
case ' n ' :
c2c . node_info = ( c2c . node_info + 1 ) % 3 ;
setup_nodes_header ( ) ;
break ;
2016-05-02 19:30:44 +03:00
case ' q ' :
goto out ;
2016-08-17 16:54:58 +03:00
case ' ? ' :
ui_browser__help_window ( & browser - > b , help ) ;
break ;
2016-05-02 19:30:44 +03:00
default :
break ;
}
}
out :
free ( cl_browser ) ;
return 0 ;
}
2016-01-06 18:59:02 +03:00
static int perf_c2c_browser__title ( struct hist_browser * browser ,
char * bf , size_t size )
{
scnprintf ( bf , size ,
2016-05-29 11:21:45 +03:00
" Shared Data Cache Line Table "
2022-08-11 09:24:48 +03:00
" (%lu entries, sorted on %s) " ,
2016-05-29 11:21:45 +03:00
browser - > nr_non_filtered_entries ,
2016-11-22 00:33:30 +03:00
display_str [ c2c . display ] ) ;
2016-01-06 18:59:02 +03:00
return 0 ;
}
static struct hist_browser *
perf_c2c_browser__new ( struct hists * hists )
{
struct hist_browser * browser = hist_browser__new ( hists ) ;
if ( browser ) {
browser - > title = perf_c2c_browser__title ;
browser - > c2c_filter = true ;
}
return browser ;
}
static int perf_c2c__hists_browse ( struct hists * hists )
{
struct hist_browser * browser ;
int key = - 1 ;
perf tools: Replace automatic const char[] variables by statics
An automatic const char[] variable gets initialized at runtime, just
like any other automatic variable. For long strings, that uses a lot of
stack and wastes time building the string; e.g. for the "No %s
allocation events..." case one has:
444516: 48 b8 4e 6f 20 25 73 20 61 6c movabs $0x6c61207325206f4e,%rax # "No %s al"
...
444674: 48 89 45 80 mov %rax,-0x80(%rbp)
444678: 48 b8 6c 6f 63 61 74 69 6f 6e movabs $0x6e6f697461636f6c,%rax # "location"
444682: 48 89 45 88 mov %rax,-0x78(%rbp)
444686: 48 b8 20 65 76 65 6e 74 73 20 movabs $0x2073746e65766520,%rax # " events "
444690: 66 44 89 55 c4 mov %r10w,-0x3c(%rbp)
444695: 48 89 45 90 mov %rax,-0x70(%rbp)
444699: 48 b8 66 6f 75 6e 64 2e 20 20 movabs $0x20202e646e756f66,%rax
Make them all static so that the compiler just references objects in .rodata.
Committer testing:
Ok, using dwarves's codiff tool:
$ codiff --functions /tmp/perf.before ~/bin/perf
builtin-sched.c:
cmd_sched | -48
1 function changed, 48 bytes removed, diff: -48
builtin-report.c:
cmd_report | -32
1 function changed, 32 bytes removed, diff: -32
builtin-kmem.c:
cmd_kmem | -64
build_alloc_func_list | -50
2 functions changed, 114 bytes removed, diff: -114
builtin-c2c.c:
perf_c2c__report | -390
1 function changed, 390 bytes removed, diff: -390
ui/browsers/header.c:
tui__header_window | -104
1 function changed, 104 bytes removed, diff: -104
/home/acme/bin/perf:
9 functions changed, 688 bytes removed, diff: -688
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181102230624.20064-1-linux@rasmusvillemoes.dk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-03 02:06:23 +03:00
static const char help [ ] =
2016-08-17 16:54:58 +03:00
" d Display cacheline details \n "
2017-11-15 00:04:47 +03:00
" ENTER Toggle callchains (if present) \n "
2016-08-17 16:54:58 +03:00
" q Quit \n " ;
2016-01-06 18:59:02 +03:00
browser = perf_c2c_browser__new ( hists ) ;
if ( browser = = NULL )
return - 1 ;
/* reset abort key so that it can get Ctrl-C as a key */
SLang_reset_tty ( ) ;
SLang_init_tty ( 0 , 0 , 0 ) ;
c2c_browser__update_nr_entries ( browser ) ;
while ( 1 ) {
2019-12-12 21:31:40 +03:00
key = hist_browser__run ( browser , " ? - help " , true , 0 ) ;
2016-01-06 18:59:02 +03:00
switch ( key ) {
case ' q ' :
goto out ;
2016-05-02 19:30:44 +03:00
case ' d ' :
perf_c2c__browse_cacheline ( browser - > he_selection ) ;
break ;
2016-08-17 16:54:58 +03:00
case ' ? ' :
ui_browser__help_window ( & browser - > b , help ) ;
break ;
2016-01-06 18:59:02 +03:00
default :
break ;
}
}
out :
hist_browser__delete ( browser ) ;
return 0 ;
}
2016-08-27 12:40:23 +03:00
static void perf_c2c_display ( struct perf_session * session )
2016-01-06 18:59:02 +03:00
{
2017-03-07 18:08:33 +03:00
if ( use_browser = = 0 )
2016-08-27 12:40:23 +03:00
perf_c2c__hists_fprintf ( stdout , session ) ;
2016-01-06 18:59:02 +03:00
else
perf_c2c__hists_browse ( & c2c . hists . hists ) ;
}
# else
2016-08-27 12:40:23 +03:00
static void perf_c2c_display ( struct perf_session * session )
2016-01-06 18:59:02 +03:00
{
use_browser = 0 ;
2016-08-27 12:40:23 +03:00
perf_c2c__hists_fprintf ( stdout , session ) ;
2016-01-06 18:59:02 +03:00
}
# endif /* HAVE_SLANG_SUPPORT */
2018-03-09 13:14:41 +03:00
static char * fill_line ( const char * orig , int len )
2016-01-06 18:59:02 +03:00
{
2018-03-09 13:14:41 +03:00
int i , j , olen = strlen ( orig ) ;
char * buf ;
buf = zalloc ( len + 1 ) ;
if ( ! buf )
return NULL ;
j = len / 2 - olen / 2 ;
for ( i = 0 ; i < j - 1 ; i + + )
buf [ i ] = ' - ' ;
buf [ i + + ] = ' ' ;
strcpy ( buf + i , orig ) ;
i + = olen ;
buf [ i + + ] = ' ' ;
for ( ; i < len ; i + + )
buf [ i ] = ' - ' ;
return buf ;
}
static int ui_quirks ( void )
{
const char * nodestr = " Data address " ;
char * buf ;
2016-01-06 18:59:02 +03:00
if ( ! c2c . use_stdio ) {
dim_offset . width = 5 ;
dim_offset . header = header_offset_tui ;
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 10:58:23 +03:00
nodestr = chk_double_cl ? " Double-CL " : " CL " ;
2016-01-06 18:59:02 +03:00
}
2016-05-29 11:21:45 +03:00
2022-08-11 09:24:46 +03:00
dim_percent_costly_snoop . header = percent_costly_snoop_header [ c2c . display ] ;
2018-03-09 13:14:41 +03:00
/* Fix the zero line for dcacheline column. */
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 10:58:23 +03:00
buf = fill_line ( chk_double_cl ? " Double-Cacheline " : " Cacheline " ,
dim_dcacheline . width +
dim_dcacheline_node . width +
dim_dcacheline_count . width + 4 ) ;
2018-03-09 13:14:41 +03:00
if ( ! buf )
return - ENOMEM ;
dim_dcacheline . header . line [ 0 ] . text = buf ;
/* Fix the zero line for offset column. */
buf = fill_line ( nodestr , dim_offset . width +
2018-03-09 13:14:42 +03:00
dim_offset_node . width +
dim_dcacheline_count . width + 4 ) ;
2018-03-09 13:14:41 +03:00
if ( ! buf )
return - ENOMEM ;
dim_offset . header . line [ 0 ] . text = buf ;
return 0 ;
2016-01-06 18:59:02 +03:00
}
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
# define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function,percent"
const char callchain_help [ ] = " Display call graph (stack chain/backtrace): \n \n "
CALLCHAIN_REPORT_HELP
" \n \t \t \t \t Default: " CALLCHAIN_DEFAULT_OPT ;
static int
parse_callchain_opt ( const struct option * opt , const char * arg , int unset )
{
struct callchain_param * callchain = opt - > value ;
callchain - > enabled = ! unset ;
/*
* - - no - call - graph
*/
if ( unset ) {
symbol_conf . use_callchain = false ;
callchain - > mode = CHAIN_NONE ;
return 0 ;
}
return parse_callchain_report_opt ( arg ) ;
}
2019-07-21 14:23:52 +03:00
static int setup_callchain ( struct evlist * evlist )
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
{
2020-06-17 15:24:21 +03:00
u64 sample_type = evlist__combined_sample_type ( evlist ) ;
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
enum perf_call_graph_mode mode = CALLCHAIN_NONE ;
if ( ( sample_type & PERF_SAMPLE_REGS_USER ) & &
perf unwind: Do not look just at the global callchain_param.record_mode
When setting up DWARF callchains on specific events, without using
'record' or 'trace' --call-graph, but instead doing it like:
perf trace -e cycles/call-graph=dwarf/
The unwind__prepare_access() call in thread__insert_map() when we
process PERF_RECORD_MMAP(2) metadata events were not being performed,
precluding us from using per-event DWARF callchains, handling them just
when we asked for all events to be DWARF, using "--call-graph dwarf".
We do it in the PERF_RECORD_MMAP because we have to look at one of the
executable maps to figure out the executable type (64-bit, 32-bit) of
the DSO laid out in that mmap. Also to look at the architecture where
the perf.data file was recorded.
All this probably should be deferred to when we process a sample for
some thread that has callchains, so that we do this processing only for
the threads with samples, not for all of them.
For now, fix using DWARF on specific events.
Before:
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.048 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.048/0.048/0.048/0.000 ms
0.000 probe_libc:inet_pton:(7fe9597bb350))
Problem processing probe_libc:inet_pton callchain, skipping...
#
After:
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.060 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
0.000 probe_libc:inet_pton:(7fd4aa930350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa804e51af3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa804e51b379] (/usr/bin/ping)
#
# perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.057 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.057/0.057/0.057/0.000 ms
0.000 probe_libc:inet_pton:(7f9363b9e350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffa9e8a14e0f3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffa9e8a14e1379] (/usr/bin/ping)
#
# perf trace --call-graph=fp --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.077 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.077/0.077/0.077/0.000 ms
0.000 probe_libc:inet_pton:(7f4947e1c350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa716d88ef3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa716d88f379] (/usr/bin/ping)
#
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=fp/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.078 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
0.000 probe_libc:inet_pton:(7fa157696350))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
getaddrinfo (/usr/lib64/libc-2.26.so)
[0xffffa9ba39c74f40] (/usr/bin/ping)
#
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/r/20180116182650.GE16107@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-15 22:48:46 +03:00
( sample_type & PERF_SAMPLE_STACK_USER ) ) {
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
mode = CALLCHAIN_DWARF ;
perf unwind: Do not look just at the global callchain_param.record_mode
When setting up DWARF callchains on specific events, without using
'record' or 'trace' --call-graph, but instead doing it like:
perf trace -e cycles/call-graph=dwarf/
The unwind__prepare_access() call in thread__insert_map() when we
process PERF_RECORD_MMAP(2) metadata events were not being performed,
precluding us from using per-event DWARF callchains, handling them just
when we asked for all events to be DWARF, using "--call-graph dwarf".
We do it in the PERF_RECORD_MMAP because we have to look at one of the
executable maps to figure out the executable type (64-bit, 32-bit) of
the DSO laid out in that mmap. Also to look at the architecture where
the perf.data file was recorded.
All this probably should be deferred to when we process a sample for
some thread that has callchains, so that we do this processing only for
the threads with samples, not for all of them.
For now, fix using DWARF on specific events.
Before:
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.048 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.048/0.048/0.048/0.000 ms
0.000 probe_libc:inet_pton:(7fe9597bb350))
Problem processing probe_libc:inet_pton callchain, skipping...
#
After:
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.060 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms
0.000 probe_libc:inet_pton:(7fd4aa930350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa804e51af3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa804e51b379] (/usr/bin/ping)
#
# perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.057 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.057/0.057/0.057/0.000 ms
0.000 probe_libc:inet_pton:(7f9363b9e350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffa9e8a14e0f3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffa9e8a14e1379] (/usr/bin/ping)
#
# perf trace --call-graph=fp --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.077 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.077/0.077/0.077/0.000 ms
0.000 probe_libc:inet_pton:(7f4947e1c350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa716d88ef3f] (/usr/bin/ping)
__libc_start_main (/usr/lib64/libc-2.26.so)
[0xffffaa716d88f379] (/usr/bin/ping)
#
# perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=fp/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.078 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
0.000 probe_libc:inet_pton:(7fa157696350))
__GI___inet_pton (/usr/lib64/libc-2.26.so)
getaddrinfo (/usr/lib64/libc-2.26.so)
[0xffffa9ba39c74f40] (/usr/bin/ping)
#
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/r/20180116182650.GE16107@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-15 22:48:46 +03:00
dwarf_callchain_users = true ;
} else if ( sample_type & PERF_SAMPLE_BRANCH_STACK )
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
mode = CALLCHAIN_LBR ;
else if ( sample_type & PERF_SAMPLE_CALLCHAIN )
mode = CALLCHAIN_FP ;
if ( ! callchain_param . enabled & &
callchain_param . mode ! = CHAIN_NONE & &
mode ! = CALLCHAIN_NONE ) {
symbol_conf . use_callchain = true ;
if ( callchain_register_param ( & callchain_param ) < 0 ) {
ui__error ( " Can't register callchain params. \n " ) ;
return - EINVAL ;
}
}
2020-03-19 23:25:16 +03:00
if ( c2c . stitch_lbr & & ( mode ! = CALLCHAIN_LBR ) ) {
ui__warning ( " Can't find LBR callchain. Switch off --stitch-lbr. \n "
" Please apply --call-graph lbr when recording. \n " ) ;
c2c . stitch_lbr = false ;
}
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
callchain_param . record_mode = mode ;
callchain_param . min_percent = 0 ;
return 0 ;
}
2016-05-29 11:21:45 +03:00
static int setup_display ( const char * str )
{
2022-08-11 09:24:50 +03:00
const char * display = str ;
2016-05-29 11:21:45 +03:00
2016-11-22 00:33:30 +03:00
if ( ! strcmp ( display , " tot " ) )
2022-08-11 09:24:45 +03:00
c2c . display = DISPLAY_TOT_HITM ;
2016-11-22 00:33:30 +03:00
else if ( ! strcmp ( display , " rmt " ) )
2022-08-11 09:24:45 +03:00
c2c . display = DISPLAY_RMT_HITM ;
2016-05-29 11:21:45 +03:00
else if ( ! strcmp ( display , " lcl " ) )
2022-08-11 09:24:45 +03:00
c2c . display = DISPLAY_LCL_HITM ;
2022-08-11 09:24:49 +03:00
else if ( ! strcmp ( display , " peer " ) )
c2c . display = DISPLAY_SNP_PEER ;
2016-05-29 11:21:45 +03:00
else {
pr_err ( " failed: unknown display type: %s \n " , str ) ;
return - 1 ;
}
return 0 ;
}
2016-05-24 15:14:38 +03:00
# define for_each_token(__tok, __buf, __sep, __tmp) \
for ( __tok = strtok_r ( __buf , __sep , & __tmp ) ; __tok ; \
__tok = strtok_r ( NULL , __sep , & __tmp ) )
2016-10-11 14:39:47 +03:00
static int build_cl_output ( char * cl_sort , bool no_source )
2016-05-24 15:14:38 +03:00
{
char * tok , * tmp , * buf = strdup ( cl_sort ) ;
bool add_pid = false ;
bool add_tid = false ;
bool add_iaddr = false ;
bool add_sym = false ;
bool add_dso = false ;
bool add_src = false ;
2019-10-15 05:54:14 +03:00
int ret = 0 ;
2016-05-24 15:14:38 +03:00
if ( ! buf )
return - ENOMEM ;
for_each_token ( tok , buf , " , " , tmp ) {
if ( ! strcmp ( tok , " tid " ) ) {
add_tid = true ;
} else if ( ! strcmp ( tok , " pid " ) ) {
add_pid = true ;
} else if ( ! strcmp ( tok , " iaddr " ) ) {
add_iaddr = true ;
add_sym = true ;
add_dso = true ;
2016-10-11 14:39:47 +03:00
add_src = no_source ? false : true ;
2016-05-24 15:14:38 +03:00
} else if ( ! strcmp ( tok , " dso " ) ) {
add_dso = true ;
} else if ( strcmp ( tok , " offset " ) ) {
pr_err ( " unrecognized sort token: %s \n " , tok ) ;
2019-10-15 05:54:14 +03:00
ret = - EINVAL ;
goto err ;
2016-05-24 15:14:38 +03:00
}
}
if ( asprintf ( & c2c . cl_output ,
2022-08-11 09:24:49 +03:00
" %s%s%s%s%s%s%s%s%s%s%s%s " ,
2016-07-06 16:40:09 +03:00
c2c . use_stdio ? " cl_num_empty, " : " " ,
2022-08-11 09:24:49 +03:00
c2c . display = = DISPLAY_SNP_PEER ? " percent_rmt_peer, "
" percent_lcl_peer, " :
" percent_rmt_hitm, "
" percent_lcl_hitm, " ,
2016-05-24 15:14:38 +03:00
" percent_stores_l1hit, "
" percent_stores_l1miss, "
2022-05-18 08:57:20 +03:00
" percent_stores_na, "
2018-03-09 13:14:42 +03:00
" offset,offset_node,dcacheline_count, " ,
2016-05-24 15:14:38 +03:00
add_pid ? " pid, " : " " ,
add_tid ? " tid, " : " " ,
add_iaddr ? " iaddr, " : " " ,
2022-08-11 09:24:49 +03:00
c2c . display = = DISPLAY_SNP_PEER ? " mean_rmt_peer, "
" mean_lcl_peer, " :
" mean_rmt, "
" mean_lcl, " ,
2016-05-24 15:14:38 +03:00
" mean_load, "
2017-01-20 12:20:31 +03:00
" tot_recs, "
2016-05-24 15:14:38 +03:00
" cpucnt, " ,
add_sym ? " symbol, " : " " ,
add_dso ? " dso, " : " " ,
add_src ? " cl_srcline, " : " " ,
2019-10-15 05:54:14 +03:00
" node " ) < 0 ) {
ret = - ENOMEM ;
goto err ;
}
2016-05-24 15:14:38 +03:00
c2c . show_src = add_src ;
2019-10-15 05:54:14 +03:00
err :
2016-05-24 15:14:38 +03:00
free ( buf ) ;
2019-10-15 05:54:14 +03:00
return ret ;
2016-05-24 15:14:38 +03:00
}
2016-10-11 14:39:47 +03:00
static int setup_coalesce ( const char * coalesce , bool no_source )
2016-05-24 15:14:38 +03:00
{
const char * c = coalesce ? : coalesce_default ;
2022-08-11 09:24:49 +03:00
const char * sort_str = NULL ;
2016-05-24 15:14:38 +03:00
if ( asprintf ( & c2c . cl_sort , " offset,%s " , c ) < 0 )
return - ENOMEM ;
2016-10-11 14:39:47 +03:00
if ( build_cl_output ( c2c . cl_sort , no_source ) )
2016-05-24 15:14:38 +03:00
return - 1 ;
2022-08-11 09:24:49 +03:00
if ( c2c . display = = DISPLAY_TOT_HITM )
sort_str = " tot_hitm " ;
else if ( c2c . display = = DISPLAY_RMT_HITM )
sort_str = " rmt_hitm,lcl_hitm " ;
else if ( c2c . display = = DISPLAY_LCL_HITM )
sort_str = " lcl_hitm,rmt_hitm " ;
else if ( c2c . display = = DISPLAY_SNP_PEER )
sort_str = " tot_peer " ;
if ( asprintf ( & c2c . cl_resort , " offset,%s " , sort_str ) < 0 )
2016-05-24 15:14:38 +03:00
return - ENOMEM ;
pr_debug ( " coalesce sort fields: %s \n " , c2c . cl_sort ) ;
pr_debug ( " coalesce resort fields: %s \n " , c2c . cl_resort ) ;
pr_debug ( " coalesce output fields: %s \n " , c2c . cl_output ) ;
return 0 ;
}
2016-09-22 18:36:40 +03:00
static int perf_c2c__report ( int argc , const char * * argv )
{
2020-11-06 12:48:52 +03:00
struct itrace_synth_opts itrace_synth_opts = {
. set = true ,
. mem = true , /* Only enable memory event */
. default_no_sample = true ,
} ;
2016-09-22 18:36:40 +03:00
struct perf_session * session ;
2016-09-22 18:36:44 +03:00
struct ui_progress prog ;
2017-01-24 00:07:59 +03:00
struct perf_data data = {
2016-09-22 18:36:40 +03:00
. mode = PERF_DATA_MODE_READ ,
} ;
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
char callchain_default_opt [ ] = CALLCHAIN_DEFAULT_OPT ;
2016-05-29 11:21:45 +03:00
const char * display = NULL ;
2016-05-24 15:14:38 +03:00
const char * coalesce = NULL ;
2016-10-11 14:39:47 +03:00
bool no_source = false ;
2016-11-22 00:33:31 +03:00
const struct option options [ ] = {
2016-09-22 18:36:40 +03:00
OPT_STRING ( ' k ' , " vmlinux " , & symbol_conf . vmlinux_name ,
" file " , " vmlinux pathname " ) ,
OPT_STRING ( ' i ' , " input " , & input_name , " file " ,
" the input file to process " ) ,
2016-06-03 16:40:28 +03:00
OPT_INCR ( ' N ' , " node-info " , & c2c . node_info ,
" show extra node info in report (repeat for more info) " ) ,
2016-01-06 18:59:02 +03:00
OPT_BOOLEAN ( 0 , " stdio " , & c2c . use_stdio , " Use the stdio interface " ) ,
2016-05-02 21:01:59 +03:00
OPT_BOOLEAN ( 0 , " stats " , & c2c . stats_only ,
2017-03-07 18:08:32 +03:00
" Display only statistic tables (implies --stdio) " ) ,
2016-07-10 17:25:15 +03:00
OPT_BOOLEAN ( 0 , " full-symbols " , & c2c . symbol_full ,
" Display full length of symbols " ) ,
2016-10-11 14:39:47 +03:00
OPT_BOOLEAN ( 0 , " no-source " , & no_source ,
" Do not display Source Line column " ) ,
2016-10-11 14:52:05 +03:00
OPT_BOOLEAN ( 0 , " show-all " , & c2c . show_all ,
" Show all captured HITM lines. " ) ,
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
OPT_CALLBACK_DEFAULT ( ' g ' , " call-graph " , & callchain_param ,
" print_type,threshold[,print_limit],order,sort_key[,branch],value " ,
callchain_help , & parse_callchain_opt ,
callchain_default_opt ) ,
2022-08-11 09:24:49 +03:00
OPT_STRING ( ' d ' , " display " , & display , " Switch HITM output type " , " tot,lcl,rmt,peer " ) ,
2016-05-24 15:14:38 +03:00
OPT_STRING ( ' c ' , " coalesce " , & coalesce , " coalesce fields " ,
" coalesce fields: pid,tid,iaddr,dso " ) ,
2016-11-22 00:33:28 +03:00
OPT_BOOLEAN ( ' f ' , " force " , & symbol_conf . force , " don't complain, do it " ) ,
2020-03-19 23:25:16 +03:00
OPT_BOOLEAN ( 0 , " stitch-lbr " , & c2c . stitch_lbr ,
" Enable LBR callgraph stitching approach " ) ,
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 10:58:23 +03:00
OPT_BOOLEAN ( 0 , " double-cl " , & chk_double_cl , " Detect adjacent cacheline false sharing " ) ,
2016-11-22 00:33:31 +03:00
OPT_PARENT ( c2c_options ) ,
2016-09-22 18:36:40 +03:00
OPT_END ( )
} ;
int err = 0 ;
2021-01-14 18:46:46 +03:00
const char * output_str , * sort_str = NULL ;
2016-09-22 18:36:40 +03:00
2016-11-22 00:33:31 +03:00
argc = parse_options ( argc , argv , options , report_c2c_usage ,
2016-09-22 18:36:40 +03:00
PARSE_OPT_STOP_AT_NON_OPTION ) ;
2016-09-22 18:36:44 +03:00
if ( argc )
2016-11-22 00:33:31 +03:00
usage_with_options ( report_c2c_usage , options ) ;
2016-09-22 18:36:40 +03:00
2022-05-26 17:54:00 +03:00
# ifndef HAVE_SLANG_SUPPORT
c2c . use_stdio = true ;
# endif
2016-05-02 21:01:59 +03:00
if ( c2c . stats_only )
c2c . use_stdio = true ;
2021-10-18 16:48:42 +03:00
err = symbol__validate_sym_arguments ( ) ;
if ( err )
goto out ;
2016-09-22 18:36:44 +03:00
if ( ! input_name | | ! strlen ( input_name ) )
input_name = " perf.data " ;
2019-02-21 12:41:30 +03:00
data . path = input_name ;
data . force = symbol_conf . force ;
2016-09-22 18:36:40 +03:00
2022-08-11 09:24:50 +03:00
session = perf_session__new ( & data , & c2c . tool ) ;
if ( IS_ERR ( session ) ) {
err = PTR_ERR ( session ) ;
pr_debug ( " Error creating perf session \n " ) ;
goto out ;
}
/*
* Use the ' tot ' as default display type if user doesn ' t specify it ;
* since Arm64 platform doesn ' t support HITMs flag , use ' peer ' as the
* default display type .
*/
if ( ! display ) {
if ( ! strcmp ( perf_env__arch ( & session - > header . env ) , " arm64 " ) )
display = " peer " ;
else
display = " tot " ;
}
2016-05-29 11:21:45 +03:00
err = setup_display ( display ) ;
if ( err )
2022-08-11 09:24:50 +03:00
goto out_session ;
2016-05-29 11:21:45 +03:00
2016-10-11 14:39:47 +03:00
err = setup_coalesce ( coalesce , no_source ) ;
2016-05-24 15:14:38 +03:00
if ( err ) {
pr_debug ( " Failed to initialize hists \n " ) ;
2022-08-11 09:24:50 +03:00
goto out_session ;
2016-05-24 15:14:38 +03:00
}
2016-05-24 11:12:31 +03:00
err = c2c_hists__init ( & c2c . hists , " dcacheline " , 2 ) ;
2016-09-22 18:36:41 +03:00
if ( err ) {
pr_debug ( " Failed to initialize hists \n " ) ;
2022-08-11 09:24:50 +03:00
goto out_session ;
2016-09-22 18:36:40 +03:00
}
2016-11-22 00:33:27 +03:00
2020-11-06 12:48:52 +03:00
session - > itrace_synth_opts = & itrace_synth_opts ;
2016-06-03 16:40:28 +03:00
err = setup_nodes ( session ) ;
if ( err ) {
pr_err ( " Failed setup nodes \n " ) ;
2022-08-11 09:24:50 +03:00
goto out_session ;
2016-06-03 16:40:28 +03:00
}
2016-09-22 18:36:40 +03:00
2018-03-09 13:14:40 +03:00
err = mem2node__init ( & c2c . mem2node , & session - > header . env ) ;
perf c2c report: Allow to report callchains
Add --call-graph option to properly setup callchain code. Adding default
settings to display callchains whenever they are stored in the
perf.data.
Committer Notes:
Testing it:
[root@jouet ~]# perf c2c record -a -g sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.331 MB perf.data (4263 samples) ]
[root@jouet ~]# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
[root@jouet ~]# perf c2c report --stats
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
=================================================
Trace Event Information
=================================================
Total records : 4263
Locked Load/Store Operations : 220
Load Operations : 2130
Loads - uncacheable : 1
Loads - IO : 7
Loads - Miss : 86
Loads - no mapping : 5
Load Fill Buffer Hit : 609
Load L1D hit : 612
Load L2D hit : 27
Load LLC hit : 607
Load Local HITM : 15
Load Remote HITM : 0
Load Remote HIT : 0
Load Local DRAM : 176
Load Remote DRAM : 0
Load MESI State Exclusive : 176
Load MESI State Shared : 0
Load LLC Misses : 176
LLC Misses to Local DRAM : 100.0%
LLC Misses to Remote DRAM : 0.0%
LLC Misses to Remote cache (HIT) : 0.0%
LLC Misses to Remote cache (HITM) : 0.0%
Store Operations : 2133
Store - uncacheable : 0
Store - no mapping : 1
Store L1D Hit : 1967
Store L1D Miss : 165
No Page Map Rejects : 145
Unable to parse data source : 0
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 15
Load HITs on shared lines : 26
Fill Buffer Hits on shared lines : 7
L1D hits on shared lines : 3
L2D hits on shared lines : 0
LLC hits on shared lines : 16
Locked Access on shared lines : 2
Store HITs on shared lines : 8
Store L1D hits on shared lines : 7
Total Merged records : 23
=================================================
c2c details
=================================================
Events : cpu/mem-loads,ldlat=30/P
: cpu/mem-stores/P
[root@jouet ~]#
[root@jouet ~]# perf c2c report
Shared Data Cache Line Table (2378 entries)
Total --- LLC Load Hitm -- -- Store Reference - - Load Dram - LLC Total - Core Load Hit -
Cacheline records %hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2
- 0xffff880024380c00 10 0.00% 0 0 0 6 6 0 0 0 0 4 1 3 0
- 0.13% _raw_spin_lock_irqsave
- 0.07% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
+ 0x103573
- 0.05% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 0.02% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
+ 0.02% sock_def_readable
+ 0.02% ep_scan_ready_list.constprop.12
+ 0.00% mutex_lock
+ 0.00% __wake_up_common
+ 0xffff880024380c40 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
+ 0xffff880024380c80 1 0.00% 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0xffff8800243e9f00 1 0.00% 0 0 0 1 1 0 0 0 0 0 0 0 0
enqueue_entity
enqueue_task_fair
activate_task
ttwu_do_activate
try_to_wake_up
wake_up_process
hrtimer_wakeup
__hrtimer_run_queues
hrtimer_interrupt
local_apic_timer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
cpuidle_enter
call_cpuidle
help
-------------
And when presing 'd' to see the cacheline details:
Cacheline 0xffff880024380c00
----- HITM ----- -- Store Refs -- --------- cycles ----- cpu
Rmt Lcl L1 Hit L1 Miss Off Pid Tid rmt hitm lcl hitm load cnt Symbol
- 0.00% 0.00% 100.00% 0.00% 0x0 1473 1474:Chrome_ChildIOT 0 0 41 2 [k] _raw_spin_lock_irqsave [kernel]
- _raw_spin_lock_irqsave
- 51.52% ep_poll
sys_epoll_wait
do_syscall_64
return_from_SYSCALL_64
- 0x103573
47.19% 0
4.33% 0xc30bd
- 35.93% ep_poll_callback
__wake_up_common
- __wake_up_sync_key
- 18.20% pipe_read
__vfs_read
vfs_read
sys_read
do_syscall_64
return_from_SYSCALL_64
0xfdad
- 17.73% sock_def_readable
unix_stream_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
sys_sendmsg
do_syscall_64
return_from_SYSCALL_64
__GI___libc_sendmsg
0x12c036af1fc0
0x16a4050
0x894928ec83485354
+ 12.45% ep_scan_ready_list.constprop.12
+ 0.00% 0.00% 0.00% 0.00% 0x8 1473 1474:Chrome_ChildIOT 0 0 102 1 [k] mutex_lock [kernel]
+ 0.00% 0.00% 0.00% 0.00% 0x38 1473 1473:chrome 0 0 88 1 [k] __wake_up_common [kernel]
help
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-11 19:23:48 +03:00
if ( err )
goto out_session ;
2018-03-09 13:14:40 +03:00
err = setup_callchain ( session - > evlist ) ;
if ( err )
goto out_mem2node ;
2016-09-22 18:36:40 +03:00
if ( symbol__init ( & session - > header . env ) < 0 )
2018-03-09 13:14:40 +03:00
goto out_mem2node ;
2016-09-22 18:36:40 +03:00
/* No pipe support at the moment. */
2017-01-24 00:07:59 +03:00
if ( perf_data__is_pipe ( session - > data ) ) {
2016-09-22 18:36:40 +03:00
pr_debug ( " No pipe support at the moment. \n " ) ;
2018-03-09 13:14:40 +03:00
goto out_mem2node ;
2016-09-22 18:36:40 +03:00
}
2016-11-22 00:33:27 +03:00
if ( c2c . use_stdio )
use_browser = 0 ;
else
use_browser = 1 ;
setup_browser ( false ) ;
2016-09-22 18:36:44 +03:00
err = perf_session__process_events ( session ) ;
if ( err ) {
pr_err ( " failed to process sample \n " ) ;
2018-03-09 13:14:40 +03:00
goto out_mem2node ;
2016-09-22 18:36:44 +03:00
}
2022-08-11 09:24:49 +03:00
if ( c2c . display ! = DISPLAY_SNP_PEER )
output_str = " cl_idx, "
" dcacheline, "
" dcacheline_node, "
" dcacheline_count, "
" percent_costly_snoop, "
" tot_hitm,lcl_hitm,rmt_hitm, "
" tot_recs, "
" tot_loads, "
" tot_stores, "
" stores_l1hit,stores_l1miss,stores_na, "
" ld_fbhit,ld_l1hit,ld_l2hit, "
" ld_lclhit,lcl_hitm, "
" ld_rmthit,rmt_hitm, "
" dram_lcl,dram_rmt " ;
else
output_str = " cl_idx, "
" dcacheline, "
" dcacheline_node, "
" dcacheline_count, "
" percent_costly_snoop, "
" tot_peer,lcl_peer,rmt_peer, "
" tot_recs, "
" tot_loads, "
" tot_stores, "
" stores_l1hit,stores_l1miss,stores_na, "
" ld_fbhit,ld_l1hit,ld_l2hit, "
" ld_lclhit,lcl_hitm, "
" ld_rmthit,rmt_hitm, "
" dram_lcl,dram_rmt " ;
2021-01-14 18:46:46 +03:00
2022-08-11 09:24:45 +03:00
if ( c2c . display = = DISPLAY_TOT_HITM )
2021-01-14 18:46:46 +03:00
sort_str = " tot_hitm " ;
2022-08-11 09:24:45 +03:00
else if ( c2c . display = = DISPLAY_RMT_HITM )
2021-01-14 18:46:46 +03:00
sort_str = " rmt_hitm " ;
2022-08-11 09:24:45 +03:00
else if ( c2c . display = = DISPLAY_LCL_HITM )
2021-01-14 18:46:46 +03:00
sort_str = " lcl_hitm " ;
2022-08-11 09:24:49 +03:00
else if ( c2c . display = = DISPLAY_SNP_PEER )
sort_str = " tot_peer " ;
2021-01-14 18:46:46 +03:00
c2c_hists__reinit ( & c2c . hists , output_str , sort_str ) ;
perf c2c report: Set final resort fields
Set resort/display fields for both cachelines and single cacheline
displays.
Cachelines are sorted on:
rmt_hitm
will be made configurable in following patches.
Following fields are display for cachelines:
dcacheline
tot_recs
percent_hitm
tot_hitm,lcl_hitm,rmt_hitm
stores,stores_l1hit,stores_l1miss
dram_lcl,dram_rmt
ld_llcmiss
tot_loads
ld_fbhit,ld_l1hit,ld_l2hit
ld_lclhit,ld_rmthit
The single cacheline is sort by:
offset,rmt_hitm,lcl_hitm
will be made configurable in following patches.
Following fields are display for each cacheline:
percent_rmt_hitm
percent_lcl_hitm
percent_stores_l1hit
percent_stores_l1miss
offset
pid
tid
mean_rmt
mean_lcl
mean_load
cpucnt
symbol
dso
node
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-0rclftliywdq9qr2sjbugb6b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-10 15:08:29 +03:00
2016-09-22 18:36:44 +03:00
ui_progress__init ( & prog , c2c . hists . hists . nr_entries , " Sorting... " ) ;
hists__collapse_resort ( & c2c . hists . hists , NULL ) ;
2021-01-14 18:46:41 +03:00
hists__output_resort_cb ( & c2c . hists . hists , & prog , resort_shared_cl_cb ) ;
2016-07-01 12:12:11 +03:00
hists__iterate_cb ( & c2c . hists . hists , resort_cl_cb ) ;
2016-09-22 18:36:44 +03:00
ui_progress__finish ( ) ;
2018-03-09 13:14:41 +03:00
if ( ui_quirks ( ) ) {
pr_err ( " failed to setup UI \n " ) ;
goto out_mem2node ;
}
2016-01-06 18:59:02 +03:00
2016-08-27 12:40:23 +03:00
perf_c2c_display ( session ) ;
2016-05-03 15:32:56 +03:00
2018-03-09 13:14:40 +03:00
out_mem2node :
mem2node__exit ( & c2c . mem2node ) ;
2016-09-22 18:36:40 +03:00
out_session :
perf_session__delete ( session ) ;
out :
return err ;
}
2016-12-12 16:52:10 +03:00
static int parse_record_events ( const struct option * opt ,
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
const char * str , int unset __maybe_unused )
{
bool * event_set = ( bool * ) opt - > value ;
2020-05-08 01:06:04 +03:00
if ( ! strcmp ( str , " list " ) ) {
perf_mem_events__list ( ) ;
exit ( 0 ) ;
}
if ( perf_mem_events__parse ( str ) )
exit ( - 1 ) ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
* event_set = true ;
2020-05-08 01:06:04 +03:00
return 0 ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
}
static const char * const __usage_record [ ] = {
" perf c2c record [<options>] [<command>] " ,
" perf c2c record [<options>] -- <command> [<options>] " ,
NULL
} ;
static const char * const * record_mem_usage = __usage_record ;
static int perf_c2c__record ( int argc , const char * * argv )
{
2021-05-27 03:16:10 +03:00
int rec_argc , i = 0 , j , rec_tmp_nr = 0 ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
const char * * rec_argv ;
2021-05-27 03:16:10 +03:00
char * * rec_tmp ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
int ret ;
bool all_user = false , all_kernel = false ;
bool event_set = false ;
2020-11-06 12:48:46 +03:00
struct perf_mem_event * e ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
struct option options [ ] = {
OPT_CALLBACK ( ' e ' , " event " , & event_set , " event " ,
2020-10-11 15:10:22 +03:00
" event selector. Use 'perf c2c record -e list' to list available events " ,
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
parse_record_events ) ,
OPT_BOOLEAN ( ' u ' , " all-user " , & all_user , " collect only user level data " ) ,
OPT_BOOLEAN ( ' k ' , " all-kernel " , & all_kernel , " collect only kernel level data " ) ,
OPT_UINTEGER ( ' l ' , " ldlat " , & perf_mem_events__loads_ldlat , " setup mem-loads latency " ) ,
2016-11-22 00:33:31 +03:00
OPT_PARENT ( c2c_options ) ,
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
OPT_END ( )
} ;
if ( perf_mem_events__init ( ) ) {
pr_err ( " failed: memory events not supported \n " ) ;
return - 1 ;
}
argc = parse_options ( argc , argv , options , record_mem_usage ,
PARSE_OPT_KEEP_UNKNOWN ) ;
2021-05-27 03:16:10 +03:00
if ( ! perf_pmu__has_hybrid ( ) )
rec_argc = argc + 11 ; /* max number of arguments */
else
rec_argc = argc + 11 * perf_pmu__hybrid_pmu_num ( ) ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
rec_argv = calloc ( rec_argc + 1 , sizeof ( char * ) ) ;
if ( ! rec_argv )
return - 1 ;
2021-05-27 03:16:10 +03:00
rec_tmp = calloc ( rec_argc + 1 , sizeof ( char * ) ) ;
if ( ! rec_tmp ) {
free ( rec_argv ) ;
return - 1 ;
}
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
rec_argv [ i + + ] = " record " ;
if ( ! event_set ) {
2020-11-06 12:48:48 +03:00
e = perf_mem_events__ptr ( PERF_MEM_EVENTS__LOAD_STORE ) ;
/*
* The load and store operations are required , use the event
* PERF_MEM_EVENTS__LOAD_STORE if it is supported .
*/
if ( e - > tag ) {
e - > record = true ;
2022-10-06 18:39:42 +03:00
rec_argv [ i + + ] = " -W " ;
2020-11-06 12:48:48 +03:00
} else {
e = perf_mem_events__ptr ( PERF_MEM_EVENTS__LOAD ) ;
e - > record = true ;
2020-11-06 12:48:46 +03:00
2020-11-06 12:48:48 +03:00
e = perf_mem_events__ptr ( PERF_MEM_EVENTS__STORE ) ;
e - > record = true ;
}
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
}
2020-11-06 12:48:46 +03:00
e = perf_mem_events__ptr ( PERF_MEM_EVENTS__LOAD ) ;
if ( e - > record )
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
rec_argv [ i + + ] = " -W " ;
rec_argv [ i + + ] = " -d " ;
2018-03-09 13:14:37 +03:00
rec_argv [ i + + ] = " --phys-data " ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
rec_argv [ i + + ] = " --sample-cpu " ;
2021-05-27 03:16:10 +03:00
ret = perf_mem_events__record_args ( rec_argv , & i , rec_tmp , & rec_tmp_nr ) ;
if ( ret )
goto out ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
if ( all_user )
rec_argv [ i + + ] = " --all-user " ;
if ( all_kernel )
rec_argv [ i + + ] = " --all-kernel " ;
for ( j = 0 ; j < argc ; j + + , i + + )
rec_argv [ i ] = argv [ j ] ;
if ( verbose > 0 ) {
pr_debug ( " calling: " ) ;
j = 0 ;
while ( rec_argv [ j ] ) {
pr_debug ( " %s " , rec_argv [ j ] ) ;
j + + ;
}
pr_debug ( " \n " ) ;
}
2017-03-27 17:47:20 +03:00
ret = cmd_record ( i , rec_argv ) ;
2021-05-27 03:16:10 +03:00
out :
for ( i = 0 ; i < rec_tmp_nr ; i + + )
free ( rec_tmp [ i ] ) ;
free ( rec_tmp ) ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
free ( rec_argv ) ;
return ret ;
}
2017-03-27 17:47:20 +03:00
int cmd_c2c ( int argc , const char * * argv )
2016-09-22 18:36:38 +03:00
{
argc = parse_options ( argc , argv , c2c_options , c2c_usage ,
PARSE_OPT_STOP_AT_NON_OPTION ) ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
if ( ! argc )
usage_with_options ( c2c_usage , c2c_options ) ;
2022-03-25 12:20:32 +03:00
if ( strlen ( argv [ 0 ] ) > 2 & & strstarts ( " record " , argv [ 0 ] ) ) {
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
return perf_c2c__record ( argc , argv ) ;
2022-03-25 12:20:32 +03:00
} else if ( strlen ( argv [ 0 ] ) > 2 & & strstarts ( " report " , argv [ 0 ] ) ) {
2016-09-22 18:36:40 +03:00
return perf_c2c__report ( argc , argv ) ;
perf c2c: Add record subcommand
Adding c2c record subcommand. It setups options related to HITM
cacheline analysis and calls standard perf record command.
$ sudo perf c2c record -v -- -a
calling: record -W -d --sample-cpu -e cpu/mem-loads,ldlat=30/P -e cpu/mem-stores/P -a
...
It produces perf.data, which is to be reported by perf c2c report, that
comes in following patches.
Details are described in the man page, which is added in one of the
following patches.
Committer notes:
Testing it:
# perf c2c record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 5.050 MB perf.data (412 samples) ]
# ls -la perf.data
-rw-------. 1 root root 5301752 Oct 4 13:32 perf.data
# perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
# perf evlist -v
cpu/mem-loads,ldlat=30/P: type: 4, size: 112, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, { bp_addr, config1 }: 0x1f
cpu/mem-stores/P: type: 4, size: 112, config: 0x82d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|ID|CPU|PERIOD|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
#
# perf report --stdio
<SNIP>
# Total Lost Samples: 14
# Samples: 216 of event 'cpu/mem-loads,ldlat=30/P'
# Event count (approx.): 15207
# Overhead Symbol Shared Object
# ........ ..................................... ............................
10.32% [k] update_blocked_averages [kernel.vmlinux]
3.43% [.] 0x00000000001a2122 qemu-system-x86_64 (deleted)
2.52% [k] enqueue_entity [kernel.vmlinux]
1.88% [.] g_main_context_query libglib-2.0.so.0.4800.2
1.86% [k] __schedule [kernel.vmlinux]
<SNIP>
# Samples: 196 of event 'cpu/mem-stores/P'
# Event count (approx.): 14771346
# Overhead Symbol Shared Object
# ........ ................................... ............................
13.91% [k] intel_idle [kernel.vmlinux]
3.02% [.] 0x00000000022f06ea chrome
2.94% [.] 0x00000000001a1b4c qemu-system-x86_64 (deleted)
2.94% [.] 0x000000000019d8e4 qemu-system-x86_64 (deleted)
2.38% [.] 0x00000000001a1c52 qemu-system-x86_64 (deleted)
<SNIP>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1474558645-19956-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22 18:36:39 +03:00
} else {
usage_with_options ( c2c_usage , c2c_options ) ;
}
2016-09-22 18:36:38 +03:00
return 0 ;
}