linux/tools/perf
Feng Tang 1470a108a6 perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.

0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.

So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.

In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):

  ----------------------------------------------------------------------
     26       31        2        0        0        0  0xffff888103ec6000
  ----------------------------------------------------------------------
   35.48%   50.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff8133148b   1153   66    971   3748   74  [k] get_mem_cgroup_from_mm
    6.45%    0.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff813396e4    570    0   1531    879   75  [k] mem_cgroup_charge
   25.81%   50.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81331472    949   70    593   3359   74  [k] get_mem_cgroup_from_mm
   19.35%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81339686   1352    0   1073   1022   74  [k] mem_cgroup_charge
    9.68%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff813396d6   1401    0    863    768   74  [k] mem_cgroup_charge
    3.23%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81333106    618    0    804     11    9  [k] uncharge_batch

The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.

[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/

Committer notes:

Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx

Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-16 09:33:45 -03:00
..
arch perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT 2023-02-06 14:56:22 -03:00
bench perf bench syscall: Add execve syscall benchmark 2023-02-02 16:32:19 -03:00
dlfilters perf tools: Fix usage of the verbose variable 2022-12-20 15:16:33 -03:00
Documentation perf c2c: Add report option to show false sharing in adjacent cachelines 2023-02-16 09:33:45 -03:00
examples/bpf perf trace: Remove unused bpf map 'syscalls' 2022-11-23 10:30:00 -03:00
include/perf perf bpf: Remove now unused BPF headers 2022-11-04 11:41:48 -03:00
jvmti
pmu-events perf jevents: Run metric_test.py at compile-time 2023-02-03 17:11:39 -03:00
python
scripts perf script flamegraph: Avoid d3-flame-graph package dependency 2023-01-19 09:29:58 -03:00
tests perf test bpf: Skip test if kernel-debuginfo is not present 2023-02-06 15:01:23 -03:00
trace perf beauty: Update copy of linux/socket.h with the kernel sources 2023-01-18 10:12:23 -03:00
ui perf tools: Fix "kernel lock contention analysis" test by not printing warnings in quiet mode 2022-10-27 16:37:26 -03:00
util perf c2c: Add report option to show false sharing in adjacent cachelines 2023-02-16 09:33:45 -03:00
.gitignore perf jevents: Run metric_test.py at compile-time 2023-02-03 17:11:39 -03:00
Build perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-annotate.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-bench.c perf bench syscall: Add execve syscall benchmark 2023-02-02 16:32:19 -03:00
builtin-buildid-cache.c
builtin-buildid-list.c
builtin-c2c.c perf c2c: Add report option to show false sharing in adjacent cachelines 2023-02-16 09:33:45 -03:00
builtin-config.c
builtin-daemon.c perf daemon: Use sig_atomic_t to avoid UB 2022-11-03 09:35:44 -03:00
builtin-data.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-diff.c perf tools: Make quiet mode consistent between tools 2022-10-27 16:37:26 -03:00
builtin-evlist.c
builtin-ftrace.c perf ftrace: Use sig_atomic_t to avoid UB 2022-11-03 09:36:09 -03:00
builtin-help.c
builtin-inject.c perf inject: Use perf_data__read() for auxtrace 2023-02-01 19:22:07 -03:00
builtin-kallsyms.c
builtin-kmem.c perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring 2023-01-10 10:52:49 -03:00
builtin-kvm.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-kwork.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-list.c perf pmu-events: Remove now unused event and metric variables 2023-02-03 13:54:21 -03:00
builtin-lock.c perf lock contention: Add -o/--lock-owner option 2023-02-08 10:33:32 -03:00
builtin-mem.c perf tools: Move 'struct perf_sample' to a separate header file to disentangle headers 2022-10-31 11:06:41 -03:00
builtin-probe.c perf probe: Fix usage when libtraceevent is missing 2023-02-02 16:32:19 -03:00
builtin-record.c perf record: Fix segfault with --overwrite and --max-size 2023-02-15 09:58:55 -03:00
builtin-report.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-sched.c perf tools: Use dedicated non-atomic clear/set bit helpers 2022-12-05 09:29:06 -03:00
builtin-script.c perf script: Support Retire Latency 2023-02-03 17:26:40 -03:00
builtin-stat.c perf stat: Remove evsel metric_name/expr 2023-02-03 13:54:21 -03:00
builtin-timechart.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-top.c perf evlist: Remove group option. 2022-12-14 15:28:18 -03:00
builtin-trace.c perf trace: Reduce #ifdefs for TEP_FIELD_IS_RELATIVE 2023-01-19 13:26:28 -03:00
builtin-version.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin.h
check-headers.sh tools headers: Update the copy of x86's memcpy_64.S used in 'perf bench' 2022-10-25 17:40:48 -03:00
command-list.txt perf help: Use HAVE_LIBTRACEEVENT to filter out unsupported commands 2023-01-02 11:51:53 -03:00
CREDITS
design.txt
Makefile perf tools: Use "grep -E" instead of "egrep" 2022-12-14 15:28:19 -03:00
Makefile.config perf tools: Remove HAVE_LIBTRACEEVENT_TEP_FIELD_IS_RELATIVE 2023-01-19 13:24:56 -03:00
Makefile.perf perf jevents: Run metric_test.py at compile-time 2023-02-03 17:11:39 -03:00
MANIFEST tools lib traceevent: Remove libtraceevent 2022-12-14 11:16:12 -03:00
perf-archive.sh
perf-completion.sh perf tools: Fix auto-complete on aarch64 2023-02-08 10:38:10 -03:00
perf-iostat.sh
perf-read-vdso.c
perf-sys.h
perf.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
perf.h