linux

iv/linux

History

Feng Tang 1470a108a6 perf c2c: Add report option to show false sharing in adjacent cachelines Many platforms have feature of adjacent cachelines prefetch, when it is enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if one is fetched to cache, the other one could likely be fetched too, which sort of extends the cacheline size to double, thus the false sharing could happens in adjacent cachelines. 0Day has captured performance changed related with this [1], and some commercial software explicitly makes its hot global variables 128 bytes aligned (2 cache lines) to avoid this kind of extended false sharing. So add an option "--double-cl" for 'perf c2c report' to show false sharing in double cache line granularity, which acts just like the cacheline size is doubled. There is no change to c2c record. The hardware events of shared cacheline are still per cacheline, and this option just changes the granularity of how events are grouped and displayed. In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case on old kernel): ---------------------------------------------------------------------- 26 31 2 0 0 0 0xffff888103ec6000 ---------------------------------------------------------------------- 35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm 6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge 25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm 19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge 9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge 3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are listed together to give users a hint of extended false sharing. [1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Committer notes: Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx Removed -a, leaving just as --double-cl, as this probably is not used so frequently and perhaps will be even auto-detected if we manage to record the MSR where this is configured. Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Feng Tang <feng.tang@intel.com> Tested-by: Leo Yan <leo.yan@linaro.org> Acked-by: Joe Mario <jmario@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>		2023-02-16 09:33:45 -03:00
..
arch	perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT	2023-02-06 14:56:22 -03:00
bench	perf bench syscall: Add execve syscall benchmark	2023-02-02 16:32:19 -03:00
dlfilters	perf tools: Fix usage of the verbose variable	2022-12-20 15:16:33 -03:00
Documentation	perf c2c: Add report option to show false sharing in adjacent cachelines	2023-02-16 09:33:45 -03:00
examples/bpf	perf trace: Remove unused bpf map 'syscalls'	2022-11-23 10:30:00 -03:00
include/perf	perf bpf: Remove now unused BPF headers	2022-11-04 11:41:48 -03:00
jvmti
pmu-events	perf jevents: Run metric_test.py at compile-time	2023-02-03 17:11:39 -03:00
python
scripts	perf script flamegraph: Avoid d3-flame-graph package dependency	2023-01-19 09:29:58 -03:00
tests	perf test bpf: Skip test if kernel-debuginfo is not present	2023-02-06 15:01:23 -03:00
trace	perf beauty: Update copy of linux/socket.h with the kernel sources	2023-01-18 10:12:23 -03:00
ui	perf tools: Fix "kernel lock contention analysis" test by not printing warnings in quiet mode	2022-10-27 16:37:26 -03:00
util	perf c2c: Add report option to show false sharing in adjacent cachelines	2023-02-16 09:33:45 -03:00
.gitignore	perf jevents: Run metric_test.py at compile-time	2023-02-03 17:11:39 -03:00
Build	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
builtin-annotate.c	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
builtin-bench.c	perf bench syscall: Add execve syscall benchmark	2023-02-02 16:32:19 -03:00
builtin-buildid-cache.c
builtin-buildid-list.c
builtin-c2c.c	perf c2c: Add report option to show false sharing in adjacent cachelines	2023-02-16 09:33:45 -03:00
builtin-config.c
builtin-daemon.c	perf daemon: Use sig_atomic_t to avoid UB	2022-11-03 09:35:44 -03:00
builtin-data.c	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
builtin-diff.c	perf tools: Make quiet mode consistent between tools	2022-10-27 16:37:26 -03:00
builtin-evlist.c
builtin-ftrace.c	perf ftrace: Use sig_atomic_t to avoid UB	2022-11-03 09:36:09 -03:00
builtin-help.c
builtin-inject.c	perf inject: Use perf_data__read() for auxtrace	2023-02-01 19:22:07 -03:00
builtin-kallsyms.c
builtin-kmem.c	perf kmem: Support field "node" in evsel__process_alloc_event() coping with recent tracepoint restructuring	2023-01-10 10:52:49 -03:00
builtin-kvm.c	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
builtin-kwork.c	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
builtin-list.c	perf pmu-events: Remove now unused event and metric variables	2023-02-03 13:54:21 -03:00
builtin-lock.c	perf lock contention: Add -o/--lock-owner option	2023-02-08 10:33:32 -03:00
builtin-mem.c	perf tools: Move 'struct perf_sample' to a separate header file to disentangle headers	2022-10-31 11:06:41 -03:00
builtin-probe.c	perf probe: Fix usage when libtraceevent is missing	2023-02-02 16:32:19 -03:00
builtin-record.c	perf record: Fix segfault with --overwrite and --max-size	2023-02-15 09:58:55 -03:00
builtin-report.c	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
builtin-sched.c	perf tools: Use dedicated non-atomic clear/set bit helpers	2022-12-05 09:29:06 -03:00
builtin-script.c	perf script: Support Retire Latency	2023-02-03 17:26:40 -03:00
builtin-stat.c	perf stat: Remove evsel metric_name/expr	2023-02-03 13:54:21 -03:00
builtin-timechart.c	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
builtin-top.c	perf evlist: Remove group option.	2022-12-14 15:28:18 -03:00
builtin-trace.c	perf trace: Reduce #ifdefs for TEP_FIELD_IS_RELATIVE	2023-01-19 13:26:28 -03:00
builtin-version.c	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
builtin.h
check-headers.sh	tools headers: Update the copy of x86's memcpy_64.S used in 'perf bench'	2022-10-25 17:40:48 -03:00
command-list.txt	perf help: Use HAVE_LIBTRACEEVENT to filter out unsupported commands	2023-01-02 11:51:53 -03:00
CREDITS
design.txt
Makefile	perf tools: Use "grep -E" instead of "egrep"	2022-12-14 15:28:19 -03:00
Makefile.config	perf tools: Remove HAVE_LIBTRACEEVENT_TEP_FIELD_IS_RELATIVE	2023-01-19 13:24:56 -03:00
Makefile.perf	perf jevents: Run metric_test.py at compile-time	2023-02-03 17:11:39 -03:00
MANIFEST	tools lib traceevent: Remove libtraceevent	2022-12-14 11:16:12 -03:00
perf-archive.sh
perf-completion.sh	perf tools: Fix auto-complete on aarch64	2023-02-08 10:38:10 -03:00
perf-iostat.sh
perf-read-vdso.c
perf-sys.h
perf.c	perf build: Use libtraceevent from the system	2022-12-14 11:16:12 -03:00
perf.h