linux

iv/linux

History

Ian Rogers 2069425eb3 perf synthetic events: Remove use of sscanf from /proc reading The synthesize benchmark, run on a single process and thread, shows perf_event__synthesize_mmap_events as the hottest function with fgets and sscanf taking the majority of execution time. fscanf performs similarly well. Replace the scanf call with manual reading of each field of the /proc/pid/maps line, and remove some unnecessary buffering. This change also addresses potential, but unlikely, buffer overruns for the string values read by scanf. Performance before is: $ sudo perf bench internals synthesize -m 16 -M 16 -s -t \# Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 102.810 usec (+- 0.027 usec) Average num. events: 17.000 (+- 0.000) Average time per event 6.048 usec Average data synthesis took: 106.325 usec (+- 0.018 usec) Average num. events: 89.000 (+- 0.000) Average time per event 1.195 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 16 Average synthesis took: 68103.100 usec (+- 441.234 usec) Average num. events: 30703.000 (+- 0.730) Average time per event 2.218 usec And after is: $ sudo perf bench internals synthesize -m 16 -M 16 -s -t \# Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 50.388 usec (+- 0.031 usec) Average num. events: 17.000 (+- 0.000) Average time per event 2.964 usec Average data synthesis took: 52.693 usec (+- 0.020 usec) Average num. events: 89.000 (+- 0.000) Average time per event 0.592 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 16 Average synthesis took: 45022.400 usec (+- 552.740 usec) Average num. events: 30624.200 (+- 10.037) Average time per event 1.470 usec On a Intel Xeon 6154 compiling with Debian gcc 9.2.1. Committer testing: On a AMD Ryzen 5 3600X 6-Core Processor: Before: # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 267.491 usec (+- 0.176 usec) Average num. events: 56.000 (+- 0.000) Average time per event 4.777 usec Average data synthesis took: 277.257 usec (+- 0.169 usec) Average num. events: 287.000 (+- 0.000) Average time per event 0.966 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 12 Average synthesis took: 81599.500 usec (+- 346.315 usec) Average num. events: 36096.100 (+- 2.523) Average time per event 2.261 usec # After: # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 110.125 usec (+- 0.080 usec) Average num. events: 56.000 (+- 0.000) Average time per event 1.967 usec Average data synthesis took: 118.518 usec (+- 0.057 usec) Average num. events: 287.000 (+- 0.000) Average time per event 0.413 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 12 Average synthesis took: 43490.700 usec (+- 284.527 usec) Average num. events: 37028.500 (+- 0.563) Average time per event 1.175 usec # Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>		2020-04-30 10:48:29 -03:00
..
arch	tools headers: Update x86's syscall_64.tbl with the kernel sources	2020-04-14 11:02:52 -03:00
bench	perf bench: Add a multi-threaded synthesize benchmark	2020-04-30 10:48:25 -03:00
Documentation	perf record: Add num-synthesize-threads option	2020-04-23 11:10:41 -03:00
examples/bpf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next	2020-01-28 16:02:33 -08:00
include/bpf	perf bpf: Remove bpf/ subdir from bpf.h headers used to build bpf events	2020-02-18 10:13:28 -03:00
jvmti	perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()	2019-10-15 11:47:38 -03:00
pmu-events	perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric	2020-04-03 09:37:56 -03:00
python	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 407	2019-06-05 17:37:14 +02:00
scripts	perf script: Add flamegraph.py script	2020-04-16 12:19:14 -03:00
tests	tools api: Add a lightweight buffered reading api	2020-04-30 10:48:28 -03:00
trace	tools headers UAPI: Sync linux/mman.h with the kernel	2020-04-14 09:04:53 -03:00
ui	perf report/top TUI: Fix title line formatting	2020-04-03 09:37:55 -03:00
util	perf synthetic events: Remove use of sscanf from /proc reading	2020-04-30 10:48:29 -03:00
.gitignore	.gitignore: add SPDX License Identifier	2020-03-25 11:50:48 +01:00
Build
builtin-annotate.c	perf annotate: Prefer cmdline option over default config	2020-02-27 10:45:08 -03:00
builtin-bench.c	perf bench: Add event synthesis benchmark	2020-04-16 12:19:12 -03:00
builtin-buildid-cache.c	perf session: Return error code for perf_session__new() function on failure	2019-09-20 15:58:11 -03:00
builtin-buildid-list.c	perf session: Return error code for perf_session__new() function on failure	2019-09-20 15:58:11 -03:00
builtin-c2c.c	perf c2c: Add option to enable the LBR stitching approach	2020-04-18 09:05:01 -03:00
builtin-config.c	perf tools: Remove util.h from where it is not needed	2019-09-20 09:19:20 -03:00
builtin-data.c	perf debug: Remove needless include directives from debug.h	2019-08-31 19:10:19 -03:00
builtin-diff.c	perf tools: Basic support for CGROUP event	2020-04-03 09:37:55 -03:00
builtin-evlist.c	perf evsel: Introduce evsel_fprintf.h	2019-09-25 16:26:34 -03:00
builtin-ftrace.c	perf tools: Support CAP_PERFMON capability	2020-04-16 12:19:08 -03:00
builtin-help.c	perf debug: Remove needless include directives from debug.h	2019-08-31 19:10:19 -03:00
builtin-inject.c	perf inject: Fix processing of ID index for injected instruction tracing	2019-12-04 12:39:53 -03:00
builtin-kallsyms.c	perf dsos: Move the dsos struct and its methods to separate source files	2019-08-31 22:24:10 -03:00
builtin-kmem.c	perf callchain: Use 'struct map_symbol' in 'struct callchain_cursor_node'	2019-11-12 08:20:53 -03:00
builtin-kvm.c	perf kvm: Use evlist layer api when possible	2019-11-06 15:43:05 -03:00
builtin-list.c	perf list: Hide deprecated events by default	2019-10-19 15:35:01 -03:00
builtin-lock.c	perf session: Return error code for perf_session__new() function on failure	2019-09-20 15:58:11 -03:00
builtin-mem.c	perf session: Return error code for perf_session__new() function on failure	2019-09-20 15:58:11 -03:00
builtin-probe.c	perf probe: Check return value of strlist__add() for -ENOMEM	2020-02-27 11:03:13 -03:00
builtin-record.c	perf record: Add num-synthesize-threads option	2020-04-23 11:10:41 -03:00
builtin-report.c	perf report: Add option to enable the LBR stitching approach	2020-04-18 09:05:01 -03:00
builtin-sched.c	perf sched timehist: Add support for filtering on CPU	2020-01-06 11:46:09 -03:00
builtin-script.c	perf script: Add option to enable the LBR stitching approach	2020-04-18 09:05:01 -03:00
builtin-stat.c	perf stat: Improve runtime stat for interval mode	2020-04-23 11:03:46 -03:00
builtin-timechart.c	perf session: Return error code for perf_session__new() function on failure	2019-09-20 15:58:11 -03:00
builtin-top.c	perf top: Add option to enable the LBR stitching approach	2020-04-18 09:05:01 -03:00
builtin-trace.c	perf trace: Resolve prctl's 'option' arg strings to numbers	2020-02-11 16:41:50 -03:00
builtin-version.c	perf symbols: Move mem_info and branch_info out of symbol.h	2019-08-31 22:27:48 -03:00
builtin.h	perf tools: Remove needless util.h include from builtin.h	2019-08-28 17:19:34 -03:00
check-headers.sh	tools headers: Synchronize linux/bits.h with the kernel sources	2020-04-14 11:40:05 -03:00
command-list.txt
CREDITS
design.txt	perf tools: Support CAP_PERFMON capability	2020-04-16 12:19:08 -03:00
Makefile	tools: Let O= makes handle a relative path with -C option	2020-03-06 17:08:28 -03:00
Makefile.config	perf tools: Support Python 3.8+ in Makefile	2020-04-03 10:03:44 -03:00
Makefile.perf	perf: Normalize gcc parameter when generating arch errno table	2020-03-26 11:04:01 -03:00
MANIFEST	libperf: Move to tools/lib/perf	2020-01-06 11:46:09 -03:00
perf-archive.sh
perf-completion.sh
perf-read-vdso.c
perf-sys.h	perf tools: Make usage of test_attr__* optional for perf-sys.h	2019-10-31 21:38:41 +01:00
perf-with-kcore.sh	Merge branch 'x86/cpu' into perf/core, to pick up dependent changes	2019-06-17 12:29:16 +02:00
perf.c	libperf: Merge libperf_set_print() into libperf_init()	2019-09-25 09:51:49 -03:00
perf.h	perf time-utils: Adopt rdclock() from perf.h	2019-08-29 17:38:32 -03:00