IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
tldr; Just FYI, I'm carrying this on the perf tools tree.
Full explanation:
There used to be no copies, with tools/ code using kernel headers
directly. From time to time tools/perf/ broke due to legitimate kernel
hacking. At some point Linus complained about such direct usage. Then we
adopted the current model.
The way these headers are used in perf are not restricted to just
including them to compile something.
There are sometimes used in scripts that convert defines into string
tables, etc, so some change may break one of these scripts, or new MSRs
may use some different #define pattern, etc.
E.g.:
$ ls -1 tools/perf/trace/beauty/*.sh | head -5
tools/perf/trace/beauty/arch_errno_names.sh
tools/perf/trace/beauty/drm_ioctl.sh
tools/perf/trace/beauty/fadvise.sh
tools/perf/trace/beauty/fsconfig.sh
tools/perf/trace/beauty/fsmount.sh
$
$ tools/perf/trace/beauty/fadvise.sh
static const char *fadvise_advices[] = {
[0] = "NORMAL",
[1] = "RANDOM",
[2] = "SEQUENTIAL",
[3] = "WILLNEED",
[4] = "DONTNEED",
[5] = "NOREUSE",
};
$
The tools/perf/check-headers.sh script, part of the tools/ build
process, points out changes in the original files.
So its important not to touch the copies in tools/ when doing changes in
the original kernel headers, that will be done later, when
check-headers.sh inform about the change to the perf tools hackers.
Cc: netdev@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231121225650.390246-7-namhyung@kernel.org
Fix a build error on 32-bit system:
util/bpf_lock_contention.c: In function 'lock_contention_get_name':
util/bpf_lock_contention.c:253:50: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u64 {aka long long unsigned int}' [-Werror=format=]
snprintf(name_buf, sizeof(name_buf), "cgroup:%lu", cgrp_id);
~~^
%llu
cc1: all warnings being treated as errors
Fixes: d0c502e46e ("perf lock contention: Prepare to handle cgroups")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: avagin@google.com
Cc: daniel.diaz@linaro.org
Link: https://lore.kernel.org/r/20231118024858.1567039-3-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
lkft reported a build error for 32-bit system:
builtin-kwork.c: In function 'top_print_work':
builtin-kwork.c:1646:28: error: format '%ld' expects argument of
type 'long int', but argument 3 has type 'u64' {aka 'long long
unsigned int'} [-Werror=format=]
1646 | ret += printf(" %*ld ", PRINT_PID_WIDTH, work->id);
| ~~~^ ~~~~~~~~
| | |
| long int u64
{aka long long unsigned int}
| %*lld
cc1: all warnings being treated as errors
make[3]: *** [/builds/linux/tools/build/Makefile.build:106:
/home/tuxbuild/.cache/tuxmake/builds/1/build/builtin-kwork.o] Error 1
Fix it.
Fixes: 55c40e5052 ("perf kwork top: Introduce new top utility")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: avagin@google.com
Cc: daniel.diaz@linaro.org
Link: https://lore.kernel.org/r/20231118024858.1567039-2-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Now it has a feature check for the dwarf_getcfi(), use it and convert
the code to check HAVE_DWARF_CFI_SUPPORT definition.
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The die_find_variable_by_reg() will search for a variable or a parameter
sub-DIE in the given scope DIE where the location matches to the given
register.
For the simplest and most common case, memory access usually happens
with a base register and an offset to the field so the register holds a
pointer in a variable or function parameter. Then we can find one if it
has a location expression at the (instruction) address. This function
only handles such a simple case for now.
In this case, the expression has a DW_OP_regN operation where N < 32.
If the register index (N) is greater than or equal to 32, DW_OP_regx
operation with an operand which saves the value for the N would be used.
It rejects expressions with more operations.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The die_get_scopes() returns the number of enclosing DIEs for the given
address and it fills an array of DIEs like dwarf_getscopes(). But it
doesn't follow the abstract origin of inlined functions as we want
information of the concrete instance. This is needed to check the
location of parameters and local variables properly. Users can check
the origin separately if needed.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's a usual convention that the conditional code is handled in a header
file. As I'm planning to add some more of them, let's move the current
code to the header first.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The die_get_typename() is to return a C-like type name from DWARF debug
entry and it follows data type if the target entry is a pointer type.
But I found that void pointers don't have the type attribute to follow
and then the function returns an error for that case. This results in a
broken type string for void pointer types.
For example, the following type entries are pointer types.
<1><48c>: Abbrev Number: 4 (DW_TAG_pointer_type)
<48d> DW_AT_byte_size : 8
<48d> DW_AT_type : <0x481>
<1><491>: Abbrev Number: 211 (DW_TAG_pointer_type)
<493> DW_AT_byte_size : 8
<1><494>: Abbrev Number: 4 (DW_TAG_pointer_type)
<495> DW_AT_byte_size : 8
<495> DW_AT_type : <0x49e>
The first one at offset 48c and the third one at offset 494 have type
information. Then they are pointer types for the referenced types. But
the second one at offset 491 doesn't have the type attribute.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Split debuginfo data structure and related functions into a separate
file so that it can be used by other components than the probe-finder.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thoese two fields are used only for the jump_ops, so move them into the
union to save some bytes. Also add jump__delete() callback not to free
the fields as they didn't allocate new strings.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: WANG Rui <wangrui@loongson.cn>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "-l" option is to print line numbers in the objdump output. perf
annotate TUI only can show the line numbers later but it causes big slow
downs for the kernel binary.
Similarly, showing source code also takes a long time and it already has
an option to control it.
$ time objdump ... -d -S -C vmlinux > /dev/null
real 0m3.474s
user 0m3.047s
sys 0m0.428s
$ time objdump ... -d -l -C vmlinux > /dev/null
real 0m1.796s
user 0m1.459s
sys 0m0.338s
$ time objdump ... -d -C vmlinux > /dev/null
real 0m0.051s
user 0m0.036s
sys 0m0.016s
As it's not needed for data type profiling, let's make it conditional so
that it can skip the unnecessary work.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
x86 core PMU exposes supported maximum precision level via max_precise
PMU capability. Although, AMD core PMU does not support precise mode,
certain core PMU events with precise_ip > 0 are allowed and forwarded to
IBS OP PMU.
Display a note about this in the 'perf report' header output and
document the details in the perf-list man page.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231107083331.901-2-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If BPF sideband events are disabled on the command line, don't
synthesize BPF events too.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <song@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Li Dong <lidong@vivo.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231102175735.2272696-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
All of the other Perf subcommands that use objdump have an option to
specify the binary, so add the same option to 'perf test'.
This is useful if you have built the kernel with a different toolchain
to the system one, where the system objdump may fail to disassemble
vmlinux.
Now this can be fixed with something like this:
$ perf test --objdump llvm-objdump "object code reading"
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20231106151051.129440-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On s390 using linux-next the test case:
87: perf record offcpu profiling tests
fails. The root cause is this command
# ./perf record --off-cpu -e dummy -- ./perf bench sched messaging -l 10
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.231 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.077 MB perf.data (401 samples) ]
#
It does not generate 800+ sample entries, on s390 usually around
40[1-9], sometimes a few more, but never more than 450. The higher the
number of CPUs the lower the number of samples.
Looking at function chain:
bench_sched_messaging()
+--> group()
the senders and receiver threads are created. The senders and receivers
call function ready() which writes one bytes and wait for a reply using
poll system() call.
As context switches are counted, the function ready() will trigger a
context switch when no input data is available after the write system
call. The write system call does not trigger context switches when the
data size is small. And writing 1000 bytes (10 iterations with
100 bytes) is not much and certainly won't block.
The 400+ context switch on s390 occur when the some receiver/sender
threads call ready() and wait for the response from function
bench_sched_messaging() being kicked off.
Lower the number of expected context switches to 400 to succeed on s390.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20231106091627.2022530-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
`python_ext_build` is the build directory for python.so, ignore it for
cleaner git status.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231030111438.1357962-2-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is a spelling mistake, Please fix it.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20231030075825.3701-1-zhaimingbing@cmss.chinamobile.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The offsets array keeps pointers to 'struct annotation_line' entries
which are available in the 'struct annotated_source'. Let's move it to
there.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some fields in the 'struct annotation' are only used with 'struct
annotated_source' so better to be moved there in order to reduce memory
consumption for other symbols.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The max_coverage field is only used when branch stack info is available
so it'd be natural to move to 'struct annotated_branch'.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The cycles info is only meaningful when sample has branch stacks. To
save the memory for normal cases, move those fields to a new 'struct
annotated_branch' and dynamically allocate it when needed. Also move
cycles_hist from annotated_source as it's related here.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The cycles info is used only when branch stack is provided. Separate
them from 'struct annotation_line' into a separate struct and lazy
allocate them to save some memory.
Committer notes:
Make annotation__compute_ipc() check if the lazy allocation works,
bailing out if so, its callers already do error checking and
propagation.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It tries cycles (or cpu-clock on s390) event with exclude_kernel bit to
open. But other arch on a VM can fail with the hardware event and need
to fallback to the software event in the same way.
So let's get rid of the cpuid check and use generic fallback mechanism
using an array of event candidates. Now event in the odd index excludes
the kernel so use that for the return value.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: James Clark <james.clark@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20231103195541.67788-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'struct thread' values hold onto references to mmaps, DSOs, etc. When a
thread exits it is necessary to clean all of this memory up by removing
the thread from the machine's threads. Some tools require this doesn't
happen, such as auxtrace events, 'perf report' if offcpu events exist or
if a task list is being generated, so add a 'struct symbol_conf' member
to make the behavior optional. When an exited thread is left in the
machine's threads, mark it as exited.
This change relates to commit 40826c45eb ("perf thread: Remove
notion of dead threads") . Dead threads were removed as they had a
reference count of 0 and were difficult to reason about with the
reference count checker. Here a thread is removed from threads when it
exits, unless via symbol_conf the exited thread isn't remove and is
marked as exited. Reference counting behaves as it normally does.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Li Dong <lidong@vivo.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231102175735.2272696-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 5b7ba82a75 ("perf symbols: Load kernel maps before using")
changed it so that loading a kernel DSO would cause the symbols for the
DSO to be eagerly loaded.
For 'perf record' this is overhead as the symbols won't be used. Add a
field to 'struct symbol_conf' to control the behavior and disable it for
'perf record' and 'perf inject'.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Li Dong <lidong@vivo.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231102175735.2272696-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are spelling mistakes in comments and a pr_debug message. Fix them.
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20231003074911.220216-1-colin.i.king@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a new branch filter, "counter", for the branch counter option. It is
used to mark the events which should be logged in the branch. If it is
applied with the -j option, the counters of all the events should be
logged in the branch. If the legacy kernel doesn't support the new
branch sample type, switching off the branch counter filter.
The stored counter values in each branch are displayed right after the
regular branch stack information via perf report -D.
Usage examples:
# perf record -e "{branch-instructions,branch-misses}:S" -j any,counter
Only the first event, branch-instructions, collect the LBR. Both
branch-instructions and branch-misses are marked as logged events. The
occurrences information of them can be found in the branch stack
extension space of each branch.
# perf record -e "{cpu/branch-instructions,branch_type=any/,cpu/branch-misses,branch_type=counter/}"
Only the first event, branch-instructions, collect the LBR. Only the
branch-misses event is marked as a logged event.
Committer notes:
I noticed 'perf test "Sample parsing"' failing, reported to the list and
Kan provided a patch that checks if the evsel has a leader and that
evsel->evlist is set, the comment in the source code further explains
it.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tinghao Zhang <tinghao.zhang@intel.com>
Link: https://lore.kernel.org/r/20231025201626.3000228-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To support the branch counters feature, the information of the maximum
number of supported counters and the width of the counters is exposed in
the sysfs caps folder. The perf tool can use the information to parse
the logged counters in each branch.
Store the information in the perf_env for later usage.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tinghao Zhang <tinghao.zhang@intel.com>
Link: https://lore.kernel.org/r/20231025201626.3000228-7-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Build
-----
* Compile BPF programs by default if clang (>= 12.0.1) is available to
enable more features like kernel lock contention, off-cpu profiling,
kwork, sample filtering and so on. It can be disabled by passing
BUILD_BPF_SKEL=0 to make.
* Produce better error messages for bison on debug build (make DEBUG=1)
by defining YYDEBUG symbol internally.
perf record
-----------
* Track sideband events (like FORK/MMAP) from all CPUs even if perf record
targets a subset of CPUs only (using -C option). Otherwise it may lose
some information happened on a CPU out of the target list.
* Fix checking raw sched_switch tracepoint argument using system BTF.
This affects off-cpu profiling which attaches a BPF program to the raw
tracepoint.
perf lock contention
--------------------
* Add --lock-cgroup option to see contention by cgroups. This should be
used with BPF only (using -b option).
$ sudo perf lock con -ab --lock-cgroup -- sleep 1
contended total wait max wait avg wait cgroup
835 14.06 ms 41.19 us 16.83 us /system.slice/led.service
25 122.38 us 13.77 us 4.89 us /
44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope
1 491 ns 491 ns 491 ns /system.slice/connectd.service
* Add -G/--cgroup-filter option to see contention only for given cgroups.
This can be useful when you identified a cgroup in the above command and
want to investigate more on it. It also works with other output options
like -t/--threads and -l/--lock-addr.
$ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1
contended total wait max wait avg wait type caller
8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8
2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25
1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a
* Use per-cpu array for better spinlock tracking. This is to improve
performance of the BPF program and to avoid nested contention on a lock
in the BPF hash map.
* Update callstack check for PowerPC. To find a representative caller of a
lock, it needs to look up the call stacks. It ends the lookup when it sees
0 in the call stack buffer. However, PowerPC call stacks can have 0 values
in the beginning so skip them when it expects valid call stacks after.
perf kwork
----------
* Support 'sched' class (for -k option) so that it can see task scheduling
event (using sched_switch tracepoint) as well as irq and workqueue items.
* Add perf kwork top subcommand to show more accurate cpu utilization with
sched class above. It works both with a recorded data (using perf kwork
record command) and BPF (using -b option). Unlike perf top command, it
does not support interactive mode (yet).
$ sudo perf kwork top -b -k sched
Starting trace, Hit <Ctrl+C> to stop and report
^C
Total : 160702.425 ms, 8 cpus
%Cpu(s): 36.00% id, 0.00% hi, 0.00% si
%Cpu0 [|||||||||||||||||| 61.66%]
%Cpu1 [|||||||||||||||||| 61.27%]
%Cpu2 [||||||||||||||||||| 66.40%]
%Cpu3 [|||||||||||||||||| 61.28%]
%Cpu4 [|||||||||||||||||| 61.82%]
%Cpu5 [||||||||||||||||||||||| 77.41%]
%Cpu6 [|||||||||||||||||| 61.73%]
%Cpu7 [|||||||||||||||||| 63.25%]
PID SPID %CPU RUNTIME COMMMAND
-------------------------------------------------------------
0 0 38.72 8089.463 ms [swapper/1]
0 0 38.71 8084.547 ms [swapper/3]
0 0 38.33 8007.532 ms [swapper/0]
0 0 38.26 7992.985 ms [swapper/6]
0 0 38.17 7971.865 ms [swapper/4]
0 0 36.74 7447.765 ms [swapper/7]
0 0 33.59 6486.942 ms [swapper/2]
0 0 22.58 3771.268 ms [swapper/5]
9545 9351 2.48 447.136 ms sched-messaging
9574 9351 2.09 418.583 ms sched-messaging
9724 9351 2.05 372.407 ms sched-messaging
9531 9351 2.01 368.804 ms sched-messaging
9512 9351 2.00 362.250 ms sched-messaging
9514 9351 1.95 357.767 ms sched-messaging
9538 9351 1.86 384.476 ms sched-messaging
9712 9351 1.84 386.490 ms sched-messaging
9723 9351 1.83 380.021 ms sched-messaging
9722 9351 1.82 382.738 ms sched-messaging
9517 9351 1.81 354.794 ms sched-messaging
9559 9351 1.79 344.305 ms sched-messaging
9725 9351 1.77 365.315 ms sched-messaging
<SNIP>
* Add hard/soft-irq statistics to perf kwork top. This will show the
total CPU utilization with IRQ stats like below:
$ sudo perf kwork top -b -k sched,irq,softirq
Starting trace, Hit <Ctrl+C> to stop and report
^C
Total : 12554.889 ms, 8 cpus
%Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here
%Cpu0 [| 4.60%]
%Cpu1 [| 4.59%]
%Cpu2 [ 2.73%]
%Cpu3 [| 3.81%]
<SNIP>
perf bench
----------
* Add -G/--cgroups option to perf bench sched pipe. The pipe bench is
good to measure context switch overhead. With this option, it puts
the reader and writer tasks in separate cgroups to enforce context
switch between two different cgroups.
Also it needs to set CPU affinity of the tasks in a CPU to accurately
measure the impact of cgroup context switches.
$ sudo perf stat -e context-switches,cgroup-switches -- \
> taskset -c 0 perf bench sched pipe -l 100000
# Running 'sched/pipe' benchmark:
# Executed 100000 pipe operations between two processes
Total time: 0.307 [sec]
3.078180 usecs/op
324867 ops/sec
Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000':
200,026 context-switches
63 cgroup-switches
0.321637922 seconds time elapsed
You can see small number of cgroup-switches because both write and read
tasks are in the same cgroup.
$ sudo mkdir /sys/fs/cgroup/{AAA,BBB}
$ sudo perf stat -e context-switches,cgroup-switches -- \
> taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB
# Running 'sched/pipe' benchmark:
# Executed 100000 pipe operations between two processes
Total time: 0.351 [sec]
3.512990 usecs/op
284657 ops/sec
Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB':
200,020 context-switches
200,019 cgroup-switches
0.365034567 seconds time elapsed
Now context-switches and cgroup-switches are almost same. And you can
see the pipe operation took little more.
* Kill child processes when perf bench sched messaging exited abnormally.
Otherwise it'd leave the child doing unnecessary work.
perf test
---------
* Fix various shellcheck issues on the tests written in shell script.
* Skip tests when condition is not satisfied:
- object code reading test for non-text section addresses.
- CoreSight test if cs_etm// event is not available.
- lock contention test if not enough CPUs.
Event parsing
-------------
* Make PMU alias name loading lazy to reduce the startup time in the
event parsing code for perf record, stat and others in the general
case.
* Lazily compute PMU default config. In the same sense, delay PMU
initialization until it's really needed to reduce the startup cost.
* Fix event term values that are raw events. The event specification
can have several terms including event name. But sometimes it clashes
with raw event encoding which starts with 'r' and has hex-digits.
For example, an event named 'read' should be processed as a normal
event but it was mis-treated as a raw encoding and caused a failure.
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
event syntax error: '..nning/event=read/'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Event metrics
-------------
* Add "Compat" regex to match event with multiple identifiers.
* Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne.
Misc
----
* Assorted memory leak fixes and footprint reduction.
* Add "bpf_skeletons" to perf version --build-options so that users can
check whether their perf tools have BPF support easily.
* Fix unaligned access in Intel-PT packet decoder found by undefined-behavior
sanitizer.
* Avoid frequency mode for the dummy event. Surprisingly it'd impact
kernel timer tick handler performance by force iterating all PMU events.
* Update bash shell completion for events and metrics.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZUMg7wAKCRCMstVUGiXM
g8FvAQC9KED6H8rlH7UTvxE6fM947EJbldwGrNA1zGx++Ucd3gD/ewA2A6SUcIh6
Tua/XovmYOQbuDYOwlRHe+sdDag0sgg=
=GrCE
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"Build:
- Compile BPF programs by default if clang (>= 12.0.1) is available
to enable more features like kernel lock contention, off-cpu
profiling, kwork, sample filtering and so on.
This can be disabled by passing BUILD_BPF_SKEL=0 to make.
- Produce better error messages for bison on debug build (make
DEBUG=1) by defining YYDEBUG symbol internally.
perf record:
- Track sideband events (like FORK/MMAP) from all CPUs even if perf
record targets a subset of CPUs only (using -C option). Otherwise
it may lose some information happened on a CPU out of the target
list.
- Fix checking raw sched_switch tracepoint argument using system BTF.
This affects off-cpu profiling which attaches a BPF program to the
raw tracepoint.
perf lock contention:
- Add --lock-cgroup option to see contention by cgroups. This should
be used with BPF only (using -b option).
$ sudo perf lock con -ab --lock-cgroup -- sleep 1
contended total wait max wait avg wait cgroup
835 14.06 ms 41.19 us 16.83 us /system.slice/led.service
25 122.38 us 13.77 us 4.89 us /
44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope
1 491 ns 491 ns 491 ns /system.slice/connectd.service
- Add -G/--cgroup-filter option to see contention only for given
cgroups.
This can be useful when you identified a cgroup in the above
command and want to investigate more on it. It also works with
other output options like -t/--threads and -l/--lock-addr.
$ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1
contended total wait max wait avg wait type caller
8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8
2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25
1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a
- Use per-cpu array for better spinlock tracking. This is to improve
performance of the BPF program and to avoid nested contention on a
lock in the BPF hash map.
- Update callstack check for PowerPC. To find a representative caller
of a lock, it needs to look up the call stacks. It ends the lookup
when it sees 0 in the call stack buffer. However, PowerPC call
stacks can have 0 values in the beginning so skip them when it
expects valid call stacks after.
perf kwork:
- Support 'sched' class (for -k option) so that it can see task
scheduling event (using sched_switch tracepoint) as well as irq and
workqueue items.
- Add perf kwork top subcommand to show more accurate cpu utilization
with sched class above. It works both with a recorded data (using
perf kwork record command) and BPF (using -b option). Unlike perf
top command, it does not support interactive mode (yet).
$ sudo perf kwork top -b -k sched
Starting trace, Hit <Ctrl+C> to stop and report
^C
Total : 160702.425 ms, 8 cpus
%Cpu(s): 36.00% id, 0.00% hi, 0.00% si
%Cpu0 [|||||||||||||||||| 61.66%]
%Cpu1 [|||||||||||||||||| 61.27%]
%Cpu2 [||||||||||||||||||| 66.40%]
%Cpu3 [|||||||||||||||||| 61.28%]
%Cpu4 [|||||||||||||||||| 61.82%]
%Cpu5 [||||||||||||||||||||||| 77.41%]
%Cpu6 [|||||||||||||||||| 61.73%]
%Cpu7 [|||||||||||||||||| 63.25%]
PID SPID %CPU RUNTIME COMMMAND
-------------------------------------------------------------
0 0 38.72 8089.463 ms [swapper/1]
0 0 38.71 8084.547 ms [swapper/3]
0 0 38.33 8007.532 ms [swapper/0]
0 0 38.26 7992.985 ms [swapper/6]
0 0 38.17 7971.865 ms [swapper/4]
0 0 36.74 7447.765 ms [swapper/7]
0 0 33.59 6486.942 ms [swapper/2]
0 0 22.58 3771.268 ms [swapper/5]
9545 9351 2.48 447.136 ms sched-messaging
9574 9351 2.09 418.583 ms sched-messaging
9724 9351 2.05 372.407 ms sched-messaging
9531 9351 2.01 368.804 ms sched-messaging
9512 9351 2.00 362.250 ms sched-messaging
9514 9351 1.95 357.767 ms sched-messaging
9538 9351 1.86 384.476 ms sched-messaging
9712 9351 1.84 386.490 ms sched-messaging
9723 9351 1.83 380.021 ms sched-messaging
9722 9351 1.82 382.738 ms sched-messaging
9517 9351 1.81 354.794 ms sched-messaging
9559 9351 1.79 344.305 ms sched-messaging
9725 9351 1.77 365.315 ms sched-messaging
<SNIP>
- Add hard/soft-irq statistics to perf kwork top. This will show the
total CPU utilization with IRQ stats like below:
$ sudo perf kwork top -b -k sched,irq,softirq
Starting trace, Hit <Ctrl+C> to stop and report
^C
Total : 12554.889 ms, 8 cpus
%Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here
%Cpu0 [| 4.60%]
%Cpu1 [| 4.59%]
%Cpu2 [ 2.73%]
%Cpu3 [| 3.81%]
<SNIP>
perf bench:
- Add -G/--cgroups option to perf bench sched pipe. The pipe bench is
good to measure context switch overhead. With this option, it puts
the reader and writer tasks in separate cgroups to enforce context
switch between two different cgroups.
Also it needs to set CPU affinity of the tasks in a CPU to
accurately measure the impact of cgroup context switches.
$ sudo perf stat -e context-switches,cgroup-switches -- \
> taskset -c 0 perf bench sched pipe -l 100000
# Running 'sched/pipe' benchmark:
# Executed 100000 pipe operations between two processes
Total time: 0.307 [sec]
3.078180 usecs/op
324867 ops/sec
Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000':
200,026 context-switches
63 cgroup-switches
0.321637922 seconds time elapsed
You can see small number of cgroup-switches because both write and
read tasks are in the same cgroup.
$ sudo mkdir /sys/fs/cgroup/{AAA,BBB}
$ sudo perf stat -e context-switches,cgroup-switches -- \
> taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB
# Running 'sched/pipe' benchmark:
# Executed 100000 pipe operations between two processes
Total time: 0.351 [sec]
3.512990 usecs/op
284657 ops/sec
Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB':
200,020 context-switches
200,019 cgroup-switches
0.365034567 seconds time elapsed
Now context-switches and cgroup-switches are almost same. And you
can see the pipe operation took little more.
- Kill child processes when perf bench sched messaging exited
abnormally. Otherwise it'd leave the child doing unnecessary work.
perf test:
- Fix various shellcheck issues on the tests written in shell script.
- Skip tests when condition is not satisfied:
- object code reading test for non-text section addresses.
- CoreSight test if cs_etm// event is not available.
- lock contention test if not enough CPUs.
Event parsing:
- Make PMU alias name loading lazy to reduce the startup time in the
event parsing code for perf record, stat and others in the general
case.
- Lazily compute PMU default config. In the same sense, delay PMU
initialization until it's really needed to reduce the startup cost.
- Fix event term values that are raw events. The event specification
can have several terms including event name. But sometimes it
clashes with raw event encoding which starts with 'r' and has
hex-digits.
For example, an event named 'read' should be processed as a normal
event but it was mis-treated as a raw encoding and caused a
failure.
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
event syntax error: '..nning/event=read/'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Event metrics:
- Add "Compat" regex to match event with multiple identifiers.
- Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne.
Misc:
- Assorted memory leak fixes and footprint reduction.
- Add "bpf_skeletons" to perf version --build-options so that users
can check whether their perf tools have BPF support easily.
- Fix unaligned access in Intel-PT packet decoder found by
undefined-behavior sanitizer.
- Avoid frequency mode for the dummy event. Surprisingly it'd impact
kernel timer tick handler performance by force iterating all PMU
events.
- Update bash shell completion for events and metrics"
* tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (187 commits)
perf vendor events intel: Update tsx_cycles_per_elision metrics
perf vendor events intel: Update bonnell version number to v5
perf vendor events intel: Update westmereex events to v4
perf vendor events intel: Update meteorlake events to v1.06
perf vendor events intel: Update knightslanding events to v16
perf vendor events intel: Add typo fix for ivybridge FP
perf vendor events intel: Update a spelling in haswell/haswellx
perf vendor events intel: Update emeraldrapids to v1.01
perf vendor events intel: Update alderlake/alderlake events to v1.23
perf build: Disable BPF skeletons if clang version is < 12.0.1
perf callchain: Fix spelling mistake "statisitcs" -> "statistics"
perf report: Fix spelling mistake "heirachy" -> "hierarchy"
perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit()
perf tests: test_arm_coresight: Simplify source iteration
perf vendor events intel: Add tigerlake two metrics
perf vendor events intel: Add broadwellde two metrics
perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric
perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
perf callchain: Minor layout changes to callchain_list
perf callchain: Make brtype_stat in callchain_list optional
...
As libelf is a requirement for libbpf if it is not available, as in some
container build tests where NO_LIBELF=1 is used, then better warn about
the most basic library first.
Ditto for libz, check its availability before libbpf too.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZUEehyDk0FkPnvMR@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One last case, caught while testing with amazonlinux:2, centos:stream,
etc:
4 7.28 amazonlinux:2 : FAIL egrep: warning: egrep is obsolescent; using grep -E
gcc version 7.3.1 20180712 (Red Hat 7.3.1-17) (GCC)
8 13.87 centos:stream : FAIL egrep: warning: egrep is obsolescent; using grep -E
Reviewed-by: Guilherme Amadio <amadio@gentoo.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZUEdtblE8qDAQkBK@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its guest
* Optimization for vSGI injection, opportunistically compressing MPIDR
to vCPU mapping into a table
* Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
* Guest support for memory operation instructions (FEAT_MOPS)
* Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
* Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
* Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
the overhead of errata mitigations
* Miscellaneous kernel and selftest fixes
LoongArch:
* New architecture. The hardware uses the same model as x86, s390
and RISC-V, where guest/host mode is orthogonal to supervisor/user
mode. The virtualization extensions are very similar to MIPS,
therefore the code also has some similarities but it's been cleaned
up to avoid some of the historical bogosities that are found in
arch/mips. The kernel emulates MMU, timer and CSR accesses, while
interrupt controllers are only emulated in userspace, at least for
now.
RISC-V:
* Support for the Smstateen and Zicond extensions
* Support for virtualizing senvcfg
* Support for virtualized SBI debug console (DBCN)
S390:
* Nested page table management can be monitored through tracepoints
and statistics
x86:
* Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC,
which could result in a dropped timer IRQ
* Avoid WARN on systems with Intel IPI virtualization
* Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without
forcing more common use cases to eat the extra memory overhead.
* Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
SBPB, aka Selective Branch Predictor Barrier).
* Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
creating the original vCPU would cause KVM to try to synchronize the vCPU's
TSC and thus clobber the correct TSC being set by userspace.
* Compute guest wall clock using a single TSC read to avoid generating an
inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
* "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server
2022.
* Don't apply side effects to Hyper-V's synthetic timer on writes from
userspace to fix an issue where the auto-enable behavior can trigger
spurious interrupts, i.e. do auto-enabling only for guest writes.
* Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
without PML enabled.
* Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
* Harden the fast page fault path to guard against encountering an invalid
root when walking SPTEs.
* Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
* Use the fast path directly from the timer callback when delivering Xen
timer events, instead of waiting for the next iteration of the run loop.
This was not done so far because previously proposed code had races,
but now care is taken to stop the hrtimer at critical points such as
restarting the timer or saving the timer information for userspace.
* Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
* Optimize injection of PMU interrupts that are simultaneous with NMIs.
* Usual handful of fixes for typos and other warts.
x86 - MTRR/PAT fixes and optimizations:
* Clean up code that deals with honoring guest MTRRs when the VM has
non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
* Zap EPT entries when non-coherent DMA assignment stops/start to prevent
using stale entries with the wrong memtype.
* Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y.
This was done as a workaround for virtual machine BIOSes that did not
bother to clear CR0.CD (because ancient KVM/QEMU did not bother to
set it, in turn), and there's zero reason to extend the quirk to
also ignore guest PAT.
x86 - SEV fixes:
* Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
running an SEV-ES guest.
* Clean up the recognition of emulation failures on SEV guests, when KVM would
like to "skip" the instruction but it had already been partially emulated.
This makes it possible to drop a hack that second guessed the (insufficient)
information provided by the emulator, and just do the right thing.
Documentation:
* Various updates and fixes, mostly for x86
* MTRR and PAT fixes and optimizations:
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or
l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH
hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p
bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD
eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL
6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g==
=UIxm
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its
guest
- Optimization for vSGI injection, opportunistically compressing
MPIDR to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems,
reducing the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
LoongArch:
- New architecture for kvm.
The hardware uses the same model as x86, s390 and RISC-V, where
guest/host mode is orthogonal to supervisor/user mode. The
virtualization extensions are very similar to MIPS, therefore the
code also has some similarities but it's been cleaned up to avoid
some of the historical bogosities that are found in arch/mips. The
kernel emulates MMU, timer and CSR accesses, while interrupt
controllers are only emulated in userspace, at least for now.
RISC-V:
- Support for the Smstateen and Zicond extensions
- Support for virtualizing senvcfg
- Support for virtualized SBI debug console (DBCN)
S390:
- Nested page table management can be monitored through tracepoints
and statistics
x86:
- Fix incorrect handling of VMX posted interrupt descriptor in
KVM_SET_LAPIC, which could result in a dropped timer IRQ
- Avoid WARN on systems with Intel IPI virtualization
- Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
without forcing more common use cases to eat the extra memory
overhead.
- Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
SBPB, aka Selective Branch Predictor Barrier).
- Fix a bug where restoring a vCPU snapshot that was taken within 1
second of creating the original vCPU would cause KVM to try to
synchronize the vCPU's TSC and thus clobber the correct TSC being
set by userspace.
- Compute guest wall clock using a single TSC read to avoid
generating an inaccurate time, e.g. if the vCPU is preempted
between multiple TSC reads.
- "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
complain about a "Firmware Bug" if the bit isn't set for select
F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
appease Windows Server 2022.
- Don't apply side effects to Hyper-V's synthetic timer on writes
from userspace to fix an issue where the auto-enable behavior can
trigger spurious interrupts, i.e. do auto-enabling only for guest
writes.
- Remove an unnecessary kick of all vCPUs when synchronizing the
dirty log without PML enabled.
- Advertise "support" for non-serializing FS/GS base MSR writes as
appropriate.
- Harden the fast page fault path to guard against encountering an
invalid root when walking SPTEs.
- Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
- Use the fast path directly from the timer callback when delivering
Xen timer events, instead of waiting for the next iteration of the
run loop. This was not done so far because previously proposed code
had races, but now care is taken to stop the hrtimer at critical
points such as restarting the timer or saving the timer information
for userspace.
- Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
flag.
- Optimize injection of PMU interrupts that are simultaneous with
NMIs.
- Usual handful of fixes for typos and other warts.
x86 - MTRR/PAT fixes and optimizations:
- Clean up code that deals with honoring guest MTRRs when the VM has
non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
- Zap EPT entries when non-coherent DMA assignment stops/start to
prevent using stale entries with the wrong memtype.
- Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y
This was done as a workaround for virtual machine BIOSes that did
not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
to set it, in turn), and there's zero reason to extend the quirk to
also ignore guest PAT.
x86 - SEV fixes:
- Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
SHUTDOWN while running an SEV-ES guest.
- Clean up the recognition of emulation failures on SEV guests, when
KVM would like to "skip" the instruction but it had already been
partially emulated. This makes it possible to drop a hack that
second guessed the (insufficient) information provided by the
emulator, and just do the right thing.
Documentation:
- Various updates and fixes, mostly for x86
- MTRR and PAT fixes and optimizations"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
KVM: selftests: Avoid using forced target for generating arm64 headers
tools headers arm64: Fix references to top srcdir in Makefile
KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
KVM: arm64: selftest: Perform ISB before reading PAR_EL1
KVM: arm64: selftest: Add the missing .guest_prepare()
KVM: arm64: Always invalidate TLB for stage-2 permission faults
KVM: x86: Service NMI requests after PMI requests in VM-Enter path
KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
KVM: arm64: Refine _EL2 system register list that require trap reinjection
arm64: Add missing _EL2 encodings
arm64: Add missing _EL12 encodings
KVM: selftests: aarch64: vPMU test for validating user accesses
KVM: selftests: aarch64: vPMU register test for unimplemented counters
KVM: selftests: aarch64: vPMU register test for implemented counters
KVM: selftests: aarch64: Introduce vpmu_counter_access test
tools: Import arm_pmuv3.h
KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
...
The ia64 architecture gets its well-earned retirement as planned,
now that there is one last (mostly) working release that will
be maintained as an LTS kernel.
The architecture specific system call tables are updated for
the added map_shadow_stack() syscall and to remove references
to the long-gone sys_lookup_dcookie() syscall.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVC40IACgkQYKtH/8kJ
Uidhmw/9EX+aWSXGoObJ3fngaNSMw+PmrEuP8qEKBHxfKHcCdX3hc451Oh4GlhaQ
tru91pPwgNvN2/rfoKusxT+V4PemGIzfNni/04rp+P0kvmdw5otQ2yNhsQNsfVmq
XGWvkxF4P2GO6bkjjfR/1dDq7GtlyXtwwPDKeLbYb6TnJOZjtx+EAN27kkfSn1Ms
R4Sa3zJ+DfHUmHL5S9g+7UD/CZ5GfKNmIskI4Mz5GsfoUz/0iiU+Bge/9sdcdSJQ
kmbLy5YnVzfooLZ3TQmBFsO3iAMWb0s/mDdtyhqhTVmTUshLolkPYyKnPFvdupyv
shXcpEST2XJNeaDRnL2K4zSCdxdbnCZHDpjfl9wfioBg7I8NfhXKpf1jYZHH1de4
LXq8ndEFEOVQw/zSpYWfQq1sux8Jiqr+UK/ukbVeFWiGGIUs91gEWtPAf8T0AZo9
ujkJvaWGl98O1g5wmBu0/dAR6QcFJMDfVwbmlIFpU8O+MEaz6X8mM+O5/T0IyTcD
eMbAUjj4uYcU7ihKzHEv/0SS9Of38kzff67CLN5k8wOP/9NlaGZ78o1bVle9b52A
BdhrsAefFiWHp1jT6Y9Rg4HOO/TguQ9e6EWSKOYFulsiLH9LEFaB9RwZLeLytV0W
vlAgY9rUW77g1OJcb7DoNv33nRFuxsKqsnz3DEIXtgozo9CzbYI=
=H1vH
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull ia64 removal and asm-generic updates from Arnd Bergmann:
- The ia64 architecture gets its well-earned retirement as planned,
now that there is one last (mostly) working release that will be
maintained as an LTS kernel.
- The architecture specific system call tables are updated for the
added map_shadow_stack() syscall and to remove references to the
long-gone sys_lookup_dcookie() syscall.
* tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
hexagon: Remove unusable symbols from the ptrace.h uapi
asm-generic: Fix spelling of architecture
arch: Reserve map_shadow_stack() syscall number for all architectures
syscalls: Cleanup references to sys_lookup_dcookie()
Documentation: Drop or replace remaining mentions of IA64
lib/raid6: Drop IA64 support
Documentation: Drop IA64 from feature descriptions
kernel: Drop IA64 support from sig_fault handlers
arch: Remove Itanium (IA-64) architecture
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its guest
- Optimization for vSGI injection, opportunistically compressing MPIDR
to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZUFJRgAKCRCivnWIJHzd
FtgYAP9cMsc1Mhlw3jNQnTc6q0cbTulD/SoEDPUat1dXMqjs+gEAnskwQTrTX834
fgGQeCAyp7Gmar+KeP64H0xm8kPSpAw=
=R4M7
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.7
- Generalized infrastructure for 'writable' ID registers, effectively
allowing userspace to opt-out of certain vCPU features for its guest
- Optimization for vSGI injection, opportunistically compressing MPIDR
to vCPU mapping into a table
- Improvements to KVM's PMU emulation, allowing userspace to select
the number of PMCs available to a VM
- Guest support for memory operation instructions (FEAT_MOPS)
- Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
bugs and getting rid of useless code
- Changes to the way the SMCCC filter is constructed, avoiding wasted
memory allocations when not in use
- Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
the overhead of errata mitigations
- Miscellaneous kernel and selftest fixes
Noticed on fedora 38, the extended regexp that so far was ok for both
grep and sed now gets complaints by grep, that says '/' doesn't need to
be escaped with '\'.
So stop using '/' in sed, use '%' instead and remove the \ before / in
the common extended regexp.
Link: https://x.com/SMT_Solvers/status/1710380010098344192?s=20
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZUEddFPTJHVLhH%2F6@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Fix regression in reading scale and unit files from sysfs for PMU
events, so that we can use that info to pretty print instead of
printing raw numbers:
# perf stat -e power/energy-ram/,power/energy-gpu/ sleep 2
Performance counter stats for 'system wide':
1.64 Joules power/energy-ram/
0.20 Joules power/energy-gpu/
2.001228914 seconds time elapsed
#
# grep -m1 "model name" /proc/cpuinfo
model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
#
- The small llvm.cpp file used to check if the llvm devel files are present was
incorrectly deleted when removing the BPF event in 'perf trace', put it back
as it is also used by tools/bpf/bpftool, that uses llvm routines to do
disassembly of BPF object files.
- Fix use of addr_location__exit() in dlfilter__object_code(), making sure that
it is only used to pair a previous addr_location__init() call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZTKh5AAKCRCyPKLppCJ+
J/g/AP0f6SNyHJz21JzDTzyjXAeSdMzKwic0LXv+kATQy31HJAD+Kf7UKQieUeZB
fxvp60aKyFN8IVIgpYiAjZMS3k9XPAY=
=N7Gv
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v6.6-2-2023-10-20' into perf-tools-next
To get the latest fixes in the perf tools including perf stat output,
dlfilter and LLVM feature detection.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Update tsx_cycles_per_elision as per:
https://github.com/intel/perfmon/pull/116
Prefer the el-start event rather than cycles-t for detecting whether
the metric will work as HLE may be disabled. Remove the metric from
sapphirerapids that has no el-start event.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Spelling fixes were already incorporated in the Linux perf tree,
update the version number to reflect this.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The spelling of "in-flight" was switched to "inflight".
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Update alderlake and alderlaken events from v1.21 to v1.23 adding the
changes from:
8df4db9433846bd247c6
The tsx_cycles_per_elision metric is updated from PR:
https://github.com/intel/perfmon/pull/116
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20231026003149.3287633-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
While building on a wide range of distros and clang versions it was
noticed that at least version 12.0.1 (noticed on Alpine 3.15 with
"Alpine clang version 12.0.1") is needed to not fail with BTF generation
errors such as:
Debian:10
Debian clang version 11.0.1-2~deb10u1:
CLANG /tmp/build/perf/util/bpf_skel/.tmp/sample_filter.bpf.o
<SNIP>
GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
Error: failed to open BPF object file: No such file or directory
make[2]: *** [Makefile.perf:1121: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
Amazon Linux 2:
clang version 11.1.0 (Amazon Linux 2 11.1.0-1.amzn2.0.2)
GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
libbpf: elf: skipping unrecognized data section(18) .eh_frame
libbpf: elf: skipping relo section(19) .rel.eh_frame for section(18) .eh_frame
libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
Error: failed to open BPF object file: No such file or directory
make[2]: *** [/tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
make[2]: *** Deleting file `/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
Ubuntu 20.04:
clang version 10.0.0-4ubuntu1
CLANG /tmp/build/perf/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o
GENSKEL /tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
libbpf: sec '.reluprobe': corrupted symbol #27 pointing to invalid section #65522 for relo #0
GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
Error: failed to open BPF object file: BPF object format invalid
make[2]: *** [Makefile.perf:1121: /tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h] Error 95
make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/bench_uprobe.skel.h'
So check if the version is at least 12.0.1 otherwise disable building
BPF skels and provide a message about it, continuing the build.
The message, when running on amazonlinux:2:
Makefile.config:698: Warning: Disabled BPF skeletons as reliable BTF generation needs at least clang version 12.0.1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/ZTvGx/Ou6BVnYBqi@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The changes in ("perf evsel: Rename evsel__increase_rlimit to
rlimit__increase_nofile") ended up breaking the python binding that now
references the rlimit__increase_nofile function, add the util/rlimit.o
to the tools/perf/util/python-ext-sources to cure that.
This was detected by the 'perf test python' regression test:
$ perf test python
14: 'import perf' in python : FAILED!
$ perf test -v python
Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
14: 'import perf' in python :
--- start ---
test child forked, pid 2912462
python usage test: "echo "import sys ; sys.path.insert(0, '/tmp/build/perf-tools-next/python'); import perf" | '/usr/bin/python3' "
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /tmp/build/perf-tools-next/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: rlimit__increase_nofile
test child finished with -1
---- end ----
'import perf' in python: FAILED!
$
Fixes: e093a222d7 ("perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/lkml/ZTrCS5Z3PZAmfPdV@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
There are two reasons to do this, firstly there is a shellcheck warning
in cs_etm_dev_name(), which can be completely deleted. And secondly the
current iteration method doesn't support systems with both ETE and ETM
because it picks one or the other. There isn't a known system with this
configuration, but it could happen in the future.
Iterating over all the sources for each CPU can be done by going through
/sys/bus/event_source/devices/cs_etm/cpu* and following the symlink back
to the Coresight device in /sys/bus/coresight/devices. This will work
whether the device is ETE, ETM or any future name, and is much simpler
and doesn't require any hard coded version numbers
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: tianruidong@linux.alibaba.com
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: atrajeev@linux.vnet.ibm.com
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231023131550.487760-1-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add tma_info_system_socket_clks and uncore_freq metrics.
The associated converter script fix is in:
https://github.com/intel/perfmon/pull/112
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926205948.1399594-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add tma_info_system_socket_clks and uncore_freq metrics that require a
broadwellx style uncore event for UNC_CLOCK.
The associated converter script fix is in:
https://github.com/intel/perfmon/pull/112
Fixes: 7d124303d6 ("perf vendor events intel: Update broadwell variant events/metrics")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926205948.1399594-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Broadwell-de has a consumer core and server uncore. The uncore_arb PMU
isn't present and the broadwellx style cbox PMU should be used
instead. Fix the tma_info_system_dram_bw_use metric to use the server
metric rather than client.
The associated converter script fix is in:
https://github.com/intel/perfmon/pull/111
Fixes: 7d124303d6 ("perf vendor events intel: Update broadwell variant events/metrics")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Link: https://lore.kernel.org/r/20230926031034.1201145-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fix leak where mem_info__put wouldn't release the maps/map as used by
perf mem. Add exit functions and use elsewhere that the maps and map
are released.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Display code doesn't modify the branch_type_stat so switch uses to
const. This is done to aid refactoring struct callchain_list where
current the branch_type_stat is embedded even if not used.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Caught by address/leak sanitizer.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Commit 40826c45eb ("perf thread: Remove notion of dead threads")
removed dead threads but the list head wasn't removed. Remove it here.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Caught using reference count checking on perf top with
"--call-graph=lbr". After this no memory leaks were detected.
Fixes: 57849998e2 ("perf report: Add processing for cycle histograms")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Comparing pointers with reference count checking is tricky to avoid a
SEGV. Add a convenience macro to simplify and use.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Running perf top with address sanitizer and "--call-graph=lbr" fails
due to reading sample 0 when no samples exist. Add a guard to prevent
this.
Fixes: e2b23483eb ("perf machine: Factor out lbr_callchain_add_lbr_ip()")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Mutex error check will capture trying to take the lock recursively and
other problems that rwlock won't. At the expense of concurrency, adda
debug mode that uses a mutex in place of a rwsem.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To address this grep 3.8 warning:
grep: warning: stray \ before #
We needed to remove the '' around the grep expression and keep the \
before # so that it is escaped by the $(shell grep ...) and thus doesn't
get to grep.
We need that \ before the #, otherwise we get this:
Makefile.perf:364: *** unterminated call to function 'shell': missing ')'. Stop.
As everything after the # will be considered a comment.
Removing the single quotes needs some more escaping so that _some_ of
the escaped chars gets to grep, like the '\|' that becomes '\\\|´.
Running on debian:10, where there is no libtraceevent-devel available,
we get:
Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c. Stop.
make[1]: *** [Makefile.perf:242: sub-make] Error 2
I.e. both the comments and the util/trace-event.c were removed.
When using:
msg := $(error PYTHON_EXT_SRCS=$(PYTHON_EXT_SRCS))
While on the more recent fedora:38, with the new grep and make packages
and libtraceevent-devel installed:
Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c util/trace-event.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c. Stop.
make[1]: *** [Makefile.perf:242: sub-make] Error 2
make: *** [Makefile:113: install-bin] Error 2
make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
$
I.e. only the comments were removed.
If we build it on the same fedora:38 system, but using NO_LIBTRACEEVENT=1
$ make NO_LIBTRACEEVENT=1 CORESIGHT=1 O=/tmp/build/$(basename $PWD) -C tools/perf install-bin
Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c. Stop.
make[1]: *** [Makefile.perf:242: sub-make] Error 2
make: *** [Makefile:113: install-bin] Error 2
make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
$
Both comments and the util/trace-event.c file removed.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/ZTj6mfM9UqY2DggC@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The hierarchy mode needs to setup output formats for each evsel.
Normally setup_sorting() handles this at the beginning, but it cannot
do that if data comes from a pipe since there's no evsel info before
reading the data. And then perf report cannot process the samples
in hierarchy mode and think as if there's no sample.
Let's check the condition and setup the output formats after reading
data so that it can find evsels.
Before:
$ ./perf record -o- true | ./perf report -i- --hierarchy -q
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
Error:
The - data has no samples!
After:
$ ./perf record -o- true | ./perf report -i- --hierarchy -q
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
94.76% true
94.76% [kernel.kallsyms]
94.76% [k] filemap_fault
5.24% perf-ex
5.24% [kernel.kallsyms]
5.06% [k] __memset
0.18% [k] native_write_msr
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231025003121.2811738-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Currently lock contention timestamp is maintained in a hash map keyed by
pid. That means it needs to get and release a map element (which is
proctected by spinlock!) on each contention begin and end pair. This
can impact on performance if there are a lot of contention (usually from
spinlocks).
It used to go with task local storage but it had an issue on memory
allocation in some critical paths. Although it's addressed in recent
kernels IIUC, the tool should support old kernels too. So it cannot
simply switch to the task local storage at least for now.
As spinlocks create lots of contention and they disabled preemption
during the spinning, it can use per-cpu array to keep the timestamp to
avoid overhead in hashmap update and delete.
In contention_begin, it's easy to check the lock types since it can see
the flags. But contention_end cannot see it. So let's try to per-cpu
array first (unconditionally) if it has an active element (lock != 0).
Then it should be used and per-task tstamp map should not be used until
the per-cpu array element is cleared which means nested spinlock
contention (if any) was finished and it nows see (the outer) lock.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org
When pelem is NULL, it'd create a new entry with zero data. But it
might be preempted by IRQ/NMI just before calling bpf_map_update_elem()
then there's a chance to call it twice for the same pid. So it'd be
better to use BPF_NOEXIST flag and check the return value to prevent
the race.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org
It checks the current lock to calculated the delta of contention time.
The address is saved in the tstamp map which is allocated at begining of
contention and released at end of contention.
But it's possible for bpf_map_delete_elem() to fail. In that case, the
element in the tstamp map kept for the current lock and it makes the
next contention for the same lock tracked incorrectly. Specificially
the next contention begin will see the existing element for the task and
it'd just return. Then the next contention end will see the element and
calculate the time using the timestamp for the previous begin.
This can result in a large value for two small contentions happened from
time to time. Let's clear the lock address so that it can be updated
next time even if the bpf_map_delete_elem() failed.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org
evsel__increase_rlimit() helper does nothing with evsel, and description
of the functionality is inaccurate, rename it and move to util/rlimit.c.
By the way, fix a checkppatch warning about misplaced license tag:
WARNING: Misplaced SPDX-License-Identifier tag - use line 1 instead
#160: FILE: tools/perf/util/rlimit.h:3:
/* SPDX-License-Identifier: LGPL-2.1 */
No functional change.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231023033144.1011896-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The -G/--cgroups option is to put sender and receiver in different
cgroups in order to measure cgroup context switch overheads.
Users need to make sure the cgroups exist and accessible. The following
example should the effect of this change. Please don't forget taskset
before the perf bench to measure cgroup switches properly. Otherwise
each task would run on a different CPU and generate cgroup switches
regardless of this change.
# perf stat -e context-switches,cgroup-switches \
> taskset -c 0 perf bench sched pipe -l 10000 > /dev/null
Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000':
20,001 context-switches
2 cgroup-switches
0.053449651 seconds time elapsed
0.011286000 seconds user
0.041869000 seconds sys
# perf stat -e context-switches,cgroup-switches \
> taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB > /dev/null
Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB':
20,001 context-switches
20,001 cgroup-switches
0.052768627 seconds time elapsed
0.006284000 seconds user
0.046266000 seconds sys
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20231017202342.1353124-1-namhyung@kernel.org
CoreSight might be not available, in such case, skip the tests.
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: vmolnaro@redhat.com
Link: https://lore.kernel.org/r/20231019091137.22525-1-mpetlan@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
If using parallel threads to collect data, perf record needs at least 6 fds
per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the
pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX)
For an environment with more than 100 cores, if perf record uses both
`-a` and `--threads` options, it is easy to exceed the upper limit of the
file descriptor number, when we run out of them try to increase the limits.
Before:
$ ulimit -n
1024
$ lscpu | grep 'On-line CPU(s)'
On-line CPU(s) list: 0-159
$ perf record --threads -a sleep 1
Failed to create data directory: Too many open files
After:
$ ulimit -n
1024
$ lscpu | grep 'On-line CPU(s)'
On-line CPU(s) list: 0-159
$ perf record --threads -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ]
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231013075945.698874-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The CPI_STALL_RATIO metric group can be used to present the high
level CPI stall breakdown metrics in powerpc, which will show:
- DISPATCH_STALL_CPI ( Dispatch stall cycles per insn )
- ISSUE_STALL_CPI ( Issue stall cycles per insn )
- EXECUTION_STALL_CPI ( Execution stall cycles per insn )
- COMPLETION_STALL_CPI ( Completion stall cycles per insn )
Commit cf26e043c2 ("perf vendor events power10: Add JSON
metric events to present CPI stall cycles in powerpc)" which added
the CPI_STALL_RATIO metric group, also modified
the PMC value used in PM_RUN_INST_CMPL event from PMC4 to PMC5,
to avoid multiplexing of events.
But that got revert in recent changes. Fix this issue by changing
back the PMC value used in PM_RUN_INST_CMPL to PMC5.
Result with the fix:
./perf stat --metric-no-group -M CPI_STALL_RATIO <workload>
Performance counter stats for 'workload':
68,745,426 PM_CMPL_STALL # 0.21 COMPLETION_STALL_CPI
7,692,827 PM_ISSUE_STALL # 0.02 ISSUE_STALL_CPI
322,638,223 PM_RUN_INST_CMPL # 0.05 DISPATCH_STALL_CPI
# 0.48 EXECUTION_STALL_CPI
16,858,553 PM_DISP_STALL_CYC
153,880,133 PM_EXEC_STALL
0.089774592 seconds time elapsed
"--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled
in all group for more accuracy.
Fixes: 7d473f475b ("perf vendor events: Move JSON/events to appropriate files for power10 platform")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Cc: maddy@linux.ibm.com
Link: https://lore.kernel.org/r/20231016143110.244255-1-kjain@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Start generating sysreg-defs.h in anticipation of updating sysreg.h to a
version that needs the generated output.
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
The recent change made it possible to generate vmlinux.h from BTF and
to ignore the file. But we also have a minimal vmlinux.h that will be
used by default. It should not be ignored by GIT.
Fixes: b7a2d774c9 ("perf build: Add ability to build with a generated vmlinux.h")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310110451.rvdUZJEY-lkp@intel.com/
Cc: oe-kbuild-all@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Running shellcheck on stat_all_metricgroups.sh reports
below warning:
In ./tests/shell/stat_all_metricgroups.sh line 7:
function ParanoidAndNotRoot()
^-- SC2112: 'function' keyword is non-standard. Delete it.
As per the format, "function" is a non-standard keyword that
can be used to declare functions. Fix this by removing the
"function" keyword from ParanoidAndNotRoot function
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231013073021.99794-4-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Running shellcheck on record_sideband.sh throws below
warning:
In tests/shell/record_sideband.sh line 25:
if ! perf record -o ${perfdata} -BN --no-bpf-event -C $1 true 2>&1 >/dev/null
^--^ SC2069: To redirect stdout+stderr, 2>&1 must be last (or use '{ cmd > file; } 2>&1' to clarify).
This shows shellcheck warning SC2069 where the redirection
order needs to be fixed. Use "cmd > /dev/null 2>&1" to fix
the redirection of perf record output
Fixes: 23b97c7ee9 ("perf test: Add test case for record sideband events")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: disgoel@linux.vnet.ibm.com
Link: https://lore.kernel.org/r/20231013073021.99794-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Running shellcheck on lock_contention.sh generates below
warning
In tests/shell/lock_contention.sh line 36:
if [ `nproc` -lt 4 ]; then
^-----^ SC2046: Quote this to prevent word splitting.
Here since nproc will generate a single word output
and there is no possibility of word splitting, this
warning can be ignored. Use exception for this with
"disable" option in shellcheck. This warning is observed
after commit:
"commit 29441ab3a3 ("perf test lock_contention.sh: Skip test
if not enough CPUs")"
Fixes: 29441ab3a3 ("perf test lock_contention.sh: Skip test if not enough CPUs")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231013073021.99794-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Simple expression parser test fails in powerpc as below:
4: Simple expression parser
test child forked, pid 170385
Using CPUID 004e2102
division by zero
syntax error
syntax error
FAILED tests/expr.c:65 parse test failed
test child finished with -1
Simple expression parser: FAILED!
This is observed after commit:
'commit 9d5da30e4a ("perf jevents: Add a new expression builtin strcmp_cpuid_str()")'
With this commit, a new expression builtin strcmp_cpuid_str
got added. This function takes an 'ID' type value, which is
a string. So expression parse for strcmp_cpuid_str expects
const char * as cpuid value type. In case of powerpc, CPU IDs
are numbers. Hence it doesn't get interpreted correctly by
bison parser. Example in case of power9, cpuid string returns
as: 004e2102
cpuid of string type is expected in two cases:
1. char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused);
Testcase "tests/expr.c" uses "perf_pmu__getcpuid" which calls
get_cpuid_str to get the cpuid string.
2. cpuid field in :struct pmu_events_map
struct pmu_events_map {
const char *arch;
const char *cpuid;
Here cpuid field is used in "perf_pmu__find_events_table"
function as "strcmp_cpuid_str(map->cpuid, cpuid)". The
value for cpuid field is picked from mapfile.csv.
Fix the mapfile.csv and get_cpuid_str function to prefix
cpuid with 0x so that it gets correctly interpreted by
the bison parser
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20231009050052.64935-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
When users pass the option '--timestamp' or '-T' in the record command,
all events will set the PERF_SAMPLE_TIME bit in the attribution. In
this case, the AUX event will record the kernel timestamp, but it
doesn't mean Arm CoreSight enables timestamp packets in its hardware
tracing.
If the option '--timestamp' or '-T' is set, this patch always enables
Arm CoreSight timestamp, as a result, the bit 28 in event's config is to
be set.
Before:
# perf record -e cs_etm// --per-thread --timestamp -- ls
# perf script --header-only
...
# event : name = cs_etm//, , id = { 69 }, type = 12, size = 136,
config = 0, { sample_period, sample_freq } = 1,
sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID|LOST,
disabled = 1, enable_on_exec = 1, sample_id_all = 1, exclude_guest = 1
...
After:
# perf record -e cs_etm// --per-thread --timestamp -- ls
# perf script --header-only
...
# event : name = cs_etm//, , id = { 49 }, type = 12, size = 136,
config = 0x10000000, { sample_period, sample_freq } = 1,
sample_type = IP|TID|TIME|CPU|IDENTIFIER, read_format = ID|LOST,
disabled = 1, enable_on_exec = 1, sample_id_all = 1, exclude_guest = 1
...
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: James Clark <james.clark@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231014074159.1667880-3-leo.yan@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
So far, it's impossible to validate timestamp trace in Arm CoreSight when
the perf is in the per-thread mode. E.g. for the command:
perf record -e cs_etm/timestamp/ --per-thread -- ls
The command enables config 'timestamp' for 'cs_etm' event in the
per-thread mode. In this case, the function cs_etm_validate_config()
directly bails out and skips validation.
Given profiled process can be scheduled on any CPUs in the per-thread
mode, this patch validates timestamp tracing for all CPUs when detect
the CPU map is empty.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: James Clark <james.clark@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231014074159.1667880-2-leo.yan@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The default config is computed during creation of the PMU and may do
things like scanning sysfs, when the PMU may just be used as part of
scanning. Change default_config to perf_event_attr_init_default, a
callback that is used when a default config needs initializing. This
avoids holding onto the memory for a perf_event_attr and copying.
On a tigerlake laptop running the pmu-scan benchmark:
Before:
Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 28.780 usec (+- 0.503 usec)
Average PMU scanning took: 283.480 usec (+- 18.471 usec)
Number of openat syscalls: 30,227
After:
Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 27.880 usec (+- 0.169 usec)
Average PMU scanning took: 245.260 usec (+- 15.758 usec)
Number of openat syscalls: 28,914
Over 3 runs it is a nearly 12% reduction in execution time and a 4.3%
of openat calls.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
strcmp_cpuid_str performs regular expression comparisons and so per
CPUID linear searches over the perf_events_map are expensive. Add a
helper function called map_for_pmu that does the search but also
caches the map specific to a PMU. As the PMU may differ, also cache
the CPUID string so that PMUs with the same CPUID string don't require
the linear search and regular expression comparisons. This speeds
loading PMUs as the search is done once per PMU to find the
appropriate tables.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add const to related APIs, this is so they can be used to default
initialize a perf_event_attr from a const pmu.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
File APIs don't alter the struct pmu so allow const ones to be passed.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Avoid setting PMU values in arm_spe_pmu_default_config, move to
perf_pmu__arch_init.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Avoid setting PMU values in intel_pt_pmu_default_config, move to
perf_pmu__arch_init.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Assign default_config as part of the init. perf_pmu__get_default_config
was doing more than just getting the default config and so this is
intended to better align with the code.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Use get_unaligned_le64() instead of memcpy_le64(..., 8) because it produces
simpler code.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-6-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Avoid unaligned access by using get_unaligned_le16(), get_unaligned_le32()
and get_unaligned_le64().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-5-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add get_unaligned_le16(), get_unaligned_le32 and get_unaligned_le64, same
as include/asm-generic/unaligned.h. And add include/asm-generic/unaligned.h
to check-headers.sh bringing tools/include/asm-generic/unaligned.h up to
date so that the kernel and tools versions match.
Use diagnostic pragmas to ignore -Wpacked used by perf build.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-2-adrian.hunter@intel.com
Link: https://lore.kernel.org/r/20231010142234.20061-1-adrian.hunter@intel.com
[ squashed check-header.sh addition ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Memory leaks were detected by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-20-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Memory leaks were detected by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-19-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
If opendir failed then closedir was passed NULL which is
erroneous. Caught by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-18-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add missing free on an error path as detected by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-16-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
If a memory allocation fails then the strdup-ed string needs
freeing. Detected by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-15-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
On success path the sib_core and sib_thr values weren't being
freed. Detected by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-14-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
On other code paths browser->he_selection is NULL checked, add a
missing case reported by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-13-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In the unlikely case of having a symbol without a mapping, avoid a
NULL dereference that clang-tidy warns about.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
pmu should be initialized to NULL before perf_pmus__scan loop. Fix and
shrink the scope of pmu at the same time. Issue detected by clang-tidy.
Fixes: 5752c20f37 ("perf mem: Scan all PMUs instead of just core ones")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
jit_repipe_unwinding_info is called in a loop by jit_process_dump,
avoid leaking unwinding_data by free-ing before overwriting. Error
detected by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
clang-tidy was warning:
```
util/env.c:334:23: warning: Access to field 'nr_pmu_mappings' results in a dereference of a null pointer (loaded from variable 'env') [clang-analyzer-core.NullDereference]
env->nr_pmu_mappings = pmu_num;
```
As functions are called potentially when !env was true. This condition
could never be true as it would produce a segv, so remove the
unnecessary NULL tests and silence clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The buildid filename is first determined and then from this the
buildid read. If getting the filename fails then the buildid will be
used for a later memcmp uninitialized. Detected by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Found by clang-tidy:
```
bench/uprobe.c:98:3: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
bench_uprobe_bpf__destroy(skel);
```
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Raw events can be strings like 'r0xead' but the 0x is optional so they
can also be 'read'. On IcelakeX uncore_imc_free_running has an event
called 'read' which may be programmed as:
```
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
```
However, the PE_RAW type isn't allowed on the right of a term, even
though in this case we just want to interpret it as a string. This
leads to the following error on IcelakeX:
```
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
event syntax error: '..nning/event=read/'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Fix this by allowing raw types on the right of terms and treat them as
strings, just as is already done for PE_LEGACY_CACHE. Make this
consistent by just entirely removing name_or_legacy and always using
name_or_raw that covers all three cases.
Fixes: 6fd1e51915 ("perf parse-events: Support PMUs for legacy cache events")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230928004431.1926969-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
By default perf will fail the build if the development files for
libtraceevent are not available.
To build perf without libtraceevent support, disabling several features
such as 'perf trace', one needs to add NO_LIBTRACEVENT=1 to the make
command line.
Add the missing comments about that to the tools/perf/Makefile.perf
file, just like all the other such command line toggles.
Fixes: 378ef0f5d9 ("perf build: Use libtraceevent from the system")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/ZR6+MhXtLnv6ow6E@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This is a longstanding to do list entry: we need a way to see that a
sample took place while in idle state, as the current way to do it is
to infer that by the name of the functions that in such state have
more samples, IOW: a hack.
Maybe we can do flip a bit in samples that take place inside the
enter/exit idle section in do_idle()?
But till then, add one more :-\
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/ZR66Qgbcltt+zG7F@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
We specify that a "num_hex" comprises 1 or more digits, however, that
allows strtoull to fail with ERANGE. Limit the number of hex digits to
being between 1 and 16.
Before:
```
$ perf stat -e 'cpu/rE7574c47490475745/' true
perf: util/parse-events.c:215: fix_raw: Assertion `errno == 0' failed.
Aborted (core dumped)
```
After:
```
$ perf stat -e 'cpu/rE7574c47490475745/' true
event syntax error: 'cpu/rE7574c47490475745/'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'cpu'
Initial error:
event syntax error: 'cpu/rE7574c47490475745/'
\___ unknown term 'rE7574c47490475745' for pmu 'cpu'
valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,config3,name,period,percore,metric-id
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Issue found through fuzz testing.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230907210533.3712979-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Build:
- Update header files in the tools/**/include directory to sync with
the kernel sources as usual.
- Remove unused bpf-prologue files. While it's not strictly a fix,
but the functionality was removed in this cycle so better to get
rid of the code together.
- Other minor build fixes.
Misc:
- Fix uninitialized memory access in PMU parsing code
- Fix segfaults on software event
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZRIFKAAKCRCMstVUGiXM
g/pXAP9HLB2s+beBTK5iQU4/NfqmAVSl303QCoR9xLByo38vfAEAlLiRIh061pTi
PRlXVuY9bUQPyCSYsiBHv/fmLqdQdwU=
=ti6G
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' into perf-tools-next
To pick up the 'perf bench sched-seccomp-notify' changes to allow us to
continue build testing perf-tools-next with the set of distro
containers, where some older ones don't have a recent enough seccomp.h
UAPI header that contains defines needed by this new 'perf bench'
workload.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf test named "kernel lock contention analysis test"
fails in powerpc system with below error:
[command]# ./perf test 81 -vv
81: kernel lock contention analysis test :
--- start ---
test child forked, pid 2140
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
[Skip] No BPF support
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --type-filter (w/ spinlock)
Testing perf lock contention --lock-filter (w/ tasklist_lock)
Testing perf lock contention --callstack-filter (w/ unix_stream)
[Fail] Recorded result should have a lock from unix_stream:
test child finished with -1
---- end ----
kernel lock contention analysis test: FAILED!
The test is failing because we get an address entry with 0 in
perf lock samples for powerpc, and code for lock contention
option "--callstack-filter" will not check further entries after
address 0.
Below are some of the samples from test generated perf.data file, which
have 0 address in the 2nd entry of callstack:
--------
sched-messaging 3409 [001] 7152.904029: lock:contention_begin: 0xc00000c80904ef00 (flags=SPIN)
c0000000001e926c __traceiter_contention_begin+0x6c ([kernel.kallsyms])
0 [unknown] ([unknown])
c000000000f8a178 native_queued_spin_lock_slowpath+0x1f8 ([kernel.kallsyms])
c000000000f89f44 _raw_spin_lock_irqsave+0x84 ([kernel.kallsyms])
c0000000001d9fd0 prepare_to_wait+0x50 ([kernel.kallsyms])
c000000000c80f50 sock_alloc_send_pskb+0x1b0 ([kernel.kallsyms])
c000000000e82298 unix_stream_sendmsg+0x2b8 ([kernel.kallsyms])
c000000000c78980 sock_sendmsg+0x80 ([kernel.kallsyms])
sched-messaging 3408 [005] 7152.904036: lock:contention_begin: 0xc00000c80904ef00 (flags=SPIN)
c0000000001e926c __traceiter_contention_begin+0x6c ([kernel.kallsyms])
0 [unknown] ([unknown])
c000000000f8a178 native_queued_spin_lock_slowpath+0x1f8 ([kernel.kallsyms])
c000000000f89f44 _raw_spin_lock_irqsave+0x84 ([kernel.kallsyms])
c0000000001d9fd0 prepare_to_wait+0x50 ([kernel.kallsyms])
c000000000c80f50 sock_alloc_send_pskb+0x1b0 ([kernel.kallsyms])
c000000000e82298 unix_stream_sendmsg+0x2b8 ([kernel.kallsyms])
c000000000c78980 sock_sendmsg+0x80 ([kernel.kallsyms])
--------
Based on commit 20002ded4d ("perf_counter: powerpc: Add callchain support"),
incase of powerpc, the callchain saved by kernel always includes first
three entries as the NIP (next instruction pointer), LR (link register), and
the contents of LR save area in the second stack frame. In certain scenarios
its possible to have invalid kernel instruction addresses in either of LR or the
second stack frame's LR. In that case, kernel will store the address as zer0.
Hence, its possible to have 2nd or 3rd callstack entry as 0.
As per the current code in match_callstack_filter function, we skip the callstack
check incase we get 0 address. And hence the test case is failing in powerpc.
Fix this issue by updating the check in match_callstack_filter function,
to not skip callstack check if the 2nd or 3rd entry have 0 address
for powerpc.
Result in powerpc after patch changes:
[command]# ./perf test 81 -vv
81: kernel lock contention analysis test :
--- start ---
test child forked, pid 4570
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
[Skip] No BPF support
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --type-filter (w/ spinlock)
Testing perf lock contention --lock-filter (w/ tasklist_lock)
[Skip] Could not find 'tasklist_lock'
Testing perf lock contention --callstack-filter (w/ unix_stream)
Testing perf lock contention --callstack-filter with task aggregation
Testing perf lock contention CSV output
[Skip] No BPF support
test child finished with 0
---- end ----
kernel lock contention analysis test: Ok
Fixes: ebab291641 ("perf lock contention: Support filters for different aggregation")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Link: https://lore.kernel.org/r/20231003092113.252380-1-kjain@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The testcase "Object code reading" fails in somecases
for "fs_something" sub test as below:
Reading object code for memory address: 0xc008000007f0142c
File is: /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
On file address is: 0x1114cc
Objdump command is: objdump -z -d --start-address=0x11142c --stop-address=0x1114ac /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
objdump read too few bytes: 128
test child finished with -1
This can alo be reproduced when running perf record with
workload that exercises fs_something() code. In the test
setup, this is exercising xfs code since root is xfs.
# perf record ./a.out
# perf report -v |grep "xfs.ko"
0.76% a.out /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko 0xc008000007de5efc B [k] xlog_cil_commit
0.74% a.out /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko 0xc008000007d5ae18 B [k] xfs_btree_key_offset
0.74% a.out /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko 0xc008000007e11fd4 B [k] 0x0000000000112074
Here addr "0xc008000007e11fd4" is not resolved. since this is a
kernel module, its offset is from the DSO. Xfs module is loaded
at 0xc008000007d00000
# cat /proc/modules | grep xfs
xfs 2228224 3 - Live 0xc008000007d00000
And size is 0x220000. So its loaded between 0xc008000007d00000
and 0xc008000007f20000. From objdump, text section is:
text 0010f7bc 0000000000000000 0000000000000000 000000a0 2**4
Hence perf captured ip maps to 0x112074 which is:
( ip - start of module ) + a0
This offset 0x112074 falls out .text section which is up to 0x10f7bc
In this case for module, the address 0xc008000007e11fd4 is pointing
to stub instructions. This address range represents the module stubs
which is allocated on module load and hence is not part of DSO offset.
To address this issue in "object code reading", skip the sample if
address falls out of text section and is within the module end.
Use the "text_end" member of "struct dso" to do this check.
To address this issue in "perf report", exploring an option of
having stubs range as part of the /proc/kallsyms, so that perf
report can resolve addresses in stubs range
However this patch uses text_end to skip the stub range for
Object code reading testcase.
Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230928075213.84392-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Update "struct dso" to include new member "is_kmod".
This new field will determine if the file is a kernel
module or not.
To resolve the address from a sample, perf looks at the
DSO maps. In case of address from a kernel module, there
were some address found to be not resolved. This was
observed while running perf test for "Object code reading".
Though the ip falls beteen the start address of the loaded
module (perf map->start ) and end address ( perf map->end),
it was unresolved.
This was happening because in some cases for kernel
modules, address from sample points to stub instructions.
To identify if the DSO is a kernel module, the new field
"is_kmod" is added to "struct dso".
Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230928075213.84392-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Update "struct dso" to include new member "text_end".
This new field will represent the offset for end of text
section for a dso. For elf, this value is derived as:
sh_size (Size of section in byes) + sh_offset (Section file
offst) of the elf header for text.
For bfd, this value is derived as:
1. For PE file,
section->size + ( section->vma - dso->text_offset)
2. Other cases:
section->filepos (file position) + section->size (size of
section)
To resolve the address from a sample, perf looks at the
DSO maps. In case of address from a kernel module, there
were some address found to be not resolved. This was
observed while running perf test for "Object code reading".
Though the ip falls beteen the start address of the loaded
module (perf map->start ) and end address ( perf map->end),
it was unresolved.
Example:
Reading object code for memory address: 0xc008000007f0142c
File is: /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
On file address is: 0x1114cc
Objdump command is: objdump -z -d --start-address=0x11142c --stop-address=0x1114ac /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
objdump read too few bytes: 128
test child finished with -1
Here, module is loaded at:
# cat /proc/modules | grep xfs
xfs 2228224 3 - Live 0xc008000007d00000
From objdump for xfs module, text section is:
text 0010f7bc 0000000000000000 0000000000000000 000000a0 2**4
Here the offset for 0xc008000007f0142c ie 0x112074 falls out
.text section which is up to 0x10f7bc.
In this case for module, the address 0xc008000007e11fd4 is pointing
to stub instructions. This address range represents the module stubs
which is allocated on module load and hence is not part of DSO offset.
To identify such address, which falls out of text
section and within module end, added the new field "text_end" to
"struct dso".
Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230928075213.84392-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Switch the test program to sleep that makes more sense for system wide
events. Only enable system wide when root or not paranoid. This avoids
failures under some testing conditions like ARM cloud.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230930060206.2353141-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
In the previous code, there was a memory leak issue where the previously
allocated memory was not freed upon a failed lseek operation. This patch
addresses the problem by releasing the old memory before returning -errno
in case of a lseek failure. This ensures that memory is properly managed
and avoids potential memory leaks.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: yangyicong@hisilicon.com
Cc: jonathan.cameron@huawei.com
Link: https://lore.kernel.org/r/20230930072719.1267784-1-visitorckw@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
commit 'be65de6b03aa ("fs: Remove dcookies support")' removed the
syscall definition for lookup_dcookie. However, syscall tables still
point to the old sys_lookup_dcookie() definition. Update syscall tables
of all architectures to directly point to sys_ni_syscall() instead.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org> # for perf
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
PMU alias names were computed when the first perf_pmu is created,
scanning all PMUs in event sources for a file called alias that
generally doesn't exist. Switch to trying to load the file when all
PMU related files are loaded in lookup. This would cause a PMU name
lookup of an alias name to fail if no PMUs were loaded, so in that
case all PMUs are loaded and the find repeated. The overhead is
similar but in the (very) general case not all PMUs are scanned for
the alias file.
As the overhead occurs once per invocation it doesn't show in perf
bench internals pmu-scan. On a tigerlake machine, the number of openat
system calls for an event of cpu/cycles/ with perf stat reduces from
94 to 69 (ie 25 fewer openat calls).
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230925062323.840799-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Testcase "Parsing of all PMU events from sysfs" parse events for
all PMUs, and not just cpu. In case of powerpc, the PowerVM
environment supports events from hv_24x7 and hv_gpci PMU which
is of example format like below:
- hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
- hv_gpci/event,partition_id=?/
The value for "?" needs to be filled in depending on system
configuration. It is better to skip these parametrized events
in this test as it is done in:
'commit b50d691e50 ("perf test: Fix "all PMU test" to skip
parametrized events")' which handled a simialr instance with
"all PMU test".
Fix parse-events test to skip parametrized events since
it needs proper setup of the parameters.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230927181703.80936-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add JSON metrics for Arm CMN. Currently just add part of CMN PMU
metrics which are general and compatible for any SoC with CMN-ANY.
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-8-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add new event test for uncore system event which is used to verify the
functionality of "Compat" matching multiple identifiers and the new event
fields "EventidCode" and "NodeType".
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-6-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The perf_pmu_test_event.matching_pmu didn't work. No matter what its
value is, it does not affect the test results. So let matching_pmu be
used for matching perf_pmu_test_pmu.pmu.name.
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-5-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The previous code assumes an event has either an "event=" or "config"
field at the beginning. For CMN neither of these may be present, as an
event is typically "type=xx,eventid=xxx".
So add EventidCode and NodeType to support CMN event description.
I compared pmu_event.c before and after compiling with JEVENT_ARCH=all,
they are consistent.
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-4-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The jevent "Compat" is used for uncore PMU alias or metric definitions.
The same PMU driver has different PMU identifiers due to different
hardware versions and types, but they may have some common PMU metric.
Since a Compat value can only match one identifier, when adding the
same metric to PMUs with different identifiers, each identifier needs
to be defined once, which is not streamlined enough.
So let "Compat" support using regular expression to match multiple
identifiers for uncore PMU metric.
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-3-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The jevent "Compat" is used for uncore PMU alias or metric definitions.
The same PMU driver has different PMU identifiers due to different
hardware versions and types, but they may have some common PMU event.
Since a Compat value can only match one identifier, when adding the
same event alias to PMUs with different identifiers, each identifier
needs to be defined once, which is not streamlined enough.
So let "Compat" support using regular expression to match identifiers
for uncore PMU alias. For example, if the "Compat" value is set to
"43401|43c01", it would be able to match PMU identifiers such as "43401"
or "43c01", which correspond to CMN600_r0p0 or CMN700_r0p0.
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Zhuo Song <zhuo.song@linux.alibaba.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/1695794391-34817-2-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The BTF func proto for a tracepoint has one more argument than the
actual tracepoint function since it has a context argument at the
begining. So it should compare to 5 when the tracepoint has 4
arguments.
typedef void (*btf_trace_sched_switch)(void *, bool, struct task_struct *, struct task_struct *, unsigned int);
Also, recent change in the perf tool would use a hand-written minimal
vmlinux.h to generate BTF in the skeleton. So it won't have the info
of the tracepoint. Anyway it should use the kernel's vmlinux BTF to
check the type in the kernel.
Fixes: b36888f71c ("perf record: Handle argument change in sched_switch")
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <song@kernel.org>
Cc: Hao Luo <haoluo@google.com>
CC: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230922234444.3115821-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To save pid of child processes when creating worker:
1. The messaging worker is changed to `union` type to store thread id and
process pid.
2. Save child process pid in create_process_worker().
3. Rename `pth_tab` as `work_tab`.
Test result:
# perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 6.744 [sec]
# perf bench sched messaging -t
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 10 groups == 400 threads run
Total time: 5.788 [sec]
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230923093037.961232-4-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Refactor the create_worker() helper:
1. Modify the return value and use pthread pointer as a parameter to
facilitate value assignment in create_worker().
2. The thread worker creation and process worker creation are abstracted
into independent helpers.
No functional change.
Test result:
# perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 6.332 [sec]
# perf bench sched messaging -t
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 10 groups == 400 threads run
Total time: 5.545 [sec]
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230923093037.961232-3-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fixed several code style issues in sched-messaging:
1. Use one space around "-" and "+" operators.
2. When a long line is broken, the operator is at the end of the line.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230923093037.961232-2-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Running shellcheck on some of the shell scripts, throws
below warning on shellcheck v0.6. Example:
In tests/shell/coresight/asm_pure_loop.sh line 14:
DATA="$DATD/perf-$TEST-$DATV.data"
^---^ SC2153: Possible misspelling: DATD may not be assigned, but DATA is.
Here, DATD is exported from "lib/coresight.sh" and this
warning can be ignored. Use "shellcheck disable=" to ignore
this check.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230907171540.36736-4-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Running shellcheck on stat+shadow_stat.sh generates below
warning
In tests/shell/stat+csv_summary.sh line 26:
while read _num _event _run _pct
^--^ SC2034: _num appears unused. Verify use (or export if used externally).
^----^ SC2034: _event appears unused. Verify use (or export if used externally).
^--^ SC2034: _run appears unused. Verify use (or export if used externally).
^--^ SC2034: _pct appears unused. Verify use (or export if used externally).
This variable is intentionally unused since it is
needed to parse through the output. commit used "_"
as a prefix for this throw away variable. But this
stil shows warning with shellcheck v0.6. Fix this
by only using "_" instead of prefix and variable name.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230907171540.36736-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Running shellcheck on some of the shell scripts throws
below error:
In tests/shell/coresight/unroll_loop_thread_10.sh line 8:
. "$(dirname $0)"/../lib/coresight.sh
^-- SC1090: Can't follow non-constant source. Use a directive to specify location.
This happens on shellcheck version "0.6.0". Fix shellcheck
warning for SC1090 using "shellcheck source="i option to mention
the location of sourced files.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230907171540.36736-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Dummy events are created with an attribute where the period and freq
are zero. evsel__config will then see the uninitialized values and
initialize them in evsel__default_freq_period. As fequency mode is
used by default the dummy event would be set to use frequency
mode. However, this has no effect on the dummy event but does cause
unnecessary timers/interrupts. Avoid this overhead by setting the
period to 1 for dummy events.
evlist__add_aux_dummy calls evlist__add_dummy then sets freq=0 and
period=1. This isn't necessary after this change and so the setting is
removed.
From Stephane:
The dummy event is not counting anything. It is used to collect mmap
records and avoid a race condition during the synthesize mmap phase of
perf record. As such, it should not cause any overhead during active
profiling. Yet, it did. Because of a bug the dummy event was
programmed as a sampling event in frequency mode. Events in that mode
incur more kernel overheads because on timer tick, the kernel has to
look at the number of samples for each event and potentially adjust
the sampling period to achieve the desired frequency. The dummy event
was therefore adding a frequency event to task and ctx contexts we may
otherwise not have any, e.g.,
perf record -a -e cpu/event=0x3c,period=10000000/.
On each timer tick the perf_adjust_freq_unthr_context() is invoked and
if ctx->nr_freq is non-zero, then the kernel will loop over ALL the
events of the context looking for frequency mode ones. In doing, so it
locks the context, and enable/disable the PMU of each hw event. If all
the events of the context are in period mode, the kernel will have to
traverse the list for nothing incurring overhead. The overhead is
multiplied by a very large factor when this happens in a guest kernel.
There is no need for the dummy event to be in frequency mode, it does
not count anything and therefore should not cause extra overhead for
no reason.
Fixes: 5bae025023 ("perf evlist: Introduce perf_evlist__new_dummy constructor")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230916035640.1074422-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The perf_pmu__parse_* functions for the sysfs files of pmu event’s
scale, unit, per-pkg and snapshot were updated in commit 7b723dbb96
("perf pmu: Be lazy about loading event info files from sysfs").
However, the paths for these sysfs files were incorrect. This resulted
in perf stat reporting values with wrong scaling and missing units. This
is fixed by correcting the paths for these sysfs files.
Before this fix:
$sudo perf stat -e power/energy-pkg/ -- sleep 2
Performance counter stats for 'system wide':
351,217,188,864 power/energy-pkg/
2.004127961 seconds time elapsed
After this fix:
$sudo perf stat -e power/energy-pkg/ -- sleep 2
Performance counter stats for 'system wide':
80.58 Joules power/energy-pkg/
2.004009749 seconds time elapsed
Fixes: 7b723dbb96 ("perf pmu: Be lazy about loading event info files from sysfs")
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: ravi.bangoria@amd.com
Cc: sandipan.das@amd.com
Cc: james.clark@arm.com
Cc: kan.liang@linux.intel.com
Link: https://lore.kernel.org/r/20230920122349.418673-1-wyes.karny@amd.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Machines with less then 4 CPUs weren't consistently triggering lock
events required for the test.
Skip the test on those machines. The limit of 4 CPUs is set as it
generates around 100 lock events for a test.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Acked-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230919150419.23193-2-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The test was failing in specific scenarios due to imperfection of FP
arithmetics. The `bc` command wasn't correctly rounding the result of
division causing the failure.
Replace the `bc` with `awk` which should work with more decimal places
and add a threshold to catch any possible rounding errors. The
acceptable rounding error is set to 0.01 when the test passes with a
warning message.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Acked-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20230919150419.23193-1-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The struct "pmu_events_table" has been changed after commit
2e255b4f9f (perf jevents: Group events by PMU, 2023-08-23).
So there doesn't exist 'entries' in pmu_events_table anymore.
This will align the members with that commit. Othewise, below
errors will be printed when run jevent.py:
pmu-events/pmu-events.c:5485:26: error: ‘struct pmu_metrics_table’ has no member named ‘entries’
5485 | .entries = pmu_metrics__freescale_imx8dxl_sys,
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230919080929.3807123-1-xu.yang_2@nxp.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fuzzing found that an invalid tracepoint name would create a memory
leak with an address sanitizer build:
```
$ perf stat -e '*:o/' true
event syntax error: '*:o/'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
=================================================================
==59380==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4 byte(s) in 2 object(s) allocated from:
#0 0x7f38ac07077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
#1 0x55f2f41be73b in str util/parse-events.l:49
#2 0x55f2f41d08e8 in parse_events_lex util/parse-events.l:338
#3 0x55f2f41dc3b1 in parse_events_parse util/parse-events-bison.c:1464
#4 0x55f2f410b8b3 in parse_events__scanner util/parse-events.c:1822
#5 0x55f2f410d1b9 in __parse_events util/parse-events.c:2094
#6 0x55f2f410e57f in parse_events_option util/parse-events.c:2279
#7 0x55f2f4427b56 in get_value tools/lib/subcmd/parse-options.c:251
#8 0x55f2f4428d98 in parse_short_opt tools/lib/subcmd/parse-options.c:351
#9 0x55f2f4429d80 in parse_options_step tools/lib/subcmd/parse-options.c:539
#10 0x55f2f442acb9 in parse_options_subcommand tools/lib/subcmd/parse-options.c:654
#11 0x55f2f3ec99fc in cmd_stat tools/perf/builtin-stat.c:2501
#12 0x55f2f4093289 in run_builtin tools/perf/perf.c:322
#13 0x55f2f40937f5 in handle_internal_command tools/perf/perf.c:375
#14 0x55f2f4093bbd in run_argv tools/perf/perf.c:419
#15 0x55f2f409412b in main tools/perf/perf.c:535
SUMMARY: AddressSanitizer: 4 byte(s) leaked in 2 allocation(s).
```
Fix by adding the missing destructor.
Fixes: 865582c3f4 ("perf tools: Adds the tracepoint name parsing support")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: He Kuang <hekuang@huawei.com>
Link: https://lore.kernel.org/r/20230914164028.363220-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Use perf version to detect whether BPF skeletons were enabled in a
build rather than a failing perf record.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Patrice Duroux <patrice.duroux@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230914211948.814999-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fix a target name and set BUILD_BPF_SKEL to 0 rather than 1.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Patrice Duroux <patrice.duroux@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230914211948.814999-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
LIBBPF is dependent on zlib so move the NO_ZLIB and feature check
early to avoid statically building when zlib is disabled. This avoids
a linkage failure with perf and static libbpf when zlib isn't
specified.
Move BUILD_BPF_SKEL logic to one place and if not defined set
BUILD_BPF_SKEL to 1. Detect dependencies of building with BPF
skeletons and warn/disable if the dependencies aren't present.
Change Makefile.perf to contain BPF skeleton logic dependent on the
Makefile.config result and refresh the comment about BUILD_BPF_SKEL.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Patrice Duroux <patrice.duroux@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230914211948.814999-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Running commands such as
# ./perf stat -e cs -- true
Segmentation fault (core dumped)
# ./perf stat -e cpu-clock-- true
Segmentation fault (core dumped)
#
dump core. This should not happen as these events are defined
even when no hardware PMU is available.
Debugging this reveals this call chain:
perf_pmus__find_by_type(type=1)
+--> pmu_read_sysfs(core_only=false)
+--> perf_pmu__find2(dirfd=3, name=0x152a113 "software")
+--> perf_pmu__lookup(pmus=0x14f0568 <other_pmus>, dirfd=3,
lookup_name=0x152a113 "software")
+--> perf_pmu__find_events_table (pmu=0x1532130)
Now the pmu is "software" and it tries to find a proper table
generated by the pmu-event generation process for s390:
# cd pmu-events/
# ./jevents.py s390 all /root/linux/tools/perf/pmu-events/arch |\
grep -E '^const struct pmu_table_entry'
const struct pmu_table_entry pmu_events__cf_z10[] = {
const struct pmu_table_entry pmu_events__cf_z13[] = {
const struct pmu_table_entry pmu_metrics__cf_z13[] = {
const struct pmu_table_entry pmu_events__cf_z14[] = {
const struct pmu_table_entry pmu_metrics__cf_z14[] = {
const struct pmu_table_entry pmu_events__cf_z15[] = {
const struct pmu_table_entry pmu_metrics__cf_z15[] = {
const struct pmu_table_entry pmu_events__cf_z16[] = {
const struct pmu_table_entry pmu_metrics__cf_z16[] = {
const struct pmu_table_entry pmu_events__cf_z196[] = {
const struct pmu_table_entry pmu_events__cf_zec12[] = {
const struct pmu_table_entry pmu_metrics__cf_zec12[] = {
const struct pmu_table_entry pmu_events__test_soc_cpu[] = {
const struct pmu_table_entry pmu_metrics__test_soc_cpu[] = {
const struct pmu_table_entry pmu_events__test_soc_sys[] = {
#
However event "software" is not listed, as can be seen in the
generated const struct pmu_events_map pmu_events_map[].
So in function perf_pmu__find_events_table(), the variable
table is initialized to NULL, but never set to a proper
value. The function scans all generated &pmu_events_map[]
tables, but no table matches, because the tables are
s390 CPU Measurement unit specific:
i = 0;
for (;;) {
const struct pmu_events_map *map = &pmu_events_map[i++];
if (!map->arch)
break;
--> the maps are there because the build generated them
if (!strcmp_cpuid_str(map->cpuid, cpuid)) {
table = &map->event_table;
break;
}
--> Since no matching CPU string the table var remains 0x0
}
free(cpuid);
if (!pmu)
return table;
--> The pmu is "software" so it exists and no return
--> and here perf dies because table is 0x0
for (i = 0; i < table->num_pmus; i++) {
...
}
return NULL;
Fix this and do not access the table variable. Instead return 0x0
which is the same return code when the for-loop was not successful.
Output after:
# ./perf stat -e cs -- true
Performance counter stats for 'true':
0 cs
0.000853105 seconds time elapsed
0.000061000 seconds user
0.000827000 seconds sys
# ./perf stat -e cpu-clock -- true
Performance counter stats for 'true':
0.25 msec cpu-clock # 0.341 CPUs utilized
0.000728383 seconds time elapsed
0.000055000 seconds user
0.000706000 seconds sys
# ./perf stat -e cycles -- true
Performance counter stats for 'true':
<not supported> cycles
0.000767298 seconds time elapsed
0.000055000 seconds user
0.000739000 seconds sys
#
Fixes: 7c52f10c0d ("perf pmu: Cache JSON events table")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: dengler@linux.ibm.com
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Cc: svens@linux.ibm.com
Link: https://lore.kernel.org/r/20230913125157.2790375-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fix an error detected by memory sanitizer:
```
==4033==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55fb0fbedfc7 in read_alias_info tools/perf/util/pmu.c:457:6
#1 0x55fb0fbea339 in check_info_data tools/perf/util/pmu.c:1434:2
#2 0x55fb0fbea339 in perf_pmu__check_alias tools/perf/util/pmu.c:1504:9
#3 0x55fb0fbdca85 in parse_events_add_pmu tools/perf/util/parse-events.c:1429:32
#4 0x55fb0f965230 in parse_events_parse tools/perf/util/parse-events.y:299:6
#5 0x55fb0fbdf6b2 in parse_events__scanner tools/perf/util/parse-events.c:1822:8
#6 0x55fb0fbdf8c1 in __parse_events tools/perf/util/parse-events.c:2094:8
#7 0x55fb0fa8ffa9 in parse_events tools/perf/util/parse-events.h:41:9
#8 0x55fb0fa8ffa9 in test_event tools/perf/tests/parse-events.c:2393:8
#9 0x55fb0fa8f458 in test__pmu_events tools/perf/tests/parse-events.c:2551:15
#10 0x55fb0fa6d93f in run_test tools/perf/tests/builtin-test.c:242:9
#11 0x55fb0fa6d93f in test_and_print tools/perf/tests/builtin-test.c:271:8
#12 0x55fb0fa6d082 in __cmd_test tools/perf/tests/builtin-test.c:442:5
#13 0x55fb0fa6d082 in cmd_test tools/perf/tests/builtin-test.c:564:9
#14 0x55fb0f942720 in run_builtin tools/perf/perf.c:322:11
#15 0x55fb0f942486 in handle_internal_command tools/perf/perf.c:375:8
#16 0x55fb0f941dab in run_argv tools/perf/perf.c:419:2
#17 0x55fb0f941dab in main tools/perf/perf.c:535:3
```
Fixes: 7b723dbb96 ("perf pmu: Be lazy about loading event info files from sysfs")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230914022425.1489035-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The parser wraps all strings as Events, so the input is an
Event. Using a string would be bad as functions like Simplify are
called on the arguments, which wouldn't be present on a string.
Fixes: 9d5da30e4a ("perf jevents: Add a new expression builtin strcmp_cpuid_str()")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230914022204.1488383-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Make part of an existing TODO conditional to avoid the following build
error:
```
tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c:26:14: error: cannot combine with previous 'char' declaration specifier
26 | typedef char bool;
| ^
include/stdbool.h:20:14: note: expanded from macro 'bool'
20 | #define bool _Bool
| ^
tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c:26:1: error: typedef requires a name [-Werror,-Wmissing-declarations]
26 | typedef char bool;
| ^~~~~~~~~~~~~~~~~
2 errors generated.
```
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230913184957.230076-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Instructions with sign- and zero- extention like movsbl and movzwq were
not handled properly. As it can check different size suffix (-b, -w, -l
or -q) we can omit that and add the common parts even though some
combinations are not possible.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230908052216.566148-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
pmu_events_table__find() is no longer used so remove it and its Arm
specific version.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230913153355.138331-4-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Currently the while loop always either exits on the first iteration with
a core PMU, or exits with NULL on heterogeneous systems or when not all
CPUs are online.
Both of the latter behaviors are undesirable for platforms other than
Arm so simplify it to always return the first core PMU, or NULL if none
exist.
This behavior was depended on by the Arm version of
pmu_metrics_table__find(), so the logic has been moved there instead.
Signed-off-by: James Clark <james.clark@arm.com>
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230913153355.138331-3-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
pmu__find_core_pmu() more logically belongs in pmus.c because it
iterates over all PMUs, so move it to pmus.c
At the same time rename it to perf_pmus__find_core_pmu() to match the
naming convention in this file.
list_prepare_entry() can't be used in perf_pmus__scan_core() anymore now
that it's called from the same compilation unit. This is with -O2
(specifically -O1 -ftree-vrp -finline-functions
-finline-small-functions) which allow the bounds of the array
access to be determined at compile time. list_prepare_entry() subtracts
the offset of the 'list' member in struct perf_pmu from &core_pmus,
which isn't a struct perf_pmu. The compiler sees that pmu results in
&core_pmus - 8 and refuses to compile. At runtime this works because
list_for_each_entry_continue() always adds the offset back again before
dereferencing ->next, but it's technically undefined behavior. With
-fsanitize=undefined an additional warning is generated.
Using list_first_entry_or_null() to get the first entry here avoids
doing &core_pmus - 8 but has the same result and fixes both the compile
warning and the undefined behavior warning. There are other uses of
list_prepare_entry() in pmus.c, but the compiler doesn't seem to be
able to see that they can also be called with &core_pmus, so I won't
change any at this time.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230913153355.138331-2-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The node (nd) may be NULL and pointer arithmetic on NULL is undefined
behavior. Move the computation of next below the NULL check on the
node.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20230914044233.1550195-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To keep perf building in systems where types and defines used in this
new benchmark are not available, such as:
12 13.46 centos:stream : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-20) (GCC)
bench/sched-seccomp-notify.c: In function 'user_notif_syscall':
bench/sched-seccomp-notify.c:55:27: error: 'SECCOMP_RET_USER_NOTIF' undeclared (first use in this function); did you mean 'SECCOMP_RET_ERRNO'?
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF),
^~~~~~~~~~~~~~~~~~~~~~
/git/perf-6.6.0-rc1/tools/include/uapi/linux/filter.h:49:59: note: in definition of macro 'BPF_STMT'
#define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k }
^
bench/sched-seccomp-notify.c:55:27: note: each undeclared identifier is reported only once for each function it appears in
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF),
^~~~~~~~~~~~~~~~~~~~~~
/git/perf-6.6.0-rc1/tools/include/uapi/linux/filter.h:49:59: note: in definition of macro 'BPF_STMT'
#define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k }
^
bench/sched-seccomp-notify.c:55:3: error: missing initializer for field 'k' of 'struct sock_filter' [-Werror=missing-field-initializers]
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF),
^~~~~~~~
In file included from bench/sched-seccomp-notify.c:5:
/git/perf-6.6.0-rc1/tools/include/uapi/linux/filter.h:28:8: note: 'k' declared here
__u32 k; /* Generic multiuse field */
^
bench/sched-seccomp-notify.c: In function 'user_notification_sync_loop':
bench/sched-seccomp-notify.c:70:28: error: storage size of 'resp' isn't known
struct seccomp_notif_resp resp;
^~~~
bench/sched-seccomp-notify.c:71:23: error: storage size of 'req' isn't known
struct seccomp_notif req;
^~~
bench/sched-seccomp-notify.c:76:23: error: 'SECCOMP_IOCTL_NOTIF_RECV' undeclared (first use in this function); did you mean 'SECCOMP_MODE_STRICT'?
if (ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req))
^~~~~~~~~~~~~~~~~~~~~~~~
SECCOMP_MODE_STRICT
bench/sched-seccomp-notify.c:86:23: error: 'SECCOMP_IOCTL_NOTIF_SEND' undeclared (first use in this function); did you mean 'SECCOMP_RET_ACTION'?
if (ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp))
^~~~~~~~~~~~~~~~~~~~~~~~
SECCOMP_RET_ACTION
bench/sched-seccomp-notify.c:71:23: error: unused variable 'req' [-Werror=unused-variable]
struct seccomp_notif req;
^~~
bench/sched-seccomp-notify.c:70:28: error: unused variable 'resp' [-Werror=unused-variable]
struct seccomp_notif_resp resp;
^~~~
14 11.31 debian:10 : FAIL gcc version 8.3.0 (Debian 8.3.0-6)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Kook <keescook@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/ZQGhjaojgOGtSNk6@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The new 'perf bench' for sched-seccomp-notify uses defines and types not
available in older systems where we want to have perf available, so grab
a copy of this UAPI from the kernel sources to allow that.
This will be checked in the future for drift from the original when we
build the perf tool, that will warn when that happens like:
make: Entering directory '/var/home/acme/git/perf-tools/tools/perf'
BUILD: Doing 'make -j32' parallel build
Warning: Kernel ABI header differences:
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Kook <keescook@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/ZQGhMXtwX7RvV3ya@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
YYDEBUG enables line numbers and other error helpers in the generated
bpf-filter-bison.c. Conditionally enabled only for debug builds.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230911170559.4037734-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
YYDEBUG enables line numbers and other error helpers in the generated
pmu-bison.c. Conditionally enabled only for debug builds.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230911170559.4037734-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
YYDEBUG enables line numbers and other error helpers in the generated
expr-bison.c. These shouldn't be generated when debugging
isn't enabled.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230911170559.4037734-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
YYDEBUG enables line numbers and other error helpers in the generated
parse-events-bison.c. These shouldn't be generated when debugging
isn't enabled.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230911170559.4037734-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The fnmatch header is now used in the PMU matching logic in pmu.c.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230911170559.4037734-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Makefile.perf uses "CONFIG_*" checks in the code. Example the config for
libtraceevent is used to set PYTHON_EXT_SRCS
ifeq ($(CONFIG_LIBTRACEEVENT),y)
PYTHON_EXT_SRCS := $(shell grep -v ^\# util/python-ext-sources)
else
PYTHON_EXT_SRCS := $(shell grep -v '^\#\|util/trace-event.c' util/python-ext-sources)
endif
But this is not picking the value for CONFIG_LIBTRACEEVENT that is set
using the settings in Makefile.config. Include the file
".config-detected" so that make will use the system detected
configuration in the CONFIG checks.
This will fix isues that could arise when other "CONFIG_*" checks are
added to Makefile.perf in future as well.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230912063807.74250-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Metrics for V1 weren't previously included in the Perf Jsons, so add
them using the telemetry source [1].
After generation any parts identical to the default metrics in sbsa.json
were manually removed.
[1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-v1.json
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Forrington <nick.forrington@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230831161618.134738-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The new data [1] includes descriptions that may have product specific
details and new groupings that will be consistent with other products.
The following command was used to generate the jsons:
$ telemetry-solution/tools/perf_json_generator/generate.py \
linux/tools/perf/ --telemetry-files \
telemetry-solution/data/pmu/cpu/neoverse/neoverse-v1.json
[1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-v1.json
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Forrington <nick.forrington@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230831161618.134738-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test that the new expression builtin returns a match when the current
escaped CPU ID is given, and that it doesn't match when "0x0" is given.
The CPU ID in test__expr() has to be changed to perf_pmu__getcpuid()
which returns the CPU ID string, rather than the raw CPU ID that
get_cpuid() returns because that can't be used with strcmp_cpuid_str().
It doesn't affect the is_intel test because both versions contain
"Intel".
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230904095104.1162928-5-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It finds all occurrences of a single character and replaces them with
a multi character string. This will be used in a test in a following
commit.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230904095104.1162928-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'cpuid_not_more_than' was the working title of the new
'strcmp_cpuid_str' keyword and was accidentally left in. It was never
used so tidying it up has no effect.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230904095104.1162928-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the function always returns 0, so even when the has_event()
test fails, the test still passes. Fix it by returning ret instead.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230904095104.1162928-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With paranoia set at 2 evsel__open will fail with EACCES for non-root
users. To avoid this stopping libpfm4 events from being printed, retry
with exclude_kernel enabled - copying the regular is_event_supported
test.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230906234416.3472339-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the first core PMU instead.
On a Raspberry Pi, before:
$ perf list
...
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
[(see 'man perf-list' on how to encode it)]
...
After:
$ perf list
...
armv8_cortex_a72/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
[(see 'man perf-list' on how to encode it)]
...
```
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230906234416.3472339-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -G/--cgroup-filter is to limit lock contention collection on the
tasks in the specific cgroups only.
$ sudo ./perf lock con -abt -G /user.slice/.../vte-spawn-52221fb8-b33f-4a52-b5c3-e35d1e6fc0e0.scope \
./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.174 [sec]
contended total wait max wait avg wait pid comm
4 114.45 us 60.06 us 28.61 us 214847 sched-messaging
2 111.40 us 60.84 us 55.70 us 214848 sched-messaging
2 106.09 us 59.42 us 53.04 us 214837 sched-messaging
1 81.70 us 81.70 us 81.70 us 214709 sched-messaging
68 78.44 us 6.83 us 1.15 us 214633 sched-messaging
69 73.71 us 2.69 us 1.07 us 214632 sched-messaging
4 72.62 us 60.83 us 18.15 us 214850 sched-messaging
2 71.75 us 67.60 us 35.88 us 214840 sched-messaging
2 69.29 us 67.53 us 34.65 us 214804 sched-messaging
2 69.00 us 68.23 us 34.50 us 214826 sched-messaging
...
Export cgroup__new() function as it's needed from outside.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --lock-cgroup option shows lock contention stats break down by
cgroups.
Add LOCK_AGGR_CGROUP mode and use it instead of use_cgroup field.
$ sudo ./perf lock con -ab --lock-cgroup sleep 1
contended total wait max wait avg wait cgroup
8 15.70 us 6.34 us 1.96 us /
2 1.48 us 747 ns 738 ns /user.slice/.../app.slice/app-gnome-google\x2dchrome-6442.scope
1 848 ns 848 ns 848 ns /user.slice/.../session.slice/org.gnome.Shell@x11.service
1 220 ns 220 ns 220 ns /user.slice/.../session.slice/pipewire-pulse.service
For now, the cgroup mode only works with BPF (-b).
Committer notes:
Remove -g as it is used in the other tools with a clear meaning of
collect/show callchains. As agreed with Namhyung off list.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Save cgroup info and display cgroup names if requested. This is a
preparation for the next patch.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The read_all_cgroups() is to build a tree of cgroups in the system and
users can look up a cgroup using __cgroup_find().
Committer notes:
Had to do this to cover that #else block:
-static inline u64 __read_cgroup_id(const char *path) { return -1ULL; }
+static inline u64 __read_cgroup_id(const char *path __maybe_unused) { return -1ULL; }
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>