IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- IPC and cycle accounting in 'perf annotate'. (Andi Kleen)
- Display cycles in branch sort mode in 'perf report'. (Andi Kleen)
- Add total time column to 'perf trace' syscall stats summary. (Milian Woff)
Infrastructure changes:
- PMU helpers to use in Intel PT. (Adrian Hunter)
- Fix perf-with-kcore script not to split args with spaces. (Adrian Hunter)
- Add empty Build files for some more architectures. (Ben Hutchings)
- Move 'perf stat' config variables to a struct to allow using some
of its functions in more places. (Jiri Olsa)
- Add DWARF register names for 'xtensa' arch. (Max Filippov)
- Implement BPF programs attached to uprobes. (Wang Nan)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently the value of a PMU config term is silently truncated if it is
too big. This is an impediment to validating the value for other
criteria later on i.e. the user provides an invalid value that gets
truncated to a valid one.
The maximum value validation is only done for the parser where the error
is passed back to the user. In other cases the silent truncation
continues so as not to affect tools that perhaps rely on it.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-16-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that we can process branch data in annotate it makes sense to
support enabling branch recording from top too. Most of the code needed
for this is already in shared code with report. But we need to add:
- The option parsing code (using shared code from the previous patch)
- Document the options
- Set up the IPC/cycles accounting state in the top session
- Call the accounting code in the hist iter callback
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-8-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add two new columns to the annotate display and display the average
cycles and the compute IPC if available.
When the LBR was not in any branch mode the IPC computation is
automatically disabled. We still display the cycle information.
Example output (with made up numbers):
The second column is the IPC and third average cycles.
│ __attribute__((noinline)) f2()
│ {
5.15 0.07 │ push %rbp
0.01 0.07 │ mov %rsp,%rbp
│ c = a / b;
9.87 0.07 │ mov a,%eax
0.07 │ mov b,%ecx
0.07 │ cltd
4.92 0.07 123│ idiv %ecx
70.79 0.07 │ mov %eax,__TMC_END__
│ }
9.25 0.07 │ pop %rbp
0.01 0.07 123│ ← retq
v2: Fix display problems.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-7-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Compute the IPC and the basic block cycles for the annotate display.
IPC is computed by counting the instructions, and then dividing the
accounted cycles by that count.
The actual IPC computation can only be done at annotate time, because we
need to parse the objdump output first to know the number of
instructions in the basic block.
The cycles/IPC are also put into the perf function annotation so that
the display code can show them.
Again basic block overlaps are not handled, with the longest winning,
but there are some heuristics to hide the IPC when the longest is not
the most common.
v2: Compute IPC correctly.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-6-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This adds the basic infrastructure to keep track of cycle counts per
basic block for annotate. We allocate an array similar to the normal
accounting, and then account branch cycles there.
We handle two cases:
cycles per basic block with start and cycles per branch (these are later
used for either IPC or just cycles per BB)
In the start case we cannot handle overlaps, so always the longest basic
block wins.
For the cycles per branch case everything is accurately accounted.
v2: Remove unnecessary checks. Slight restructure. Move
symbol__get_annotation to another patch. Move histogram allocation.
v3: Merged with current tree
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-4-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
cycles is a new branch_info field available on some CPUs that indicates
the time deltas between branches in the LBR.
Add a sort key and output code for the cycles to allow to display the
basic block cycles individually in perf report.
We also pass in the cycles for weight when LBRs are processed, which
allows to get global and local weight, to get an estimate of the total
cost.
And also print the cycles information for perf report -D. I also added
printing for the previously missing LBR flags (mispredict etc.)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf currently fails to build on MIPS as there is no
tools/perf/arch/mips/Build file. Adding an empty file fixes this as
there are no MIPS-specific sources to build.
It looks like the same is needed for Alpha and PA-RISC, though I
haven't been able to test those.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 5e8c0fb6a9 ("perf build: Add arch x86 objects building")
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1438704627.7315.2.camel@decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit e1abf2cc8d ("bpf: Fix the build on
BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable")
updated the building condition of bpf_trace.o from CONFIG_BPF_SYSCALL
to CONFIG_BPF_EVENTS, but the corresponding #ifdef controller in
trace_events.h for trace_call_bpf() was not changed. Which, in theory,
is incorrect.
With current Kconfigs, we can create a .config with CONFIG_BPF_SYSCALL=y
and CONFIG_BPF_EVENTS=n by unselecting CONFIG_KPROBE_EVENT and
selecting CONFIG_BPF_SYSCALL. With these options, trace_call_bpf() will
be defined as an extern function, but if anyone calls it a symbol missing
error will be triggered since bpf_trace.o was not built.
This patch changes the #ifdef controller for trace_call_bpf() from
CONFIG_BPF_SYSCALL to CONFIG_BPF_EVENTS. I'll show its correctness:
Before this patch:
BPF_SYSCALL BPF_EVENTS trace_call_bpf bpf_trace.o
y y normal compiled
n n inline not compiled
y n normal not compiled (incorrect)
n y impossible (BPF_EVENTS depends on BPF_SYSCALL)
After this patch:
BPF_SYSCALL BPF_EVENTS trace_call_bpf bpf_trace.o
y y normal compiled
n n inline not compiled
y n inline not compiled (fixed)
n y impossible (BPF_EVENTS depends on BPF_SYSCALL)
So this patch doesn't break anything. QED.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Deref sys_enter pointer args with contents from probe:vfs_getname, showing
pathnames instead of pointers in many syscalls in 'perf trace'. (Arnaldo Carvalho de Melo)
- Make 'perf trace' write to stderr by default, just like 'strace'. (Milian Woff)
Infrastructure changes:
- color_vfprintf() fixes. (Andi Kleen, Jiri Olsa)
- Allow enabling/disabling PERF_SAMPLE_TIME per event. (Kan Liang)
- Fix build errors with mipsel-linux-uclibc compiler. (Petri Gynther)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Without this patch, it is cumbersome to read the trace output but
ignoring the normal, potentially verbose, output of the debuggee. One
common example is doing something like the following:
perf trace -s find /tmp > /dev/null
Without this patch, the trace summary will be lost. Now, it will still
be printed at the end. This behavior is also applied by strace.
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/n/tip-tqnks6y2cnvm5f9g2dsfr7zl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To work like strace and dereference syscall pointer args we need to
insert probes (or tracepoints) right after we copy those bytes from
userspace.
Since we're formatting the syscall args at raw_syscalls:sys_enter time,
we need to have a formatter that just stores the position where, later,
when we get the probe:vfs_getname, we can insert the pointer contents.
Now, if a probe:vfs_getname with this format is in place:
# perf probe -l
probe:vfs_getname (on getname_flags:72@/home/git/linux/fs/namei.c with pathname)
That was, in this case, put in place with:
# perf probe 'vfs_getname=getname_flags:72 pathname=filename:string'
Added new event:
probe:vfs_getname (on getname_flags:72 with pathname=filename:string)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_getname -aR sleep 1
#
Then 'perf trace' will notice that and do the pointer -> contents
expansion:
# trace -e open touch /tmp/bla
0.165 (0.010 ms): touch/17752 open(filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
0.195 (0.011 ms): touch/17752 open(filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
0.512 (0.012 ms): touch/17752 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
0.582 (0.012 ms): touch/17752 open(filename: /tmp/bla, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: 438) = 3
#
Roughly equivalent to strace's output:
# strace -rT -e open touch /tmp/bla
0.000000 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000039>
0.000317 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000102>
0.001461 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 <0.000072>
0.000405 open("/tmp/bla", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 <0.000055>
0.000641 +++ exited with 0 +++
#
Now we need to either look for at all syscalls that are marked as
pointers and have some well known names ("filename", "pathname", etc)
and set the arg formatter to the one used for the "open" syscall in this
patch.
This implementation works for syscalls with just a string being copied
from userspace, for matching syscalls with more than one string being
copied via the same probe/trace point (vfs_getname) we need to extend
the vfs_getname probe spec to include the pointer too, but there are
some problems with that in 'perf probe' or the kernel kprobes code, need
to investigate before considering supporting multiple strings per
syscall.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-xvuwx6nuj8cf389kf9s2ue2s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Vince Weaver and Stephane Eranian reported warnings in the PEBS
code when running the perf fuzzer. Stephane wrote:
> I can reproduce the problem on my HSW running the fuzzer.
>
> I can see why this could be happening if you are mixing PEBS and non PEBS events
> in the bottom 4 counters. I suspect:
> for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
> if ((counts[bit] == 0) && (error[bit] == 0))
> continue;
>
> This test is not correct when you have non-PEBS events mixed with
> PEBS events and they overflow at the same time. They will have
> counts[i] != 0 but error[i] == 0, and thus you fall thru the loop
> and hit the assert. Or it is something along those lines.
The only way I can make this work is if ->status only has !PEBS events
set, because if it has both set we'll take that slow path which masks
out the !PEBS bits.
After masking there are 3 options:
- there is one bit set, and its @bit, we increment counts[bit].
- there are multiple bits set, we increment error[] for each set bit,
we do not increment counts[].
- there are no bits set, we do nothing.
The intent was to never increment counts[] for !PEBS events.
Now if we start out with only a single !PEBS event set, we'll pass the
test and increment counts[] for a !PEBS and hit the warn.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>