2379 Commits

Author SHA1 Message Date
Miguel Ojeda
b8a94bfb33 kallsyms: increase maximum kernel symbol length to 512
Rust symbols can become quite long due to namespacing introduced
by modules, types, traits, generics, etc. For instance,
the following code:

    pub mod my_module {
        pub struct MyType;
        pub struct MyGenericType<T>(T);

        pub trait MyTrait {
            fn my_method() -> u32;
        }

        impl MyTrait for MyGenericType<MyType> {
            fn my_method() -> u32 {
                42
            }
        }
    }

generates a symbol of length 96 when using the upcoming v0 mangling scheme:

    _RNvXNtCshGpAVYOtgW1_7example9my_moduleINtB2_13MyGenericTypeNtB2_6MyTypeENtB2_7MyTrait9my_method

At the moment, Rust symbols may reach up to 300 in length.
Setting 512 as the maximum seems like a reasonable choice to
keep some headroom.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com>
Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>
Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-09-28 08:56:25 +02:00
Andrii Nakryiko
87dbdc230d libbpf: Don't require full struct enum64 in UAPI headers
Drop the requirement for system-wide kernel UAPI headers to provide full
struct btf_enum64 definition. This is an unexpected requirement that
slipped in libbpf 1.0 and put unnecessary pressure ([0]) on users to have
a bleeding-edge kernel UAPI header from unreleased Linux 6.0.

To achieve this, we forward declare struct btf_enum64. But that's not
enough as there is btf_enum64_value() helper that expects to know the
layout of struct btf_enum64. So we get a bit creative with
reinterpreting memory layout as array of __u32 and accesing lo32/hi32
fields as array elements. Alternative way would be to have a local
pointer variable for anonymous struct with exactly the same layout as
struct btf_enum64, but that gets us into C++ compiler errors complaining
about invalid type casts. So play it safe, if ugly.

  [0] Closes: https://github.com/libbpf/libbpf/issues/562

Fixes: d90ec262b35b ("libbpf: Add enum64 support for btf_dump")
Reported-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/bpf/20220927042940.147185-1-andrii@kernel.org
2022-09-27 20:45:17 +02:00
Jon Doron
6a4ab8869d libbpf: Fix the case of running as non-root with capabilities
When running rootless with special capabilities like:
FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH

The "access" API will not make the proper check if there is really
access to a file or not.

>From the access man page:
"
The check is done using the calling process's real UID and GID, rather
than the effective IDs as is done when actually attempting an operation
(e.g., open(2)) on the file.  Similarly, for the root user, the check
uses the set of permitted capabilities  rather than the set of effective
capabilities; ***and for non-root users, the check uses an empty set of
capabilities.***
"

What that means is that for non-root user the access API will not do the
proper validation if the process really has permission to a file or not.

To resolve this this patch replaces all the access API calls with
faccessat with AT_EACCESS flag.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com
2022-09-26 21:38:32 -07:00
Andrii Nakryiko
dbdea9b36f libbpf: restore memory layout of bpf_object_open_opts
When attach_prog_fd field was removed in libbpf 1.0 and replaced with
`long: 0` placeholder, it actually shifted all the subsequent fields by
8 byte. This is due to `long: 0` promising to adjust next field's offset
to long-aligned offset. But in this case we were already long-aligned
as pin_root_path is a pointer. So `long: 0` had no effect, and thus
didn't feel the gap created by removed attach_prog_fd.

Non-zero bitfield should have been used instead. I validated using
pahole. Originally kconfig field was at offset 40. With `long: 0` it's
at offset 32, which is wrong. With this change it's back at offset 40.

While technically libbpf 1.0 is allowed to break backwards
compatibility and applications should have been recompiled against
libbpf 1.0 headers, but given how trivial it is to preserve memory
layout, let's fix this.

Reported-by: Grant Seltzer Richman <grantseltzer@gmail.com>
Fixes: 146bf811f5ac ("libbpf: remove most other deprecated high-level APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220923230559.666608-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-09-23 16:19:37 -07:00
Wang Yufen
e588c116df libbpf: Add pathname_concat() helper
Move snprintf and len check to common helper pathname_concat() to make the
code simpler.

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1663828124-10437-1-git-send-email-wangyufen@huawei.com
2022-09-23 14:18:03 -07:00
Jakub Kicinski
0140a7168f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/freescale/fec.h
  7b15515fc1ca ("Revert "fec: Restart PPS after link state change"")
  40c79ce13b03 ("net: fec: add stop mode support for imx8 platform")
https://lore.kernel.org/all/20220921105337.62b41047@canb.auug.org.au/

drivers/pinctrl/pinctrl-ocelot.c
  c297561bc98a ("pinctrl: ocelot: Fix interrupt controller")
  181f604b33cd ("pinctrl: ocelot: add ability to be used in a non-mmio configuration")
https://lore.kernel.org/all/20220921110032.7cd28114@canb.auug.org.au/

tools/testing/selftests/drivers/net/bonding/Makefile
  bbb774d921e2 ("net: Add tests for bonding and team address list management")
  152e8ec77640 ("selftests/bonding: add a test for bonding lladdr target")
https://lore.kernel.org/all/20220921110437.5b7dbd82@canb.auug.org.au/

drivers/net/can/usb/gs_usb.c
  5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition")
  45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support")
https://lore.kernel.org/all/84f45a7d-92b6-4dc5-d7a1-072152fab6ff@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 13:02:10 -07:00
Tao Chen
01f2e36c95 libbpf: Support raw BTF placed in the default search path
Currently, the default vmlinux files at '/boot/vmlinux-*',
'/lib/modules/*/vmlinux-*' etc. are parsed with 'btf__parse_elf()' to
extract BTF. It is possible that these files are actually raw BTF files
similar to /sys/kernel/btf/vmlinux. So parse these files with
'btf__parse' which tries both raw format and ELF format.

This might be useful in some scenarios where users put their custom BTF
into known locations and don't want to specify btf_custom_path option.

Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/3f59fb5a345d2e4f10e16fe9e35fbc4c03ecaa3e.1662999860.git.chentao.kernel@linux.alibaba.com
2022-09-21 17:26:16 -07:00
Yonghong Song
9f2f5d7830 libbpf: Improve BPF_PROG2 macro code quality and description
Commit 34586d29f8df ("libbpf: Add new BPF_PROG2 macro") added BPF_PROG2
macro for trampoline based programs with struct arguments. Andrii
made a few suggestions to improve code quality and description.
This patch implemented these suggestions including better internal
macro name, consistent usage pattern for __builtin_choose_expr(),
simpler macro definition for always-inline func arguments and
better macro description.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220910025214.1536510-1-yhs@fb.com
2022-09-21 17:05:31 -07:00
David Vernet
b66ccae01f bpf: Add libbpf logic for user-space ring buffer
Now that all of the logic is in place in the kernel to support user-space
produced ring buffers, we can add the user-space logic to libbpf. This
patch therefore adds the following public symbols to libbpf:

struct user_ring_buffer *
user_ring_buffer__new(int map_fd,
		      const struct user_ring_buffer_opts *opts);
void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                         __u32 size, int timeout_ms);
void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
void user_ring_buffer__discard(struct user_ring_buffer *rb,
void user_ring_buffer__free(struct user_ring_buffer *rb);

A user-space producer must first create a struct user_ring_buffer * object
with user_ring_buffer__new(), and can then reserve samples in the
ring buffer using one of the following two symbols:

void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                         __u32 size, int timeout_ms);

With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring
buffer will be returned if sufficient space is available in the buffer.
user_ring_buffer__reserve_blocking() provides similar semantics, but will
block for up to 'timeout_ms' in epoll_wait if there is insufficient space
in the buffer. This function has the guarantee from the kernel that it will
receive at least one event-notification per invocation to
bpf_ringbuf_drain(), provided that at least one sample is drained, and the
BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain().

Once a sample is reserved, it must either be committed to the ring buffer
with user_ring_buffer__submit(), or discarded with
user_ring_buffer__discard().

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-21 16:25:03 -07:00
David Vernet
583c1f4201 bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type
We want to support a ringbuf map type where samples are published from
user-space, to be consumed by BPF programs. BPF currently supports a
kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF
map type.  We'll need to define a new map type for user-space -> kernel,
as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply
to a user-space producer ring buffer, and we'll want to add one or
more helper functions that would not apply for a kernel-producer
ring buffer.

This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type
definition. The map type is useless in its current form, as there is no
way to access or use it for anything until we one or more BPF helpers. A
follow-on patch will therefore add a new helper function that allows BPF
programs to run callbacks on samples that are published to the ring
buffer.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
2022-09-21 16:24:17 -07:00
Yury Norov
6333cb31a7 tools: sync find_bit() implementation
Sync find_first_bit() and find_next_bit() implementation with the
mother kernel.

Also, drop unused find_last_bit() and find_next_clump8().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-21 12:21:44 -07:00
Adrian Hunter
6cc4479645 libperf evlist: Fix polling of system-wide events
Originally, (refer commit f90d194a867a5a1d ("perf evlist: Do not poll
events that use the system_wide flag") there wasn't much reason to poll
system-wide events because:

 1. The mmaps get "merged" via set-output anyway (the per-cpu case)
 2. perf reads all mmaps when any event is woken
 3. system-wide mmaps do not fill up as fast as the mmaps for user
    selected events

But there was 1 reason not to poll which was that it prevented correct
termination due to POLLHUP on all user selected events.  That issue is
now easily resolved by using fdarray_flag__nonfilterable.

With the advent of commit ae4f8ae16a078964 ("libperf evlist: Allow
mixing per-thread and per-cpu mmaps"), system-wide mmaps can be used
also in the per-thread case where reason 1 does not apply.

Fix the omission of system-wide events from polling by using the
fdarray_flag__nonfilterable flag.

Example:

 Before:

    $ perf record --no-bpf-event -vvv -e intel_pt// --per-thread uname 2>err.txt
    Linux
    $ grep 'sys_perf_event_open.*=\|pollfd' err.txt
    sys_perf_event_open: pid 155076  cpu -1  group_fd -1  flags 0x8 = 5
    sys_perf_event_open: pid 155076  cpu -1  group_fd -1  flags 0x8 = 6
    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 7
    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 9
    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 10
    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 11
    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 12
    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 13
    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 14
    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 15
    thread_data[0x55fb43c29e80]: pollfd[0] <- event_fd=5
    thread_data[0x55fb43c29e80]: pollfd[1] <- event_fd=6
    thread_data[0x55fb43c29e80]: pollfd[2] <- non_perf_event fd=4

 After:

    $ perf record --no-bpf-event -vvv -e intel_pt// --per-thread uname 2>err.txt
    Linux
    $ grep 'sys_perf_event_open.*=\|pollfd' err.txt
    sys_perf_event_open: pid 156316  cpu -1  group_fd -1  flags 0x8 = 5
    sys_perf_event_open: pid 156316  cpu -1  group_fd -1  flags 0x8 = 6
    sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 7
    sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 9
    sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 10
    sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 11
    sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 12
    sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 13
    sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 14
    sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 15
    thread_data[0x55cc19e58e80]: pollfd[0] <- event_fd=5
    thread_data[0x55cc19e58e80]: pollfd[1] <- event_fd=6
    thread_data[0x55cc19e58e80]: pollfd[2] <- event_fd=7
    thread_data[0x55cc19e58e80]: pollfd[3] <- event_fd=9
    thread_data[0x55cc19e58e80]: pollfd[4] <- event_fd=10
    thread_data[0x55cc19e58e80]: pollfd[5] <- event_fd=11
    thread_data[0x55cc19e58e80]: pollfd[6] <- event_fd=12
    thread_data[0x55cc19e58e80]: pollfd[7] <- event_fd=13
    thread_data[0x55cc19e58e80]: pollfd[8] <- event_fd=14
    thread_data[0x55cc19e58e80]: pollfd[9] <- event_fd=15
    thread_data[0x55cc19e58e80]: pollfd[10] <- non_perf_event fd=4

Fixes: ae4f8ae16a078964 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220915122612.81738-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-21 16:08:00 -03:00
Xin Liu
7620bffbf7 libbpf: Fix NULL pointer exception in API btf_dump__dump_type_data
We found that function btf_dump__dump_type_data can be called by the
user as an API, but in this function, the `opts` parameter may be used
as a null pointer.This causes `opts->indent_str` to trigger a NULL
pointer exception.

Fixes: 2ce8450ef5a3 ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts")
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Weibin Kong <kongweibin2@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220917084809.30770-1-liuxin350@huawei.com
2022-09-20 17:34:09 -07:00
Xin Liu
dc567045f1 libbpf: Clean up legacy bpf maps declaration in bpf_helpers
Legacy BPF map declarations are no longer supported in libbpf v1.0 [0].
Only BTF-defined maps are supported starting from v1.0, so it is time to
remove the definition of bpf_map_def in bpf_helpers.h.

  [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0

Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220913073643.19960-1-liuxin350@huawei.com
2022-09-16 22:56:09 +02:00
Andrii Nakryiko
749c202cb6 libbpf: Fix crash if SEC("freplace") programs don't have attach_prog_fd set
Fix SIGSEGV caused by libbpf trying to find attach type in vmlinux BTF
for freplace programs. It's wrong to search in vmlinux BTF and libbpf
doesn't even mark vmlinux BTF as required for freplace programs. So
trying to search anything in obj->vmlinux_btf might cause NULL
dereference if nothing else in BPF object requires vmlinux BTF.

Instead, error out if freplace (EXT) program doesn't specify
attach_prog_fd during at the load time.

Fixes: 91abb4a6d79d ("libbpf: Support attachment of BPF tracing programs to kernel modules")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220909193053.577111-3-andrii@kernel.org
2022-09-16 22:39:37 +02:00
Daniel Borkmann
665f5d3577 libbpf: Remove gcc support for bpf_tail_call_static for now
This reverts commit 14e5ce79943a ("libbpf: Add GCC support for
bpf_tail_call_static"). Reason is that gcc invented their own BPF asm
which is not conform with LLVM one, and going forward this would be
more painful to maintain here and in other areas of the library. Thus
remove it; ask to gcc folks is to align with LLVM one to use exact
same syntax.

Fixes: 14e5ce79943a ("libbpf: Add GCC support for bpf_tail_call_static")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: James Hilliard <james.hilliard1@gmail.com>
Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
2022-09-09 16:24:51 +02:00
Adrian Hunter
7864d8f7c0 libperf evlist: Fix per-thread mmaps for multi-threaded targets
The offending commit removed mmap_per_thread(), which did not consider
the different set-output rules for per-thread mmaps i.e. in the per-thread
case set-output is used for file descriptors of the same thread not the
same cpu.

This was not immediately noticed because it only happens with
multi-threaded targets and we do not have a test for that yet.

Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
events i.e. to continue to allow the mixing of per-thread and per-cpu
mmaps.

Debug messages (with -vv) show the file descriptors that are opened with
sys_perf_event_open. New debug messages are added (needs -vvv) that show
also which file descriptors are mmapped and which are redirected with
set-output.

In the per-cpu case (cpu != -1) file descriptors for the same CPU are
set-output to the first file descriptor for that CPU.

In the per-thread case (cpu == -1) file descriptors for the same thread are
set-output to the first file descriptor for that thread.

Example (process 17489 has 2 threads):

 Before (but with new debug prints):

   $ perf record --no-bpf-event -vvv --per-thread -p 17489
   <SNIP>
   sys_perf_event_open: pid 17489  cpu -1  group_fd -1  flags 0x8 = 5
   sys_perf_event_open: pid 17490  cpu -1  group_fd -1  flags 0x8 = 6
   <SNIP>
   libperf: idx 0: mmapping fd 5
   libperf: idx 0: set output fd 6 -> 5
   failed to mmap with 22 (Invalid argument)

 After:

   $ perf record --no-bpf-event -vvv --per-thread -p 17489
   <SNIP>
   sys_perf_event_open: pid 17489  cpu -1  group_fd -1  flags 0x8 = 5
   sys_perf_event_open: pid 17490  cpu -1  group_fd -1  flags 0x8 = 6
   <SNIP>
   libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
   libperf: idx 0: mmapping fd 5
   libperf: idx 1: mmapping fd 6
   <SNIP>
   [ perf record: Woken up 2 times to write data ]
   [ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]

Per-cpu example (process 20341 has 2 threads, same as above):

   $ perf record --no-bpf-event -vvv -p 20341
   <SNIP>
   sys_perf_event_open: pid 20341  cpu 0  group_fd -1  flags 0x8 = 5
   sys_perf_event_open: pid 20342  cpu 0  group_fd -1  flags 0x8 = 6
   sys_perf_event_open: pid 20341  cpu 1  group_fd -1  flags 0x8 = 7
   sys_perf_event_open: pid 20342  cpu 1  group_fd -1  flags 0x8 = 8
   sys_perf_event_open: pid 20341  cpu 2  group_fd -1  flags 0x8 = 9
   sys_perf_event_open: pid 20342  cpu 2  group_fd -1  flags 0x8 = 10
   sys_perf_event_open: pid 20341  cpu 3  group_fd -1  flags 0x8 = 11
   sys_perf_event_open: pid 20342  cpu 3  group_fd -1  flags 0x8 = 12
   sys_perf_event_open: pid 20341  cpu 4  group_fd -1  flags 0x8 = 13
   sys_perf_event_open: pid 20342  cpu 4  group_fd -1  flags 0x8 = 14
   sys_perf_event_open: pid 20341  cpu 5  group_fd -1  flags 0x8 = 15
   sys_perf_event_open: pid 20342  cpu 5  group_fd -1  flags 0x8 = 16
   sys_perf_event_open: pid 20341  cpu 6  group_fd -1  flags 0x8 = 17
   sys_perf_event_open: pid 20342  cpu 6  group_fd -1  flags 0x8 = 18
   sys_perf_event_open: pid 20341  cpu 7  group_fd -1  flags 0x8 = 19
   sys_perf_event_open: pid 20342  cpu 7  group_fd -1  flags 0x8 = 20
   <SNIP>
   libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
   libperf: idx 0: mmapping fd 5
   libperf: idx 0: set output fd 6 -> 5
   libperf: idx 1: mmapping fd 7
   libperf: idx 1: set output fd 8 -> 7
   libperf: idx 2: mmapping fd 9
   libperf: idx 2: set output fd 10 -> 9
   libperf: idx 3: mmapping fd 11
   libperf: idx 3: set output fd 12 -> 11
   libperf: idx 4: mmapping fd 13
   libperf: idx 4: set output fd 14 -> 13
   libperf: idx 5: mmapping fd 15
   libperf: idx 5: set output fd 16 -> 15
   libperf: idx 6: mmapping fd 17
   libperf: idx 6: set output fd 18 -> 17
   libperf: idx 7: mmapping fd 19
   libperf: idx 7: set output fd 20 -> 19
   <SNIP>
   [ perf record: Woken up 7 times to write data ]
   [ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]

Fixes: ae4f8ae16a078964 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Reported-by: Tomáš Trnka <trnka@scm.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216441
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220905114209.8389-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-08 12:17:22 -03:00
Yonghong Song
34586d29f8 libbpf: Add new BPF_PROG2 macro
To support struct arguments in trampoline based programs,
existing BPF_PROG doesn't work any more since
the type size is needed to find whether a parameter
takes one or two registers. So this patch added a new
BPF_PROG2 macro to support such trampoline programs.

The idea is suggested by Andrii. For example, if the
to-be-traced function has signature like
  typedef struct {
       void *x;
       int t;
  } sockptr;
  int blah(sockptr x, char y);

In the new BPF_PROG2 macro, the argument can be
represented as
  __bpf_prog_call(
     ({ union {
          struct { __u64 x, y; } ___z;
          sockptr x;
        } ___tmp = { .___z = { ctx[0], ctx[1] }};
        ___tmp.x;
     }),
     ({ union {
          struct { __u8 x; } ___z;
          char y;
        } ___tmp = { .___z = { ctx[2] }};
        ___tmp.y;
     }));
In the above, the values stored on the stack are properly
assigned to the actual argument type value by using 'union'
magic. Note that the macro also works even if no arguments
are with struct types.

Note that new BPF_PROG2 works for both llvm16 and pre-llvm16
compilers where llvm16 supports bpf target passing value
with struct up to 16 byte size and pre-llvm16 will pass
by reference by storing values on the stack. With static functions
with struct argument as always inline, the compiler is able
to optimize and remove additional stack saving of struct values.

Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220831152707.2079473-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-06 19:51:14 -07:00
Paolo Abeni
2786bcff28 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-09-05

The following pull-request contains BPF updates for your *net-next* tree.

We've added 106 non-merge commits during the last 18 day(s) which contain
a total of 159 files changed, 5225 insertions(+), 1358 deletions(-).

There are two small merge conflicts, resolve them as follows:

1) tools/testing/selftests/bpf/DENYLIST.s390x

  Commit 27e23836ce22 ("selftests/bpf: Add lru_bug to s390x deny list") in
  bpf tree was needed to get BPF CI green on s390x, but it conflicted with
  newly added tests on bpf-next. Resolve by adding both hunks, result:

  [...]
  lru_bug                                  # prog 'printk': failed to auto-attach: -524
  setget_sockopt                           # attach unexpected error: -524                                               (trampoline)
  cb_refs                                  # expected error message unexpected error: -524                               (trampoline)
  cgroup_hierarchical_stats                # JIT does not support calling kernel function                                (kfunc)
  htab_update                              # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
  [...]

2) net/core/filter.c

  Commit 1227c1771dd2 ("net: Fix data-races around sysctl_[rw]mem_(max|default).")
  from net tree conflicts with commit 29003875bd5b ("bpf: Change bpf_setsockopt(SOL_SOCKET)
  to reuse sk_setsockopt()") from bpf-next tree. Take the code as it is from
  bpf-next tree, result:

  [...]
	if (getopt) {
		if (optname == SO_BINDTODEVICE)
			return -EINVAL;
		return sk_getsockopt(sk, SOL_SOCKET, optname,
				     KERNEL_SOCKPTR(optval),
				     KERNEL_SOCKPTR(optlen));
	}

	return sk_setsockopt(sk, SOL_SOCKET, optname,
			     KERNEL_SOCKPTR(optval), *optlen);
  [...]

The main changes are:

1) Add any-context BPF specific memory allocator which is useful in particular for BPF
   tracing with bonus of performance equal to full prealloc, from Alexei Starovoitov.

2) Big batch to remove duplicated code from bpf_{get,set}sockopt() helpers as an effort
   to reuse the existing core socket code as much as possible, from Martin KaFai Lau.

3) Extend BPF flow dissector for BPF programs to just augment the in-kernel dissector
   with custom logic. In other words, allow for partial replacement, from Shmulik Ladkani.

4) Add a new cgroup iterator to BPF with different traversal options, from Hao Luo.

5) Support for BPF to collect hierarchical cgroup statistics efficiently through BPF
   integration with the rstat framework, from Yosry Ahmed.

6) Support bpf_{g,s}et_retval() under more BPF cgroup hooks, from Stanislav Fomichev.

7) BPF hash table and local storages fixes under fully preemptible kernel, from Hou Tao.

8) Add various improvements to BPF selftests and libbpf for compilation with gcc BPF
   backend, from James Hilliard.

9) Fix verifier helper permissions and reference state management for synchronous
   callbacks, from Kumar Kartikeya Dwivedi.

10) Add support for BPF selftest's xskxceiver to also be used against real devices that
    support MAC loopback, from Maciej Fijalkowski.

11) Various fixes to the bpf-helpers(7) man page generation script, from Quentin Monnet.

12) Document BPF verifier's tnum_in(tnum_range(), ...) gotchas, from Shung-Hsi Yu.

13) Various minor misc improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (106 commits)
  bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.
  bpf: Remove usage of kmem_cache from bpf_mem_cache.
  bpf: Remove prealloc-only restriction for sleepable bpf programs.
  bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.
  bpf: Remove tracing program restriction on map types
  bpf: Convert percpu hash map to per-cpu bpf_mem_alloc.
  bpf: Add percpu allocation support to bpf_mem_alloc.
  bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.
  bpf: Adjust low/high watermarks in bpf_mem_cache
  bpf: Optimize call_rcu in non-preallocated hash map.
  bpf: Optimize element count in non-preallocated hash map.
  bpf: Relax the requirement to use preallocated hash maps in tracing progs.
  samples/bpf: Reduce syscall overhead in map_perf_test.
  selftests/bpf: Improve test coverage of test_maps
  bpf: Convert hash map to bpf_mem_alloc.
  bpf: Introduce any context BPF specific memory allocator.
  selftest/bpf: Add test for bpf_getsockopt()
  bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt()
  bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt()
  bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt()
  ...
====================

Link: https://lore.kernel.org/r/20220905161136.9150-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-06 23:21:18 +02:00
James Hilliard
14e5ce7994 libbpf: Add GCC support for bpf_tail_call_static
The bpf_tail_call_static function is currently not defined unless
using clang >= 8.

To support bpf_tail_call_static on GCC we can check if __clang__ is
not defined to enable bpf_tail_call_static.

We need to use GCC assembly syntax when the compiler does not define
__clang__ as LLVM inline assembly is not fully compatible with GCC.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220829210546.755377-1-james.hilliard1@gmail.com
2022-08-31 20:54:23 +02:00
Benjamin Tissoires
343949e107 libbpf: add map_get_fd_by_id and map_delete_elem in light skeleton
This allows to have a better control over maps from the kernel when
preloading eBPF programs.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220824134055.1328882-8-benjamin.tissoires@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25 18:52:29 -07:00
Jakub Kicinski
880b0dd94f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
  21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()")
  c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer")
https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/
https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25 16:07:42 -07:00
Namhyung Kim
6d395a5135 libperf: Add a test case for read formats
It checks a various combination of the read format settings and verify
it return the value in a proper position.  The test uses task-clock
software events to guarantee it's always active and sets enabled/running
time.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220819003644.508916-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-19 15:56:44 -03:00
Namhyung Kim
89e3106fa2 libperf: Handle read format in perf_evsel__read()
The perf_counts_values should be increased to read the new lost data.
Also adjust values after read according the read format.

This supports PERF_FORMAT_GROUP which has a different data format but
it's only available for leader events.  Currently it doesn't have an API
to read sibling (member) events in the group.  But users may read the
sibling event directly.

Also reading from mmap would be disabled when the read format has ID or
LOST bit as it's not exposed via mmap.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220819003644.508916-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-19 15:56:27 -03:00
Ian Rogers
b2f10cd4e8 perf cpumap: Fix alignment for masks in event encoding
A mask encoding of a cpu map is laid out as:

  u16 nr
  u16 long_size
  unsigned long mask[];

However, the mask may be 8-byte aligned meaning there is a 4-byte pad
after long_size. This means 32-bit and 64-bit builds see the mask as
being at different offsets. On top of this the structure is in the byte
data[] encoded as:

  u16 type
  char data[]

This means the mask's struct isn't the required 4 or 8 byte aligned, but
is offset by 2. Consequently the long reads and writes are causing
undefined behavior as the alignment is broken.

Fix the mask struct by creating explicit 32 and 64-bit variants, use a
union to avoid data[] and casts; the struct must be packed so the
layout matches the existing perf.data layout. Taking an address of a
member of a packed struct breaks alignment so pass the packed
perf_record_cpu_map_data to functions, so they can access variables with
the right alignment.

As the 64-bit version has 4 bytes of padding, optimizing writing to only
write the 32-bit version.

Committer notes:

Disable warnings about 'packed' that break the build in some arches like
riscv64, but just around that specific struct.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-19 15:30:28 -03:00
Ian Rogers
e989bc3d0f perf cpumap: Const map for max()
Allows max() to be used with 'const struct perf_cpu_maps *'.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-19 12:26:58 -03:00
Jakub Kicinski
3f5f728a72 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2022-08-17

We've added 45 non-merge commits during the last 14 day(s) which contain
a total of 61 files changed, 986 insertions(+), 372 deletions(-).

The main changes are:

1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt
   Kanzenbach and Jesper Dangaard Brouer.

2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko.

3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov.

4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires.

5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle
   unsupported names on old kernels, from Hangbin Liu.

6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton,
   from Hao Luo.

7) Relax libbpf's requirement for shared libs to be marked executable, from
   Henqgi Chen.

8) Improve bpf_iter internals handling of error returns, from Hao Luo.

9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard.

10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong.

11) bpftool improvements to handle full BPF program names better, from Manu
    Bretelle.

12) bpftool fixes around libcap use, from Quentin Monnet.

13) BPF map internals clean ups and improvements around memory allocations,
    from Yafang Shao.

14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup
    iterator to work on cgroupv1, from Yosry Ahmed.

15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong.

16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel
    Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
  selftests/bpf: Few fixes for selftests/bpf built in release mode
  libbpf: Clean up deprecated and legacy aliases
  libbpf: Streamline bpf_attr and perf_event_attr initialization
  libbpf: Fix potential NULL dereference when parsing ELF
  selftests/bpf: Tests libbpf autoattach APIs
  libbpf: Allows disabling auto attach
  selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm
  libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
  selftests/bpf: Update CI kconfig
  selftests/bpf: Add connmark read test
  selftests/bpf: Add existing connection bpf_*_ct_lookup() test
  bpftool: Clear errno after libcap's checks
  bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
  bpftool: Fix a typo in a comment
  libbpf: Add names for auxiliary maps
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Remove unneeded memset in queue_stack_map creation
  libbpf: preserve errno across pr_warn/pr_info/pr_debug
  ...
====================

Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 20:29:36 -07:00
Andrii Nakryiko
abf84b64e3 libbpf: Clean up deprecated and legacy aliases
Remove three missed deprecated APIs that were aliased to new APIs:
bpf_object__unload, bpf_prog_attach_xattr and btf__load.

Also move legacy API libbpf_find_kernel_btf (aliased to
btf__load_vmlinux_btf) into libbpf_legacy.h.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-4-andrii@kernel.org
2022-08-17 22:42:56 +02:00
Andrii Nakryiko
813847a314 libbpf: Streamline bpf_attr and perf_event_attr initialization
Make sure that entire libbpf code base is initializing bpf_attr and
perf_event_attr with memset(0). Also for bpf_attr make sure we
clear and pass to kernel only relevant parts of bpf_attr. bpf_attr is
a huge union of independent sub-command attributes, so there is no need
to clear and pass entire union bpf_attr, which over time grows quite
a lot and for most commands this growth is completely irrelevant.

Few cases where we were relying on compiler initialization of BPF UAPI
structs (like bpf_prog_info, bpf_map_info, etc) with `= {};` were
switched to memset(0) pattern for future-proofing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-3-andrii@kernel.org
2022-08-17 22:42:10 +02:00
Andrii Nakryiko
d4e6d684f3 libbpf: Fix potential NULL dereference when parsing ELF
Fix if condition filtering empty ELF sections to prevent NULL
dereference.

Fixes: 47ea7417b074 ("libbpf: Skip empty sections in bpf_object__init_global_data_maps")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-2-andrii@kernel.org
2022-08-17 22:42:10 +02:00
Hao Luo
43cb8cbadf libbpf: Allows disabling auto attach
Adds libbpf APIs for disabling auto-attach for individual functions.
This is motivated by the use case of cgroup iter [1]. Some iter
types require their parameters to be non-zero, therefore applying
auto-attach on them will fail. With these two new APIs, users who
want to use auto-attach and these types of iters can disable
auto-attach on the program and perform manual attach.

[1] https://lore.kernel.org/bpf/CAEf4BzZ+a2uDo_t6kGBziqdz--m2gh2_EUwkGLDtMd65uwxUjA@mail.gmail.com/

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220816234012.910255-1-haoluo@google.com
2022-08-17 09:40:47 -07:00
Hangbin Liu
1f235777c3 libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
Similar with commit 10b62d6a38f7 ("libbpf: Add names for auxiliary maps"),
let's make bpf_prog_load() also ignore name if kernel doesn't support
program name.

To achieve this, we need to call sys_bpf_prog_load() directly in
probe_kern_prog_name() to avoid circular dependency. sys_bpf_prog_load()
also need to be exported in the libbpf_internal.h file.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220813000936.6464-1-liuhangbin@gmail.com
2022-08-15 14:58:20 -07:00
Hangbin Liu
10b62d6a38 libbpf: Add names for auxiliary maps
The bpftool self-created maps can appear in final map show output due to
deferred removal in kernel. These maps don't have a name, which would make
users confused about where it comes from.

With a libbpf_ prefix name, users could know who created these maps.
It also could make some tests (like test_offload.py, which skip base maps
without names as a workaround) filter them out.

Kernel adds bpf prog/map name support in the same merge
commit fadad670a8ab ("Merge branch 'bpf-extend-info'"). So we can also use
kernel_supports(NULL, FEAT_PROG_NAME) to check if kernel supports map name.

As discussed [1], Let's make bpf_map_create accept non-null
name string, and silently ignore the name if kernel doesn't support.

  [1] https://lore.kernel.org/bpf/CAEf4BzYL1TQwo1231s83pjTdFPk9XWWhfZC5=KzkU-VO0k=0Ug@mail.gmail.com/

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220811034020.529685-1-liuhangbin@gmail.com
2022-08-11 15:12:38 -07:00
Linus Torvalds
7ebfc85e2c Including fixes from bluetooth, bpf, can and netfilter.
A little longer PR than usual but it's all fixes, no late features.
 It's long partially because of timing, and partially because of
 follow ups to stuff that got merged a week or so before the merge
 window and wasn't as widely tested. Maybe the Bluetooth fixes are
 a little alarming so we'll address that, but the rest seems okay
 and not scary.
 
 Notably we're including a fix for the netfilter Kconfig [1], your
 WiFi warning [2] and a bluetooth fix which should unblock syzbot [3].
 
 Current release - regressions:
 
  - Bluetooth:
    - don't try to cancel uninitialized works [3]
    - L2CAP: fix use-after-free caused by l2cap_chan_put
 
  - tls: rx: fix device offload after recent rework
 
  - devlink: fix UAF on failed reload and leftover locks in mlxsw
 
 Current release - new code bugs:
 
  - netfilter:
    - flowtable: fix incorrect Kconfig dependencies [1]
    - nf_tables: fix crash when nf_trace is enabled
 
  - bpf:
    - use proper target btf when exporting attach_btf_obj_id
    - arm64: fixes for bpf trampoline support
 
  - Bluetooth:
    - ISO: unlock on error path in iso_sock_setsockopt()
    - ISO: fix info leak in iso_sock_getsockopt()
    - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP
    - ISO: fix memory corruption on iso_pinfo.base
    - ISO: fix not using the correct QoS
    - hci_conn: fix updating ISO QoS PHY
 
  - phy: dp83867: fix get nvmem cell fail
 
 Previous releases - regressions:
 
  - wifi: cfg80211: fix validating BSS pointers in
    __cfg80211_connect_result [2]
 
  - atm: bring back zatm uAPI after ATM had been removed
 
  - properly fix old bug making bonding ARP monitor mode not being
    able to work with software devices with lockless Tx
 
  - tap: fix null-deref on skb->dev in dev_parse_header_protocol
 
  - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps
    some devices and breaks others
 
  - netfilter:
    - nf_tables: many fixes rejecting cross-object linking
      which may lead to UAFs
    - nf_tables: fix null deref due to zeroed list head
    - nf_tables: validate variable length element extension
 
  - bgmac: fix a BUG triggered by wrong bytes_compl
 
  - bcmgenet: indicate MAC is in charge of PHY PM
 
 Previous releases - always broken:
 
  - bpf:
    - fix bad pointer deref in bpf_sys_bpf() injected via test infra
    - disallow non-builtin bpf programs calling the prog_run command
    - don't reinit map value in prealloc_lru_pop
    - fix UAFs during the read of map iterator fd
    - fix invalidity check for values in sk local storage map
    - reject sleepable program for non-resched map iterator
 
  - mptcp:
    - move subflow cleanup in mptcp_destroy_common()
    - do not queue data on closed subflows
 
  - virtio_net: fix memory leak inside XDP_TX with mergeable
 
  - vsock: fix memory leak when multiple threads try to connect()
 
  - rework sk_user_data sharing to prevent psock leaks
 
  - geneve: fix TOS inheriting for ipv4
 
  - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel
 
  - phy: c45 baset1: do not skip aneg configuration if clock role
    is not specified
 
  - rose: avoid overflow when /proc displays timer information
 
  - x25: fix call timeouts in blocking connects
 
  - can: mcp251x: fix race condition on receive interrupt
 
  - can: j1939:
    - replace user-reachable WARN_ON_ONCE() with netdev_warn_once()
    - fix memory leak of skbs in j1939_session_destroy()
 
 Misc:
 
  - docs: bpf: clarify that many things are not uAPI
 
  - seg6: initialize induction variable to first valid array index
    (to silence clang vs objtool warning)
 
  - can: ems_usb: fix clang 14's -Wunaligned-access warning
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmL1TtkACgkQMUZtbf5S
 Iruz8Q/+O5xFFsjxuyZD0Mw9d3Jeo3ZI9PeeDvcYl5dZXVegpxqorujTFntxv1Ad
 JC8o5qqms3kO51d+W/yai6iDacEHX2YcJrupZve+vGvpOEVmBRY5O0E1AckJ18+u
 ItmjSVESkybUP5P08/An7Y0dMmj9Xb2z84dGkLe+n8lg6/fimo6Ki6yZjcOBOALu
 AYquMXUcnwztRMbTFjscbJjBd4xFMKZEtthljYtPdIReIN976wmMNYYx+jcPK7ha
 g39Kv6maklp4euerkGIJ/AMnOWHaOGCFjIaz7rr4444NDfrKdt/jeirUXJaz77Jo
 TJM2UOwgOeg6WZkSa3cmdq6UdjdkJ6LTe2CJFf1wJ1qfhAi+s8yWoszsM2Enp+66
 c/mo9jTCMAjmgEJF11idZuz2S697/5j0hvbfM3ZPgNyNBgn8qxz/Z56fNOisx95u
 TkoKKFnGH+mcm/et+omBcyLBtBVK2+/6B6mpl6btf4DOkPn5KFYWHV67uV3ksHzQ
 ye+pnzidoIG0yKbRM2EQKXk7ELKROpl52xUHko93ZinMJt0Q7jBm7tZhJozNFEzi
 hWgUvpmNXgawzLYQcJ9jJmKw3PmYZnRhvYZB/1r91YamM28Hd58k9WfpWtUtjYJN
 N0X58L6JSnKPqzR70pcFppz6iBlh0tHdcEQGWhhKU5ScS3FDxGc=
 =C5Ck
 -----END PGP SIGNATURE-----

Merge tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, bpf, can and netfilter.

  A little larger than usual but it's all fixes, no late features. It's
  large partially because of timing, and partially because of follow ups
  to stuff that got merged a week or so before the merge window and
  wasn't as widely tested. Maybe the Bluetooth fixes are a little
  alarming so we'll address that, but the rest seems okay and not scary.

  Notably we're including a fix for the netfilter Kconfig [1], your WiFi
  warning [2] and a bluetooth fix which should unblock syzbot [3].

  Current release - regressions:

   - Bluetooth:
      - don't try to cancel uninitialized works [3]
      - L2CAP: fix use-after-free caused by l2cap_chan_put

   - tls: rx: fix device offload after recent rework

   - devlink: fix UAF on failed reload and leftover locks in mlxsw

  Current release - new code bugs:

   - netfilter:
      - flowtable: fix incorrect Kconfig dependencies [1]
      - nf_tables: fix crash when nf_trace is enabled

   - bpf:
      - use proper target btf when exporting attach_btf_obj_id
      - arm64: fixes for bpf trampoline support

   - Bluetooth:
      - ISO: unlock on error path in iso_sock_setsockopt()
      - ISO: fix info leak in iso_sock_getsockopt()
      - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP
      - ISO: fix memory corruption on iso_pinfo.base
      - ISO: fix not using the correct QoS
      - hci_conn: fix updating ISO QoS PHY

   - phy: dp83867: fix get nvmem cell fail

  Previous releases - regressions:

   - wifi: cfg80211: fix validating BSS pointers in
     __cfg80211_connect_result [2]

   - atm: bring back zatm uAPI after ATM had been removed

   - properly fix old bug making bonding ARP monitor mode not being able
     to work with software devices with lockless Tx

   - tap: fix null-deref on skb->dev in dev_parse_header_protocol

   - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some
     devices and breaks others

   - netfilter:
      - nf_tables: many fixes rejecting cross-object linking which may
        lead to UAFs
      - nf_tables: fix null deref due to zeroed list head
      - nf_tables: validate variable length element extension

   - bgmac: fix a BUG triggered by wrong bytes_compl

   - bcmgenet: indicate MAC is in charge of PHY PM

  Previous releases - always broken:

   - bpf:
      - fix bad pointer deref in bpf_sys_bpf() injected via test infra
      - disallow non-builtin bpf programs calling the prog_run command
      - don't reinit map value in prealloc_lru_pop
      - fix UAFs during the read of map iterator fd
      - fix invalidity check for values in sk local storage map
      - reject sleepable program for non-resched map iterator

   - mptcp:
      - move subflow cleanup in mptcp_destroy_common()
      - do not queue data on closed subflows

   - virtio_net: fix memory leak inside XDP_TX with mergeable

   - vsock: fix memory leak when multiple threads try to connect()

   - rework sk_user_data sharing to prevent psock leaks

   - geneve: fix TOS inheriting for ipv4

   - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel

   - phy: c45 baset1: do not skip aneg configuration if clock role is
     not specified

   - rose: avoid overflow when /proc displays timer information

   - x25: fix call timeouts in blocking connects

   - can: mcp251x: fix race condition on receive interrupt

   - can: j1939:
      - replace user-reachable WARN_ON_ONCE() with netdev_warn_once()
      - fix memory leak of skbs in j1939_session_destroy()

  Misc:

   - docs: bpf: clarify that many things are not uAPI

   - seg6: initialize induction variable to first valid array index (to
     silence clang vs objtool warning)

   - can: ems_usb: fix clang 14's -Wunaligned-access warning"

* tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits)
  net: atm: bring back zatm uAPI
  dpaa2-eth: trace the allocated address instead of page struct
  net: add missing kdoc for struct genl_multicast_group::flags
  nfp: fix use-after-free in area_cache_get()
  MAINTAINERS: use my korg address for mt7601u
  mlxsw: minimal: Fix deadlock in ports creation
  bonding: fix reference count leak in balance-alb mode
  net: usb: qmi_wwan: Add support for Cinterion MV32
  bpf: Shut up kern_sys_bpf warning.
  net/tls: Use RCU API to access tls_ctx->netdev
  tls: rx: device: don't try to copy too much on detach
  tls: rx: device: bound the frag walk
  net_sched: cls_route: remove from list when handle is 0
  selftests: forwarding: Fix failing tests with old libnet
  net: refactor bpf_sk_reuseport_detach()
  net: fix refcount bug in sk_psock_get (2)
  selftests/bpf: Ensure sleepable program is rejected by hash map iter
  selftests/bpf: Add write tests for sk local storage map iterator
  selftests/bpf: Add tests for reading a dangling map iter fd
  bpf: Only allow sleepable program for resched-able iterator
  ...
2022-08-11 13:45:37 -07:00
Andrii Nakryiko
d7c5802faf libbpf: preserve errno across pr_warn/pr_info/pr_debug
As suggested in [0], make sure that libbpf_print saves and restored
errno and as such guaranteed that no matter what actual print callback
user installs, macros like pr_warn/pr_info/pr_debug are completely
transparent as far as errno goes.

While libbpf code is pretty careful about not clobbering important errno
values accidentally with pr_warn(), it's a trivial change to make sure
that pr_warn can be used anywhere without a risk of clobbering errno.

No functional changes, just future proofing.

  [0] https://github.com/libbpf/libbpf/pull/536

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/r/20220810183425.1998735-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 11:47:29 -07:00
Alexei Starovoitov
86f44fcec2 bpf: Disallow bpf programs call prog_run command.
The verifier cannot perform sufficient validation of bpf_attr->test.ctx_in
pointer, therefore bpf programs should not be allowed to call BPF_PROG_RUN
command from within the program.
To fix this issue split bpf_sys_bpf() bpf helper into normal kern_sys_bpf()
kernel function that can only be used by the kernel light skeleton directly.

Reported-by: YiFei Zhu <zhuyifei@google.com>
Fixes: b1d18a7574d0 ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 09:43:07 -07:00
Hengqi Chen
9e32084ef1 libbpf: Do not require executable permission for shared libraries
Currently, resolve_full_path() requires executable permission for both
programs and shared libraries. This causes failures on distos like Debian
since the shared libraries are not installed executable and Linux is not
requiring shared libraries to have executable permissions. Let's remove
executable permission check for shared libraries.

Reported-by: Goro Fuji <goro@fastly.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220806102021.3867130-1-hengqi.chen@gmail.com
2022-08-08 15:07:40 -07:00
Andrii Nakryiko
e19db6762c libbpf: Reject legacy 'maps' ELF section
Add explicit error message if BPF object file is still using legacy BPF
map definitions in SEC("maps"). Before this change, if BPF object file
is still using legacy map definition user will see a bit confusing:

  libbpf: elf: skipping unrecognized data section(4) maps
  libbpf: prog 'handler': bad map relo against 'server_map' in section 'maps'

Now libbpf will be explicit about rejecting "maps" ELF section:

  libbpf: elf: legacy map definitions in 'maps' section are not supported by libbpf v1.0+

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220803214202.23750-1-andrii@kernel.org
2022-08-08 15:44:11 +02:00
Linus Torvalds
4e23eeebb2 Bitmap patches for v6.0-rc1
This branch consists of:
 
 Qu Wenruo:
 lib: bitmap: fix the duplicated comments on bitmap_to_arr64()
 https://lore.kernel.org/lkml/0d85e1dbad52ad7fb5787c4432bdb36cbd24f632.1656063005.git.wqu@suse.com/
 
 Alexander Lobakin:
 bitops: let optimize out non-atomic bitops on compile-time constants
 https://lore.kernel.org/lkml/20220624121313.2382500-1-alexandr.lobakin@intel.com/T/
 
 Yury Norov:
 lib: cleanup bitmap-related headers
 https://lore.kernel.org/linux-arm-kernel/YtCVeOGLiQ4gNPSf@yury-laptop/T/#m305522194c4d38edfdaffa71fcaaf2e2ca00a961
 
 Alexander Lobakin:
 x86/olpc: fix 'logical not is only applied to the left hand side'
 https://www.spinics.net/lists/kernel/msg4440064.html
 
 Yury Norov:
 lib/nodemask: inline wrappers around bitmap
 https://lore.kernel.org/all/20220723214537.2054208-1-yury.norov@gmail.com/
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmLpVvwACgkQsUSA/Tof
 vsiAHgwAwS9pl8GJ+fKYnue2CYo9349d2oT6BBUs/Rv8uqYEa4QkpYsR7NS733TG
 pos0hhoRvSOzrUP4qppXUjfJ+NkzLgpnKFOeWfFoNAKlHuaaMRvF3Y0Q/P8g0/Kg
 HPWcCQLHyCH9Wjs3e2TTgRjxTrHuruD2VJ401/PX/lw0DicUhmev5mUFa10uwFkP
 ZJRprjoFn9HJ0Hk16pFZDi36d3YumhACOcWRiJdoBDrEPV3S6lm9EeOy/yHBNp5k
 9bKj+RboeT2t70KaZcKv+M5j1nu0cAhl7kRkjcxcmGyimI0l82Vgq9yFxhGqvWg8
 RnCrJ5EaO08FGCAKG9GEwzdiNa24Gdq5XZSpQA7JZHmhmchpnnlNenJicyv0gOQi
 abChZeWSEsyA+78l2+kk9nezfVKUOnKDEZQxBVTOyWsmZYxHZV94oam340VjQDaY
 4/fETdOy/qqPIxnpxAeFGWxZjcVaYiYPLj7KLPMsB0aAAF7pZrem465vSfgbrE81
 +gCdqrWd
 =4dTW
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.0-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo)

 - optimize out non-atomic bitops on compile-time constants (Alexander
   Lobakin)

 - cleanup bitmap-related headers (Yury Norov)

 - x86/olpc: fix 'logical not is only applied to the left hand side'
   (Alexander Lobakin)

 - lib/nodemask: inline wrappers around bitmap (Yury Norov)

* tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits)
  lib/nodemask: inline next_node_in() and node_random()
  powerpc: drop dependency on <asm/machdep.h> in archrandom.h
  x86/olpc: fix 'logical not is only applied to the left hand side'
  lib/cpumask: move some one-line wrappers to header file
  headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure
  headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>
  headers/deps: mm: Optimize <linux/gfp.h> header dependencies
  lib/cpumask: move trivial wrappers around find_bit to the header
  lib/cpumask: change return types to unsigned where appropriate
  cpumask: change return types to bool where appropriate
  lib/bitmap: change type of bitmap_weight to unsigned long
  lib/bitmap: change return types to bool where appropriate
  arm: align find_bit declarations with generic kernel
  iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
  lib/test_bitmap: test the tail after bitmap_to_arr64()
  lib/bitmap: fix off-by-one in bitmap_to_arr64()
  lib: test_bitmap: add compile-time optimization/evaluations assertions
  bitmap: don't assume compiler evaluates small mem*() builtins calls
  net/ice: fix initializing the bitmap in the switch code
  bitops: let optimize out non-atomic bitops on compile-time constants
  ...
2022-08-07 17:52:35 -07:00
Linus Torvalds
48a577dc1b perf tools changes for v6.0: 1st batch
- Introduce 'perf lock contention' subtool, using new lock contention
   tracepoints and using BPF for in kernel aggregation and then userspace
   processing using the perf tooling infrastructure for resolving symbols, target
   specification, etc.
 
   Since the new lock contention tracepoints don't provide lock names, get up to
   8 stack traces and display the first non-lock function symbol name as a caller:
 
   $ perf lock report -F acquired,contended,avg_wait,wait_total
 
                   Name   acquired  contended     avg wait    total wait
 
    update_blocked_a...         40         40      3.61 us     144.45 us
    kernfs_fop_open+...          5          5      3.64 us      18.18 us
     _nohz_idle_balance          3          3      2.65 us       7.95 us
    tick_do_update_j...          1          1      6.04 us       6.04 us
     ep_scan_ready_list          1          1      3.93 us       3.93 us
 
   Supports the usual 'perf record' + 'perf report' workflow as well as a
   BCC/bpftrace like mode where you start the tool and then press control+C to get
   results:
 
    $ sudo perf lock contention -b
   ^C
    contended   total wait     max wait     avg wait         type   caller
 
           42    192.67 us     13.64 us      4.59 us     spinlock   queue_work_on+0x20
           23     85.54 us     10.28 us      3.72 us     spinlock   worker_thread+0x14a
            6     13.92 us      6.51 us      2.32 us        mutex   kernfs_iop_permission+0x30
            3     11.59 us     10.04 us      3.86 us        mutex   kernfs_dop_revalidate+0x3c
            1      7.52 us      7.52 us      7.52 us     spinlock   kthread+0x115
            1      7.24 us      7.24 us      7.24 us     rwlock:W   sys_epoll_wait+0x148
            2      7.08 us      3.99 us      3.54 us     spinlock   delayed_work_timer_fn+0x1b
            1      6.41 us      6.41 us      6.41 us     spinlock   idle_balance+0xa06
            2      2.50 us      1.83 us      1.25 us        mutex   kernfs_iop_lookup+0x2f
            1      1.71 us      1.71 us      1.71 us        mutex   kernfs_iop_getattr+0x2c
   ...
 
 - Add new 'perf kwork' tool to trace time properties of kernel work (such as
   softirq, and workqueue), uses eBPF skeletons to collect info in kernel space,
   aggregating data that then gets processed by the userspace tool, e.g.:
 
   # perf kwork report
 
    Kwork Name      | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
   ----------------------------------------------------------------------------------------------------
    nvme0q5:130     | 004 |      1.101 ms |    49 |    0.051 ms |    26035.056403 s |  26035.056455 s |
    amdgpu:162      | 002 |      0.176 ms |     9 |    0.046 ms |    26035.268020 s |  26035.268066 s |
    nvme0q24:149    | 023 |      0.161 ms |    55 |    0.009 ms |    26035.655280 s |  26035.655288 s |
    nvme0q20:145    | 019 |      0.090 ms |    33 |    0.014 ms |    26035.939018 s |  26035.939032 s |
    nvme0q31:156    | 030 |      0.075 ms |    21 |    0.010 ms |    26035.052237 s |  26035.052247 s |
    nvme0q8:133     | 007 |      0.062 ms |    12 |    0.021 ms |    26035.416840 s |  26035.416861 s |
    nvme0q6:131     | 005 |      0.054 ms |    22 |    0.010 ms |    26035.199919 s |  26035.199929 s |
    nvme0q19:144    | 018 |      0.052 ms |    14 |    0.010 ms |    26035.110615 s |  26035.110625 s |
    nvme0q7:132     | 006 |      0.049 ms |    13 |    0.007 ms |    26035.125180 s |  26035.125187 s |
    nvme0q18:143    | 017 |      0.033 ms |    14 |    0.007 ms |    26035.169698 s |  26035.169705 s |
    nvme0q17:142    | 016 |      0.013 ms |     1 |    0.013 ms |    26035.565147 s |  26035.565160 s |
    enp5s0-rx-0:164 | 006 |      0.004 ms |     4 |    0.002 ms |    26035.928882 s |  26035.928884 s |
    enp5s0-tx-0:166 | 008 |      0.003 ms |     3 |    0.002 ms |    26035.870923 s |  26035.870925 s |
   --------------------------------------------------------------------------------------------------------
 
   See commit log messages for more examples with extra options to limit the events time window, etc.
 
 - Add support for new AMD IBS (Instruction Based Sampling) features:
 
   With the DataSrc extensions, the source of data can be decoded among:
  - Local L3 or other L1/L2 in CCX.
  - A peer cache in a near CCX.
  - Data returned from DRAM.
  - A peer cache in a far CCX.
  - DRAM address map with "long latency" bit set.
  - Data returned from MMIO/Config/PCI/APIC.
  - Extension Memory (S-Link, GenZ, etc - identified by the CS target
     and/or address map at DF's choice).
  - Peer Agent Memory.
 
 - Support hardware tracing with Intel PT on guest machines, combining the
   traces with the ones in the host machine.
 
 - Add a "-m" option to 'perf buildid-list' to show kernel and modules
   build-ids, to display all of the information needed to do external
   symbolization of kernel stack traces, such as those collected by
   bpf_get_stackid().
 
 - Add arch TSC frequency information to perf.data file headers.
 
 - Handle changes in the binutils disassembler function signatures in
   perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer).
 
 - Fix building the perf perl binding with the newest gcc in distros such
   as fedora rawhide, where some new warnings were breaking the build as
   perf uses -Werror.
 
 - Add 'perf test' entry for branch stack sampling.
 
 - Add ARM SPE system wide 'perf test' entry.
 
 - Add user space counter reading tests to 'perf test'.
 
 - Build with python3 by default, if available.
 
 - Add python converter script for the vendor JSON event files.
 
 - Update vendor JSON files for alderlake, bonnell, broadwell, broadwellde,
   broadwellx, cascadelakex, elkhartlake, goldmont, goldmontplus, haswell,
   haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, knightslanding,
   nehalemep, nehalemex, sandybridge, sapphirerapids, silvermont, skylake,
   skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp and westmereex.
 
 - Add vendor JSON File for Intel meteorlake.
 
 - Add Arm Cortex-A78C and X1C JSON vendor event files.
 
 - Add workaround to symbol address reading from ELF files without phdr,
   falling back to the previoous equation.
 
 - Convert legacy map definition to BTF-defined in the perf BPF script test.
 
 - Rework prologue generation code to stop using libbpf deprecated APIs.
 
 - Add default hybrid events for 'perf stat' on x86.
 
 - Add topdown metrics in the default 'perf stat' on the hybrid machines (big/little cores).
 
 - Prefer sampled CPU when exporting JSON in 'perf data convert'
 
 - Fix ('perf stat CSV output linter') and ("Check branch stack sampling") 'perf test' entries on s390.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYuw6gwAKCRCyPKLppCJ+
 J5+iAP0RL6sKMhzdkRjRYfG8CluJ401YaPHadzv5jxP8gOZz2gEAsuYDrMF9t1zB
 4DqORfobdX9UQEJjP9oRltU73GM0swI=
 =2/M0
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:

 - Introduce 'perf lock contention' subtool, using new lock contention
   tracepoints and using BPF for in kernel aggregation and then
   userspace processing using the perf tooling infrastructure for
   resolving symbols, target specification, etc.

   Since the new lock contention tracepoints don't provide lock names,
   get up to 8 stack traces and display the first non-lock function
   symbol name as a caller:

    $ perf lock report -F acquired,contended,avg_wait,wait_total

                    Name   acquired  contended     avg wait    total wait

     update_blocked_a...         40         40      3.61 us     144.45 us
     kernfs_fop_open+...          5          5      3.64 us      18.18 us
      _nohz_idle_balance          3          3      2.65 us       7.95 us
     tick_do_update_j...          1          1      6.04 us       6.04 us
      ep_scan_ready_list          1          1      3.93 us       3.93 us

   Supports the usual 'perf record' + 'perf report' workflow as well as
   a BCC/bpftrace like mode where you start the tool and then press
   control+C to get results:

     $ sudo perf lock contention -b
    ^C
    contended   total wait     max wait     avg wait         type   caller

            42    192.67 us     13.64 us      4.59 us     spinlock   queue_work_on+0x20
            23     85.54 us     10.28 us      3.72 us     spinlock   worker_thread+0x14a
             6     13.92 us      6.51 us      2.32 us        mutex   kernfs_iop_permission+0x30
             3     11.59 us     10.04 us      3.86 us        mutex   kernfs_dop_revalidate+0x3c
             1      7.52 us      7.52 us      7.52 us     spinlock   kthread+0x115
             1      7.24 us      7.24 us      7.24 us     rwlock:W   sys_epoll_wait+0x148
             2      7.08 us      3.99 us      3.54 us     spinlock   delayed_work_timer_fn+0x1b
             1      6.41 us      6.41 us      6.41 us     spinlock   idle_balance+0xa06
             2      2.50 us      1.83 us      1.25 us        mutex   kernfs_iop_lookup+0x2f
             1      1.71 us      1.71 us      1.71 us        mutex   kernfs_iop_getattr+0x2c
    ...

 - Add new 'perf kwork' tool to trace time properties of kernel work
   (such as softirq, and workqueue), uses eBPF skeletons to collect info
   in kernel space, aggregating data that then gets processed by the
   userspace tool, e.g.:

    # perf kwork report

     Kwork Name      | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
    ----------------------------------------------------------------------------------------------------
     nvme0q5:130     | 004 |      1.101 ms |    49 |    0.051 ms |    26035.056403 s |  26035.056455 s |
     amdgpu:162      | 002 |      0.176 ms |     9 |    0.046 ms |    26035.268020 s |  26035.268066 s |
     nvme0q24:149    | 023 |      0.161 ms |    55 |    0.009 ms |    26035.655280 s |  26035.655288 s |
     nvme0q20:145    | 019 |      0.090 ms |    33 |    0.014 ms |    26035.939018 s |  26035.939032 s |
     nvme0q31:156    | 030 |      0.075 ms |    21 |    0.010 ms |    26035.052237 s |  26035.052247 s |
     nvme0q8:133     | 007 |      0.062 ms |    12 |    0.021 ms |    26035.416840 s |  26035.416861 s |
     nvme0q6:131     | 005 |      0.054 ms |    22 |    0.010 ms |    26035.199919 s |  26035.199929 s |
     nvme0q19:144    | 018 |      0.052 ms |    14 |    0.010 ms |    26035.110615 s |  26035.110625 s |
     nvme0q7:132     | 006 |      0.049 ms |    13 |    0.007 ms |    26035.125180 s |  26035.125187 s |
     nvme0q18:143    | 017 |      0.033 ms |    14 |    0.007 ms |    26035.169698 s |  26035.169705 s |
     nvme0q17:142    | 016 |      0.013 ms |     1 |    0.013 ms |    26035.565147 s |  26035.565160 s |
     enp5s0-rx-0:164 | 006 |      0.004 ms |     4 |    0.002 ms |    26035.928882 s |  26035.928884 s |
     enp5s0-tx-0:166 | 008 |      0.003 ms |     3 |    0.002 ms |    26035.870923 s |  26035.870925 s |
    --------------------------------------------------------------------------------------------------------

   See commit log messages for more examples with extra options to limit
   the events time window, etc.

 - Add support for new AMD IBS (Instruction Based Sampling) features:

   With the DataSrc extensions, the source of data can be decoded among:
     - Local L3 or other L1/L2 in CCX.
     - A peer cache in a near CCX.
     - Data returned from DRAM.
     - A peer cache in a far CCX.
     - DRAM address map with "long latency" bit set.
     - Data returned from MMIO/Config/PCI/APIC.
     - Extension Memory (S-Link, GenZ, etc - identified by the CS target
       and/or address map at DF's choice).
     - Peer Agent Memory.

 - Support hardware tracing with Intel PT on guest machines, combining
   the traces with the ones in the host machine.

 - Add a "-m" option to 'perf buildid-list' to show kernel and modules
   build-ids, to display all of the information needed to do external
   symbolization of kernel stack traces, such as those collected by
   bpf_get_stackid().

 - Add arch TSC frequency information to perf.data file headers.

 - Handle changes in the binutils disassembler function signatures in
   perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer).

 - Fix building the perf perl binding with the newest gcc in distros
   such as fedora rawhide, where some new warnings were breaking the
   build as perf uses -Werror.

 - Add 'perf test' entry for branch stack sampling.

 - Add ARM SPE system wide 'perf test' entry.

 - Add user space counter reading tests to 'perf test'.

 - Build with python3 by default, if available.

 - Add python converter script for the vendor JSON event files.

 - Update vendor JSON files for most Intel cores.

 - Add vendor JSON File for Intel meteorlake.

 - Add Arm Cortex-A78C and X1C JSON vendor event files.

 - Add workaround to symbol address reading from ELF files without phdr,
   falling back to the previoous equation.

 - Convert legacy map definition to BTF-defined in the perf BPF script
   test.

 - Rework prologue generation code to stop using libbpf deprecated APIs.

 - Add default hybrid events for 'perf stat' on x86.

 - Add topdown metrics in the default 'perf stat' on the hybrid machines
   (big/little cores).

 - Prefer sampled CPU when exporting JSON in 'perf data convert'

 - Fix ('perf stat CSV output linter') and ("Check branch stack
   sampling") 'perf test' entries on s390.

* tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
  perf stat: Refactor __run_perf_stat() common code
  perf lock: Print the number of lost entries for BPF
  perf lock: Add --map-nr-entries option
  perf lock: Introduce struct lock_contention
  perf scripting python: Do not build fail on deprecation warnings
  genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO
  perf build: Suppress openssl v3 deprecation warnings in libcrypto feature test
  perf parse-events: Break out tracepoint and printing
  perf parse-events: Don't #define YY_EXTRA_TYPE
  tools bpftool: Don't display disassembler-four-args feature test
  tools bpftool: Fix compilation error with new binutils
  tools bpf_jit_disasm: Don't display disassembler-four-args feature test
  tools bpf_jit_disasm: Fix compilation error with new binutils
  tools perf: Fix compilation error with new binutils
  tools include: add dis-asm-compat.h to handle version differences
  tools build: Don't display disassembler-four-args feature test
  tools build: Add feature test for init_disassemble_info API changes
  perf test: Add ARM SPE system wide test
  perf tools: Rework prologue generation code
  perf bpf: Convert legacy map definition to BTF-defined
  ...
2022-08-06 09:36:08 -07:00
James Hilliard
d25f40ff68 libbpf: Ensure functions with always_inline attribute are inline
GCC expects the always_inline attribute to only be set on inline
functions, as such we should make all functions with this attribute
use the __always_inline macro which makes the function inline and
sets the attribute.

Fixes errors like:
/home/buildroot/bpf-next/tools/testing/selftests/bpf/tools/include/bpf/bpf_tracing.h:439:1: error: ‘always_inline’ function might not be inlinable [-Werror=attributes]
  439 | ____##name(unsigned long long *ctx, ##args)
      | ^~~~

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220803151403.793024-1-james.hilliard1@gmail.com
2022-08-04 14:43:41 -07:00
Florian Fainelli
3045f42a64 libbpf: Initialize err in probe_map_create
GCC-11 warns about the possibly unitialized err variable in
probe_map_create:

libbpf_probes.c: In function 'probe_map_create':
libbpf_probes.c:361:38: error: 'err' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  361 |                 return fd < 0 && err == exp_err ? 1 : 0;
      |                                  ~~~~^~~~~~~~~~

Fixes: 878d8def0603 ("libbpf: Rework feature-probing APIs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220801025109.1206633-1-f.fainelli@gmail.com
2022-08-04 14:40:00 -07:00
James Hilliard
47ea7417b0 libbpf: Skip empty sections in bpf_object__init_global_data_maps
The GNU assembler generates an empty .bss section. This is a well
established behavior in GAS that happens in all supported targets.

The LLVM assembler doesn't generate an empty .bss section.

bpftool chokes on the empty .bss section.

Additionally in bpf_object__elf_collect the sec_desc->data is not
initialized when a section is not recognized. In this case, this
happens with .comment.

So we must check that sec_desc->data is initialized before checking
if the size is 0.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220731232649.4668-1-james.hilliard1@gmail.com
2022-08-04 14:39:07 -07:00
Joe Burton
395fc4fa33 libbpf: Add bpf_obj_get_opts()
Add an extensible variant of bpf_obj_get() capable of setting the
`file_flags` parameter.

This parameter is needed to enable unprivileged access to BPF maps.
Without a method like this, users must manually make the syscall.

Signed-off-by: Joe Burton <jevburton@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220729202727.3311806-1-jevburton.kernel@gmail.com
2022-07-29 15:30:06 -07:00
Daniel Müller
64893e83f9 libbpf: Support PPC in arch_specific_syscall_pfx
Commit 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support
for syscall kprobes") added the arch_specific_syscall_pfx() function,
which returns a string representing the architecture in use. As it turns
out this function is currently not aware of Power PC, where NULL is
returned. That's being flagged by the libbpf CI system, which builds for
ppc64le and the compiler sees a NULL pointer being passed in to a %s
format string.
With this change we add representations for two more architectures, for
Power PC and Power PC 64, and also adjust the string format logic to
handle NULL pointers gracefully, in an attempt to prevent similar issues
with other architectures in the future.

Fixes: 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220728222345.3125975-1-deso@posteo.net
2022-07-28 16:11:18 -07:00
Ilya Leoshkevich
2d369b4b00 libbpf: Extend BPF_KSYSCALL documentation
Explicitly list known quirks. Mention that socket-related syscalls can be
invoked via socketcall().

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220726134008.256968-2-iii@linux.ibm.com
2022-07-26 16:27:21 +02:00
Dan Carpenter
14229b8153 libbpf: Fix str_has_sfx()'s return value
The return from strcmp() is inverted so it wrongly returns true instead
of false and vice versa.

Fixes: a1c9d61b19cb ("libbpf: Improve library identification for uprobe binary path resolution")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/YtZ+/dAA195d99ak@kili
2022-07-21 14:30:25 +02:00
Dan Carpenter
c6018fc6e7 libbpf: Fix sign expansion bug in btf_dump_get_enum_value()
The code here is supposed to take a signed int and store it in a signed
long long. Unfortunately, the way that the type promotion works with
this conditional statement is that it takes a signed int, type promotes
it to a __u32, and then stores that as a signed long long. The result is
never negative.

This is from static analysis, but I made a little test program just to
test it before I sent the patch:

  #include <stdio.h>

  int main(void)
  {
        unsigned long long src = -1ULL;
        signed long long dst1, dst2;
        int is_signed = 1;

        dst1 = is_signed ? *(int *)&src : *(unsigned int *)0;
        dst2 = is_signed ? (signed long long)*(int *)&src : *(unsigned int *)0;

        printf("%lld\n", dst1);
        printf("%lld\n", dst2);

        return 0;
  }

Fixes: d90ec262b35b ("libbpf: Add enum64 support for btf_dump")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/YtZ+LpgPADm7BeEd@kili
2022-07-21 14:24:18 +02:00
Adrian Hunter
7151c1d178 perf auxtrace: Add machine_pid and vcpu to auxtrace_error
Add machine_pid and vcpu to struct perf_record_auxtrace_error. The existing
fmt member is used to identify the new format.

The new members make it possible to easily differentiate errors from guest
machines.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220711093218.10967-18-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-20 11:08:17 -03:00
Adrian Hunter
b47bb18661 perf tools: Add machine_pid and vcpu to id_index
When injecting events from a guest perf.data file, the events will have
separate sample ID numbers. These ID numbers can then be used to determine
which machine an event belongs to. To facilitate that, add machine_pid and
vcpu to id_index records. For backward compatibility, these are added at
the end of the record, and the length of the record is used to determine
if they are present or not.

Note, this is needed because the events from a guest perf.data file contain
the pid/tid of the process running at that time inside the VM not the
pid/tid of the (QEMU) hypervisor thread. So a way is needed to relate
guest events back to the guest machine and VCPU, and using sample ID
numbers for that is relatively simple and convenient.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220711093218.10967-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-20 11:07:58 -03:00