IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Convert to the regular interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20221026123110.331690-1-bigeasy@linutronix.de
For maps of type BPF_MAP_TYPE_CPUMAP memory is allocated first before
checking the max_entries argument. If then max_entries is greater than
NR_CPUS additional work needs to be done to free allocated memory before
an error is returned.
This changes moves the check on max_entries before the allocation
happens.
Signed-off-by: Florian Lehner <dev@der-flo.net>
Link: https://lore.kernel.org/r/20221028183405.59554-1-dev@der-flo.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
There is a typo in comment for DFS algorithm in bpf/verifier.c. The top
element should not be popped until all its neighbors have been checked.
Fix it.
Fixes: 475fb78fbf48 ("bpf: verifier (add branch/goto checks)")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221027034458.2925218-1-xukuohai@huaweicloud.com
There is a spelling mistake in an error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20221026081645.3186878-1-colin.i.king@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
kernel-patches/bpf failed with error:
Running bpftool checks...
Comparing /data/users/ast/net-next/tools/include/uapi/linux/bpf.h (bpf_map_type) and
/data/users/ast/net-next/tools/bpf/bpftool/map.c (do_help() TYPE):
{'cgroup_storage_deprecated', 'cgroup_storage'}
Comparing /data/users/ast/net-next/tools/include/uapi/linux/bpf.h (bpf_map_type) and
/data/users/ast/net-next/tools/bpf/bpftool/Documentation/bpftool-map.rst (TYPE):
{'cgroup_storage_deprecated', 'cgroup_storage'}
The selftests/bpf/test_bpftool_synctypes.py runs checking in the above.
The failure is introduced by Commit c4bcfb38a95e("bpf: Implement cgroup storage available
to non-cgroup-attached bpf progs"). The commit introduced BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED
which has the same enum value as BPF_MAP_TYPE_CGROUP_STORAGE.
In test_bpftool_synctypes.py, one test is to compare uapi bpf.h map types and
bpftool supported maps. The tool picks 'cgroup_storage_deprecated' from bpf.h
while bpftool supported map is displayed as 'cgroup_storage'. The test failure
can be fixed by explicitly replacing 'cgroup_storage_deprecated' with 'cgroup_storage'
in uapi bpf.h map types.
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20221026163014.470732-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When running tests, we should probably accept any help we can get when
it comes to detecting issues early or making them more debuggable. We
have seen a few cases where a test_progs_noalu32 run, for example,
encountered a soft lockup and stopped making progress. It was only
interrupted once we hit the overall test timeout [0]. We can not and do
not want to necessarily rely on test timeouts, because those rely on
infrastructure provided by the environment we run in (and which is not
present in tools/testing/selftests/bpf/vmtest.sh, for example).
To that end, let's enable panics on soft as well as hard lockups to fail
fast should we encounter one. That's happening in the configuration
indented to be used for selftests (including when using vmtest.sh or
when running in BPF CI).
[0] https://github.com/kernel-patches/bpf/runs/7844499997
Signed-off-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/r/20221025231546.811766-1-deso@posteo.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yonghong Song says:
====================
There already exists a local storage implementation for cgroup-attached
bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
bpf_get_local_storage(). But there are use cases such that non-cgroup
attached bpf progs wants to access cgroup local storage data. For example,
tc egress prog has access to sk and cgroup. It is possible to use
sk local storage to emulate cgroup local storage by storing data in socket.
But this is a waste as it could be lots of sockets belonging to a particular
cgroup. Alternatively, a separate map can be created with cgroup id as the key.
But this will introduce additional overhead to manipulate the new map.
A cgroup local storage, similar to existing sk/inode/task storage,
should help for this use case.
This patch implemented new cgroup local storage available to
non-cgroup-attached bpf programs. In the patch series, Patches 1 and 2
are preparation patches. Patch 3 implemented new cgroup local storage
kernel support. Patches 4 and 5 implemented libbpf and bpftool support.
Patches 6-8 fixed one existing test and added four new tests to validate
kernel/libbpf implementations. Patch 9 added documentation for new
BPF_MAP_TYPE_CGRP_STORAGE map type and comparison of the old and new
cgroup local storage maps.
Changelogs:
v5 -> v6:
. fix selftest test_libbpf_str/bpf_map_type_str due to marking
BPF_MAP_TYPE_CGROUP_STORAGE as deprecated.
. add cgrp_local_storage test in s390x denylist since the test
has some fentry/fexit programs.
v4 -> v5:
. additional refactoring in patch 2
. fix the call site for bpf_cgrp_storage_free() in kernel/cgroup/cgroup.c.
. add a test for progs attaching to cgroups
. add a negative test (the helper key is a task instead of expected cgroup)
. some spelling fixes
v3 -> v4:
. fix a config guarding problem in kernel/cgroup/cgroup.c when
cgrp_storage is deleted (CONFIG_CGROUP_BPF => CONFIG_BPF_SYSCALL).
. rename selftest from cgroup_local_storage.c to cgrp_local_storage.c
so the name can better align with map name.
. fix a few misspellings.
v2 -> v3:
. fix a config caused kernel test complaint.
. better description/comments in uapi bpf.h and bpf_cgrp_storage.c.
. factor code for better resue for map_alloc/map_free.
. improved explanation in map documentation.
v1 -> v2:
. change map name from BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE to
BPF_MAP_TYPE_CGRP_STORAGE.
. removed support of sleepable programs.
. changed the place of freeing cgrp local storage from put_css_set_locked()
to css_free_rwork_fn().
. added map documentation.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add some descriptions and examples for BPF_MAP_TYPE_CGRP_STORAGE.
Also illustate the major difference between BPF_MAP_TYPE_CGRP_STORAGE
and BPF_MAP_TYPE_CGROUP_STORAGE and recommend to use
BPF_MAP_TYPE_CGRP_STORAGE instead of BPF_MAP_TYPE_CGROUP_STORAGE
in the end.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042922.676383-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test cgrp_local_storage have some programs utilizing trampoline.
Arch s390x does not support trampoline so add the test to
the corresponding DENYLIST file.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042917.675685-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add four tests for new cgroup local storage, (1) testing bpf program helpers
and user space map APIs, (2) testing recursive fentry triggering won't deadlock,
(3) testing progs attached to cgroups, and (4) a negative test if the
bpf_cgrp_storage_get() helper key is not a cgroup btf id.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042911.675546-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previous bpf patch made a change to uapi bpf.h like
@@ -922,7 +922,14 @@ enum bpf_map_type {
BPF_MAP_TYPE_SOCKHASH,
- BPF_MAP_TYPE_CGROUP_STORAGE,
+ BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
+ BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
where BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED and BPF_MAP_TYPE_CGROUP_STORAGE
have the same enum value. This will cause selftest test_libbpf_str/bpf_map_type_str
failing. This patch fixed the issue by avoid the check for
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED in the test.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042906.674830-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add support for new cgroup local storage
Acked-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042901.674177-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add support for new cgroup local storage.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042856.673989-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Similar to sk/inode/task storage, implement similar cgroup local storage.
There already exists a local storage implementation for cgroup-attached
bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
bpf_get_local_storage(). But there are use cases such that non-cgroup
attached bpf progs wants to access cgroup local storage data. For example,
tc egress prog has access to sk and cgroup. It is possible to use
sk local storage to emulate cgroup local storage by storing data in socket.
But this is a waste as it could be lots of sockets belonging to a particular
cgroup. Alternatively, a separate map can be created with cgroup id as the key.
But this will introduce additional overhead to manipulate the new map.
A cgroup local storage, similar to existing sk/inode/task storage,
should help for this use case.
The life-cycle of storage is managed with the life-cycle of the
cgroup struct. i.e. the storage is destroyed along with the owning cgroup
with a call to bpf_cgrp_storage_free() when cgroup itself
is deleted.
The userspace map operations can be done by using a cgroup fd as a key
passed to the lookup, update and delete operations.
Typically, the following code is used to get the current cgroup:
struct task_struct *task = bpf_get_current_task_btf();
... task->cgroups->dfl_cgrp ...
and in structure task_struct definition:
struct task_struct {
....
struct css_set __rcu *cgroups;
....
}
With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
So the current implementation only supports non-sleepable program and supporting
sleepable program will be the next step together with adding rcu_read_lock
protection for rcu tagged structures.
Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
for cgroup storage available to non-cgroup-attached bpf programs. The old
cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
functionality. While old cgroup storage pre-allocates storage memory, the new
mechanism can also pre-allocate with a user space bpf_map_update_elem() call
to avoid potential run-time memory allocation failure.
Therefore, the new cgroup storage can provide all functionality w.r.t.
the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
be deprecated since the new one can provide the same functionality.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Refactor codes so that inode/task/sk storage implementation
can maximally share the same code. I also added some comments
in new function bpf_local_storage_unlink_nolock() to make
codes easy to understand. There is no functionality change.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042845.672944-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make struct cgroup btf id global so later patch can reuse
the same btf id.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042840.672602-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin KaFai Lau says:
====================
From: Martin KaFai Lau <martin.lau@kernel.org>
The commit bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
added deadlock detection to avoid a tracing program from recurring
on the bpf_task_storage_{get,delete}() helpers. These helpers acquire
a spin lock and it will lead to deadlock.
It is unnecessary for the bpf_lsm and bpf_iter programs which do
not recur. The situation is the same as the existing
bpf_pid_task_storage_{lookup,delete}_elem() which are
used in the syscall and they also do not have deadlock detection.
This set is to add new bpf_task_storage_{get,delete}() helper proto
without the deadlock detection. The set also removes the prog->active
check from the bpf_lsm and bpf_iter program. Please see the individual
patch for details.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch modifies the task_ls_recursion test to check that
the first bpf_task_storage_get(&map_a, ...) in BPF_PROG(on_update)
can still do the lockless lookup even it cannot acquire the percpu
busy lock. If the lookup succeeds, it will increment the value
by 1 and the value in the task storage map_a will become 200+1=201.
After that, BPF_PROG(on_update) tries to delete from map_a and
should get -EBUSY because it cannot acquire the percpu busy lock
after finding the data.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-10-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a test to check for deadlock failure
in bpf_task_storage_{get,delete} when called by a sleepable bpf_lsm prog.
It also checks if the prog_info.recursion_misses is non zero.
The test starts with 32 threads and they are affinitized to one cpu.
In my qemu setup, with CONFIG_PREEMPT=y, I can reproduce it within
one second if it is run without the previous patches of this set.
Here is the test error message before adding the no deadlock detection
version of the bpf_task_storage_{get,delete}:
test_nodeadlock:FAIL:bpf_task_storage_get busy unexpected bpf_task_storage_get busy: actual 2 != expected 0
test_nodeadlock:FAIL:bpf_task_storage_delete busy unexpected bpf_task_storage_delete busy: actual 2 != expected 0
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
The situation is similar to the bpf_pid_task_storage_delete_elem()
which is called from the syscall map_delete_elem. It does not need
deadlock detection. Otherwise, it will cause unnecessary failure
when calling the bpf_task_storage_delete() helper.
This patch adds bpf_task_storage_delete proto that does not do deadlock
detection. It will be used by bpf_lsm and bpf_iter program.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-8-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Similar to the earlier change in bpf_task_storage_get_recur.
This patch changes bpf_task_storage_delete_recur such that it
does the lookup first. It only returns -EBUSY if it needs to
take the spinlock to do the deletion when potential deadlock
is detected.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-7-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The bpf_lsm and bpf_iter do not recur that will cause a deadlock.
The situation is similar to the bpf_pid_task_storage_lookup_elem()
which is called from the syscall map_lookup_elem. It does not need
deadlock detection. Otherwise, it will cause unnecessary failure
when calling the bpf_task_storage_get() helper.
This patch adds bpf_task_storage_get proto that does not do deadlock
detection. It will be used by bpf_lsm and bpf_iter programs.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_task_storage_get() does a lookup and optionally inserts
new data if BPF_LOCAL_STORAGE_GET_F_CREATE is present.
During lookup, it will cache the lookup result and caching requires to
acquire a spinlock. When potential deadlock is detected (by the
bpf_task_storage_busy pcpu-counter added in
commit bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")),
the current behavior is returning NULL immediately to avoid deadlock. It is
too pessimistic. This patch will go ahead to do a lookup (which is a
lockless operation) but it will avoid caching it in order to avoid
acquiring the spinlock.
When lookup fails to find the data and BPF_LOCAL_STORAGE_GET_F_CREATE
is set, an insertion is needed and this requires acquiring a spinlock.
This patch will still return NULL when a potential deadlock is detected.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch creates a new function __bpf_task_storage_get() and
moves the core logic of the existing bpf_task_storage_get()
into this new function. This new function will be shared
by another new helper proto in the latter patch.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds the "_recur" naming to the bpf_task_storage_{get,delete}
proto. In a latter patch, they will only be used by the tracing
programs that requires a deadlock detection because a tracing
prog may use bpf_task_storage_{get,delete} recursively and cause a
deadlock.
Another following patch will add a different helper proto for the non
tracing programs because they do not need the deadlock prevention.
This patch does this rename to prepare for this future proto
additions.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The commit 64696c40d03c ("bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline")
removed prog->active check for struct_ops prog. The bpf_lsm
and bpf_iter is also using trampoline. Like struct_ops, the bpf_lsm
and bpf_iter have fixed hooks for the prog to attach. The
kernel does not call the same hook in a recursive way.
This patch also removes the prog->active check for
bpf_lsm and bpf_iter.
A later patch has a test to reproduce the recursion issue
for a sleepable bpf_lsm program.
This patch appends the '_recur' naming to the existing
enter and exit functions that track the prog->active counter.
New __bpf_prog_{enter,exit}[_sleepable] function are
added to skip the prog->active tracking. The '_struct_ops'
version is also removed.
It also moves the decision on picking the enter and exit function to
the new bpf_trampoline_{enter,exit}(). It returns the '_recur' ones
for all tracing progs to use. For bpf_lsm, bpf_iter,
struct_ops (no prog->active tracking after 64696c40d03c), and
bpf_lsm_cgroup (no prog->active tracking after 69fd337a975c7),
it will return the functions that don't track the prog->active.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221025184524.3526117-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When examining module BTF, it is common to see core kernel structures
such as sk_buff, net_device duplicated in the module. After adding
debug messaging to BTF it turned out that much of the problem
was down to the identical struct test failing during deduplication;
sometimes the compiler adds identical structs. However
it turns out sometimes that type ids of identical struct members
can also differ, even when the containing structs are still identical.
To take an example, for struct sk_buff, debug messaging revealed
that the identical struct matching was failing for the anon
struct "headers"; specifically for the first field:
__u8 __pkt_type_offset[0]; /* 128 0 */
Looking at the code in BTF deduplication, we have code that guards
against the possibility of identical struct definitions, down to
type ids, and identical array definitions. However in this case
we have a struct which is being defined twice but does not have
identical type ids since each duplicate struct has separate type
ids for the above array member. A similar problem (though not
observed) could occur for struct-in-struct.
The solution is to make the "identical struct" test check members
not just for matching ids, but to also check if they in turn are
identical structs or arrays.
The results of doing this are quite dramatic (for some modules
at least); I see the number of type ids drop from around 10000
to just over 1000 in one module for example.
For testing use latest pahole or apply [1], otherwise dedups
can fail for the reasons described there.
Also fix return type of btf_dedup_identical_arrays() as
suggested by Andrii to match boolean return type used
elsewhere.
Fixes: efdd3eb8015e ("libbpf: Accommodate DWARF/compiler bug with duplicated structs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@oracle.com
[1] https://lore.kernel.org/bpf/1666364523-9648-1-git-send-email-alan.maguire
Jiri Olsa says:
====================
hi,
Martynas reported kprobe _multi link does not resolve symbols
from kernel modules, which attach by address works.
In addition while fixing that I realized we do not take module
reference if the module has kprobe_multi link on top of it and
can be removed.
There's mo crash related to this, it will silently disappear from
ftrace tables, while kprobe_multi link stays up with no data.
This patchset has fixes for both issues.
v3 changes:
- reorder fields in struct bpf_kprobe_multi_link [Andrii]
- added ack [Andrii]
v2 changes:
- added acks (Song)
- added comment to kallsyms_callback (Song)
- change module_callback realloc logic (Andrii)
- get rid of macros in tests (Andrii)
thanks,
jirka
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding kprobe_multi kmod attach api tests that attach bpf_testmod
functions via bpf_program__attach_kprobe_multi_opts.
Running it as serial test, because we don't want other tests to
reload bpf_testmod while it's running.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding test that makes sure the kernel module won't be removed
if there's kprobe multi link defined on top of it.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding 3 bpf_testmod_fentry_* functions to have a way to test
kprobe multi link on kernel module. They follow bpf_fentry_test*
functions prototypes/code.
Adding equivalent functions to all bpf_fentry_test* does not
seems necessary at the moment, could be added later.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding load_kallsyms_refresh function to re-read symbols from
/proc/kallsyms file.
This will be needed to get proper functions addresses from
bpf_testmod.ko module, which is loaded/unloaded several times
during the tests run, so symbols might be already old when
we need to use them.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently we allow to create kprobe multi link on function from kernel
module, but we don't take the module reference to ensure it's not
unloaded while we are tracing it.
The multi kprobe link is based on fprobe/ftrace layer which takes
different approach and releases ftrace hooks when module is unloaded
even if there's tracer registered on top of it.
Adding code that gathers all the related modules for the link and takes
their references before it's attached. All kernel module references are
released after link is unregistered.
Note that we do it the same way already for trampoline probes
(but for single address).
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Renaming __bpf_kprobe_multi_cookie_cmp to bpf_kprobe_multi_addrs_cmp,
because it's more suitable to current and upcoming code.
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently ftrace_lookup_symbols iterates only over core symbols,
adding module_kallsyms_on_each_symbol call to check on modules
symbols as well.
Also removing 'args.found == args.cnt' condition, because it's
already checked in kallsyms_callback function.
Also removing 'err < 0' check, because both *kallsyms_on_each_symbol
functions do not return error.
Reported-by: Martynas Pumputis <m@lambda.lt>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Making module_kallsyms_on_each_symbol generally available, so it
can be used outside CONFIG_LIVEPATCH option in following changes.
Rather than adding another ifdef option let's make the function
generally available (when CONFIG_KALLSYMS and CONFIG_MODULES
options are defined).
Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221025134148.3300700-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Quentin Monnet says:
====================
To disassemble instructions for JIT-ed programs, bpftool has relied on the
libbfd library. This has been problematic in the past: libbfd's interface
is not meant to be stable and has changed several times, hence the
detection of the two related features from the Makefile
(disassembler-four-args and disassembler-init-styled). When it comes to
shipping bpftool, this has also caused issues with several distribution
maintainers unwilling to support the feature (for example, Debian's page
for binutils-dev, libbfd's package, says: "Note that building Debian
packages which depend on the shared libbfd is Not Allowed.").
This patchset adds support for LLVM as the primary library for
disassembling instructions for JIT-ed programs.
We keep libbfd as a fallback. One reason for this is that currently it
works well, we have all we need in terms of features detection in the
Makefile, so it provides a fallback for disassembling JIT-ed programs if
libbfd is installed but LLVM is not. The other reason is that libbfd
supports nfp instruction for Netronome's SmartNICs and can be used to
disassemble offloaded programs, something that LLVM cannot do (Niklas
confirmed that the feature is still in use). However, if libbfd's interface
breaks again in the future, we might reconsider keeping support for it.
v4:
- Rebase to address a conflict with commit 2c76238eaddd ("bpftool: Add
"bootstrap" feature to version output").
v3:
- Extend commit description (patch 6) with notes on llvm-dev and LLVM's
disassembler stability.
v2:
- Pass callback when creating the LLVM disassembler, so that the
branch targets are printed as addresses (instead of byte offsets).
- Add last commit to "support" other arch with LLVM, although we don't
know any supported triple yet.
- Use $(LLVM_CONFIG) instead of llvm-config in Makefile.
- Pass components to llvm-config --libs to limit the number of
libraries to pass on the command line, in Makefile.
- Rebase split of FEATURE_TESTS and FEATURE_DISPLAY in Makefile.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Similarly to "libbfd", add a "llvm" feature to the output of command
"bpftool version" to indicate that LLVM is used for disassembling JIT-ed
programs. This feature is mutually exclusive (from Makefile definitions)
with "libbfd".
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Tested-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221025150329.97371-9-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For offloaded BPF programs, instead of failing to create the
LLVM disassembler without even looking for a triple at all, do run the
function that attempts to retrieve a valid architecture name for the
device.
It will still fail for the LLVM disassembler, because currently we have
no valid triple to return (NFP disassembly is not supported by LLVM).
But failing in that function is more logical than to assume in
jit_disasm.c that passing an "arch" name is simply not supported.
Suggested-by: Song Liu <song@kernel.org>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20221025150329.97371-8-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
To disassemble instructions for JIT-ed programs, bpftool has relied on
the libbfd library. This has been problematic in the past: libbfd's
interface is not meant to be stable and has changed several times. For
building bpftool, we have to detect how the libbfd version on the system
behaves, which is why we have to handle features disassembler-four-args
and disassembler-init-styled in the Makefile. When it comes to shipping
bpftool, this has also caused issues with several distribution
maintainers unwilling to support the feature (see for example Debian's
page for binutils-dev, which ships libbfd: "Note that building Debian
packages which depend on the shared libbfd is Not Allowed." [0]).
For these reasons, we add support for LLVM as an alternative to libbfd
for disassembling instructions of JIT-ed programs. Thanks to the
preparation work in the previous commits, it's easy to add the library
by passing the relevant compilation options in the Makefile, and by
adding the functions for setting up the LLVM disassembler in file
jit_disasm.c.
The LLVM disassembler requires the LLVM development package (usually
llvm-dev or llvm-devel).
The expectation is that the interface for this disassembler will be more
stable. There is a note in LLVM's Developer Policy [1] stating that the
stability for the C API is "best effort" and not guaranteed, but at
least there is some effort to keep compatibility when possible (which
hasn't really been the case for libbfd so far). Furthermore, the Debian
page for the related LLVM package does not caution against linking to
the lib, as binutils-dev page does.
Naturally, the display of disassembled instructions comes with a few
minor differences. Here is a sample output with libbfd (already
supported before this patch):
# bpftool prog dump jited id 56
bpf_prog_6deef7357e7b4530:
0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax
7: push %rbp
8: mov %rsp,%rbp
b: push %rbx
c: push %r13
e: push %r14
10: mov %rdi,%rbx
13: movzwq 0xb4(%rbx),%r13
1b: xor %r14d,%r14d
1e: or $0x2,%r14d
22: mov $0x1,%eax
27: cmp $0x2,%r14
2b: jne 0x000000000000002f
2d: xor %eax,%eax
2f: pop %r14
31: pop %r13
33: pop %rbx
34: leave
35: ret
LLVM supports several variants that we could set when initialising the
disassembler, for example with:
LLVMSetDisasmOptions(*ctx,
LLVMDisassembler_Option_AsmPrinterVariant);
but the default printer is used for now. Here is the output with LLVM:
# bpftool prog dump jited id 56
bpf_prog_6deef7357e7b4530:
0: nopl (%rax,%rax)
5: nop
7: pushq %rbp
8: movq %rsp, %rbp
b: pushq %rbx
c: pushq %r13
e: pushq %r14
10: movq %rdi, %rbx
13: movzwq 180(%rbx), %r13
1b: xorl %r14d, %r14d
1e: orl $2, %r14d
22: movl $1, %eax
27: cmpq $2, %r14
2b: jne 0x2f
2d: xorl %eax, %eax
2f: popq %r14
31: popq %r13
33: popq %rbx
34: leave
35: retq
The LLVM disassembler comes as the default choice, with libbfd as a
fall-back.
Of course, we could replace libbfd entirely and avoid supporting two
different libraries. One reason for keeping libbfd is that, right now,
it works well, we have all we need in terms of features detection in the
Makefile, so it provides a fallback for disassembling JIT-ed programs if
libbfd is installed but LLVM is not. The other motivation is that libbfd
supports nfp instruction for Netronome's SmartNICs and can be used to
disassemble offloaded programs, something that LLVM cannot do. If
libbfd's interface breaks again in the future, we might reconsider
keeping support for it.
[0] https://packages.debian.org/buster/binutils-dev
[1] https://llvm.org/docs/DeveloperPolicy.html#c-api-changes
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Tested-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221025150329.97371-7-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Refactor disasm_print_insn() to extract the code specific to libbfd and
move it to dedicated functions. There is no functional change. This is
in preparation for supporting an alternative library for disassembling
the instructions.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Tested-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221025150329.97371-6-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Bpftool uses libbfd for disassembling JIT-ed programs. But the feature
is optional, and the tool can be compiled without libbfd support. The
Makefile sets the relevant variables accordingly. It also sets variables
related to libbfd's interface, given that it has changed over time.
Group all those libbfd-related definitions so that it's easier to
understand what we are testing for, and only use variables related to
libbfd's interface if we need libbfd in the first place.
In addition to make the Makefile clearer, grouping the definitions
related to disassembling JIT-ed programs will help support alternatives
to libbfd.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Tested-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221025150329.97371-5-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make FEATURE_TESTS and FEATURE_DISPLAY easier to read and less likely to
be subject to conflicts on updates by having one feature per line.
Suggested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Tested-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221025150329.97371-4-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The JIT disassembler in bpftool is the only components (with the JSON
writer) using asserts to check the return values of functions. But it
does not do so in a consistent way, and diasm_print_insn() returns no
value, although sometimes the operation failed.
Remove the asserts, and instead check the return values, print messages
on errors, and propagate the error to the caller from prog.c.
Remove the inclusion of assert.h from jit_disasm.c, and also from map.c
where it is unused.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Tested-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221025150329.97371-3-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
_GNU_SOURCE is defined in several source files for bpftool, but only one
of them takes the precaution of checking whether the value is already
defined. Add #ifndef for other occurrences too.
This is in preparation for the support of disassembling JIT-ed programs
with LLVM, with $(llvm-config --cflags) passing -D_GNU_SOURCE as a
compilation argument.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Tested-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221025150329.97371-2-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Modify iter prog in existing bpf_iter_bpf_array_map.c, which currently
dumps arraymap key/val, to also do a write of (val, key) into a
newly-added hashmap. Confirm that the write succeeds as expected by
modifying the userspace runner program.
Before a change added in an earlier commit - considering PTR_TO_BUF reg
a valid input to helpers which expect MAP_{KEY,VAL} - the verifier
would've rejected this prog change due to type mismatch. Since using
current iter's key/val to access a separate map is a reasonable usecase,
let's add support for it.
Note that the test prog cannot directly write (val, key) into hashmap
via bpf_map_update_elem when both come from iter context because key is
marked MEM_RDONLY. This is due to bpf_map_update_elem - and other basic
map helpers - taking ARG_PTR_TO_MAP_{KEY,VALUE} w/o MEM_RDONLY type
flag. bpf_map_{lookup,update,delete}_elem don't modify their
input key/val so it should be possible to tag their args READONLY, but
due to the ubiquitous use of these helpers and verifier checks for
type == MAP_VALUE, such a change is nontrivial and seems better to
address in a followup series.
Also fixup some 'goto's in test runner's map checking loop.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221020160721.4030492-4-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test_ringbuf_map_key test prog, borrowing heavily from extant
test_ringbuf.c. The program tries to use the result of
bpf_ringbuf_reserve as map_key, which was not possible before previouis
commits in this series. The test runner added to prog_tests/ringbuf.c
verifies that the program loads and does basic sanity checks to confirm
that it runs as expected.
Also, refactor test_ringbuf such that runners for existing test_ringbuf
and newly-added test_ringbuf_map_key are subtests of 'ringbuf' top-level
test.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221020160721.4030492-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
After the previous patch, which added PTR_TO_MEM | MEM_ALLOC type
map_key_value_types, the only difference between map_key_value_types and
mem_types sets is PTR_TO_BUF and PTR_TO_MEM, which are in the latter set
but not the former.
Helpers which expect ARG_PTR_TO_MAP_KEY or ARG_PTR_TO_MAP_VALUE
already effectively expect a valid blob of arbitrary memory that isn't
necessarily explicitly associated with a map. When validating a
PTR_TO_MAP_{KEY,VALUE} arg, the verifier expects meta->map_ptr to have
already been set, either by an earlier ARG_CONST_MAP_PTR arg, or custom
logic like that in process_timer_func or process_kptr_func.
So let's get rid of map_key_value_types and just use mem_types for those
args.
This has the effect of adding PTR_TO_BUF and PTR_TO_MEM to the set of
compatible types for ARG_PTR_TO_MAP_KEY and ARG_PTR_TO_MAP_VALUE.
PTR_TO_BUF is used by various bpf_iter implementations to represent a
chunk of valid r/w memory in ctx args for iter prog.
PTR_TO_MEM is used by networking, tracing, and ringbuf helpers to
represent a chunk of valid memory. The PTR_TO_MEM | MEM_ALLOC
type added in previous commit is specific to ringbuf helpers.
Presence or absence of MEM_ALLOC doesn't change the validity of using
PTR_TO_MEM as a map_{key,val} input.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221020160721.4030492-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds support for the following pattern:
struct some_data *data = bpf_ringbuf_reserve(&ringbuf, sizeof(struct some_data, 0));
if (!data)
return;
bpf_map_lookup_elem(&another_map, &data->some_field);
bpf_ringbuf_submit(data);
Currently the verifier does not consider bpf_ringbuf_reserve's
PTR_TO_MEM | MEM_ALLOC ret type a valid key input to bpf_map_lookup_elem.
Since PTR_TO_MEM is by definition a valid region of memory, it is safe
to use it as a key for lookups.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221020160721.4030492-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Manu Bretelle says:
====================
This patchset adds initial support for running BPF's vmtest on aarch64
architecture.
It includes a `config.aarch64` heavily based on `config.s390x`
Makes vmtest.sh handle aarch64 and set QEMU variables to values that
works on that arch.
Finally, it provides a DENYLIST.aarch64 that takes care of currently
broken tests on aarch64 so the vmtest run passes.
This was tested by running:
LLVM_STRIP=llvm-strip-16 CLANG=clang-16 \
tools/testing/selftests/bpf/vmtest.sh -- \
./test_progs -d \
\"$(cat tools/testing/selftests/bpf/DENYLIST{,.aarch64} \
| cut -d'#' -f1 \
| sed -e 's/^[[:space:]]*//' \
-e 's/[[:space:]]*$//' \
| tr -s '\n' ','\
)\"
on an aarch64 host.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>