IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
BPF object files are, in a way, the final artifact produced as part of
the ahead-of-time compilation process. That makes them somewhat special
compared to "regular" object files, which are a intermediate build
artifacts that can typically be removed safely. As such, it can make
sense to name them differently to make it easier to spot this difference
at a glance.
Among others, libbpf-bootstrap [0] has established the extension .bpf.o
for BPF object files. It seems reasonable to follow this example and
establish the same denomination for selftest build artifacts. To that
end, this change adjusts the corresponding part of the build system and
the test programs loading BPF object files to work with .bpf.o files.
[0] https://github.com/libbpf/libbpf-bootstrap
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220901222253.1199242-1-deso@posteo.net
Introduce new mode to xdpxceiver responsible for testing AF_XDP zero
copy support of driver that serves underlying physical device. When
setting up test suite, determine whether driver has ZC support or not by
trying to bind XSK ZC socket to the interface. If it succeeded,
interpret it as ZC support being in place and do softirq and busy poll
tests for zero copy mode.
Note that Rx dropped tests are skipped since ZC path is not touching
rx_dropped stat at all.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-7-maciej.fijalkowski@intel.com
For single threaded poll tests call pthread_kill() from main thread so
that we are sure worker thread has finished its job and it is possible
to proceed with next test types from test suite. It was observed that on
some platforms it takes a bit longer for worker thread to exit and next
test case sees device as busy in this case.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-6-maciej.fijalkowski@intel.com
Currently, architecture of xdpxceiver is designed strictly for
conducting veth based tests. Veth pair is created together with a
network namespace and one of the veth interfaces is moved to the
mentioned netns. Then, separate threads for Tx and Rx are spawned which
will utilize described setup.
Infrastructure described in the paragraph above can not be used for
testing AF_XDP support on physical devices. That testing will be
conducted on a single network interface and same queue. Xskxceiver
needs to be extended to distinguish between veth tests and physical
interface tests.
Since same iface/queue id pair will be used by both Tx/Rx threads for
physical device testing, Tx thread, which happen to run after the Rx
thread, is going to create XSK socket with shared umem flag. In order to
track this setting throughout the lifetime of spawned threads, introduce
'shared_umem' boolean variable to struct ifobject and set it to true
when xdpxceiver is run against physical device. In such case, UMEM size
needs to be doubled, so half of it will be used by Rx thread and other
half by Tx thread. For two step based test types, value of XSKMAP
element under key 0 has to be updated as there is now another socket for
the second step. Also, to avoid race conditions when destroying XSK
resources, move this activity to the main thread after spawned Rx and Tx
threads have finished its job. This way it is possible to gracefully
remove shared umem without introducing synchronization mechanisms.
To run xsk selftests suite on physical device, append "-i $IFACE" when
invoking test_xsk.sh. For veth based tests, simply skip it. When "-i
$IFACE" is in place, under the hood test_xsk.sh will use $IFACE for both
interfaces supplied to xdpxceiver, which in turn will interpret that
this execution of test suite is for a physical device.
Note that currently this makes it possible only to test SKB and DRV mode
(in case underlying device has native XDP support). ZC testing support
is added in a later patch.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-5-maciej.fijalkowski@intel.com
So that "enp240s0f0" or such name can be used against xskxceiver.
While at it, also extend character count for netns name.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-4-maciej.fijalkowski@intel.com
In order to prepare xdpxceiver for physical device testing, let us
introduce default Rx pkt stream. Reason for doing it is that physical
device testing will use a UMEM with a doubled size where half of it will
be used by Tx and other half by Rx. This means that pkt addresses will
differ for Tx and Rx streams. Rx thread will initialize the
xsk_umem_info::base_addr that is added here so that pkt_set(), when
working on Rx UMEM will add this offset and second half of UMEM space
will be used. Note that currently base_addr is 0 on both sides. Future
commit will do the mentioned initialization.
Previously, veth based testing worked on separate UMEMs, so single
default stream was fine.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-3-maciej.fijalkowski@intel.com
Currently, xdpxceiver assumes that underlying device supports XDP in
native mode - it is fine by now since tests can run only on a veth pair.
Future commit is going to allow running test suite against physical
devices, so let us query the device if it is capable of running XDP
programs in native mode. This way xdpxceiver will not try to run
TEST_MODE_DRV if device being tested is not supporting it.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220901114813.16275-2-maciej.fijalkowski@intel.com
Get the tunnel flags in {ipv6}vxlan_get_tunnel_src and ensure they are
aligned with tunnel params set at {ipv6}vxlan_set_tunnel_dst.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831144010.174110-2-shmulik.ladkani@gmail.com
Under full preemptible kernel, task local storage lookup operations on
the same CPU may update per-cpu bpf_task_storage_busy concurrently. If
the update of bpf_task_storage_busy is not preemption safe, the final
value of bpf_task_storage_busy may become not-zero forever and
bpf_task_storage_trylock() will always fail. So add a test case to
ensure the update of bpf_task_storage_busy is preemption safe.
Will skip the test case when CONFIG_PREEMPT is disabled, and it can only
reproduce the problem probabilistically. By increasing
TASK_STORAGE_MAP_NR_LOOP and running it under ARM64 VM with 4-cpus, it
takes about four rounds to reproduce:
> test_maps is modified to only run test_task_storage_map_stress_lookup()
$ export TASK_STORAGE_MAP_NR_THREAD=256
$ export TASK_STORAGE_MAP_NR_LOOP=81920
$ export TASK_STORAGE_MAP_PIN_CPU=1
$ time ./test_maps
test_task_storage_map_stress_lookup(135):FAIL:bad bpf_task_storage_busy got -2
real 0m24.743s
user 0m6.772s
sys 0m17.966s
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20220901061938.3789460-5-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
sys_pidfd_open() is defined twice in both test_bprm_opts.c and
test_local_storage.c, so move it to a common header file. And it will be
used in map_tests as well.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20220901061938.3789460-4-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
One test demonstrates the reentrancy of hash map update on the same
bucket should fail, and another one shows concureently updates of
the same hash map bucket should succeed and not fail due to
the reentrancy checking for bucket lock.
There is no trampoline support on s390x, so move htab_update to
denylist.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220831042629.130006-4-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch adds a test to ensure bpf_setsockopt(TCP_CONGESTION, "not_exist")
will not trigger the kernel module autoload.
Before the fix:
[ 40.535829] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
[...]
[ 40.552134] tcp_ca_find_autoload.constprop.0+0xcb/0x200
[ 40.552689] tcp_set_congestion_control+0x99/0x7b0
[ 40.553203] do_tcp_setsockopt+0x3ed/0x2240
[...]
[ 40.556041] __bpf_setsockopt+0x124/0x640
Signed-off-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220830231953.792412-1-martin.lau@linux.dev
There is a potential for us to hit a type conflict when including
netinet/tcp.h and sys/socket.h, we can replace both of these includes
with linux/tcp.h and bpf_tcp_helpers.h to avoid this conflict.
Fixes errors like the below when compiling with gcc BPF backend:
In file included from /usr/include/netinet/tcp.h:91,
from progs/connect4_prog.c:11:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char'
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/connect4_prog.c:10:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int'
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220829154710.3870139-1-james.hilliard1@gmail.com
There is a potential for us to hit a type conflict when including
netinet/tcp.h with sys/socket.h, we can remove these as they are not
actually needed.
Fixes errors like the below when compiling with gcc BPF backend:
In file included from /usr/include/netinet/tcp.h:91,
from progs/bind4_prog.c:10:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char'
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/bind4_prog.c:9:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int'
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/bind4_prog.o] Error 1
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220826052925.980431-1-james.hilliard1@gmail.com
Due to bpf_map_lookup_elem being declared static we need to also
declare subprog_noise as static.
Fixes the following error:
progs/tailcall_bpf2bpf4.c:26:9: error: 'bpf_map_lookup_elem' is static but used in inline function 'subprog_noise' which is not static [-Werror]
26 | bpf_map_lookup_elem(&nop_table, &key);
| ^~~~~~~~~~~~~~~~~~~
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220826035141.737919-1-james.hilliard1@gmail.com
The sys/socket.h header isn't required to build test_tc_dtime and may
cause a type conflict.
Fixes the following error:
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/test_tc_dtime.c:18:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: error: conflicting types for 'int8_t'; have '__int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
In file included from progs/test_tc_dtime.c:5:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'char'}
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: error: conflicting types for 'int64_t'; have '__int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long int'}
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/test_tc_dtime.o] Error 1
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Link: https://lore.kernel.org/r/20220826050703.869571-1-james.hilliard1@gmail.com
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
bpf_cgroup_iter_order is globally visible but the entries do not have
CGROUP prefix. As requested by Andrii, put a CGROUP in the names
in bpf_cgroup_iter_order.
This patch fixes two previous commits: one introduced the API and
the other uses the API in bpf selftest (that is, the selftest
cgroup_hierarchical_stats).
I tested this patch via the following command:
test_progs -t cgroup,iter,btf_dump
Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
Fixes: 88886309d2e8 ("selftests/bpf: add a selftest for cgroup hierarchical stats collection")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220825223936.1865810-1-haoluo@google.com
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Add a selftest that tests the whole workflow for collecting,
aggregating (flushing), and displaying cgroup hierarchical stats.
TL;DR:
- Userspace program creates a cgroup hierarchy and induces memcg reclaim
in parts of it.
- Whenever reclaim happens, vmscan_start and vmscan_end update
per-cgroup percpu readings, and tell rstat which (cgroup, cpu) pairs
have updates.
- When userspace tries to read the stats, vmscan_dump calls rstat to flush
the stats, and outputs the stats in text format to userspace (similar
to cgroupfs stats).
- rstat calls vmscan_flush once for every (cgroup, cpu) pair that has
updates, vmscan_flush aggregates cpu readings and propagates updates
to parents.
- Userspace program makes sure the stats are aggregated and read
correctly.
Detailed explanation:
- The test loads tracing bpf programs, vmscan_start and vmscan_end, to
measure the latency of cgroup reclaim. Per-cgroup readings are stored in
percpu maps for efficiency. When a cgroup reading is updated on a cpu,
cgroup_rstat_updated(cgroup, cpu) is called to add the cgroup to the
rstat updated tree on that cpu.
- A cgroup_iter program, vmscan_dump, is loaded and pinned to a file, for
each cgroup. Reading this file invokes the program, which calls
cgroup_rstat_flush(cgroup) to ask rstat to propagate the updates for all
cpus and cgroups that have updates in this cgroup's subtree. Afterwards,
the stats are exposed to the user. vmscan_dump returns 1 to terminate
iteration early, so that we only expose stats for one cgroup per read.
- An ftrace program, vmscan_flush, is also loaded and attached to
bpf_rstat_flush. When rstat flushing is ongoing, vmscan_flush is invoked
once for each (cgroup, cpu) pair that has updates. cgroups are popped
from the rstat tree in a bottom-up fashion, so calls will always be
made for cgroups that have updates before their parents. The program
aggregates percpu readings to a total per-cgroup reading, and also
propagates them to the parent cgroup. After rstat flushing is over, all
cgroups will have correct updated hierarchical readings (including all
cpus and all their descendants).
- Finally, the test creates a cgroup hierarchy and induces memcg reclaim
in parts of it, and makes sure that the stats collection, aggregation,
and reading workflow works as expected.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-6-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch extends bpf selft cgroup_helpers [ID] n various ways:
- Add enable_controllers() that allows tests to enable all or a
subset of controllers for a specific cgroup.
- Add join_cgroup_parent(). The cgroup workdir is based on the pid,
therefore a spawned child cannot join the same cgroup hierarchy of the
test through join_cgroup(). join_cgroup_parent() is used in child
processes to join a cgroup under the parent's workdir.
- Add write_cgroup_file() and write_cgroup_file_parent() (similar to
join_cgroup_parent() above).
- Add get_root_cgroup() for tests that need to do checks on root cgroup.
- Distinguish relative and absolute cgroup paths in function arguments.
Now relative paths are called relative_path, and absolute paths are
called cgroup_path.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-5-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a selftest for cgroup_iter. The selftest creates a mini cgroup tree
of the following structure:
ROOT (working cgroup)
|
PARENT
/ \
CHILD1 CHILD2
and tests the following scenarios:
- invalid cgroup fd.
- pre-order walk over descendants from PARENT.
- post-order walk over descendants from PARENT.
- walk of ancestors from PARENT.
- process only a single object (i.e. PARENT).
- early termination.
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-3-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes:
- walking a cgroup's descendants in pre-order.
- walking a cgroup's descendants in post-order.
- walking a cgroup's ancestors.
- process only the given cgroup.
When attaching cgroup_iter, one can set a cgroup to the iter_link
created from attaching. This cgroup is passed as a file descriptor
or cgroup id and serves as the starting point of the walk. If no
cgroup is specified, the starting point will be the root cgroup v2.
For walking descendants, one can specify the order: either pre-order or
post-order. For walking ancestors, the walk starts at the specified
cgroup and ends at the root.
One can also terminate the walk early by returning 1 from the iter
program.
Note that because walking cgroup hierarchy holds cgroup_mutex, the iter
program is called with cgroup_mutex held.
Currently only one session is supported, which means, depending on the
volume of data bpf program intends to send to user space, the number
of cgroups that can be walked is limited. For example, given the current
buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each
cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can
be walked is 512. This is a limitation of cgroup_iter. If the output
data is larger than the kernel buffer size, after all data in the
kernel buffer is consumed by user space, the subsequent read() syscall
will signal EOPNOTSUPP. In order to work around, the user may have to
update their program to reduce the volume of data sent to output. For
example, skip some uninteresting cgroups. In future, we may extend
bpf_iter flags to allow customizing buffer size.
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
sizeof(new_cc) is not real memory size that new_cc points to; introduce
a new_cc_len to store the size and then pass it to bpf_setsockopt().
Fixes: 31123c0360e0 ("selftests/bpf: bpf_setsockopt tests")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220824013907.380448-1-yangyingliang@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The cb_refs BPF selftest is failing execution on s390x machines. This is
a newly added test that requires a feature not presently supported on
this architecture.
Denylist the test for this architecture.
Fixes: 3cf7e7d8685c ("selftests/bpf: Add tests for reference state fixes for callbacks")
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220824163906.1186832-1-deso@posteo.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
These are regression tests to ensure we don't end up in invalid runtime
state for helpers that execute callbacks multiple times. It exercises
the fixes to verifier callback handling for reference state in previous
patches.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220823013226.24988-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For each hook, have a simple bpf_set_retval(bpf_get_retval) program
and make sure it loads for the hooks we want. The exceptions are
the hooks which don't propagate the error to the callers:
- sockops
- recvmsg
- getpeername
- getsockname
- cg_skb ingress and egress
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220823222555.523590-6-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The dissector program returns BPF_FLOW_DISSECTOR_CONTINUE (and avoids
setting skb->flow_keys or last_dissection map) in case it encounters
IP packets whose (outer) source address is 127.0.0.127.
Additional test is added to prog_tests/flow_dissector.c which sets
this address as test's pkk.iph.saddr, with the expected retval of
BPF_FLOW_DISSECTOR_CONTINUE.
Also, legacy test_flow_dissector.sh was similarly augmented.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-5-shmulik.ladkani@gmail.com
Formerly, a boolean denoting whether bpf_flow_dissect returned BPF_OK
was set into 'bpf_attr.test.retval'.
Augment this, so users can check the actual return code of the dissector
program under test.
Existing prog_tests/flow_dissector*.c tests were correspondingly changed
to check against each test's expected retval.
Also, tests' resulting 'flow_keys' are verified only in case the expected
retval is BPF_OK. This allows adding new tests that expect non BPF_OK.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-4-shmulik.ladkani@gmail.com
Trampoline is not supported in s390.
Fixes: 31123c0360e0 ("selftests/bpf: bpf_setsockopt tests")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220819192155.91713-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There is a spelling mistake in an ASSERT_OK literal string. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Mykola Lysenko <mykolal@fb.com>
Link: https://lore.kernel.org/r/20220817213242.101277-1-colin.i.king@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds tests to exercise optnames that are allowed
in bpf_setsockopt().
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061847.4182339-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko says:
====================
bpf-next 2022-08-17
We've added 45 non-merge commits during the last 14 day(s) which contain
a total of 61 files changed, 986 insertions(+), 372 deletions(-).
The main changes are:
1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt
Kanzenbach and Jesper Dangaard Brouer.
2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko.
3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov.
4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires.
5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle
unsupported names on old kernels, from Hangbin Liu.
6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton,
from Hao Luo.
7) Relax libbpf's requirement for shared libs to be marked executable, from
Henqgi Chen.
8) Improve bpf_iter internals handling of error returns, from Hao Luo.
9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard.
10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong.
11) bpftool improvements to handle full BPF program names better, from Manu
Bretelle.
12) bpftool fixes around libcap use, from Quentin Monnet.
13) BPF map internals clean ups and improvements around memory allocations,
from Yafang Shao.
14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup
iterator to work on cgroupv1, from Yosry Ahmed.
15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong.
16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel
Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
selftests/bpf: Few fixes for selftests/bpf built in release mode
libbpf: Clean up deprecated and legacy aliases
libbpf: Streamline bpf_attr and perf_event_attr initialization
libbpf: Fix potential NULL dereference when parsing ELF
selftests/bpf: Tests libbpf autoattach APIs
libbpf: Allows disabling auto attach
selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm
libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
selftests/bpf: Update CI kconfig
selftests/bpf: Add connmark read test
selftests/bpf: Add existing connection bpf_*_ct_lookup() test
bpftool: Clear errno after libcap's checks
bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
bpftool: Fix a typo in a comment
libbpf: Add names for auxiliary maps
bpf: Use bpf_map_area_alloc consistently on bpf map creation
bpf: Make __GFP_NOWARN consistent in bpf map creation
bpf: Use bpf_map_area_free instread of kvfree
bpf: Remove unneeded memset in queue_stack_map creation
libbpf: preserve errno across pr_warn/pr_info/pr_debug
...
====================
Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix few issues found when building and running test_progs in
release mode.
First, potentially uninitialized idx variable in xskxceiver,
force-initialize to zero to satisfy compiler.
Few instances of defining uprobe trigger functions break in release mode
unless marked as noinline, due to being static. Add noinline to make
sure everything works.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-5-andrii@kernel.org
Use SYS_PREFIX macro from bpf_misc.h instead of hard-coded '__x64_'
prefix for sys_setdomainname attach point in lsm test.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220816055231.717006-1-asavkov@redhat.com
Test that the prog can read from the connection mark. This test is nice
because it ensures progs can interact with netfilter subsystem
correctly.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/d3bc620a491e4c626c20d80631063922cbe13e2b.1660254747.git.dxu@dxuuu.xyz
Add a test where we do a conntrack lookup on an existing connection.
This is nice because it's a more realistic test than artifically
creating a ct entry and looking it up afterwards.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/de5a617832f38f8b5631cc87e2a836da7c94d497.1660254747.git.dxu@dxuuu.xyz
Add test to validate the overwrite of sock local storage map value in
map iterator and another one to ensure out-of-bound value writing is
rejected.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-9-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
After closing both related link fd and map fd, reading the map
iterator fd to ensure it is OK to do so.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test checking that programs calling destructive kfuncs can only do
so if they have CAP_SYS_BOOT capabilities.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Link: https://lore.kernel.org/r/20220810065905.475418-4-asavkov@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Poll test case was not testing all the functionality of the poll feature
in the test suite. This patch updates the poll test case which contains 2
test cases to test the RX and the TX poll functionality and additional 2
more test cases to check the timeout feature of the poll event.
Poll test suite has 4 test cases:
1. TEST_TYPE_RX_POLL: Check if RX path POLLIN function works as expect.
TX path can use any method to send the traffic.
2. TEST_TYPE_TX_POLL: Check if TX path POLLOUT function works as expect.
RX path can use any method to receive the traffic.
3. TEST_TYPE_POLL_RXQ_EMPTY: Call poll function with parameter POLLIN on
empty RX queue will cause timeout. If timeout then test case passes.
4. TEST_TYPE_POLL_TXQ_FULL: When TX queue is filled and packets are not
cleaned by the kernel then if we invoke the poll function with POLLOUT
it should trigger timeout.
Signed-off-by: Shibin Koikkara Reeny <shibin.koikkara.reeny@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20220803144354.98122-1-shibin.koikkara.reeny@intel.com
Add a regression test to check against invalid check_and_init_map_value
call inside prealloc_lru_pop.
The kptr should not be reset to NULL once we set it after deleting the
map element. Hence, we trigger a program that updates the element
causing its reuse, and checks whether the unref kptr is reset or not.
If it is, prealloc_lru_pop does an incorrect check_and_init_map_value
call and the test fails.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220809213033.24147-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add an additional test, "data_slice_use_after_release2", for ensuring
that data slices are correctly invalidated by the verifier after the
dynptr whose ref obj id they track is released. In particular, this
tests data slice invalidation for dynptrs located at a non-zero offset
from the frame pointer.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220809214055.4050604-2-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Before, you could see the following errors:
$ ./vmtest.sh -j
./vmtest.sh: option requires an argument -- j
./vmtest.sh: line 357: OPTARG: unbound variable
$ ./vmtest.sh -z
./vmtest.sh: illegal option -- z
./vmtest.sh: line 357: OPTARG: unbound variable
Fix by adding ':' as first character of optstring. Reason is that getopts
requires ':' as the first character for OPTARG to be set in the `?` and `:`
error cases.
Note that the ':' as the first character of the optstring switches getopts
to silent mode. The desire to run in this mode seems to have been there all
along, as the script takes care of reporting errors.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/0f93b56198328b6b4da7b4cf4662d05c3edb5fd2.1660064925.git.dxu@dxuuu.xyz
Set the exit trap only after argument parsing is done. This way argument
parse failure or `-h` will not require sudo.
Reasoning is that it's confusing that a help message would require root
access.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/6a802aa37758e5a7e6aa5de294634f5518005e2b.1660064925.git.dxu@dxuuu.xyz
Add BPF-helper test case for CLOCK_TAI access. The added test verifies that:
* Timestamps are generated
* Timestamps are moving forward
* Timestamps are reasonable
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220809060803.5773-3-kurt@linutronix.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Apparently, no existing selftest covers it. Add a new one where
we load cgroup/bind4 program and attach fentry to it. Calling
bpf_obj_get_info_by_fd on the fentry program should return non-zero
btf_id/btf_obj_id instead of crashing the kernel.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220804201140.1340684-2-sdf@google.com
This patch cleans up a few things:
* dynptr_fail.c:
There is no sys_nanosleep tracepoint. dynptr_fail only tests
that the prog load fails, so just SEC("?raw_tp") suffices here.
* test_bpf_cookie:
There is no sys_nanosleep kprobe. The prog is loaded in
userspace through bpf_program__attach_kprobe_opts passing in
SYS_NANOSLEEP_KPROBE_NAME, so just SEC("k{ret}probe") suffices here.
* test_helper_restricted:
There is no sys_nanosleep kprobe. test_helper_restricted only tests
that the prog load fails, so just SEC("?kprobe")( suffices here.
There are no functional changes.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220805171405.2272103-1-joannelkoong@gmail.com
The send_signal/send_signal_tracepoint is pretty flaky, with at least
one failure in every ten runs on a few attempts I've tried it:
> test_send_signal_common:PASS:pipe_c2p 0 nsec
> test_send_signal_common:PASS:pipe_p2c 0 nsec
> test_send_signal_common:PASS:fork 0 nsec
> test_send_signal_common:PASS:skel_open_and_load 0 nsec
> test_send_signal_common:PASS:skel_attach 0 nsec
> test_send_signal_common:PASS:pipe_read 0 nsec
> test_send_signal_common:PASS:pipe_write 0 nsec
> test_send_signal_common:PASS:reading pipe 0 nsec
> test_send_signal_common:PASS:reading pipe error: size 0 0 nsec
> test_send_signal_common:FAIL:incorrect result unexpected incorrect result: actual 48 != expected 50
> test_send_signal_common:PASS:pipe_write 0 nsec
> #139/1 send_signal/send_signal_tracepoint:FAIL
The reason does not appear to be a correctness issue in the strict
sense. Rather, we merely do not receive the signal we are waiting for
within the provided timeout.
Let's bump the timeout by a factor of ten. With that change I have not
been able to reproduce the failure in 150+ iterations. I am also sneaking
in a small simplification to the test_progs test selection logic.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220727182955.4044988-1-deso@posteo.net