linux

iv/linux

Author	SHA1	Message	Date
Martin KaFai Lau	31de4105f0	bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev	2023-02-17 22:12:04 +01:00
Pu Lehui	49b5e77ae3	riscv, bpf: Add bpf trampoline support for RV64 BPF trampoline is the critical infrastructure of the BPF subsystem, acting as a mediator between kernel functions and BPF programs. Numerous important features, such as using BPF program for zero overhead kernel introspection, rely on this key component. We can't wait to support bpf trampoline on RV64. The related tests have passed, as well as the test_verifier with no new failure ceses. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-5-pulehui@huaweicloud.com	2023-02-17 21:45:30 +01:00
Pu Lehui	596f2e6f9c	riscv, bpf: Add bpf_arch_text_poke support for RV64 Implement bpf_arch_text_poke for RV64. For call scenario, to make BPF trampoline compatible with the kernel and BPF context, we follow the framework of RV64 ftrace to reserve 4 nops for BPF programs as function entry, and use auipc+jalr instructions for function call. However, since auipc+jalr call instruction is non-atomic operation, we need to use stop-machine to make sure instructions patching in atomic context. Also, we use auipc+jalr pair and need to patch in stop-machine context for jump scenario. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-4-pulehui@huaweicloud.com	2023-02-17 21:45:30 +01:00
Pu Lehui	0fd1fd0104	riscv, bpf: Factor out emit_call for kernel and bpf context The current emit_call function is not suitable for kernel function call as it store return value to bpf R0 register. We can separate it out for common use. Meanwhile, simplify judgment logic, that is, fixed function address can use jal or auipc+jalr, while the unfixed can use only auipc+jalr. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-3-pulehui@huaweicloud.com	2023-02-17 21:45:30 +01:00
Pu Lehui	5e57fb7b0b	riscv: Extend patch_text for multiple instructions Extend patch_text for multiple instructions. This is the preparaiton for multiple instructions text patching in riscv BPF trampoline, and may be useful for other scenario. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-2-pulehui@huaweicloud.com	2023-02-17 21:45:30 +01:00
Martin KaFai Lau	181127fb76	Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" This reverts commit 6c20822fada1b8adb77fa450d03a0d449686a4a9. build bot failed on arch with different cache line size: https://lore.kernel.org/bpf/50c35055-afa9-d01e-9a05-ea5351280e4f@intel.com/ Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-02-17 12:24:33 -08:00
Andrii Nakryiko	e2b5cfc978	selftests/bpf: Add global subprog context passing tests Add tests validating that it's possible to pass context arguments into global subprogs for various types of programs, including a particularly tricky KPROBE programs (which cover kprobes, uprobes, USDTs, a vast and important class of programs). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-4-andrii@kernel.org	2023-02-17 21:21:50 +01:00
Andrii Nakryiko	95ebb37617	selftests/bpf: Convert test_global_funcs test to test_loader framework Convert 17 test_global_funcs subtests into test_loader framework for easier maintenance and more declarative way to define expected failures/successes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-3-andrii@kernel.org	2023-02-17 21:20:44 +01:00
Andrii Nakryiko	d384dce281	bpf: Fix global subprog context argument resolution logic KPROBE program's user-facing context type is defined as typedef bpf_user_pt_regs_t. This leads to a problem when trying to passing kprobe/uprobe/usdt context argument into global subprog, as kernel always strip away mods and typedefs of user-supplied type, but takes expected type from bpf_ctx_convert as is, which causes mismatch. Current way to work around this is to define a fake struct with the same name as expected typedef: struct bpf_user_pt_regs_t {}; __noinline my_global_subprog(struct bpf_user_pt_regs_t *ctx) { ... } This patch fixes the issue by resolving expected type, if it's not a struct. It still leaves the above work-around working for backwards compatibility. Fixes: 91cc1a99740e ("bpf: Annotate context types") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-2-andrii@kernel.org	2023-02-17 21:20:44 +01:00
Hengqi Chen	64f50f6575	LoongArch, bpf: Use 4 instructions for function address in JIT This patch fixes the following issue of function calls in JIT, like: [ 29.346981] multi-func JIT bug 105 != 103 The issus can be reproduced by running the "inline simple bpf_loop call" verifier test. This is because we are emiting 2-4 instructions for 64-bit immediate moves. During the first pass of JIT, the placeholder address is zero, emiting two instructions for it. In the extra pass, the function address is in XKVRANGE, emiting four instructions for it. This change the instruction index in JIT context. Let's always use 4 instructions for function address in JIT. So that the instruction sequences don't change between the first pass and the extra pass for function calls. Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support") Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/bpf/20230214152633.2265699-1-hengqi.chen@gmail.com	2023-02-17 17:43:07 +01:00
Martin KaFai Lau	1fe4850b34	bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state The bpf_fib_lookup() helper does not only look up the fib (ie. route) but it also looks up the neigh. Before returning the neigh, the helper does not check for NUD_VALID. When a neigh state (neigh->nud_state) is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper still returns SUCCESS instead of NO_NEIGH in this case. Because of the SUCCESS return value, the bpf prog directly uses the returned dmac and ends up filling all zero in the eth header. This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is not valid. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev	2023-02-17 16:19:42 +01:00
Martin KaFai Lau	af2d0d09ea	bpf: Disable bh in bpf_test_run for xdp and tc prog Some of the bpf helpers require bh disabled. eg. The bpf_fib_lookup helper that will be used in a latter selftest. In particular, it calls ___neigh_lookup_noref that expects the bh disabled. This patch disables bh before calling bpf_prog_run[_xdp], so the testing prog can also use those helpers. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217004150.2980689-2-martin.lau@linux.dev	2023-02-17 16:19:23 +01:00
Maciej Fijalkowski	1596dae2f1	xsk: check IFF_UP earlier in Tx path Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These two paths share a call to common function xsk_xmit() which has two sanity checks within. A pseudo code example to show the two paths: __xsk_sendmsg() : xsk_poll(): if (unlikely(!xsk_is_bound(xs))) if (unlikely(!xsk_is_bound(xs))) return -ENXIO; return mask; if (unlikely(need_wait)) (...) return -EOPNOTSUPP; xsk_xmit() mark napi id (...) xsk_xmit() xsk_xmit(): if (unlikely(!(xs->dev->flags & IFF_UP))) return -ENETDOWN; if (unlikely(!xs->tx)) return -ENOBUFS; As it can be observed above, in sendmsg() napi id can be marked on interface that was not brought up and this causes a NULL ptr dereference: [31757.505631] BUG: kernel NULL pointer dereference, address: 0000000000000018 [31757.512710] #PF: supervisor read access in kernel mode [31757.517936] #PF: error_code(0x0000) - not-present page [31757.523149] PGD 0 P4D 0 [31757.525726] Oops: 0000 [#1] PREEMPT SMP NOPTI [31757.530154] CPU: 26 PID: 95641 Comm: xdpsock Not tainted 6.2.0-rc5+ #40 [31757.536871] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [31757.547457] RIP: 0010:xsk_sendmsg+0xde/0x180 [31757.551799] Code: 00 75 a2 48 8b 00 a8 04 75 9b 84 d2 74 69 8b 85 14 01 00 00 85 c0 75 1b 48 8b 85 28 03 00 00 48 8b 80 98 00 00 00 48 8b 40 20 <8b> 40 18 89 85 14 01 00 00 8b bd 14 01 00 00 81 ff 00 01 00 00 0f [31757.570840] RSP: 0018:ffffc90034f27dc0 EFLAGS: 00010246 [31757.576143] RAX: 0000000000000000 RBX: ffffc90034f27e18 RCX: 0000000000000000 [31757.583389] RDX: 0000000000000001 RSI: ffffc90034f27e18 RDI: ffff88984cf3c100 [31757.590631] RBP: ffff88984714a800 R08: ffff88984714a800 R09: 0000000000000000 [31757.597877] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffa [31757.605123] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000 [31757.612364] FS: 00007fb4c5931180(0000) GS:ffff88afdfa00000(0000) knlGS:0000000000000000 [31757.620571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [31757.626406] CR2: 0000000000000018 CR3: 000000184b41c003 CR4: 00000000007706e0 [31757.633648] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [31757.640894] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [31757.648139] PKRU: 55555554 [31757.650894] Call Trace: [31757.653385] <TASK> [31757.655524] sock_sendmsg+0x8f/0xa0 [31757.659077] ? sockfd_lookup_light+0x12/0x70 [31757.663416] __sys_sendto+0xfc/0x170 [31757.667051] ? do_sched_setscheduler+0xdb/0x1b0 [31757.671658] __x64_sys_sendto+0x20/0x30 [31757.675557] do_syscall_64+0x38/0x90 [31757.679197] entry_SYSCALL_64_after_hwframe+0x72/0xdc [31757.687969] Code: 8e f6 ff 44 8b 4c 24 2c 4c 8b 44 24 20 41 89 c4 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 3a 44 89 e7 48 89 44 24 08 e8 b5 8e f6 ff 48 [31757.707007] RSP: 002b:00007ffd49c73c70 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [31757.714694] RAX: ffffffffffffffda RBX: 000055a996565380 RCX: 00007fb4c5727c16 [31757.721939] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 [31757.729184] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000000 [31757.736429] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000 [31757.743673] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [31757.754940] </TASK> To fix this, let's make xsk_xmit a function that will be responsible for generic Tx, where RCU is handled accordingly and pull out sanity checks and xs->zc handling. Populate sanity checks to __xsk_sendmsg() and xsk_poll(). Fixes: ca2e1a627035 ("xsk: Mark napi_id on sendmsg()") Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230215143309.13145-1-maciej.fijalkowski@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2023-02-17 16:05:28 +01:00
Taichi Nishimura	df71a42cc3	Fix typos in selftest/bpf files Run spell checker on files in selftest/bpf and fixed typos. Signed-off-by: Taichi Nishimura <awkrail01@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/bpf/20230216085537.519062-1-awkrail01@gmail.com	2023-02-16 16:56:17 -08:00
Ilya Leoshkevich	c5a237a4db	selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() Use the new type-safe wrappers around bpf_obj_get_info_by_fd(). Fix a prog/map mixup in prog_holds_map(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230214231221.249277-6-iii@linux.ibm.com	2023-02-16 15:32:46 -08:00
Ilya Leoshkevich	c0ca277bb8	samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() Use the new type-safe wrappers around bpf_obj_get_info_by_fd(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230214231221.249277-5-iii@linux.ibm.com	2023-02-16 15:32:45 -08:00
Ilya Leoshkevich	38f0408ef7	bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd() Use the new type-safe wrappers around bpf_obj_get_info_by_fd(). Split the bpf_obj_get_info_by_fd() call in build_btf_type_table() in two, since knowing the type helps with the Memory Sanitizer. Improve map_parse_fd_and_info() type safety by using struct bpf_map_info * instead of void * for info. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20230214231221.249277-4-iii@linux.ibm.com	2023-02-16 15:32:45 -08:00
Ilya Leoshkevich	629dfc660c	libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() Use the new type-safe wrappers around bpf_obj_get_info_by_fd(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230214231221.249277-3-iii@linux.ibm.com	2023-02-16 15:32:45 -08:00
Ilya Leoshkevich	55a9ed0e16	libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd() These are type-safe wrappers around bpf_obj_get_info_by_fd(). They found one problem in selftests, and are also useful for adding Memory Sanitizer annotations. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230214231221.249277-2-iii@linux.ibm.com	2023-02-16 15:32:36 -08:00
Alexander Lobakin	6c20822fad	bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES &xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed before the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-02-15 17:39:36 -08:00
Andrii Nakryiko	d964f09af4	Merge branch 'New benchmark for hashmap lookups' Anton Protopopov says: ==================== Add a new benchmark for hashmap lookups and fix several typos. In commit 3 I've patched the bench utility so that now command line options can be reused by different benchmarks. The benchmark itself is added in the last commit 7. I was using this benchmark to test map lookup productivity when using a different hash function [1]. When run with --quiet, the results can be easily plotted [2]. The results provided by the benchmark look reasonable and match the results of my different benchmarks (requiring to patch kernel to get actual statistics on map lookups). Links: [1] https://fosdem.org/2023/schedule/event/bpf_hashing/ [2] https://github.com/aspsk/bpf-bench/tree/master/hashmap-bench Changes, v1->v2: - percpu_times_index[] is of wrong size (Martin) - use base 0 for strtol (Andrii) - just use -q without argument (Andrii) - use less hacks when parsing arguments (Andrii) ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-02-15 16:29:32 -08:00
Anton Protopopov	f371f2dc53	selftest/bpf/benchs: Add benchmark for hashmap lookups Add a new benchmark which measures hashmap lookup operations speed. A user can control the following parameters of the benchmark: * key_size (max 1024): the key size to use * max_entries: the hashmap max entries * nr_entries: the number of entries to insert/lookup * nr_loops: the number of loops for the benchmark * map_flags The hashmap flags passed to BPF_MAP_CREATE The BPF program performing the benchmarks calls two nested bpf_loop: bpf_loop(nr_loops/nr_entries) bpf_loop(nr_entries) bpf_map_lookup() So the nr_loops determines the number of actual map lookups. All lookups are successful. Example (the output is generated on a AMD Ryzen 9 3950X machine): for nr_entries in `seq 4096 4096 65536`; do echo -n "$((nr_entries*100/65536))% full: "; sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=4 --nr_entries=$nr_entries --max_entries=65536 --nr_loops=1000000 --map_flags=0x40 \| grep cpu; done 6% full: cpu01: lookup 50.739M ± 0.018M events/sec (approximated from 32 samples of ~19ms) 12% full: cpu01: lookup 47.751M ± 0.015M events/sec (approximated from 32 samples of ~20ms) 18% full: cpu01: lookup 45.153M ± 0.013M events/sec (approximated from 32 samples of ~22ms) 25% full: cpu01: lookup 43.826M ± 0.014M events/sec (approximated from 32 samples of ~22ms) 31% full: cpu01: lookup 41.971M ± 0.012M events/sec (approximated from 32 samples of ~23ms) 37% full: cpu01: lookup 41.034M ± 0.015M events/sec (approximated from 32 samples of ~24ms) 43% full: cpu01: lookup 39.946M ± 0.012M events/sec (approximated from 32 samples of ~25ms) 50% full: cpu01: lookup 38.256M ± 0.014M events/sec (approximated from 32 samples of ~26ms) 56% full: cpu01: lookup 36.580M ± 0.018M events/sec (approximated from 32 samples of ~27ms) 62% full: cpu01: lookup 36.252M ± 0.012M events/sec (approximated from 32 samples of ~27ms) 68% full: cpu01: lookup 35.200M ± 0.012M events/sec (approximated from 32 samples of ~28ms) 75% full: cpu01: lookup 34.061M ± 0.009M events/sec (approximated from 32 samples of ~29ms) 81% full: cpu01: lookup 34.374M ± 0.010M events/sec (approximated from 32 samples of ~29ms) 87% full: cpu01: lookup 33.244M ± 0.011M events/sec (approximated from 32 samples of ~30ms) 93% full: cpu01: lookup 32.182M ± 0.013M events/sec (approximated from 32 samples of ~31ms) 100% full: cpu01: lookup 31.497M ± 0.016M events/sec (approximated from 32 samples of ~31ms) Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-8-aspsk@isovalent.com	2023-02-15 16:29:31 -08:00
Anton Protopopov	a237dda05e	selftest/bpf/benchs: Print less if the quiet option is set The bench utility will print Setting up benchmark '<bench-name>'... Benchmark '<bench-name>' started. on startup to stdout. Suppress this output if --quiet option if given. This makes it simpler to parse benchmark output by a script. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-7-aspsk@isovalent.com	2023-02-15 16:29:31 -08:00
Anton Protopopov	90c22503cd	selftest/bpf/benchs: Make quiet option common The "local-storage-tasks-trace" benchmark has a `--quiet` option. Move it to the list of common options, so that the main code and other benchmarks can use (new) env.quiet variable. Patch the run_bench_local_storage_rcu_tasks_trace.sh helper script accordingly. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-6-aspsk@isovalent.com	2023-02-15 16:29:31 -08:00
Anton Protopopov	9644546260	selftest/bpf/benchs: Remove an unused header The benchs/bench_bpf_hashmap_full_update.c doesn't set a custom argp, so it shouldn't include the <argp.h> header. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-5-aspsk@isovalent.com	2023-02-15 16:29:31 -08:00
Anton Protopopov	22ff7aeaa9	selftest/bpf/benchs: Enhance argp parsing To parse command line the bench utility uses the argp_parse() function. This function takes as an argument a parent 'struct argp' structure which defines common command line options and an array of children 'struct argp' structures which defines additional command line options for particular benchmarks. This implementation doesn't allow benchmarks to share option names, e.g., if two benchmarks want to use, say, the --option option, then only one of them will succeed (the first one encountered in the array). This will be convenient if same option names could be used in different benchmarks (with the same semantics, e.g., --nr_loops=N). Fix this by calling the argp_parse() function twice. The first call is the same as it was before, with all children argps, and helps to find the benchmark name and to print a combined help message if anything is wrong. Given the name, we can call the argp_parse the second time, but now the children array points only to a correct benchmark thus always calling the correct parsers. (If there's no a specific list of arguments, then only one call to argp_parse will be done.) Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-4-aspsk@isovalent.com	2023-02-15 16:29:31 -08:00
Anton Protopopov	2f1c59637f	selftest/bpf/benchs: Make a function static in bpf_hashmap_full_update The hashmap_report_final callback function defined in the benchs/bench_bpf_hashmap_full_update.c file should be static. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-3-aspsk@isovalent.com	2023-02-15 16:29:31 -08:00
Anton Protopopov	4db98ab445	selftest/bpf/benchs: Fix a typo in bpf_hashmap_full_update To call the bpf_hashmap_full_update benchmark, one should say: bench bpf-hashmap-ful-update The patch adds a missing 'l' to the benchmark name. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-2-aspsk@isovalent.com	2023-02-15 16:29:31 -08:00
Alexei Starovoitov	3538a0fbbd	Merge branch 'Use __GFP_ZERO in bpf memory allocator' Hou Tao says: ==================== From: Hou Tao <houtao1@huawei.com> Hi, The patchset tries to fix the hard-up problem found when checking how htab handles element reuse in bpf memory allocator. The immediate reuse of freed elements will reinitialize special fields (e.g., bpf_spin_lock) in htab map value and it may corrupt lookup procedure with BFP_F_LOCK flag which acquires bpf-spin-lock during value copying, and lead to hard-lock as shown in patch #2. Patch #1 fixes it by using __GFP_ZERO when allocating the object from slab and the behavior is similar with the preallocated hash-table case. Please see individual patches for more details. And comments are always welcome. Regards, Change Log: v1: * Use __GFP_ZERO instead of ctor to avoid retpoline overhead (from Alexei) * Add comments for check_and_init_map_value() (from Alexei) * split __GFP_ZERO patches out of the original patchset to unblock the development work of others. RFC: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 15:40:06 -08:00
Hou Tao	f88da2d46c	selftests/bpf: Add test case for element reuse in htab map The reinitialization of spin-lock in map value after immediate reuse may corrupt lookup with BPF_F_LOCK flag and result in hard lock-up, so add one test case to demonstrate the problem. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230215082132.3856544-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 15:40:06 -08:00
Hou Tao	997849c4b9	bpf: Zeroing allocated object from slab in bpf memory allocator Currently the freed element in bpf memory allocator may be immediately reused, for htab map the reuse will reinitialize special fields in map value (e.g., bpf_spin_lock), but lookup procedure may still access these special fields, and it may lead to hard-lockup as shown below: NMI backtrace for cpu 16 CPU: 16 PID: 2574 Comm: htab.bin Tainted: G L 6.1.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), RIP: 0010:queued_spin_lock_slowpath+0x283/0x2c0 ...... Call Trace: <TASK> copy_map_value_locked+0xb7/0x170 bpf_map_copy_value+0x113/0x3c0 __sys_bpf+0x1c67/0x2780 __x64_sys_bpf+0x1c/0x20 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 ...... </TASK> For htab map, just like the preallocated case, these is no need to initialize these special fields in map value again once these fields have been initialized. For preallocated htab map, these fields are initialized through __GFP_ZERO in bpf_map_area_alloc(), so do the similar thing for non-preallocated htab in bpf memory allocator. And there is no need to use __GFP_ZERO for per-cpu bpf memory allocator, because __alloc_percpu_gfp() does it implicitly. Fixes: 0fd7c5d43339 ("bpf: Optimize call_rcu in non-preallocated hash map.") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230215082132.3856544-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 15:40:06 -08:00
Alexei Starovoitov	b2d9002ee9	Merge branch 'Improvements for BPF_ST tracking by verifier ' Eduard Zingerman says: ==================== This patch-set is a part of preparation work for -mcpu=v4 option for BPF C compiler (discussed in [1]). Among other things -mcpu=v4 should enable generation of BPF_ST instruction by the compiler. - Patches #1,2 adjust verifier to track values of constants written to stack using BPF_ST. Currently these are tracked imprecisely, unlike the writes using BPF_STX, e.g.: fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm after such instruction, where m stands for "misc", just a note that something is written at fp[-8]. r1 = 42; verifier tracks r1=42 after this instruction. fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. This patch makes both cases equivalent. - Patches #3,4 adjust verifier.c:check_stack_write_fixed_off() to preserve STACK_ZERO marks when BPF_ST writes zero. Currently these are replaced by STACK_MISC, unlike zero writes using BPF_STX, e.g.: ... stack range [X,Y] is marked as STACK_ZERO ... r0 = ... variable offset pointer to stack with range [X,Y] ... fp[r0] = 0; currently verifier marks range [X,Y] as STACK_MISC for such instructions. r1 = 0; fp[r0] = r1; verifier keeps STACK_ZERO marks for range [X,Y]. This patch makes both cases equivalent. Motivating example for patch #1 could be found at [3]. Previous version of the patch-set is here [2], the changes are: - Explicit initialization of fake register parent link is removed from verifier.c:check_stack_write_fixed_off() as parent links are now correctly handled by verifier.c:save_register_state(). - Original patch #1 is split in patches #1 & #3. - Missing test case added for patch #3 verifier.c:check_stack_write_fixed_off() adjustment. - Test cases are updated to use .prog_type = BPF_PROG_TYPE_SK_LOOKUP, which requires return value to be in the range [0,1] (original test cases assumed that such range is always required, which is not true). - Original patch #3 with changes allowing BPF_ST writes to context is withheld for now, w/o compiler support for BPF_ST it requires some creative testing. - Original patch #5 is removed from the patch-set. This patch contained adjustments to expected verifier error messages in some tests, necessary when C compiler generates BPF_ST instruction instead of BPF_STX (changes to expected instruction indices). These changes are not necessary yet. [1] https://lore.kernel.org/bpf/01515302-c37d-2ee5-c950-2f556a4caad0@meta.com/ [2] https://lore.kernel.org/bpf/20221231163122.1360813-1-eddyz87@gmail.com/ [3] https://lore.kernel.org/bpf/f1e4282bf00aa21a72fc5906f8c3be1ae6c94a5e.camel@gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 11:48:48 -08:00
Eduard Zingerman	2a33c5a25e	selftests/bpf: check if BPF_ST with variable offset preserves STACK_ZERO A test case to verify that variable offset BPF_ST instruction preserves STACK_ZERO marks when writes zeros, e.g. in the following situation: (u64)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now variable offset pointer to stack (u8)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for ; fp[-8] should be preserved. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 11:48:48 -08:00
Eduard Zingerman	31ff213512	bpf: BPF_ST with variable offset should preserve STACK_ZERO marks BPF_STX instruction preserves STACK_ZERO marks for variable offset writes in situations like below: (u64)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now a variable offset pointer to stack r1 = 0 (u8)(r0) = r1 ; BPF_STX writing zero, STACK_ZERO mark for ; fp[-8] is preserved This commit updates verifier.c:check_stack_write_var_off() to process BPF_ST in a similar manner, e.g. the following example: (u64)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now variable offset pointer to stack (u8)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for ; fp[-8] is preserved Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 11:48:47 -08:00
Eduard Zingerman	1a24af65bb	selftests/bpf: check if verifier tracks constants spilled by BPF_ST_MEM Check that verifier tracks the value of 'imm' spilled to stack by BPF_ST_MEM instruction. Cover the following cases: - write of non-zero constant to stack; - write of a zero constant to stack. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 11:48:47 -08:00
Eduard Zingerman	ecdf985d76	bpf: track immediate values written to stack by BPF_ST instruction For aligned stack writes using BPF_ST instruction track stored values in a same way BPF_STX is handled, e.g. make sure that the following commands produce similar verifier knowledge: fp[-8] = 42; r1 = 42; fp[-8] = r1; This covers two cases: - non-null values written to stack are stored as spill of fake registers; - null values written to stack are stored as STACK_ZERO marks. Previously both cases above used STACK_MISC marks instead. Some verifier test cases relied on the old logic to obtain STACK_MISC marks for some stack values. These test cases are updated in the same commit to avoid failures during bisect. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 11:48:47 -08:00
Alexei Starovoitov	62d101d5f4	selftests/bpf: Fix map_kptr test. The compiler is optimizing out majority of unref_ptr read/writes, so the test wasn't testing much. For example, one could delete '__kptr' tag from 'struct prog_test_ref_kfunc __kptr *unref_ptr;' and the test would still "pass". Convert it to volatile stores. Confirmed by comparing bpf asm before/after. Fixes: 2cbc469a6fc3 ("selftests/bpf: Add C tests for kptr") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230214235051.22938-1-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-02-15 09:24:59 -08:00
Björn Töpel	5e53e5c7ed	selftests/bpf: Cross-compile bpftool When the BPF selftests are cross-compiled, only the a host version of bpftool is built. This version of bpftool is used on the host-side to generate various intermediates, e.g., skeletons. The test runners are also using bpftool, so the Makefile will symlink bpftool from the selftest/bpf root, where the test runners will look the tool: \| $(Q)ln -sf $(if $2,..,.)/tools/build/bpftool/bootstrap/bpftool \ \| $(OUTPUT)/$(if $2,$2/)bpftool There are two problems for cross-compilation builds: 1. There is no native (cross-compilation target) of bpftool 2. The bootstrap/bpftool is never cross-compiled (by design) Make sure that a native/cross-compiled version of bpftool is built, and if CROSS_COMPILE is set, symlink the native/non-bootstrap version. Acked-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230214161253.183458-1-bjorn@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 08:50:20 -08:00
David Vernet	213aacb8a2	bpf, docs: Add myself to BPF docs MAINTAINERS entry In commit 7e2a9ebe8126 ("docs, bpf: Ensure IETF's BPF mailing list gets copied for ISA doc changes"), a new MAINTAINERS entry was added for any BPF IETF documentation updates for the ongoing standardization process. I've been making it a point to try and review as many BPF documentation patches as possible, and have made a committment to Alexei to consistently review BPF standardization patches going forward. This patch adds my name as a reviewer to the MAINTAINERS entry for the standardization effort. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230214223553.78353-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 08:48:43 -08:00
Tiezhu Yang	524581d121	selftests/bpf: Fix build error for LoongArch There exists build error when make -C tools/testing/selftests/bpf/ on LoongArch: BINARY test_verifier In file included from test_verifier.c:27: tools/include/uapi/linux/bpf_perf_event.h:14:28: error: field 'regs' has incomplete type 14 \| bpf_user_pt_regs_t regs; \| ^~~~ make: *** [Makefile:577: tools/testing/selftests/bpf/test_verifier] Error 1 make: Leaving directory 'tools/testing/selftests/bpf' Add missing uapi header for LoongArch to use the following definition: typedef struct user_pt_regs bpf_user_pt_regs_t; Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/1676458867-22052-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 08:47:53 -08:00
Bagas Sanjaya	e2d323a1f0	Documentation: bpf: Add missing line break separator in node_data struct code block Stephen Rothwell reported htmldocs warning when merging bpf-next tree, which was the same warning as reported by kernel test robot: Documentation/bpf/graph_ds_impl.rst:62: ERROR: Error in "code-block" directive: maximum 1 argument(s) allowed, 12 supplied. The error is due to Sphinx confuses node_data struct declaration with code-block directive option. Fix the warning by separating the code-block marker with node_data struct declaration. Link: https://lore.kernel.org/linux-next/20230215144505.4751d823@canb.auug.org.au/ Link: https://lore.kernel.org/linux-doc/202302151123.wUE5FYFx-lkp@intel.com/ Fixes: c31315c3aa0929 ("bpf, documentation: Add graph documentation for non-owning refs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20230215123253.41552-3-bagasdotme@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-15 08:43:51 -08:00
Alexei Starovoitov	1f5dfcc78a	Revert "bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25" This reverts commit 0243d3dfe274832aa0a16214499c208122345173. pahole 1.25 is too aggressive removing functions. With clang compiled kernel the following is seen: WARN: resolve_btfids: unresolved symbol tcp_reno_cong_avoid WARN: resolve_btfids: unresolved symbol dctcp_update_alpha WARN: resolve_btfids: unresolved symbol cubictcp_cong_avoid WARN: resolve_btfids: unresolved symbol bpf_xdp_metadata_rx_timestamp WARN: resolve_btfids: unresolved symbol bpf_xdp_metadata_rx_hash WARN: resolve_btfids: unresolved symbol bpf_task_kptr_get WARN: resolve_btfids: unresolved symbol bpf_task_acquire_not_zero WARN: resolve_btfids: unresolved symbol bpf_rdonly_cast WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_static_unused_arg WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_ref WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_pass_ctx WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_pass2 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_pass1 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_mem_len_pass1 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_mem_len_fail2 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_mem_len_fail1 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_kptr_get WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_fail3 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_fail2 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test_acquire WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test2 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_test1 WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_memb_release WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_memb1_release WARN: resolve_btfids: unresolved symbol bpf_kfunc_call_int_mem_release Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 22:03:25 -08:00
Joanne Koong	50a7cedb15	selftests/bpf: Clean up dynptr prog_tests Clean up prog_tests/dynptr.c by removing the unneeded "expected_err_msg" in the dynptr_tests struct, which is a remnant from converting the fail tests cases to use the generic verification tester. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230214051332.4007131-2-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 21:42:29 -08:00
Joanne Koong	8032cad103	selftests/bpf: Clean up user_ringbuf, cgrp_kfunc, kfunc_dynptr_param tests Clean up user_ringbuf, cgrp_kfunc, and kfunc_dynptr_param tests to use the generic verification tester for checking verifier rejections. The generic verification tester uses btf_decl_tag-based annotations for verifying that the tests fail with the expected log messages. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: David Vernet <void@manifault.com> Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com> Link: https://lore.kernel.org/r/20230214051332.4007131-1-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 21:42:29 -08:00
Alexei Starovoitov	c8ea09974f	Merge branch 'BPF rbtree next-gen datastructure' Dave Marchevsky says: ==================== This series adds a rbtree datastructure following the "next-gen datastructure" precedent set by recently-added linked-list [0]. This is a reimplementation of previous rbtree RFC [1] to use kfunc + kptr instead of adding a new map type. This series adds a smaller set of API functions than that RFC - just the minimum needed to support current cgfifo example scheduler in ongoing sched_ext effort [2], namely: bpf_rbtree_add bpf_rbtree_remove bpf_rbtree_first The meat of this series is bugfixes and verifier infra work to support these API functions. Adding more rbtree kfuncs in future patches should be straightforward as a result. First, the series refactors and extends linked_list's release_on_unlock logic. The concept of "reference to node that was added to data structure" is formalized as "non-owning reference". From linked_list's perspective this non-owning reference after linked_list_push_{front,back} has same semantics as release_on_unlock, with the addition of writes to such references being valid in the critical section. Such references are no longer marked PTR_UNTRUSTED. Patches 2 and 13 go into more detail. The series then adds rbtree API kfuncs and necessary verifier support for them - namely support for callback args to kfuncs and some non-owning reference interactions that linked_list didn't need. BPF rbtree uses struct rb_root_cached + existing rbtree lib under the hood. From the BPF program writer's perspective, a BPF rbtree is very similar to existing linked list. Consider the following example: struct node_data { long key; long data; struct bpf_rb_node node; } static bool less(struct bpf_rb_node a, const struct bpf_rb_node b) { struct node_data node_a; struct node_data node_b; node_a = container_of(a, struct node_data, node); node_b = container_of(b, struct node_data, node); return node_a->key < node_b->key; } private(A) struct bpf_spin_lock glock; private(A) struct bpf_rb_root groot __contains(node_data, node); /* ... in BPF program / struct node_data n, m; struct bpf_rb_node res; n = bpf_obj_new(typeof(n)); if (!n) / skip / n->key = 5; n->data = 10; bpf_spin_lock(&glock); bpf_rbtree_add(&groot, &n->node, less); bpf_spin_unlock(&glock); bpf_spin_lock(&glock); res = bpf_rbtree_first(&groot); if (!res) / skip / res = bpf_rbtree_remove(&groot, res); if (!res) / skip / bpf_spin_unlock(&glock); m = container_of(res, struct node_data, node); bpf_obj_drop(m); Some obvious similarities: Special bpf_rb_root and bpf_rb_node types have same semantics as bpf_list_head and bpf_list_node, respectively * __contains is used to associated node type with root * The spin_lock associated with a rbtree must be held when using rbtree API kfuncs * Nodes are allocated via bpf_obj_new and dropped via bpf_obj_drop * Rbtree takes ownership of node lifetime when a node is added. Removing a node gives ownership back to the program, requiring a bpf_obj_drop before program exit Some new additions as well: * Support for callbacks in kfunc args is added to enable 'less' callback use above * bpf_rbtree_first is the first graph API function to return a non-owning reference instead of convering an arg from own->non-own * Because all references to nodes already added to the rbtree are non-owning, bpf_rbtree_remove must accept such a reference in order to remove it from the tree Summary of patches: Patches 1 - 5 implement the meat of rbtree-specific support in this series, gradually building up to implemented kfuncs that verify as expected. Patch 6 adds the bpf_rbtree_{add,first,remove} to bpf_experimental.h. Patch 7 adds tests, Patch 9 adds documentation. [0]: lore.kernel.org/bpf/20221118015614.2013203-1-memxor@gmail.com [1]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com [2]: lore.kernel.org/bpf/20221130082313.3241517-1-tj@kernel.org Changelog: v5 -> v6: lore.kernel.org/bpf/20230212092715.1422619-1-davemarchevsky@fb.com/ Patch #'s below refer to the patch's number in v5 unless otherwise stated. * General / Patch 1 * Rebase onto latest bpf-next: "bpf: Migrate release_on_unlock logic to non-owning ref semantics" * This was Patch 1 of v4, was applied, not included in v6 * Patch 3 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs" * Use bpf_callback_t instead of plain-C fn ptr for bpf_rbtree_add. This necessitated having bpf_rbtree_add duplicate rbtree_add's functionality. Wrapper function was used w/ internal __bpf_rbtree_add helper so that bpf_experimental.h proto could continue to use plain-C fn ptr so BPF progs could benefit from typechecking (Alexei) v4 -> v5: lore.kernel.org/bpf/20230209174144.3280955-1-davemarchevsky@fb.com/ Patch #'s below refer to the patch's number in v4 unless otherwise stated. * General * Rebase onto latest bpf-next: "Merge branch 'bpf, mm: introduce cgroup.memory=nobpf'" * Patches 1-3 are squashed into "bpf: Migrate release_on_unlock logic to non-owning ref semantics". * Added type_is_non_owning_ref helper (Alexei) * Use a NON_OWN_REF type flag instead of separate bool (Alexei) * Patch 8 - "bpf: Special verifier handling for bpf_rbtree_{remove, first}" * When doing btf_parse_fields, reject structs with both bpf_list_node and bpf_rb_node fields. This is a temporary measure that can be removed after "collection identity" followup. See comment added in btf_parse_fields for more detail (Kumar, Alexei) * Add linked_list BTF test exercising check added to btf_parse_fields * Minor changes and moving around of some reg type checks due to NON_OWN_REF type flag introduction * Patch 10 - "selftests/bpf: Add rbtree selftests" * Migrate failure tests to RUN_TESTS, __failure, __msg() framework (Alexei) v3 -> v4: lore.kernel.org/bpf/20230131180016.3368305-1-davemarchevsky@fb.com/ Patch #'s below refer to the patch's number in v3 unless otherwise stated. * General * Don't base this series on "bpf: Refactor release_regno searching logic", which was submitted separately as a refactor. * Rebase onto latest bpf-next: "samples/bpf: Add openat2() enter/exit tracepoint to syscall_tp sample" * Patch 2 - "bpf: Improve bpf_reg_state space usage for non-owning ref lock" * print_verifier_state change was adding redundant comma after "non_own_ref", fix it to put comma in correct place * invalidate_non_owning_refs no longer needs to take bpf_active_lock param, since any non-owning ref reg in env's cur_state is assumed to use that state's active_lock (Alexei) * invalidate_non_owning_refs' reg loop should check that the reg being inspected is a PTR_TO_BTF_ID before checking reg->non_owning_ref_lock, since that field is part of a union and may be filled w/ meaningless bytes if reg != PTR_TO_BTF_ID (Alexei) * Patch 3 - "selftests/bpf: Update linked_list tests for non-owning ref semantics" * Change the string searched for by the following tests: * linked_list/incorrect_node_off1 * linked_list/double_push_front * linked_list/double_push_back necessary due to rebase / dropping of "release_regno searching logic" patch (see "General" changes) * Patch 8 - "bpf: Special verifier handling for bpf_rbtree_{remove, first}" * Just call invalidate_non_owning_refs w/ env instead of env, lock. (see Patch 2 changes) * Patch 11 - "bpf, documentation: Add graph documentation for non-owning refs" * Fix documentation formatting and improve content (David) * v3's version of patch 11 was missing some changes, v4's patch 11 is still addressing David's feedback from v2 v2 -> v3: lore.kernel.org/bpf/20221217082506.1570898-1-davemarchevsky@fb.com/ Patch #'s below refer to the patch's number in v2 unless otherwise stated. * Patch 1 - "bpf: Support multiple arg regs w/ ref_obj_id for kfuncs" * No longer needed as v3 doesn't have multiple ref_obj_id arg regs * The refactoring pieces were submitted separately (https://lore.kernel.org/bpf/20230121002417.1684602-1-davemarchevsky@fb.com/) * Patch 2 - "bpf: Migrate release_on_unlock logic to non-owning ref semantics" * Remove KF_RELEASE_NON_OWN flag from list API push methods, just match against specific kfuncs for now (Alexei, David) * Separate "release non owning reference" logic from KF_RELEASE logic (Alexei, David) * reg_find_field_offset now correctly tests 'rec' instead of 'reg' after calling reg_btf_record (Dan Carpenter) * New patch added after Patch 2 - "bpf: Improve bpf_reg_state space usage for non-owning ref lock" * Eliminates extra bpf_reg_state memory usage by using a bool instead of copying lock identity * Patch 4 - "bpf: rename list_head -> graph_root in field info types" * v2's version was applied to bpf-next, not including in respins * Patch 6 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs" * Remove KF_RELEASE_NON_OWN flag from rbtree_add, just add it to specific kfunc matching (Alexei, David) * Patch 9 - "bpf: Special verifier handling for bpf_rbtree_{remove, first}" * Remove KF_INVALIDATE_NON_OWN kfunc flag, just match against specific kfunc for now (Alexei, David) * Patch 11 - "libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type" * Drop for now, will submit separately * Patch 12 - "selftests/bpf: Add rbtree selftests" * Some expected-failure tests have different error messages due to "release non-owning reference logic" being separated from KF_RELEASE logic in Patch 2 changes * Patch 13 - "bpf, documentation: Add graph documentation for non-owning refs" * Fix documentation formatting and improve content (David) v1 -> v2: lore.kernel.org/bpf/20221206231000.3180914-1-davemarchevsky@fb.com/ Series-wide changes: * Rename datastructure_{head,node,api} -> graph_{root,node,api} (Alexei) * "graph datastructure" in patch summaries to refer to linked_list + rbtree instead of "next-gen datastructure" (Alexei) * Move from hacky marking of non-owning references as PTR_UNTRUSTED to cleaner implementation (Alexei) * Add invalidation of non-owning refs to rbtree_remove (Kumar, Alexei) Patch #'s below refer to the patch's number in v1 unless otherwise stated. Note that in v1 most of the meaty verifier changes were in the latter half of the series. Here, about half of that complexity has been moved to "bpf: Migrate release_on_unlock logic to non-owning ref semantics" - was Patch 3 in v1. * Patch 1 - "bpf: Loosen alloc obj test in verifier's reg_btf_record" * Was applied, dropped from further iterations * Patch 2 - "bpf: map_check_btf should fail if btf_parse_fields fails" * Dropped in favor of verifier check-on-use: when some normal verifier checking expects the map to have btf_fields correctly parsed, it won't find any and verification will fail * New patch added before Patch 3 - "bpf: Support multiple arg regs w/ ref_obj_id for kfuncs" * Addition of KF_RELEASE_NON_OWN flag, which requires KF_RELEASE, and tagging of bpf_list_push_{front,back} KF_RELEASE \| KF_RELEASE_NON_OWN, means that list-in-list push_{front,back} will trigger "only one ref_obj_id arg reg" logic. This is because "head" arg to those functions can be a list-in-list, which itself can be an owning reference with ref_obj_id. So need to support multiple ref_obj_id for release kfuncs. * Patch 3 - "bpf: Minor refactor of ref_set_release_on_unlock" * Now a major refactor w/ a rename to reflect this * "bpf: Migrate release_on_unlock logic to non-owning ref semantics" * Replaces release_on_unlock with active_lock logic as discussed in v1 * New patch added after Patch 3 - "selftests/bpf: Update linked_list tests for non_owning_ref logic" * Removes "write after push" linked_list failure tests - no longer failure scenarios. * Patch 4 - "bpf: rename list_head -> datastructure_head in field info types" * rename to graph_root instead. Similar renamings across the series - see series-wide changes. * Patch 5 - "bpf: Add basic bpf_rb_{root,node} support" * OWNER_FIELD_MASK -> GRAPH_ROOT_MASK, OWNEE_FIELD_MASK -> GRAPH_NODE_MASK, and change of "owner"/"ownee" in big btf_check_and_fixup_fields comment to "root"/"node" (Alexei) * Patch 6 - "bpf: Add bpf_rbtree_{add,remove,first} kfuncs" * bpf_rbtree_remove can no longer return NULL. v2 continues v1's "use type system to prevent remove of node that isn't in a datastructure" approach, so rbtree_remove should never have been able to return NULL * Patch 7 - "bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args" * is_bpf_datastructure_api_kfunc -> is_bpf_graph_api_kfunc (Alexei) * Patch 8 - "bpf: Add callback validation to kfunc verifier logic" * Explicitly disallow rbtree_remove in rbtree callback * Explicitly disallow bpf_spin_{lock,unlock} call in rbtree callback, preventing possibility of "unbalanced" unlock (Alexei) * Patch 10 - "bpf, x86: BPF_PROBE_MEM handling for insn->off < 0" * Now that non-owning refs aren't marked PTR_UNTRUSTED it's not necessary to include this patch as part of the series * After conversation w/ Alexei, did another pass and submitted as an independent series (lore.kernel.org/bpf/20221213182726.325137-1-davemarchevsky@fb.com/) * Patch 13 - "selftests/bpf: Add rbtree selftests" * Since bpf_rbtree_remove can no longer return null, remove null checks * Remove test confirming that rbtree_first isn't allowed in callback. We want this to be possible * Add failure test confirming that rbtree_remove's new non-owning reference invalidation behavior behaves as expected * Add SEC("license") to rbtree_btf_fail__* progs. They were previously failing due to lack of this section. Now they're failing for correct reasons. * rbtree_btf_fail__add_wrong_type.c - add locking around rbtree_add, rename the bpf prog to something reasonable * New patch added after patch 13 - "bpf, documentation: Add graph documentation for non-owning refs" * Summarizes details of owning and non-owning refs which we hashed out in v1 ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 19:40:58 -08:00
Dave Marchevsky	c31315c3aa	bpf, documentation: Add graph documentation for non-owning refs It is difficult to intuit the semantics of owning and non-owning references from verifier code. In order to keep the high-level details from being lost in the mailing list, this patch adds documentation explaining semantics and details. The target audience of doc added in this patch is folks working on BPF internals, as there's focus on "what should the verifier do here". Via reorganization or copy-and-paste, much of the content can probably be repurposed for BPF program writer audience as well. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-9-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 19:40:53 -08:00
Dave Marchevsky	215249f6ad	selftests/bpf: Add rbtree selftests This patch adds selftests exercising the logic changed/added in the previous patches in the series. A variety of successful and unsuccessful rbtree usages are validated: Success: * Add some nodes, let map_value bpf_rbtree_root destructor clean them up * Add some nodes, remove one using the non-owning ref leftover by successful rbtree_add() call * Add some nodes, remove one using the non-owning ref returned by rbtree_first() call Failure: * BTF where bpf_rb_root owns bpf_list_node should fail to load * BTF where node of type X is added to tree containing nodes of type Y should fail to load * No calling rbtree api functions in 'less' callback for rbtree_add * No releasing lock in 'less' callback for rbtree_add * No removing a node which hasn't been added to any tree * No adding a node which has already been added to a tree * No escaping of non-owning references past their lock's critical section * No escaping of non-owning references past other invalidation points (rbtree_remove) These tests mostly focus on rbtree-specific additions, but some of the failure cases revalidate scenarios common to both linked_list and rbtree which are covered in the former's tests. Better to be a bit redundant in case linked_list and rbtree semantics deviate over time. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-8-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 19:40:53 -08:00
Dave Marchevsky	c834df847e	bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h These kfuncs will be used by selftests in following patches Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-7-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 19:40:53 -08:00
Dave Marchevsky	a40d363243	bpf: Special verifier handling for bpf_rbtree_{remove, first} Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties that require handling in the verifier: * both bpf_rbtree_remove and bpf_rbtree_first return the type containing the bpf_rb_node field, with the offset set to that field's offset, instead of a struct bpf_rb_node * * mark_reg_graph_node helper added in previous patch generalizes this logic, use it * bpf_rbtree_remove's node input is a node that's been inserted in the tree - a non-owning reference. * bpf_rbtree_remove must invalidate non-owning references in order to avoid aliasing issue. Use previously-added invalidate_non_owning_refs helper to mark this function as a non-owning ref invalidation point. * Unlike other functions, which convert one of their input arg regs to non-owning reference, bpf_rbtree_first takes no arguments and just returns a non-owning reference (possibly null) * For now verifier logic for this is special-cased instead of adding new kfunc flag. This patch, along with the previous one, complete special verifier handling for all rbtree API functions added in this series. With functional verifier handling of rbtree_remove, under current non-owning reference scheme, a node type with both bpf_{list,rb}_node fields could cause the verifier to accept programs which remove such nodes from collections they haven't been added to. In order to prevent this, this patch adds a check to btf_parse_fields which rejects structs with both bpf_{list,rb}_node fields. This is a temporary measure that can be removed after "collection identity" followup. See comment added in btf_parse_fields. A linked_list BTF test exercising the new check is added in this patch as well. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 19:40:53 -08:00
Dave Marchevsky	5d92ddc3de	bpf: Add callback validation to kfunc verifier logic Some BPF helpers take a callback function which the helper calls. For each helper that takes such a callback, there's a special call to __check_func_call with a callback-state-setting callback that sets up verifier bpf_func_state for the callback's frame. kfuncs don't have any of this infrastructure yet, so let's add it in this patch, following existing helper pattern as much as possible. To validate functionality of this added plumbing, this patch adds callback handling for the bpf_rbtree_add kfunc and hopes to lay groundwork for future graph datastructure callbacks. In the "general plumbing" category we have: * check_kfunc_call doing callback verification right before clearing CALLER_SAVED_REGS, exactly like check_helper_call * recognition of func_ptr BTF types in kfunc args as KF_ARG_PTR_TO_CALLBACK + propagation of subprogno for this arg type In the "rbtree_add / graph datastructure-specific plumbing" category: * Since bpf_rbtree_add must be called while the spin_lock associated with the tree is held, don't complain when callback's func_state doesn't unlock it by frame exit * Mark rbtree_add callback's args with ref_set_non_owning to prevent rbtree api functions from being called in the callback. Semantically this makes sense, as less() takes no ownership of its args when determining which comes first. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-13 19:40:53 -08:00

1 2 3 4 5 ...

1156870 Commits