linux

iv/linux

Author	SHA1	Message	Date
Martin KaFai Lau	4e1ea33292	bpftool: Support dumping a map with btf_vmlinux_value_type_id This patch makes bpftool support dumping a map's value properly when the map's value type is a type of the running kernel's btf. (i.e. map_info.btf_vmlinux_value_type_id is set instead of map_info.btf_value_type_id). The first usecase is for the BPF_MAP_TYPE_STRUCT_OPS. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230044.1103008-1-kafai@fb.com	2020-01-15 15:23:27 -08:00
Martin KaFai Lau	84c72ceee9	bpftool: Add struct_ops map name This patch adds BPF_MAP_TYPE_STRUCT_OPS to "struct_ops" name mapping so that "bpftool map show" can print the "struct_ops" map type properly. [root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool map show id 8 8: struct_ops name dctcp flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 7 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230037.1102674-1-kafai@fb.com	2020-01-15 15:23:27 -08:00
Martin KaFai Lau	fb2426ad00	libbpf: Expose bpf_find_kernel_btf as a LIBBPF_API This patch exposes bpf_find_kernel_btf() as a LIBBPF_API. It will be used in 'bpftool map dump' in a following patch to dump a map with btf_vmlinux_value_type_id set. bpf_find_kernel_btf() is renamed to libbpf_find_kernel_btf() and moved to btf.c. As <linux/kernel.h> is included, some of the max/min type casting needs to be fixed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230031.1102305-1-kafai@fb.com	2020-01-15 15:23:27 -08:00
Martin KaFai Lau	188a486619	bpftool: Fix missing BTF output for json during map dump The btf availability check is only done for plain text output. It causes the whole BTF output went missing when json_output is used. This patch simplifies the logic a little by avoiding passing "int btf" to map_dump(). For plain text output, the btf_wtr is only created when the map has BTF (i.e. info->btf_id != 0). The nullness of "json_writer_t *wtr" in map_dump() alone can decide if dumping BTF output is needed. As long as wtr is not NULL, map_dump() will print out the BTF-described data whenever a map has BTF available (i.e. info->btf_id != 0) regardless of json or plain-text output. In do_dump(), the "int btf" is also renamed to "int do_plain_btf". Fixes: `99f9863a0c` ("bpftool: Match maps by name") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Paul Chaignon <paul.chaignon@orange.com> Link: https://lore.kernel.org/bpf/20200115230025.1101828-1-kafai@fb.com	2020-01-15 15:23:27 -08:00
Martin KaFai Lau	d7de72674a	bpftool: Fix a leak of btf object When testing a map has btf or not, maps_have_btf() tests it by actually getting a btf_fd from sys_bpf(BPF_BTF_GET_FD_BY_ID). However, it forgot to btf__free() it. In maps_have_btf() stage, there is no need to test it by really calling sys_bpf(BPF_BTF_GET_FD_BY_ID). Testing non zero info.btf_id is good enough. Also, the err_close case is unnecessary, and also causes double close() because the calling func do_dump() will close() all fds again. Fixes: `99f9863a0c` ("bpftool: Match maps by name") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Paul Chaignon <paul.chaignon@orange.com> Link: https://lore.kernel.org/bpf/20200115230019.1101352-1-kafai@fb.com	2020-01-15 15:23:27 -08:00
Alexei Starovoitov	990bca1fc8	Merge branch 'bpf-batch-ops' Brian Vazquez says: ==================== This patch series introduce batch ops that can be added to bpf maps to lookup/lookup_and_delete/update/delete more than 1 element at the time, this is specially useful when syscall overhead is a problem and in case of hmap it will provide a reliable way of traversing them. The implementation inclues a generic approach that could potentially be used by any bpf map and adds it to arraymap, it also includes the specific implementation of hashmaps which are traversed using buckets instead of keys. The bpf syscall subcommands introduced are: BPF_MAP_LOOKUP_BATCH BPF_MAP_LOOKUP_AND_DELETE_BATCH BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP__BATCH commands / __aligned_u64 in_batch; /* start batch, * NULL to start from beginning / __aligned_u64 out_batch; / output: next start batch / __aligned_u64 keys; __aligned_u64 values; __u32 count; / input/output: * input: # of key/value * elements * output: # of filled elements */ __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch and out_batch are only used for lookup and lookup_and_delete since those are the only two operations that attempt to traverse the map. update/delete batch ops should provide the keys/values that user wants to modify. Here are the previous discussions on the batch processing: - https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ - https://lore.kernel.org/bpf/20190829064502.2750303-1-yhs@fb.com/ - https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Changelog sinve v4: - Remove unnecessary checks from libbpf API (Andrii Nakryiko) - Move DECLARE_LIBBPF_OPTS with all var declarations (Andrii Nakryiko) - Change bucket internal buffer size to 5 entries (Yonghong Song) - Fix some minor bugs in hashtab batch ops implementation (Yonghong Song) Changelog sinve v3: - Do not use copy_to_user inside atomic region (Yonghong Song) - Use _opts approach on libbpf APIs (Andrii Nakryiko) - Drop generic_map_lookup_and_delete_batch support - Free malloc-ed memory in tests (Yonghong Song) - Reverse christmas tree (Yonghong Song) - Add acked labels Changelog sinve v2: - Add generic batch support for lpm_trie and test it (Yonghong Song) - Use define MAP_LOOKUP_RETRIES for retries (John Fastabend) - Return errors directly and remove labels (Yonghong Song) - Insert new API functions into libbpf alphabetically (Yonghong Song) - Change hlist_nulls_for_each_entry_rcu to hlist_nulls_for_each_entry_safe in htab batch ops (Yonghong Song) Changelog since v1: - Fix SOB ordering and remove Co-authored-by tag (Alexei Starovoitov) Changelog since RFC: - Change batch to in_batch and out_batch to support more flexible opaque values to iterate the bpf maps. - Remove update/delete specific batch ops for htab and use the generic implementations instead. ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-01-15 14:07:47 -08:00
Brian Vazquez	f0fac2cec2	selftests/bpf: Add batch ops testing to array bpf map Tested bpf_map_lookup_batch() and bpf_map_update_batch() functionality. $ ./test_maps ... test_array_map_batch_ops:PASS ... Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-10-brianvv@google.com	2020-01-15 14:00:35 -08:00
Yonghong Song	30ff3c5913	selftests/bpf: Add batch ops testing for htab and htab_percpu map Tested bpf_map_lookup_batch(), bpf_map_lookup_and_delete_batch(), bpf_map_update_batch(), and bpf_map_delete_batch() functionality. $ ./test_maps ... test_htab_map_batch_ops:PASS test_htab_percpu_map_batch_ops:PASS ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-9-brianvv@google.com	2020-01-15 14:00:35 -08:00
Yonghong Song	2ab3d86ea1	libbpf: Add libbpf support to batch ops Added four libbpf API functions to support map batch operations: . int bpf_map_delete_batch( ... ) . int bpf_map_lookup_batch( ... ) . int bpf_map_lookup_and_delete_batch( ... ) . int bpf_map_update_batch( ... ) Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-8-brianvv@google.com	2020-01-15 14:00:35 -08:00
Yonghong Song	a1e3a3b8ba	tools/bpf: Sync uapi header bpf.h sync uapi header include/uapi/linux/bpf.h to tools/include/uapi/linux/bpf.h Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-7-brianvv@google.com	2020-01-15 14:00:35 -08:00
Yonghong Song	057996380a	bpf: Add batch ops to all htab bpf map htab can't use generic batch support due some problematic behaviours inherent to the data structre, i.e. while iterating the bpf map a concurrent program might delete the next entry that batch was about to use, in that case there's no easy solution to retrieve the next entry, the issue has been discussed multiple times (see [1] and [2]). The only way hmap can be traversed without the problem previously exposed is by making sure that the map is traversing entire buckets. This commit implements those strict requirements for hmap, the implementation follows the same interaction that generic support with some exceptions: - If keys/values buffer are not big enough to traverse a bucket, ENOSPC will be returned. - out_batch contains the value of the next bucket in the iteration, not the next key, but this is transparent for the user since the user should never use out_batch for other than bpf batch syscalls. This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete batch ops it is possible to use the generic implementations. [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com	2020-01-15 14:00:35 -08:00
Brian Vazquez	c60f2d2861	bpf: Add lookup and update batch ops to arraymap This adds the generic batch ops functionality to bpf arraymap, note that since deletion is not a valid operation for arraymap, only batch and lookup are added. Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200115184308.162644-5-brianvv@google.com	2020-01-15 14:00:35 -08:00
Brian Vazquez	aa2e93b8e5	bpf: Add generic support for update and delete batch ops This commit adds generic support for update and delete batch ops that can be used for almost all the bpf maps. These commands share the same UAPI attr that lookup and lookup_and_delete batch ops use and the syscall commands are: BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The main difference between update/delete and lookup batch ops is that for update/delete keys/values must be specified for userspace and because of that, neither in_batch nor out_batch are used. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com	2020-01-15 14:00:35 -08:00
Brian Vazquez	cb4d03ab49	bpf: Add generic support for lookup batch op This commit introduces generic support for the bpf_map_lookup_batch. This implementation can be used by almost all the bpf maps since its core implementation is relying on the existing map_get_next_key and map_lookup_elem. The bpf syscall subcommand introduced is: BPF_MAP_LOOKUP_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP__BATCH commands / __aligned_u64 in_batch; /* start batch, * NULL to start from beginning / __aligned_u64 out_batch; / output: next start batch / __aligned_u64 keys; __aligned_u64 values; __u32 count; / input/output: * input: # of key/value * elements * output: # of filled elements / __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch/out_batch are opaque values use to communicate between user/kernel space, in_batch/out_batch must be of key_size length. To start iterating from the beginning in_batch must be null, count is the # of key/value elements to retrieve. Note that the 'keys' buffer must be a buffer of key_size count size and the 'values' buffer must be value_size * count, where value_size must be aligned to 8 bytes by userspace if it's dealing with percpu maps. 'count' will contain the number of keys/values successfully retrieved. Note that 'count' is an input/output variable and it can contain a lower value after a call. If there's no more entries to retrieve, ENOENT will be returned. If error is ENOENT, count might be > 0 in case it copied some values but there were no more entries to retrieve. Note that if the return code is an error and not -EFAULT, count indicates the number of elements successfully processed. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com	2020-01-15 14:00:35 -08:00
Brian Vazquez	15c14a3dca	bpf: Add bpf_map_{value_size, update_value, map_copy_value} functions This commit moves reusable code from map_lookup_elem and map_update_elem to avoid code duplication in kernel/bpf/syscall.c. Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200115184308.162644-2-brianvv@google.com	2020-01-15 14:00:34 -08:00
Eelco Chaudron	83e4b88be1	selftests/bpf: Add a test for attaching a bpf fentry/fexit trace to an XDP program Add a test that will attach a FENTRY and FEXIT program to the XDP test program. It will also verify data from the XDP context on FENTRY and verifies the return code on exit. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157909410480.47481.11202505690938004673.stgit@xdp-tutorial	2020-01-15 13:22:25 -08:00
Andrii Nakryiko	9173cac3b6	libbpf: Support .text sub-calls relocations The LLVM patch https://reviews.llvm.org/D72197 makes LLVM emit function call relocations within the same section. This includes a default .text section, which contains any BPF sub-programs. This wasn't the case before and so libbpf was able to get a way with slightly simpler handling of subprogram call relocations. This patch adds support for .text section relocations. It needs to ensure correct order of relocations, so does two passes: - first, relocate .text instructions, if there are any relocations in it; - then process all the other programs and copy over patched .text instructions for all sub-program calls. v1->v2: - break early once .text program is processed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115190856.2391325-1-andriin@fb.com	2020-01-15 11:51:48 -08:00
Alexei Starovoitov	5640a771d7	Merge branch 'bpf_send_signal_thread' Yonghong Song says: ==================== Commit `8b401f9ed2` ("bpf: implement bpf_send_signal() helper") added helper bpf_send_signal() which permits bpf program to send a signal to the current process. The signal may send to any thread of the process. This patch implemented a new helper bpf_send_signal_thread() to send a signal to the thread corresponding to the kernel current task. This helper can simplify user space code if the thread context of bpf sending signal is needed in user space. Please see Patch #1 for details of use case and kernel implementation. Patch #2 added some bpf self tests for the new helper. Changelogs: v2 -> v3: - More simplification for skeleton codes by removing not-needed mmap code and redundantly created tracepoint link. v1 -> v2: - More description for the difference between bpf_send_signal() and bpf_send_signal_thread() in the uapi header bpf.h. - Use skeleton and mmap for send_signal test. ==================== Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-01-15 11:46:32 -08:00
Yonghong Song	ab8b7f0cb3	tools/bpf: Add self tests for bpf_send_signal_thread() The test_progs send_signal() is amended to test bpf_send_signal_thread() as well. $ ./test_progs -n 40 #40/1 send_signal_tracepoint:OK #40/2 send_signal_perf:OK #40/3 send_signal_nmi:OK #40/4 send_signal_tracepoint_thread:OK #40/5 send_signal_perf_thread:OK #40/6 send_signal_nmi_thread:OK #40 send_signal:OK Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED Also took this opportunity to rewrite the send_signal test using skeleton framework and array mmap to make code simpler and more readable. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115035003.602425-1-yhs@fb.com	2020-01-15 11:44:51 -08:00
Yonghong Song	8482941f09	bpf: Add bpf_send_signal_thread() helper Commit `8b401f9ed2` ("bpf: implement bpf_send_signal() helper") added helper bpf_send_signal() which permits bpf program to send a signal to the current process. The signal may be delivered to any threads in the process. We found a use case where sending the signal to the current thread is more preferable. - A bpf program will collect the stack trace and then send signal to the user application. - The user application will add some thread specific information to the just collected stack trace for later analysis. If bpf_send_signal() is used, user application will need to check whether the thread receiving the signal matches the thread collecting the stack by checking thread id. If not, it will need to send signal to another thread through pthread_kill(). This patch proposed a new helper bpf_send_signal_thread(), which sends the signal to the thread corresponding to the current kernel task. This way, user space is guaranteed that bpf_program execution context and user space signal handling context are the same thread. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com	2020-01-15 11:44:51 -08:00
Magnus Karlsson	d3a56931f9	xsk: Support allocations of large umems When registering a umem area that is sufficiently large (>1G on an x86), kmalloc cannot be used to allocate one of the internal data structures, as the size requested gets too large. Use kvmalloc instead that falls back on vmalloc if the allocation is too large for kmalloc. Also add accounting for this structure as it is triggered by a user space action (the XDP_UMEM_REG setsockopt) and it is by far the largest structure of kernel allocated memory in xsk. Reported-by: Ryan Goodfellow <rgoodfel@isi.edu> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1578995365-7050-1-git-send-email-magnus.karlsson@intel.com	2020-01-15 11:41:52 -08:00
Li RongQing	0a29275b63	bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_map A negative value should be returned if map->map_type is invalid although that is impossible now, but if we run into such situation in future, then xdpbuff could be leaked. Daniel Borkmann suggested: -EBADRQC should be returned to stay consistent with generic XDP for the tracepoint output and not to be confused with -EOPNOTSUPP from other locations like dev_map_enqueue() when ndo_xdp_xmit is missing and such. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1578618277-18085-1-git-send-email-lirongqing@baidu.com	2020-01-14 20:32:24 +01:00
Martin KaFai Lau	3b4130418f	bpf: Fix seq_show for BPF_MAP_TYPE_STRUCT_OPS Instead of using bpf_struct_ops_map_lookup_elem() which is not implemented, bpf_struct_ops_map_seq_show_elem() should also use bpf_struct_ops_map_sys_lookup_elem() which does an inplace update to the value. The change allocates a value to pass to bpf_struct_ops_map_sys_lookup_elem(). [root@arch-fb-vm1 bpf]# cat /sys/fs/bpf/dctcp {{{1}},BPF_STRUCT_OPS_STATE_INUSE,{{00000000df93eebc,00000000df93eebc},0,2, ... Fixes: `85d33df357` ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200114072647.3188298-1-kafai@fb.com	2020-01-14 09:54:31 -08:00
Alexei Starovoitov	6dd42aa17f	Merge branch 'runqslower' Andrii Nakryiko says: ==================== Based on recent BPF CO-RE, tp_btf, and BPF skeleton changes, re-implement BCC-based runqslower tool as a portable pre-compiled BPF CO-RE-based tool. Make sure it's built as part of selftests to ensure it doesn't bit rot. As part of this patch set, augment `format c` output of `bpftool btf dump` sub-command with applying `preserve_access_index` attribute to all structs and unions. This makes all such structs and unions automatically relocatable under BPF CO-RE, which improves user experience of writing TRACING programs with direct kernel memory read access. Also, further clean up selftest/bpf Makefile output and make it conforming to libbpf and bpftool succinct output format. v1->v2: - build in-tree bpftool for runqslower (Yonghong); - drop `format core` and augment `format c` instead (Alexei); - move runqslower under tools/bpf (Daniel). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-01-14 09:23:20 -08:00
Andrii Nakryiko	3a0d3092a4	selftests/bpf: Build runqslower from selftests Ensure runqslower tool is built as part of selftests to prevent it from bit rotting. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-7-andriin@fb.com	2020-01-14 09:23:12 -08:00
Andrii Nakryiko	9c01546d26	tools/bpf: Add runqslower tool to tools/bpf Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches its BCC-based counterpart 1-to-1, supporting all the same parameters and functionality. runqslower tool utilizes BPF skeleton, auto-generated from BPF object file, as well as memory-mapped interface to global (read-only, in this case) data. Its Makefile also ensures auto-generation of "relocatable" vmlinux.h, which is necessary for BTF-typed raw tracepoints with direct memory access. [0] `11bf5d02c8/tools/runqslower.py` Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-6-andriin@fb.com	2020-01-13 17:48:13 -08:00
Andrii Nakryiko	1cf5b23988	bpftool: Apply preserve_access_index attribute to all types in BTF dump This patch makes structs and unions, emitted through BTF dump, automatically CO-RE-relocatable (unless disabled with `#define BPF_NO_PRESERVE_ACCESS_INDEX`, specified before including generated header file). This effectivaly turns usual bpf_probe_read() call into equivalent of bpf_core_read(), by automatically applying builtin_preserve_access_index to any field accesses of types in generated C types header. This is especially useful for tp_btf/fentry/fexit BPF program types. They allow direct memory access, so BPF C code just uses straightfoward a->b->c access pattern to read data from kernel. But without kernel structs marked as CO-RE relocatable through preserve_access_index attribute, one has to enclose all the data reads into a special __builtin_preserve_access_index code block, like so: __builtin_preserve_access_index(({ x = p->pid; /* where p is struct task_struct , for example / })); This is very inconvenient and obscures the logic quite a bit. By marking all auto-generated types with preserve_access_index attribute the above code is reduced to just a clean and natural `x = p->pid;`. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-5-andriin@fb.com	2020-01-13 17:48:13 -08:00
Andrii Nakryiko	2cc51d34d9	selftests/bpf: Conform selftests/bpf Makefile output to libbpf and bpftool Bring selftest/bpf's Makefile output to the same format used by libbpf and bpftool: 2 spaces of padding on the left + 8-character left-aligned build step identifier. Also, hide feature detection output by default. Can be enabled back by setting V=1. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-4-andriin@fb.com	2020-01-13 17:48:13 -08:00
Andrii Nakryiko	292e1d73b1	libbpf: Clean up bpf_helper_defs.h generation output bpf_helpers_doc.py script, used to generate bpf_helper_defs.h, unconditionally emits one informational message to stderr. Remove it and preserve stderr to contain only relevant errors. Also make sure script invocations command is muted by default in libbpf's Makefile. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-3-andriin@fb.com	2020-01-13 17:48:13 -08:00
Andrii Nakryiko	533420a415	tools: Sync uapi/linux/if_link.h Sync uapi/linux/if_link.h into tools to avoid out of sync warnings during libbpf build. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-2-andriin@fb.com	2020-01-13 17:48:12 -08:00
Andrii Nakryiko	ac065870d9	selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long ' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com	2020-01-10 14:02:07 -08:00
Andrii Nakryiko	1d1a3bcffe	libbpf: Poison kernel-only integer types It's been a recurring issue with types like u32 slipping into libbpf source code accidentally. This is not detected during builds inside kernel source tree, but becomes a compilation error in libbpf's Github repo. Libbpf is supposed to use only __{s,u}{8,16,32,64} typedefs, so poison {s,u}{8,16,32,64} explicitly in every .c file. Doing that in a bit more centralized way, e.g., inside libbpf_internal.h breaks selftests, which are both using kernel u32 and libbpf_internal.h. This patch also fixes a new u32 occurence in libbpf.c, added recently. Fixes: `590a008882` ("bpf: libbpf: Add STRUCT_OPS support") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110181916.271446-1-andriin@fb.com	2020-01-10 10:38:00 -08:00
Daniel Borkmann	7a2d070f91	Merge branch 'bpf-global-funcs' Alexei Starovoitov says: ==================== Introduce static vs global functions and function by function verification. This is another step toward dynamic re-linking (or replacement) of global functions. See patch 2 for details. v2->v3: - cleaned up a check spotted by Song. - rebased and dropped patch 2 that was trying to improve BTF based on ELF. - added one more unit test for scalar return value from global func. v1->v2: - addressed review comments from Song, Andrii, Yonghong - fixed memory leak in error path - added modified ctx check - added more tests in patch 7 ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2020-01-10 17:20:19 +01:00
Alexei Starovoitov	360301a6c2	selftests/bpf: Add unit tests for global functions test_global_func[12] - check 512 stack limit. test_global_func[34] - check 8 frame call chain limit. test_global_func5 - check that non-ctx pointer cannot be passed into a function that expects context. test_global_func6 - check that ctx pointer is unmodified. test_global_func7 - check that global function returns scalar. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-7-ast@kernel.org	2020-01-10 17:20:07 +01:00
Alexei Starovoitov	e528d1c012	selftests/bpf: Modify a test to check global functions Make two static functions in test_xdp_noinline.c global: before: processed 2790 insns after: processed 2598 insns Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-6-ast@kernel.org	2020-01-10 17:20:07 +01:00
Alexei Starovoitov	6db2d81a46	selftests/bpf: Add a test for a large global function test results: pyperf50 with always_inlined the same function five times: processed 46378 insns pyperf50 with global function: processed 6102 insns Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-5-ast@kernel.org	2020-01-10 17:20:07 +01:00
Alexei Starovoitov	7608e4db6d	selftests/bpf: Add fexit-to-skb test for global funcs Add simple fexit prog type to skb prog type test when subprogram is a global function. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-4-ast@kernel.org	2020-01-10 17:20:07 +01:00
Alexei Starovoitov	51c39bb1d5	bpf: Introduce function-by-function verification New llvm and old llvm with libbpf help produce BTF that distinguish global and static functions. Unlike arguments of static function the arguments of global functions cannot be removed or optimized away by llvm. The compiler has to use exactly the arguments specified in a function prototype. The argument type information allows the verifier validate each global function independently. For now only supported argument types are pointer to context and scalars. In the future pointers to structures, sizes, pointer to packet data can be supported as well. Consider the following example: static int f1(int ...) { ... } int f3(int b); int f2(int a) { f1(a) + f3(a); } int f3(int b) { ... } int main(...) { f1(...) + f2(...) + f3(...); } The verifier will start its safety checks from the first global function f2(). It will recursively descend into f1() because it's static. Then it will check that arguments match for the f3() invocation inside f2(). It will not descend into f3(). It will finish f2() that has to be successfully verified for all possible values of 'a'. Then it will proceed with f3(). That function also has to be safe for all possible values of 'b'. Then it will start subprog 0 (which is main() function). It will recursively descend into f1() and will skip full check of f2() and f3(), since they are global. The order of processing global functions doesn't affect safety, since all global functions must be proven safe based on their arguments only. Such function by function verification can drastically improve speed of the verification and reduce complexity. Note that the stack limit of 512 still applies to the call chain regardless whether functions were static or global. The nested level of 8 also still applies. The same recursion prevention checks are in place as well. The type information and static/global kind is preserved after the verification hence in the above example global function f2() and f3() can be replaced later by equivalent functions with the same types that are loaded and verified later without affecting safety of this main() program. Such replacement (re-linking) of global functions is a subject of future patches. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org	2020-01-10 17:20:07 +01:00
Alexei Starovoitov	2d3eb67f64	libbpf: Sanitize global functions In case the kernel doesn't support BTF_FUNC_GLOBAL sanitize BTF produced by the compiler for global functions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-2-ast@kernel.org	2020-01-10 17:20:07 +01:00
Alexei Starovoitov	f41aa387a7	Merge branch 'selftest-makefile-cleanup' Andrii Nakryiko says: ==================== Fix issues with bpf_helper_defs.h usage in selftests/bpf. As part of that, fix the way clean up is performed for libbpf and selftests/bpf. Some for Makefile output clean ups as well. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-01-09 21:56:16 -08:00
Andrii Nakryiko	965b9fee28	selftests/bpf: Further clean up Makefile output Further clean up Makefile output: - hide "entering directory" messages; - silvence sub-Make command echoing; - succinct MKDIR messages. Also remove few test binaries that are not produced anymore from .gitignore. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110051716.1591485-4-andriin@fb.com	2020-01-09 21:55:08 -08:00
Andrii Nakryiko	6910d7d386	selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir Reorder includes search path to ensure $(OUTPUT) and $(CURDIR) go before libbpf's directory. Also fix bpf_helpers.h to include bpf_helper_defs.h in such a way as to leverage includes search path. This allows selftests to not use libbpf's local and potentially stale bpf_helper_defs.h. It's important because selftests/bpf's Makefile only re-generates bpf_helper_defs.h in seltests' output directory, not the one in libbpf's directory. Also force regeneration of bpf_helper_defs.h when libbpf.a is updated to reduce staleness. Fixes: `fa633a0f89` ("libbpf: Fix build on read-only filesystems") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110051716.1591485-3-andriin@fb.com	2020-01-09 21:55:08 -08:00
Andrii Nakryiko	2031af28a4	libbpf,selftests/bpf: Fix clean targets Libbpf's clean target should clean out generated files in $(OUTPUT) directory and not make assumption that $(OUTPUT) directory is current working directory. Selftest's Makefile should delegate cleaning of libbpf-generated files to libbpf's Makefile. This ensures more robust clean up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110051716.1591485-2-andriin@fb.com	2020-01-09 21:55:08 -08:00
Andrii Nakryiko	492ab0205f	libbpf: Make bpf_map order and indices stable Currently, libbpf re-sorts bpf_map structs after all the maps are added and initialized, which might change their relative order and invalidate any bpf_map pointer or index taken before that. This is inconvenient and error-prone. For instance, it can cause .kconfig map index to point to a wrong map. Furthermore, libbpf itself doesn't rely on any specific ordering of bpf_maps, so it's just an unnecessary complication right now. This patch drops sorting of maps and makes their relative positions fixed. If efficient index is ever needed, it's better to have a separate array of pointers as a search index, instead of reordering bpf_map struct in-place. This will be less error-prone and will allow multiple independent orderings, if necessary (e.g., either by section index or by name). Fixes: `166750bc1d` ("libbpf: Support libbpf-provided extern variables") Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110034247.1220142-1-andriin@fb.com	2020-01-09 21:17:01 -08:00
Andrey Ignatov	f5bfcd953d	bpf: Document BPF_F_QUERY_EFFECTIVE flag Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects attach_flags what may not be obvious and what may lead to confision. Specifically attach_flags is returned only for target_fd but if programs are inherited from an ancestor cgroup then returned attach_flags for current cgroup may be confusing. For example, two effective programs of same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in attach_flags. Simple repro: # bpftool c s /sys/fs/cgroup/path/to/task ID AttachType AttachFlags Name # bpftool c s /sys/fs/cgroup/path/to/task effective ID AttachType AttachFlags Name 95043 ingress tw_ipt_ingress 95048 ingress tw_ingress Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com	2020-01-09 09:40:06 -08:00
Alexei Starovoitov	417759f7d4	Merge branch 'tcp-bpf-cc' Martin Lau says: ==================== This series introduces BPF STRUCT_OPS. It is an infra to allow implementing some specific kernel's function pointers in BPF. The first use case included in this series is to implement TCP congestion control algorithm in BPF (i.e. implement struct tcp_congestion_ops in BPF). There has been attempt to move the TCP CC to the user space (e.g. CCP in TCP). The common arguments are faster turn around, get away from long-tail kernel versions in production...etc, which are legit points. BPF has been the continuous effort to join both kernel and userspace upsides together (e.g. XDP to gain the performance advantage without bypassing the kernel). The recent BPF advancements (in particular BTF-aware verifier, BPF trampoline, BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc) possible in BPF. The idea is to allow implementing tcp_congestion_ops in bpf. It allows a faster turnaround for testing algorithm in the production while leveraging the existing (and continue growing) BPF feature/framework instead of building one specifically for userspace TCP CC. Please see individual patch for details. The bpftool support will be posted in follow-up patches. v4: - Expose tcp_ca_find() to tcp.h in patch 7. It is used to check the same bpf-tcp-cc does not exist to guarantee the register() will succeed. - set_memory_ro() and then set_memory_x() only after all trampolines are written to the image in patch 6. (Daniel) spinlock is replaced by mutex because set_memory_* requires sleepable context. v3: - Fix kbuild error by considering CONFIG_BPF_SYSCALL (kbuild) - Support anonymous bitfield in patch 4 (Andrii, Yonghong) - Push boundary safety check to a specific arch's trampoline function (in patch 6) (Yonghong). Reuse the WANR_ON_ONCE check in arch_prepare_bpf_trampoline() in x86. - Check module field is 0 in udata in patch 6 (Yonghong) - Check zero holes in patch 6 (Andrii) - s/_btf_vmlinux/btf/ in patch 5 and 7 (Andrii) - s/check_xxx/is_xxx/ in patch 7 (Andrii) - Use "struct_ops/" convention in patch 11 (Andrii) - Use the skel instead of bpf_object in patch 11 (Andrii) - libbpf: Decide BPF_PROG_TYPE_STRUCT_OPS at open phase by using find_sec_def() - libbpf: Avoid a debug message at open phase (Andrii) - libbpf: Add bpf_program__(is\|set)_struct_ops() for consistency (Andrii) - libbpf: Add "struct_ops" to section_defs (Andrii) - libbpf: Some code shuffling in init_kern_struct_ops() (Andrii) - libbpf: A few safety checks (Andrii) v2: - Dropped cubic for now. They will be reposted once there are more clarity in "jiffies" on both bpf side (about the helper) and tcp_cubic side (some of jiffies usages are being replaced by tp->tcp_mstamp) - Remove unnecssary check on bitfield support from btf_struct_access() (Yonghong) - BTF_TYPE_EMIT macro (Yonghong, Andrii) - value_name's length check to avoid an unlikely type match during truncation case (Yonghong) - BUILD_BUG_ON to ensure no trampoline-image overrun in the future (Yonghong) - Simplify get_next_key() (Yonghong) - Added comment to explain how to check mandatory func ptr in net/ipv4/bpf_tcp_ca.c (Yonghong) - Rename "__bpf_" to "bpf_struct_ops_" for value prefix (Andrii) - Add comment to highlight the bpf_dctcp.c is not necessarily the same as tcp_dctcp.c. (Alexei, Eric) - libbpf: Renmae "struct_ops" to ".struct_ops" for elf sec (Andrii) - libbpf: Expose struct_ops as a bpf_map (Andrii) - libbpf: Support multiple struct_ops in SEC(".struct_ops") (Andrii) - libbpf: Add bpf_map__attach_struct_ops() (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-01-09 09:15:25 -08:00
Martin KaFai Lau	09903869f6	bpf: Add bpf_dctcp example This patch adds a bpf_dctcp example. It currently does not do no-ECN fallback but the same could be done through the cgrp2-bpf. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200109003517.3856825-1-kafai@fb.com	2020-01-09 08:46:18 -08:00
Martin KaFai Lau	590a008882	bpf: libbpf: Add STRUCT_OPS support This patch adds BPF STRUCT_OPS support to libbpf. The only sec_name convention is SEC(".struct_ops") to identify the struct_ops implemented in BPF, e.g. To implement a tcp_congestion_ops: SEC(".struct_ops") struct tcp_congestion_ops dctcp = { .init = (void )dctcp_init, / <-- a bpf_prog / / ... some more func prts ... */ .name = "bpf_dctcp", }; Each struct_ops is defined as a global variable under SEC(".struct_ops") as above. libbpf creates a map for each variable and the variable name is the map's name. Multiple struct_ops is supported under SEC(".struct_ops"). In the bpf_object__open phase, libbpf will look for the SEC(".struct_ops") section and find out what is the btf-type the struct_ops is implementing. Note that the btf-type here is referring to a type in the bpf_prog.o's btf. A "struct bpf_map" is added by bpf_object__add_map() as other maps do. It will then collect (through SHT_REL) where are the bpf progs that the func ptrs are referring to. No btf_vmlinux is needed in the open phase. In the bpf_object__load phase, the map-fields, which depend on the btf_vmlinux, are initialized (in bpf_map__init_kern_struct_ops()). It will also set the prog->type, prog->attach_btf_id, and prog->expected_attach_type. Thus, the prog's properties do not rely on its section name. [ Currently, the bpf_prog's btf-type ==> btf_vmlinux's btf-type matching process is as simple as: member-name match + btf-kind match + size match. If these matching conditions fail, libbpf will reject. The current targeting support is "struct tcp_congestion_ops" which most of its members are function pointers. The member ordering of the bpf_prog's btf-type can be different from the btf_vmlinux's btf-type. ] Then, all obj->maps are created as usual (in bpf_object__create_maps()). Once the maps are created and prog's properties are all set, the libbpf will proceed to load all the progs. bpf_map__attach_struct_ops() is added to register a struct_ops map to a kernel subsystem. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200109003514.3856730-1-kafai@fb.com	2020-01-09 08:46:18 -08:00
Martin KaFai Lau	17328d618c	bpf: Synch uapi bpf.h to tools/ This patch sync uapi bpf.h to tools/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200109003512.3856559-1-kafai@fb.com	2020-01-09 08:46:18 -08:00
Martin KaFai Lau	206057fe02	bpf: Add BPF_FUNC_tcp_send_ack helper Add a helper to send out a tcp-ack. It will be used in the later bpf_dctcp implementation that requires to send out an ack when the CE state changed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109004551.3900448-1-kafai@fb.com	2020-01-09 08:46:18 -08:00

1 2 3 4 5 ...

888775 Commits