linux

iv/linux

Author	SHA1	Message	Date
Ilya Leoshkevich	4e9b4a6883	s390/bpf: Use relative long branches Currently maximum JITed code size is limited to 64k, because JIT can emit only relative short branches, whose range is limited by 64k in both directions. Teach JIT to use relative long branches. There are no compare+branch relative long instructions, so using relative long branches consumes more space due to having to having to emit an explicit comparison instruction. Therefore do this only when relative short branch is not enough. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191118180340.68373-2-iii@linux.ibm.com	2019-11-18 19:51:16 -08:00
Colin Ian King	a25ecd9d1e	bpf: Fix memory leak on object 'data' The error return path on when bpf_fentry_test* tests fail does not kfree 'data'. Fix this by adding the missing kfree. Addresses-Coverity: ("Resource leak") Fixes: faeb2dce084a ("bpf: Add kernel test functions for fentry testing") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191118114059.37287-1-colin.king@canonical.com	2019-11-18 19:32:59 -08:00
Yonghong Song	2ea2612b98	selftests, bpf: Workaround an alu32 sub-register spilling issue Currently, with latest llvm trunk, selftest test_progs failed obj file test_seg6_loop.o with the following error in verifier: infinite loop detected at insn 76 The byte code sequence looks like below, and noted that alu32 has been turned off by default for better generated codes in general: 48: w3 = 100 49: (u32 )(r10 - 68) = r3 ... ; if (tlv.type == SR6_TLV_PADDING) { 76: if w3 == 5 goto -18 <LBB0_19> ... 85: r1 = (u32 )(r10 - 68) ; for (int i = 0; i < 100; i++) { 86: w1 += -1 87: if w1 == 0 goto +5 <LBB0_20> 88: (u32 )(r10 - 68) = r1 The main reason for verification failure is due to partial spills at r10 - 68 for induction variable "i". Current verifier only handles spills with 8-byte values. The above 4-byte value spill to stack is treated to STACK_MISC and its content is not saved. For the above example: w3 = 100 R3_w=inv100 fp-64_w=inv1086626730498 (u32 )(r10 - 68) = r3 R3_w=inv100 fp-64_w=inv1086626730498 ... r1 = (u32 )(r10 - 68) R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) fp-64=inv1086626730498 To resolve this issue, verifier needs to be extended to track sub-registers in spilling, or llvm needs to enhanced to prevent sub-register spilling in register allocation phase. The former will increase verifier complexity and the latter will need some llvm "hacking". Let us workaround this issue by declaring the induction variable as "long" type so spilling will happen at non sub-register level. We can revisit this later if sub-register spilling causes similar or other verification issues. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191117214036.1309510-1-yhs@fb.com	2019-11-18 21:37:00 +01:00
Jiri Benc	3b054b7133	selftests, bpf: Fix test_tc_tunnel hanging When run_kselftests.sh is run, it hangs after test_tc_tunnel.sh. The reason is test_tc_tunnel.sh ensures the server ('nc -l') is run all the time, starting it again every time it is expected to terminate. The exception is the final client_connect: the server is not started anymore, which ensures no process is kept running after the test is finished. For a sit test, though, the script is terminated prematurely without the final client_connect and the 'nc' process keeps running. This in turn causes the run_one function in kselftest/runner.sh to hang forever, waiting for the runaway process to finish. Ensure a remaining server is terminated on cleanup. Fixes: f6ad6accaa99 ("selftests/bpf: expand test_tc_tunnel with SIT encap") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/bpf/60919291657a9ee89c708d8aababc28ebe1420be.1573821780.git.jbenc@redhat.com	2019-11-18 21:31:49 +01:00
Jiri Benc	56bf877a50	selftests, bpf: xdping is not meant to be run standalone The actual test to run is test_xdping.sh, which is already in TEST_PROGS. The xdping program alone is not runnable with 'make run_tests', it immediatelly fails due to missing arguments. Move xdping to TEST_GEN_PROGS_EXTENDED in order to be built but not run. Fixes: cd5385029f1d ("selftests/bpf: measure RTT from xdp using xdping") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/4365c81198f62521344c2215909634407184387e.1573821726.git.jbenc@redhat.com	2019-11-18 21:31:45 +01:00
Daniel Borkmann	b97e12e594	Merge branch 'bpf-array-mmap' Andrii Nakryiko says: ==================== This patch set adds ability to memory-map BPF array maps (single- and multi-element). The primary use case is memory-mapping BPF array maps, created to back global data variables, created by libbpf implicitly. This allows for much better usability, along with avoiding syscalls to read or update data completely. Due to memory-mapping requirements, BPF array map that is supposed to be memory-mapped, has to be created with special BPF_F_MMAPABLE attribute, which triggers slightly different memory allocation strategy internally. See patch 1 for details. Libbpf is extended to detect kernel support for this flag, and if supported, will specify it for all global data maps automatically. Patch #1 refactors bpf_map_inc() and converts bpf_map's refcnt to atomic64_t to make refcounting never fail. Patch #2 does similar refactoring for bpf_prog_add()/bpf_prog_inc(). v5->v6: - add back uref counting (Daniel); v4->v5: - change bpf_prog's refcnt to atomic64_t (Daniel); v3->v4: - add mmap's open() callback to fix refcounting (Johannes); - switch to remap_vmalloc_pages() instead of custom fault handler (Johannes); - converted bpf_map's refcnt/usercnt into atomic64_t; - provide default bpf_map_default_vmops handling open/close properly; v2->v3: - change allocation strategy to avoid extra pointer dereference (Jakub); v1->v2: - fix map lookup code generation for BPF_F_MMAPABLE case; - prevent BPF_F_MMAPABLE flag for all but plain array map type; - centralize ref-counting in generic bpf_map_mmap(); - don't use uref counting (Alexei); - use vfree() directly; - print flags with %x (Song); - extend tests to verify bpf_map_{lookup,update}_elem() logic as well. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-11-18 11:42:11 +01:00
Andrii Nakryiko	5051b38452	selftests/bpf: Add BPF_TYPE_MAP_ARRAY mmap() tests Add selftests validating mmap()-ing BPF array maps: both single-element and multi-element ones. Check that plain bpf_map_update_elem() and bpf_map_lookup_elem() work correctly with memory-mapped array. Also convert CO-RE relocation tests to use memory-mapped views of global data. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191117172806.2195367-6-andriin@fb.com	2019-11-18 11:42:00 +01:00
Andrii Nakryiko	7fe74b4362	libbpf: Make global data internal arrays mmap()-able, if possible Add detection of BPF_F_MMAPABLE flag support for arrays and add it as an extra flag to internal global data maps, if supported by kernel. This allows users to memory-map global data and use it without BPF map operations, greatly simplifying user experience. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191117172806.2195367-5-andriin@fb.com	2019-11-18 11:41:59 +01:00
Andrii Nakryiko	fc9702273e	bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY Add ability to memory-map contents of BPF array map. This is extremely useful for working with BPF global data from userspace programs. It allows to avoid typical bpf_map_{lookup,update}_elem operations, improving both performance and usability. There had to be special considerations for map freezing, to avoid having writable memory view into a frozen map. To solve this issue, map freezing and mmap-ing is happening under mutex now: - if map is already frozen, no writable mapping is allowed; - if map has writable memory mappings active (accounted in map->writecnt), map freezing will keep failing with -EBUSY; - once number of writable memory mappings drops to zero, map freezing can be performed again. Only non-per-CPU plain arrays are supported right now. Maps with spinlocks can't be memory mapped either. For BPF_F_MMAPABLE array, memory allocation has to be done through vmalloc() to be mmap()'able. We also need to make sure that array data memory is page-sized and page-aligned, so we over-allocate memory in such a way that struct bpf_array is at the end of a single page of memory with array->value being aligned with the start of the second page. On deallocation we need to accomodate this memory arrangement to free vmalloc()'ed memory correctly. One important consideration regarding how memory-mapping subsystem functions. Memory-mapping subsystem provides few optional callbacks, among them open() and close(). close() is called for each memory region that is unmapped, so that users can decrease their reference counters and free up resources, if necessary. open() is almost symmetrical: it's called for each memory region that is being mapped, except the very first one. So bpf_map_mmap does initial refcnt bump, while open() will do any extra ones after that. Thus number of close() calls is equal to number of open() calls plus one more. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/bpf/20191117172806.2195367-4-andriin@fb.com	2019-11-18 11:41:59 +01:00
Andrii Nakryiko	85192dbf4d	bpf: Convert bpf_prog refcnt to atomic64_t Similarly to bpf_map's refcnt/usercnt, convert bpf_prog's refcnt to atomic64 and remove artificial 32k limit. This allows to make bpf_prog's refcounting non-failing, simplifying logic of users of bpf_prog_add/bpf_prog_inc. Validated compilation by running allyesconfig kernel build. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191117172806.2195367-3-andriin@fb.com	2019-11-18 11:41:59 +01:00
Andrii Nakryiko	1e0bd5a091	bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails 92117d8443bc ("bpf: fix refcnt overflow") turned refcounting of bpf_map into potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit (32k). Due to using 32-bit counter, it's possible in practice to overflow refcounter and make it wrap around to 0, causing erroneous map free, while there are still references to it, causing use-after-free problems. But having a failing refcounting operations are problematic in some cases. One example is mmap() interface. After establishing initial memory-mapping, user is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily splitting it into multiple non-contiguous regions. All this happening without any control from the users of mmap subsystem. Rather mmap subsystem sends notifications to original creator of memory mapping through open/close callbacks, which are optionally specified during initial memory mapping creation. These callbacks are used to maintain accurate refcount for bpf_map (see next patch in this series). The problem is that open() callback is not supposed to fail, because memory-mapped resource is set up and properly referenced. This is posing a problem for using memory-mapping with BPF maps. One solution to this is to maintain separate refcount for just memory-mappings and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively. There are similar use cases in current work on tcp-bpf, necessitating extra counter as well. This seems like a rather unfortunate and ugly solution that doesn't scale well to various new use cases. Another approach to solve this is to use non-failing refcount_t type, which uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX, stays there. This utlimately causes memory leak, but prevents use after free. But given refcounting is not the most performance-critical operation with BPF maps (it's not used from running BPF program code), we can also just switch to 64-bit counter that can't overflow in practice, potentially disadvantaging 32-bit platforms a tiny bit. This simplifies semantics and allows above described scenarios to not worry about failing refcount increment operation. In terms of struct bpf_map size, we are still good and use the same amount of space: BEFORE (3 cache lines, 8 bytes of padding at the end): struct bpf_map { const struct bpf_map_ops * ops __attribute__((__aligned__(64))); /* 0 8 / struct bpf_map inner_map_meta; /* 8 8 / void security; /* 16 8 / enum bpf_map_type map_type; / 24 4 / u32 key_size; / 28 4 / u32 value_size; / 32 4 / u32 max_entries; / 36 4 / u32 map_flags; / 40 4 / int spin_lock_off; / 44 4 / u32 id; / 48 4 / int numa_node; / 52 4 / u32 btf_key_type_id; / 56 4 / u32 btf_value_type_id; / 60 4 / / --- cacheline 1 boundary (64 bytes) --- / struct btf btf; /* 64 8 / struct bpf_map_memory memory; / 72 16 / bool unpriv_array; / 88 1 / bool frozen; / 89 1 / / XXX 38 bytes hole, try to pack / / --- cacheline 2 boundary (128 bytes) --- / atomic_t refcnt __attribute__((__aligned__(64))); / 128 4 / atomic_t usercnt; / 132 4 / struct work_struct work; / 136 32 / char name[16]; / 168 16 / / size: 192, cachelines: 3, members: 21 / / sum members: 146, holes: 1, sum holes: 38 / / padding: 8 / / forced alignments: 2, forced holes: 1, sum forced holes: 38 / } __attribute__((__aligned__(64))); AFTER (same 3 cache lines, no extra padding now): struct bpf_map { const struct bpf_map_ops ops __attribute__((__aligned__(64))); /* 0 8 / struct bpf_map inner_map_meta; /* 8 8 / void security; /* 16 8 / enum bpf_map_type map_type; / 24 4 / u32 key_size; / 28 4 / u32 value_size; / 32 4 / u32 max_entries; / 36 4 / u32 map_flags; / 40 4 / int spin_lock_off; / 44 4 / u32 id; / 48 4 / int numa_node; / 52 4 / u32 btf_key_type_id; / 56 4 / u32 btf_value_type_id; / 60 4 / / --- cacheline 1 boundary (64 bytes) --- / struct btf btf; /* 64 8 / struct bpf_map_memory memory; / 72 16 / bool unpriv_array; / 88 1 / bool frozen; / 89 1 / / XXX 38 bytes hole, try to pack / / --- cacheline 2 boundary (128 bytes) --- / atomic64_t refcnt __attribute__((__aligned__(64))); / 128 8 / atomic64_t usercnt; / 136 8 / struct work_struct work; / 144 32 / char name[16]; / 176 16 / / size: 192, cachelines: 3, members: 21 / / sum members: 154, holes: 1, sum holes: 38 / / forced alignments: 2, forced holes: 1, sum forced holes: 38 */ } __attribute__((__aligned__(64))); This patch, while modifying all users of bpf_map_inc, also cleans up its interface to match bpf_map_put with separate operations for bpf_map_inc and bpf_map_inc_with_uref (to match bpf_map_put and bpf_map_put_with_uref, respectively). Also, given there are no users of bpf_map_inc_not_zero specifying uref=true, remove uref flag and default to uref=false internally. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191117172806.2195367-2-andriin@fb.com	2019-11-18 11:41:59 +01:00
Daniel Borkmann	2893c996d8	Merge branch 'bpf-trampoline' Alexei Starovoitov says: ==================== Introduce BPF trampoline that works as a bridge between kernel functions, BPF programs and other BPF programs. The first use case is fentry/fexit BPF programs that are roughly equivalent to kprobe/kretprobe. Unlike k[ret]probe there is practically zero overhead to call a set of BPF programs before or after kernel function. The second use case is heavily influenced by pain points in XDP development. BPF trampoline allows attaching similar fentry/fexit BPF program to any networking BPF program. It's now possible to see packets on input and output of any XDP, TC, lwt, cgroup programs without disturbing them. This greatly helps BPF-based network troubleshooting. The third use case of BPF trampoline will be explored in the follow up patches. The BPF trampoline will be used to dynamicly link BPF programs. It's more generic mechanism than array and link list of programs used in tracing, networking, cgroups. In many cases it can be used as a replacement for bpf_tail_call-based program chaining. See [1] for long term design discussion. v3 -> v4: - Included Peter's "86/alternatives: Teach text_poke_bp() to emulate instructions" as a first patch. If it changes between now and merge window, I'll rebease to newer version. The patch is necessary to do s/text_poke/text_poke_bp/ in patch 3 to fix the race. - In patch 4 fixed bpf_trampoline creation race spotted by Andrii. - Added patch 15 that annotates prog->kern bpf context types. It made patches 16 and 17 cleaner and more generic. - Addressed Andrii's feedback in other patches. v2 -> v3: - Addressed Song's and Andrii's comments - Fixed few minor bugs discovered while testing - Added one more libbpf patch v1 -> v2: - Addressed Andrii's comments - Added more test for fentry/fexit to kernel functions. Including stress test for maximum number of progs per trampoline. - Fixed a race btf_resolve_helper_id() - Added a patch to compare BTF types of functions arguments with actual types. - Added support for attaching BPF program to another BPF program via trampoline - Converted to use text_poke() API. That's the only viable mechanism to implement BPF-to-BPF attach. BPF-to-kernel attach can be refactored to use register_ftrace_direct() whenever it's available. [1] https://lore.kernel.org/bpf/20191112025112.bhzmrrh2pr76ssnh@ast-mbp.dhcp.thefacebook.com/ ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-11-15 23:49:34 +01:00
Alexei Starovoitov	d6f39601ec	selftests/bpf: Add a test for attaching BPF prog to another BPF prog and subprog Add a test that attaches one FEXIT program to main sched_cls networking program and two other FEXIT programs to subprograms. All three tracing programs access return values and skb->len of networking program and subprograms. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-21-ast@kernel.org	2019-11-15 23:46:09 +01:00
Alexei Starovoitov	4c0963243c	selftests/bpf: Extend test_pkt_access test The test_pkt_access.o is used by multiple tests. Fix its section name so that program type can be automatically detected by libbpf and make it call other subprograms with skb argument. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-20-ast@kernel.org	2019-11-15 23:45:50 +01:00
Alexei Starovoitov	e7bf94dbb8	libbpf: Add support for attaching BPF programs to other BPF programs Extend libbpf api to pass attach_prog_fd into bpf_object__open. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-19-ast@kernel.org	2019-11-15 23:45:37 +01:00
Alexei Starovoitov	5b92a28aae	bpf: Support attaching tracing BPF program to other BPF programs Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type including their subprograms. This feature allows snooping on input and output packets in XDP, TC programs including their return values. In order to do that the verifier needs to track types not only of vmlinux, but types of other BPF programs as well. The verifier also needs to translate uapi/linux/bpf.h types used by networking programs into kernel internal BTF types used by FENTRY/FEXIT BPF programs. In some cases LLVM optimizations can remove arguments from BPF subprograms without adjusting BTF info that LLVM backend knows. When BTF info disagrees with actual types that the verifiers sees the BPF trampoline has to fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT program can still attach to such subprograms, but it won't be able to recognize pointer types like 'struct sk_buff *' and it won't be able to pass them to bpf_skb_output() for dumping packets to user space. The FENTRY/FEXIT program would need to use bpf_probe_read_kernel() instead. The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd points to previously loaded BPF program the attach_btf_id is BTF type id of main function or one of its subprograms. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-18-ast@kernel.org	2019-11-15 23:45:24 +01:00
Alexei Starovoitov	8c1b6e69dc	bpf: Compare BTF types of functions arguments with actual types Make the verifier check that BTF types of function arguments match actual types passed into top-level BPF program and into BPF-to-BPF calls. If types match such BPF programs and sub-programs will have full support of BPF trampoline. If types mismatch the trampoline has to be conservative. It has to save/restore five program arguments and assume 64-bit scalars. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-17-ast@kernel.org	2019-11-15 23:45:02 +01:00
Alexei Starovoitov	91cc1a9974	bpf: Annotate context types Annotate BPF program context types with program-side type and kernel-side type. This type information is used by the verifier. btf_get_prog_ctx_type() is used in the later patches to verify that BTF type of ctx in BPF program matches to kernel expected ctx type. For example, the XDP program type is: BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp, struct xdp_md, struct xdp_buff) That means that XDP program should be written as: int xdp_prog(struct xdp_md *ctx) { ... } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-16-ast@kernel.org	2019-11-15 23:44:48 +01:00
Alexei Starovoitov	9cc31b3a09	bpf: Fix race in btf_resolve_helper_id() btf_resolve_helper_id() caching logic is a bit racy, since under root the verifier can verify several programs in parallel. Fix it with READ/WRITE_ONCE. Fix the type as well, since error is also recorded. Fixes: a7658e1a4164 ("bpf: Check types of arguments passed into helpers") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-15-ast@kernel.org	2019-11-15 23:44:20 +01:00
Alexei Starovoitov	9fd4a39dc7	bpf: Reserve space for BPF trampoline in BPF programs BPF trampoline can be made to work with existing 5 bytes of BPF program prologue, but let's add 5 bytes of NOPs to the beginning of every JITed BPF program to make BPF trampoline job easier. They can be removed in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-14-ast@kernel.org	2019-11-15 23:44:06 +01:00
Alexei Starovoitov	e76d776e9c	selftests/bpf: Add stress test for maximum number of progs Add stress test for maximum number of attached BPF programs per BPF trampoline. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-13-ast@kernel.org	2019-11-15 23:43:53 +01:00
Alexei Starovoitov	510312882c	selftests/bpf: Add combined fentry/fexit test Add a combined fentry/fexit test. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-12-ast@kernel.org	2019-11-15 23:43:41 +01:00
Alexei Starovoitov	d3b0856e59	selftests/bpf: Add fexit tests for BPF trampoline Add fexit tests for BPF trampoline that checks kernel functions with up to 6 arguments of different sizes and their return values. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-11-ast@kernel.org	2019-11-15 23:43:28 +01:00
Alexei Starovoitov	11d1e2eeff	selftests/bpf: Add test for BPF trampoline Add sanity test for BPF trampoline that checks kernel functions with up to 6 arguments of different sizes. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-10-ast@kernel.org	2019-11-15 23:43:15 +01:00
Alexei Starovoitov	faeb2dce08	bpf: Add kernel test functions for fentry testing Add few kernel functions with various number of arguments, their types and sizes for BPF trampoline testing to cover different calling conventions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-9-ast@kernel.org	2019-11-15 23:43:01 +01:00
Alexei Starovoitov	e41074d39d	selftest/bpf: Simple test for fentry/fexit Add simple test for fentry and fexit programs around eth_type_trans. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-8-ast@kernel.org	2019-11-15 23:42:46 +01:00
Alexei Starovoitov	b8c54ea455	libbpf: Add support to attach to fentry/fexit tracing progs Teach libbpf to recognize tracing programs types and attach them to fentry/fexit. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-7-ast@kernel.org	2019-11-15 23:42:31 +01:00
Alexei Starovoitov	1442e2871b	libbpf: Introduce btf__find_by_name_kind() Introduce btf__find_by_name_kind() helper to search BTF by name and kind, since name alone can be ambiguous. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-6-ast@kernel.org	2019-11-15 23:42:14 +01:00
Alexei Starovoitov	fec56f5890	bpf: Introduce BPF trampoline Introduce BPF trampoline concept to allow kernel code to call into BPF programs with practically zero overhead. The trampoline generation logic is architecture dependent. It's converting native calling convention into BPF calling convention. BPF ISA is 64-bit (even on 32-bit architectures). The registers R1 to R5 are used to pass arguments into BPF functions. The main BPF program accepts only single argument "ctx" in R1. Whereas CPU native calling convention is different. x86-64 is passing first 6 arguments in registers and the rest on the stack. x86-32 is passing first 3 arguments in registers. sparc64 is passing first 6 in registers. And so on. The trampolines between BPF and kernel already exist. BPF_CALL_x macros in include/linux/filter.h statically compile trampolines from BPF into kernel helpers. They convert up to five u64 arguments into kernel C pointers and integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On 32-bit architecture they're meaningful. The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and __bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert kernel function arguments into array of u64s that BPF program consumes via R1=ctx pointer. This patch set is doing the same job as __bpf_trace_##call() static trampolines, but dynamically for any kernel function. There are ~22k global kernel functions that are attachable via nop at function entry. The function arguments and types are described in BTF. The job of btf_distill_func_proto() function is to extract useful information from BTF into "function model" that architecture dependent trampoline generators will use to generate assembly code to cast kernel function arguments into array of u64s. For example the kernel function eth_type_trans has two pointers. They will be casted to u64 and stored into stack of generated trampoline. The pointer to that stack space will be passed into BPF program in R1. On x86-64 such generated trampoline will consume 16 bytes of stack and two stores of %rdi and %rsi into stack. The verifier will make sure that only two u64 are accessed read-only by BPF program. The verifier will also recognize the precise type of the pointers being accessed and will not allow typecasting of the pointer to a different type within BPF program. The tracing use case in the datacenter demonstrated that certain key kernel functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always active. Other functions have both kprobe and kretprobe. So it is essential to keep both kernel code and BPF programs executing at maximum speed. Hence generated BPF trampoline is re-generated every time new program is attached or detached to maintain maximum performance. To avoid the high cost of retpoline the attached BPF programs are called directly. __bpf_prog_enter/exit() are used to support per-program execution stats. In the future this logic will be optimized further by adding support for bpf_stats_enabled_key inside generated assembly code. Introduction of preemptible and sleepable BPF programs will completely remove the need to call to __bpf_prog_enter/exit(). Detach of a BPF program from the trampoline should not fail. To avoid memory allocation in detach path the half of the page is used as a reserve and flipped after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly which is enough for BPF tracing use cases. This limit can be increased in the future. BPF_TRACE_FENTRY programs have access to raw kernel function arguments while BPF_TRACE_FEXIT programs have access to kernel return value as well. Often kprobe BPF program remembers function arguments in a map while kretprobe fetches arguments from a map and analyzes them together with return value. BPF_TRACE_FEXIT accelerates this typical use case. Recursion prevention for kprobe BPF programs is done via per-cpu bpf_prog_active counter. In practice that turned out to be a mistake. It caused programs to randomly skip execution. The tracing tools missed results they were looking for. Hence BPF trampoline doesn't provide builtin recursion prevention. It's a job of BPF program itself and will be addressed in the follow up patches. BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases in the future. For example to remove retpoline cost from XDP programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org	2019-11-15 23:41:51 +01:00
Alexei Starovoitov	5964b2000f	bpf: Add bpf_arch_text_poke() helper Add bpf_arch_text_poke() helper that is used by BPF trampoline logic to patch nops/calls in kernel text into calls into BPF trampoline and to patch calls/nops inside BPF programs too. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-4-ast@kernel.org	2019-11-15 23:41:28 +01:00
Alexei Starovoitov	3b2744e665	bpf: Refactor x86 JIT into helpers Refactor x86 JITing of LDX, STX, CALL instructions into separate helper functions. No functional changes in LDX and STX helpers. There is a minor change in CALL helper. It will populate target address correctly on the first pass of JIT instead of second pass. That won't reduce total number of JIT passes though. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-3-ast@kernel.org	2019-11-15 23:41:06 +01:00
Peter Zijlstra	c3d6324f84	x86/alternatives: Teach text_poke_bp() to emulate instructions In preparation for static_call and variable size jump_label support, teach text_poke_bp() to emulate instructions, namely: JMP32, JMP8, CALL, NOP2, NOP_ATOMIC5, INT3 The current text_poke_bp() takes a @handler argument which is used as a jump target when the temporary INT3 is hit by a different CPU. When patching CALL instructions, this doesn't work because we'd miss the PUSH of the return address. Instead, teach poke_int3_handler() to emulate an instruction, typically the instruction we're patching in. This fits almost all text_poke_bp() users, except arch_unoptimize_kprobe() which restores random text, and for that site we have to build an explicit emulate instruction. Tested-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191111132457.529086974@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 8c7eebc10687af45ac8e40ad1bac0cf7893dba9f) Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-11-15 14:07:01 -08:00
Mao Wenan	808c9f7ebf	bpf, doc: Change right arguments for JIT example code The example code for the x86_64 JIT uses the wrong arguments when calling function bar(). Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191114034351.162740-1-maowenan@huawei.com	2019-11-15 22:36:35 +01:00
Andre Guedes	b313332980	samples/bpf: Add missing option to xdpsock usage Commit 743e568c1586 (samples/bpf: Add a "force" flag to XDP samples) introduced the '-F' option but missed adding it to the usage() and the 'long_option' array. Fixes: 743e568c1586 (samples/bpf: Add a "force" flag to XDP samples) Signed-off-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191114162847.221770-2-andre.guedes@intel.com	2019-11-15 22:32:10 +01:00
Andre Guedes	110b2263db	samples/bpf: Remove duplicate option from xdpsock The '-f' option is shown twice in the usage(). This patch removes the outdated version. Signed-off-by: Andre Guedes <andre.guedes@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191114162847.221770-1-andre.guedes@intel.com	2019-11-15 22:29:17 +01:00
Ilya Leoshkevich	fcf3513139	s390/bpf: Make sure JIT passes do not increase code size The upcoming s390 branch length extension patches rely on "passes do not increase code size" property in order to consistently choose between short and long branches. Currently this property does not hold between the first and the second passes for register save/restore sequences, as well as various code fragments that depend on SEEN_* flags. Generate the code during the first pass conservatively: assume register save/restore sequences have the maximum possible length, and that all SEEN_* flags are set. Also refuse to JIT if this happens anyway (e.g. due to a bug), as this might lead to verifier bypass once long branches are introduced. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191114151820.53222-1-iii@linux.ibm.com	2019-11-15 22:25:54 +01:00
Ilya Leoshkevich	b7b3fc8dd9	bpf: Support doubleword alignment in bpf_jit_binary_alloc Currently passing alignment greater than 4 to bpf_jit_binary_alloc does not work: in such cases it silently aligns only to 4 bytes. On s390, in order to load a constant from memory in a large (>512k) BPF program, one must use lgrl instruction, whose memory operand must be aligned on an 8-byte boundary. This patch makes it possible to request 8-byte alignment from bpf_jit_binary_alloc, and also makes it issue a warning when an unsupported alignment is requested. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191115123722.58462-1-iii@linux.ibm.com	2019-11-15 22:25:00 +01:00
Anders Roxell	e47a179997	bpf, testing: Add missing object file to TEST_FILES When installing kselftests to its own directory and run the test_lwt_ip_encap.sh it will complain that test_lwt_ip_encap.o can't be found. Same with the test_tc_edt.sh test it will complain that test_tc_edt.o can't be found. $ ./test_lwt_ip_encap.sh starting egress IPv4 encap test Error opening object test_lwt_ip_encap.o: No such file or directory Object hashing failed! Cannot initialize ELF context! Failed to parse eBPF program: Invalid argument Rework to add test_lwt_ip_encap.o and test_tc_edt.o to TEST_FILES so the object file gets installed when installing kselftest. Fixes: 74b5a5968fe8 ("selftests/bpf: Replace test_progs and test_maps w/ general rule") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191111161728.8854-1-anders.roxell@linaro.org	2019-11-11 22:35:23 +01:00
Yonghong Song	b7a0d65d80	bpf, testing: Workaround a verifier failure for test_progs With latest llvm compiler, running test_progs will have the following verifier failure for test_sysctl_loop1.o: libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: invalid indirect read from stack var_off (0x0; 0xff)+196 size 7 ... libbpf: -- END LOG -- libbpf: failed to load program 'cgroup/sysctl' libbpf: failed to load object 'test_sysctl_loop1.o' The related bytecode looks as below: 0000000000000308 LBB0_8: 97: r4 = r10 98: r4 += -288 99: r4 += r7 100: w8 &= 255 101: r1 = r10 102: r1 += -488 103: r1 += r8 104: r2 = 7 105: r3 = 0 106: call 106 107: w1 = w0 108: w1 += -1 109: if w1 > 6 goto -24 <LBB0_5> 110: w0 += w8 111: r7 += 8 112: w8 = w0 113: if r7 != 224 goto -17 <LBB0_8> And source code: for (i = 0; i < ARRAY_SIZE(tcp_mem); ++i) { ret = bpf_strtoul(value + off, MAX_ULONG_STR_LEN, 0, tcp_mem + i); if (ret <= 0 \|\| ret > MAX_ULONG_STR_LEN) return 0; off += ret & MAX_ULONG_STR_LEN; } Current verifier is not able to conclude that register w0 before '+' at insn 110 has a range of 1 to 7 and thinks it is from 0 - 255. This leads to more conservative range for w8 at insn 112, and later verifier complaint. Let us workaround this issue until we found a compiler and/or verifier solution. The workaround in this patch is to make variable 'ret' volatile, which will force a reload and then '&' operation to ensure better value range. With this patch, I got the below byte code for the loop: 0000000000000328 LBB0_9: 101: r4 = r10 102: r4 += -288 103: r4 += r7 104: w8 &= 255 105: r1 = r10 106: r1 += -488 107: r1 += r8 108: r2 = 7 109: r3 = 0 110: call 106 111: (u32 )(r10 - 64) = r0 112: r1 = (u32 )(r10 - 64) 113: if w1 s< 1 goto -28 <LBB0_5> 114: r1 = (u32 )(r10 - 64) 115: if w1 s> 7 goto -30 <LBB0_5> 116: r1 = (u32 )(r10 - 64) 117: w1 &= 7 118: w1 += w8 119: r7 += 8 120: w8 = w1 121: if r7 != 224 goto -21 <LBB0_9> Insn 117 did the '&' operation and we got more precise value range for 'w8' at insn 120. The test is happy then: #3/17 test_sysctl_loop1.o:OK Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191107170045.2503480-1-yhs@fb.com	2019-11-11 14:03:10 +01:00
Alexei Starovoitov	0d2ec5b51d	Merge branch 'share-umem' Magnus Karlsson says: ==================== This patch set extends libbpf and the xdpsock sample program to demonstrate the shared umem mode (XDP_SHARED_UMEM) as well as Rx-only and Tx-only sockets. This in order for users to have an example to use as a blue print and also so that these modes will be exercised more frequently. Note that the user needs to supply an XDP program with the XDP_SHARED_UMEM mode that distributes the packets over the sockets according to some policy. There is an example supplied with the xdpsock program, but there is no default one in libbpf similarly to when XDP_SHARED_UMEM is not used. The reason for this is that I felt that supplying one that would work for all users in this mode is futile. There are just tons of ways to distribute packets, so whatever I come up with and build into libbpf would be wrong in most cases. This patch has been applied against commit 30ee348c1267 ("Merge branch 'bpf-libbpf-fixes'") Structure of the patch set: Patch 1: Adds shared umem support to libbpf Patch 2: Shared umem support and example XPD program added to xdpsock sample Patch 3: Adds Rx-only and Tx-only support to libbpf Patch 4: Uses Rx-only sockets for rxdrop and Tx-only sockets for txpush in the xdpsock sample Patch 5: Add documentation entries for these two features ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-11-10 19:30:51 -08:00
Magnus Karlsson	57afa8b0cf	xsk: Extend documentation for Rx\|Tx-only sockets and shared umems Add more documentation about the new Rx-only and Tx-only sockets in libbpf and also how libbpf can now support shared umems. Also found two pieces that could be improved in the text, that got fixed in this commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1573148860-30254-6-git-send-email-magnus.karlsson@intel.com	2019-11-10 19:30:46 -08:00
Magnus Karlsson	661842c46d	samples/bpf: Use Rx-only and Tx-only sockets in xdpsock Use Rx-only sockets for the rxdrop sample and Tx-only sockets for the txpush sample in the xdpsock application. This so that we exercise and show case these socket types too. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1573148860-30254-5-git-send-email-magnus.karlsson@intel.com	2019-11-10 19:30:46 -08:00
Magnus Karlsson	a68977d269	libbpf: Allow for creating Rx or Tx only AF_XDP sockets The libbpf AF_XDP code is extended to allow for the creation of Rx only or Tx only sockets. Previously it returned an error if the socket was not initialized for both Rx and Tx. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1573148860-30254-4-git-send-email-magnus.karlsson@intel.com	2019-11-10 19:30:46 -08:00
Magnus Karlsson	2e5d72c15f	samples/bpf: Add XDP_SHARED_UMEM support to xdpsock Add support for the XDP_SHARED_UMEM mode to the xdpsock sample application. As libbpf does not have a built in XDP program for this mode, we use an explicitly loaded XDP program. This also serves as an example on how to write your own XDP program that can route to an AF_XDP socket. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1573148860-30254-3-git-send-email-magnus.karlsson@intel.com	2019-11-10 19:30:45 -08:00
Magnus Karlsson	cbf07409d0	libbpf: Support XDP_SHARED_UMEM with external XDP program Add support in libbpf to create multiple sockets that share a single umem. Note that an external XDP program need to be supplied that routes the incoming traffic to the desired sockets. So you need to supply the libbpf_flag XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD and load your own XDP program. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1573148860-30254-2-git-send-email-magnus.karlsson@intel.com	2019-11-10 19:30:45 -08:00
Alexei Starovoitov	472aeb386e	Merge branch 'map-pinning' Toke Høiland-Jørgensen says: ==================== This series fixes a few bugs in libbpf that I discovered while playing around with the new auto-pinning code, and writing the first utility in xdp-tools[0]: - If object loading fails, libbpf does not clean up the pinnings created by the auto-pinning mechanism. - EPERM is not propagated to the caller on program load - Netlink functions write error messages directly to stderr In addition, libbpf currently only has a somewhat limited getter function for XDP link info, which makes it impossible to discover whether an attached program is in SKB mode or not. So the last patch in the series adds a new getter for XDP link info which returns all the information returned via netlink (and which can be extended later). Finally, add a getter for BPF program size, which can be used by the caller to estimate the amount of locked memory needed to load a program. A selftest is added for the pinning change, while the other features were tested in the xdp-filter tool from the xdp-tools repo. The 'new-libbpf-features' branch contains the commits that make use of the new XDP getter and the corrected EPERM error code. [0] https://github.com/xdp-project/xdp-tools Changelog: v4: - Don't do any size checks on struct xdp_info, just copy (and/or zero) whatever size the caller supplied. v3: - Pass through all kernel error codes on program load (instead of just EPERM). - No new bpf_object__unload() variant, just do the loop at the caller - Don't reject struct xdp_info sizes that are bigger than what we expect. - Add a comment noting that bpf_program__size() returns the size in bytes v2: - Keep function names in libbpf.map sorted properly ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-11-10 19:26:35 -08:00
Toke Høiland-Jørgensen	1a734efe06	libbpf: Add getter for program size This adds a new getter for the BPF program size (in bytes). This is useful for a caller that is trying to predict how much memory will be locked by loading a BPF object into the kernel. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/157333185272.88376.10996937115395724683.stgit@toke.dk	2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen	473f4e133a	libbpf: Add bpf_get_link_xdp_info() function to get more XDP information Currently, libbpf only provides a function to get a single ID for the XDP program attached to the interface. However, it can be useful to get the full set of program IDs attached, along with the attachment mode, in one go. Add a new getter function to support this, using an extendible structure to carry the information. Express the old bpf_get_link_id() function in terms of the new function. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157333185164.88376.7520653040667637246.stgit@toke.dk	2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen	b6e99b010e	libbpf: Use pr_warn() when printing netlink errors The netlink functions were using fprintf(stderr, ) directly to print out error messages, instead of going through the usual logging macros. This makes it impossible for the calling application to silence or redirect those error messages. Fix this by switching to pr_warn() in nlattr.c and netlink.c. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/157333185055.88376.15999360127117901443.stgit@toke.dk	2019-11-10 19:26:30 -08:00
Toke Høiland-Jørgensen	4f33ddb4e3	libbpf: Propagate EPERM to caller on program load When loading an eBPF program, libbpf overrides the return code for EPERM errors instead of returning it to the caller. This makes it hard to figure out what went wrong on load. In particular, EPERM is returned when the system rlimit is too low to lock the memory required for the BPF program. Previously, this was somewhat obscured because the rlimit error would be hit on map creation (which does return it correctly). However, since maps can now be reused, object load can proceed all the way to loading programs without hitting the error; propagating it even in this case makes it possible for the caller to react appropriately (and, e.g., attempt to raise the rlimit before retrying). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/157333184946.88376.11768171652794234561.stgit@toke.dk	2019-11-10 19:26:30 -08:00

1 2 3 4 5 ...

873557 Commits