linux

iv/linux

Author	SHA1	Message	Date
Yonghong Song	fce557bcef	bpf: Make btf_sock_ids global tcp and udp bpf_iter can reuse some socket ids in btf_sock_ids, so make it global. I put the extern definition in btf_ids.h as a central place so it can be easily discovered by developers. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163402.1393427-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Yonghong Song	0f12e584b2	bpf: Add BTF_ID_LIST_GLOBAL in btf_ids.h Existing BTF_ID_LIST used a local static variable to store btf_ids. This patch provided a new macro BTF_ID_LIST_GLOBAL to store btf_ids in a global variable which can be shared among multiple files. The existing BTF_ID_LIST is still retained. Two reasons. First, BTF_ID_LIST is also used to build btf_ids for helper arguments which typically is an array of 5. Since typically different helpers have different signature, it makes little sense to share them. Second, some current computed btf_ids are indeed local. If later those btf_ids are shared between different files, they can use BTF_ID_LIST_GLOBAL then. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/bpf/20200720163401.1393159-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Yonghong Song	d8dfe5bfe8	tools/bpf: Sync btf_ids.h to tools Sync kernel header btf_ids.h to tools directory. Also define macro CONFIG_DEBUG_INFO_BTF before including btf_ids.h in prog_tests/resolve_btfids.c since non-stub definitions for BTF_ID_LIST etc. macros are defined under CONFIG_DEBUG_INFO_BTF. This prevented test_progs from failing. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163359.1393079-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Yonghong Song	bc4f0548f6	bpf: Compute bpf_skc_to_() helper socket btf ids at build time Currently, socket types (struct tcp_sock, udp_sock, etc.) used by bpf_skc_to_() helpers are computed when vmlinux_btf is first built in the kernel. Commit 5a2798ab32ba ("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros") implemented a mechanism to compute btf_ids at kernel build time which can simplify kernel implementation and reduce runtime overhead by removing in-kernel btf_id calculation. This patch did exactly this, removing in-kernel btf_id computation and utilizing build-time btf_id computation. If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will define an array with size of 5, which is not enough for btf_sock_ids. So define its own static array if CONFIG_DEBUG_INFO_BTF is not defined. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com	2020-07-21 13:26:26 -07:00
Ilya Leoshkevich	e4d9c23207	samples/bpf, selftests/bpf: Use bpf_probe_read_kernel A handful of samples and selftests fail to build on s390, because after commit 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") bpf_probe_read is not available anymore. Fix by using bpf_probe_read_kernel. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720114806.88823-1-iii@linux.ibm.com	2020-07-21 13:26:26 -07:00
Ilya Leoshkevich	6bd557275a	selftests/bpf: Fix test_lwt_seg6local.sh hangs OpenBSD netcat (Debian patchlevel 1.195-2) does not seem to react to SIGINT for whatever reason, causing prefix.pl to hang after test_lwt_seg6local.sh exits due to netcat inheriting test_lwt_seg6local.sh's file descriptors. Fix by using SIGTERM instead. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720101810.84299-1-iii@linux.ibm.com	2020-07-21 13:26:26 -07:00
Alexei Starovoitov	495436c1f9	Merge branch 'compressed-JITed-insn' Luke Nelson says: ==================== his patch series enables using compressed riscv (RVC) instructions in the rv64 BPF JIT. RVC is a standard riscv extension that adds a set of compressed, 2-byte instructions that can replace some regular 4-byte instructions for improved code density. This series first modifies the JIT to support using 2-byte instructions (e.g., in jump offset computations), then adds RVC encoding and helper functions, and finally uses the helper functions to optimize the rv64 JIT. I used our formal verification framework, Serval, to verify the correctness of the RVC encodings and their uses in the rv64 JIT. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra RFC -> v1: - From Björn Töpel: * Changed RVOFF macro to static inline "ninsns_rvoff". * Changed return type of rvc_ functions from u32 to u16. * Changed sizeof(u16) to sizeof(ctx->insns). Factored unsigned immediate checks into helper functions (is_8b_uint, etc.) * Changed to use IS_ENABLED instead of #ifdef to check if RVC is enabled. * Changed type of immediate arguments to rvc_* encoding to u32 to avoid issues from promotion of u16 to signed int. * Cleaned up RVC checks in emit_{addi,slli,srli,srai}. + Wrapped lines at 100 instead of 80 columns for increased clarity. + Move !imm checks into each branch instead of checking separately. + Strengthed checks for c.{slli,srli,srai} to check that imm < XLEN. Otherwise, imm could be non-zero but the lower XLEN bits could all be zero, leading to invalid RVC encoding. * Changed emit_imm to sign-extend the 12-bit value in "lower" + The immediate checks for emit_{addiw,li,addi} use signed comparisons, so this enables the RVC variants to be used more often (e.g., if val == -1, then lower should be -1 as opposed to 4095). ==================== Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-07-21 13:26:26 -07:00
YueHaibing	956fcfcd35	tools/bpftool: Fix error handing in do_skeleton() Fix pass 0 to PTR_ERR, also dump more err info using libbpf_strerror. Fixes: 5dc7a8b21144 ("bpftool, selftests/bpf: Embed object file inside skeleton") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200717123059.29624-1-yuehaibing@huawei.com	2020-07-21 13:26:25 -07:00
Luke Nelson	18a4d8c97b	bpf, riscv: Use compressed instructions in the rv64 JIT This patch uses the RVC support and encodings from bpf_jit.h to optimize the rv64 jit. The optimizations work by replacing emit(rv_X(...)) with a call to a helper function emit_X, which will emit a compressed version of the instruction when possible, and when RVC is enabled. The JIT continues to pass all tests in lib/test_bpf.c, and introduces no new failures to test_verifier; both with and without RVC being enabled. Most changes are straightforward replacements of emit(rv_X(...), ctx) with emit_X(..., ctx), with the following exceptions bearing mention; * Change emit_imm to sign-extend the value in "lower", since the checks for RVC (and the instructions themselves) treat the value as signed. Otherwise, small negative immediates will not be recognized as encodable using an RVC instruction. For example, without this change, emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a 6b int even though a "c.li rd, -1" instruction suffices. * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit cases since the values are zero-extended into the upper 32 bits in the following instructions anyways, and the addition commutes with zero-extension. (BPF_SUB BPF_X must still use subw since subtraction does not commute with zero-extension.) This patch avoids optimizing branches and jumps to use RVC instructions since surrounding code often makes assumptions about the sizes of emitted instructions. Optimizing these will require changing these functions (e.g., emit_branch) to dynamically compute jump offsets. The following are examples of the JITed code for the verifier selftest "direct packet read test#3 for CGROUP_SKB OK", without and with RVC enabled, respectively. The former uses 178 bytes, and the latter uses 112, for a ~37% reduction in code size for this example. Without RVC: 0: 02000813 addi a6,zero,32 4: fd010113 addi sp,sp,-48 8: 02813423 sd s0,40(sp) c: 02913023 sd s1,32(sp) 10: 01213c23 sd s2,24(sp) 14: 01313823 sd s3,16(sp) 18: 01413423 sd s4,8(sp) 1c: 03010413 addi s0,sp,48 20: 03056683 lwu a3,48(a0) 24: 02069693 slli a3,a3,0x20 28: 0206d693 srli a3,a3,0x20 2c: 03456703 lwu a4,52(a0) 30: 02071713 slli a4,a4,0x20 34: 02075713 srli a4,a4,0x20 38: 03856483 lwu s1,56(a0) 3c: 02049493 slli s1,s1,0x20 40: 0204d493 srli s1,s1,0x20 44: 03c56903 lwu s2,60(a0) 48: 02091913 slli s2,s2,0x20 4c: 02095913 srli s2,s2,0x20 50: 04056983 lwu s3,64(a0) 54: 02099993 slli s3,s3,0x20 58: 0209d993 srli s3,s3,0x20 5c: 09056a03 lwu s4,144(a0) 60: 020a1a13 slli s4,s4,0x20 64: 020a5a13 srli s4,s4,0x20 68: 00900313 addi t1,zero,9 6c: 006a7463 bgeu s4,t1,0x74 70: 00000a13 addi s4,zero,0 74: 02d52823 sw a3,48(a0) 78: 02e52a23 sw a4,52(a0) 7c: 02952c23 sw s1,56(a0) 80: 03252e23 sw s2,60(a0) 84: 05352023 sw s3,64(a0) 88: 00000793 addi a5,zero,0 8c: 02813403 ld s0,40(sp) 90: 02013483 ld s1,32(sp) 94: 01813903 ld s2,24(sp) 98: 01013983 ld s3,16(sp) 9c: 00813a03 ld s4,8(sp) a0: 03010113 addi sp,sp,48 a4: 00078513 addi a0,a5,0 a8: 00008067 jalr zero,0(ra) With RVC: 0: 02000813 addi a6,zero,32 4: 7179 c.addi16sp sp,-48 6: f422 c.sdsp s0,40(sp) 8: f026 c.sdsp s1,32(sp) a: ec4a c.sdsp s2,24(sp) c: e84e c.sdsp s3,16(sp) e: e452 c.sdsp s4,8(sp) 10: 1800 c.addi4spn s0,sp,48 12: 03056683 lwu a3,48(a0) 16: 1682 c.slli a3,0x20 18: 9281 c.srli a3,0x20 1a: 03456703 lwu a4,52(a0) 1e: 1702 c.slli a4,0x20 20: 9301 c.srli a4,0x20 22: 03856483 lwu s1,56(a0) 26: 1482 c.slli s1,0x20 28: 9081 c.srli s1,0x20 2a: 03c56903 lwu s2,60(a0) 2e: 1902 c.slli s2,0x20 30: 02095913 srli s2,s2,0x20 34: 04056983 lwu s3,64(a0) 38: 1982 c.slli s3,0x20 3a: 0209d993 srli s3,s3,0x20 3e: 09056a03 lwu s4,144(a0) 42: 1a02 c.slli s4,0x20 44: 020a5a13 srli s4,s4,0x20 48: 4325 c.li t1,9 4a: 006a7363 bgeu s4,t1,0x50 4e: 4a01 c.li s4,0 50: d914 c.sw a3,48(a0) 52: d958 c.sw a4,52(a0) 54: dd04 c.sw s1,56(a0) 56: 03252e23 sw s2,60(a0) 5a: 05352023 sw s3,64(a0) 5e: 4781 c.li a5,0 60: 7422 c.ldsp s0,40(sp) 62: 7482 c.ldsp s1,32(sp) 64: 6962 c.ldsp s2,24(sp) 66: 69c2 c.ldsp s3,16(sp) 68: 6a22 c.ldsp s4,8(sp) 6a: 6145 c.addi16sp sp,48 6c: 853e c.mv a0,a5 6e: 8082 c.jr ra Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com	2020-07-21 13:26:25 -07:00
Ian Rogers	da7a35062b	libbpf bpf_helpers: Use __builtin_offsetof for offsetof The non-builtin route for offsetof has a dependency on size_t from stdlib.h/stdint.h that is undeclared and may break targets. The offsetof macro in bpf_helpers may disable the same macro in other headers that have a #ifdef offsetof guard. Rather than add additional dependencies improve the offsetof macro declared here to use the builtin that is available since llvm 3.7 (the first with a BPF backend). Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200720061741.1514673-1-irogers@google.com	2020-07-21 13:26:25 -07:00
Luke Nelson	804ec72c68	bpf, riscv: Add encodings for compressed instructions This patch adds functions for encoding and emitting compressed riscv (RVC) instructions to the BPF JIT. Some regular riscv instructions can be compressed into an RVC instruction if the instruction fields meet some requirements. For example, "add rd, rs1, rs2" can be compressed into "c.add rd, rs2" when rd == rs1. To make using RVC encodings simpler, this patch also adds helper functions that selectively emit either a regular instruction or a compressed instruction if possible. For example, emit_add will produce a "c.add" if possible and regular "add" otherwise. Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200721025241.8077-3-luke.r.nels@gmail.com	2020-07-21 13:26:25 -07:00
Ilya Leoshkevich	94ad428df5	s390/bpf: Use bpf_skip() in bpf_jit_prologue() Now that we have bpf_skip() for emitting nops, use it in bpf_jit_prologue() in order to reduce code duplication. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717165326.6786-6-iii@linux.ibm.com	2020-07-21 13:26:25 -07:00
Luke Nelson	bfabff3cb0	bpf, riscv: Modify JIT ctx to support compressed instructions This patch makes the necessary changes to struct rv_jit_context and to bpf_int_jit_compile to support compressed riscv (RVC) instructions in the BPF JIT. It changes the JIT image to be u16 instead of u32, since RVC instructions are 2 bytes as opposed to 4. It also changes ctx->offset and ctx->ninsns to refer to 2-byte instructions rather than 4-byte ones. The riscv PC is required to be 16-bit aligned with or without RVC, so this is sufficient to refer to any valid riscv offset. The code for computing jump offsets in bytes is updated accordingly, and factored into a new "ninsns_rvoff" function to simplify the code. Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200721025241.8077-2-luke.r.nels@gmail.com	2020-07-21 13:26:25 -07:00
Ilya Leoshkevich	1491b73311	s390/bpf: Tolerate not converging code shrinking "BPF_MAXINSNS: Maximum possible literals" unnecessarily falls back to the interpreter because of failing sanity check in bpf_set_addr. The problem is that there are a lot of branches that can be shrunk, and doing so opens up the possibility to shrink even more. This process does not converge after 3 passes, causing code offsets to change during the codegen pass, which must never happen. Fix by inserting nops during codegen pass in order to preserve code offets. Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717165326.6786-5-iii@linux.ibm.com	2020-07-21 13:26:25 -07:00
Ilya Leoshkevich	5fa6974471	s390/bpf: Use brcl for jumping to exit_ip if necessary "BPF_MAXINSNS: Maximum possible literals" test causes panic with bpf_jit_harden = 2. The reason is that BPF_JMP \| BPF_EXIT is always emitted as brc, however, after removal of JITed image size limitations, brcl might be required. Fix by using brcl when necessary. Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717165326.6786-4-iii@linux.ibm.com	2020-07-21 13:26:25 -07:00
Ilya Leoshkevich	7477d43be5	s390/bpf: Fix sign extension in branch_ku Both signed and unsigned variants of BPF_JMP \| BPF_K require sign-extending the immediate. JIT emits cgfi for the signed case, which is correct, and clgfi for the unsigned case, which is not correct: clgfi zero-extends the immediate. s390 does not provide an instruction that does sign-extension and unsigned comparison at the same time. Therefore, fix by first loading the sign-extended immediate into work register REG_1 and proceeding as if it's BPF_X. Fixes: 4e9b4a6883dd ("s390/bpf: Use relative long branches") Reported-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Seth Forshee <seth.forshee@canonical.com> Link: https://lore.kernel.org/bpf/20200717165326.6786-3-iii@linux.ibm.com	2020-07-21 13:26:24 -07:00
Ilya Leoshkevich	2ea4859807	selftests: bpf: test_kmod.sh: Fix running out of srctree When running out of srctree, relative path to lib/test_bpf.ko is different than when running in srctree. Check $building_out_of_srctree environment variable and use a different relative path if needed. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717165326.6786-2-iii@linux.ibm.com	2020-07-21 13:26:24 -07:00
Lorenzo Bianconi	c576b9c77b	bpf: cpumap: Fix possible rcpu kthread hung Fix the following cpumap kthread hung. The issue is currently occurring when __cpu_map_load_bpf_program fails (e.g if the bpf prog has not BPF_XDP_CPUMAP as expected_attach_type) $./test_progs -n 101 101/1 cpumap_with_progs:OK 101 xdp_cpumap_attach:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED [ 369.996478] INFO: task cpumap/0/map:7:205 blocked for more than 122 seconds. [ 369.998463] Not tainted 5.8.0-rc4-01472-ge57892f50a07 #212 [ 370.000102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 370.001918] cpumap/0/map:7 D 0 205 2 0x00004000 [ 370.003228] Call Trace: [ 370.003930] __schedule+0x5c7/0xf50 [ 370.004901] ? io_schedule_timeout+0xb0/0xb0 [ 370.005934] ? static_obj+0x31/0x80 [ 370.006788] ? mark_held_locks+0x24/0x90 [ 370.007752] ? cpu_map_bpf_prog_run_xdp+0x6c0/0x6c0 [ 370.008930] schedule+0x6f/0x160 [ 370.009728] schedule_preempt_disabled+0x14/0x20 [ 370.010829] kthread+0x17b/0x240 [ 370.011433] ? kthread_create_worker_on_cpu+0xd0/0xd0 [ 370.011944] ret_from_fork+0x1f/0x30 [ 370.012348] Showing all locks held in the system: [ 370.013025] 1 lock held by khungtaskd/33: [ 370.013432] #0: ffffffff82b24720 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x28/0x1c3 [ 370.014461] ============================================= Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") Reported-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/e54f2aabf959f298939e5507b09c48f8c2e380be.1595170625.git.lorenzo@kernel.org	2020-07-21 09:15:28 -07:00
Jakub Sitnicki	343ead287d	bpf, netns: Fix build without CONFIG_INET When CONFIG_NET is set but CONFIG_INET isn't, build fails with: ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_unneed': kernel/bpf/net_namespace.c:32: undefined reference to `bpf_sk_lookup_enabled' ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_need': kernel/bpf/net_namespace.c:43: undefined reference to `bpf_sk_lookup_enabled' This is because without CONFIG_INET bpf_sk_lookup_enabled symbol is not available. Wrap references to bpf_sk_lookup_enabled with preprocessor conditionals. Fixes: 1559b4aa1db4 ("inet: Run SK_LOOKUP BPF program on socket lookup") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/bpf/20200721100716.720477-1-jakub@cloudflare.com	2020-07-21 09:12:34 -07:00
Alexei Starovoitov	e57892f50a	Merge branch 'bpf-socket-lookup' Jakub Sitnicki says: ==================== Changelog ========= v4 -> v5: - Enforce BPF prog return value to be SK_DROP or SK_PASS. (Andrii) - Simplify prog runners now that only SK_DROP/PASS can be returned. - Enable bpf_perf_event_output from the start. (Andrii) - Drop patch "selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c" - Remove tests for narrow loads from context at an offset wider in size than target field, while we are discussing how to fix it: https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@cloudflare.com/ - Rebase onto recent bpf-next (bfdfa51702de) - Other minor changes called out in per-patch changelogs, see patches: 2, 4, 6, 13-15 - Carried over Andrii's Acks where nothing changed. v3 -> v4: - Reduce BPF prog return codes to SK_DROP/SK_PASS (Lorenz) - Default to drop on illegal return value from BPF prog (Lorenz) - Extend bpf_sk_assign to accept NULL socket pointer. - Switch to saner return values and add docs for new prog_array API (Andrii) - Add support for narrow loads from BPF context fields (Yonghong) - Fix broken build when IPv6 is compiled as a module (kernel test robot) - Fix null/wild-ptr-deref on BPF context access - Rebase to recent bpf-next (eef8a42d6ce0) - Other minor changes called out in per-patch changelogs, see patches 1-2, 4, 6, 8, 10-12, 14, 16 v2 -> v3: - Switch to link-based program attachment - Support for multi-prog attachment - Ability to skip reuseport socket selection - Code on RX path is guarded by a static key - struct in6_addr's are no longer copied into BPF prog context - BPF prog context is initialized as late as possible - Changes called out in patches 1-2, 4, 6, 8, 10-14, 16 - Patches dropped: 01/17 flow_dissector: Extract attach/detach/query helpers 03/17 inet: Store layer 4 protocol in inet_hashinfo 08/17 udp: Store layer 4 protocol in udp_table v1 -> v2: - Changes called out in patches 2, 13-15, 17 - Rebase to recent bpf-next (b4563facdcae) RFCv2 -> v1: - Switch to fetching a socket from a map and selecting a socket with bpf_sk_assign, instead of having a dedicated helper that does both. - Run reuseport logic on sockets selected by BPF sk_lookup. - Allow BPF sk_lookup to fail the lookup with no match. - Go back to having just 2 hash table lookups in UDP. RFCv1 -> RFCv2: - Make socket lookup redirection map-based. BPF program now uses a dedicated helper and a SOCKARRAY map to select the socket to redirect to. A consequence of this change is that bpf_inet_lookup context is now read-only. - Look for connected UDP sockets before allowing redirection from BPF. This makes connected UDP socket work as expected in the presence of inet_lookup prog. - Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector, the only other per-netns BPF prog type. Overview ======== This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP, or BPF sk_lookup for short. BPF sk_lookup program runs when transport layer is looking up a listening socket for a new connection request (TCP), or when looking up an unconnected socket for a packet (UDP). This serves as a mechanism to overcome the limits of what bind() API allows to express. Two use-cases driving this work are: (1) steer packets destined to an IP range, fixed port to a single socket 192.0.2.0/24, port 80 -> NGINX socket (2) steer packets destined to an IP address, any port to a single socket 198.51.100.1, any port -> L7 proxy socket In its context, program receives information about the packet that triggered the socket lookup. Namely IP version, L4 protocol identifier, and address 4-tuple. To select a socket BPF program fetches it from a map holding socket references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...) helper to record the selection, and returns SK_PASS code. Transport layer then uses the selected socket as a result of socket lookup. Alternatively, program can also fail the lookup (SK_DROP), or let the lookup continue as usual (SK_PASS without selecting a socket). This lets the user match packets with listening (TCP) or receiving (UDP) sockets freely at the last possible point on the receive path, where we know that packets are destined for local delivery after undergoing policing, filtering, and routing. Program is attached to a network namespace, similar to BPF flow_dissector. We add a new attach type, BPF_SK_LOOKUP, for this. Multiple programs can be attached at the same time, in which case their return values are aggregated according the rules outlined in patch #4 description. Series structure ================ Patches are organized as so: 1: enables multiple link-based prog attachments for bpf-netns 2: introduces sk_lookup program type 3-4: hook up the program to run on ipv4/tcp socket lookup 5-6: hook up the program to run on ipv6/tcp socket lookup 7-8: hook up the program to run on ipv4/udp socket lookup 9-10: hook up the program to run on ipv6/udp socket lookup 11-13: libbpf & bpftool support for sk_lookup 14-15: verifier and selftests for sk_lookup Patches are also available on GH: https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v5 Follow-up work ============== I'll follow up with below items, which IMHO don't block the review: - benchmark results for udp6 small packet flood scenario, - user docs for new BPF prog type, Documentation/bpf/prog_sk_lookup.rst, - timeout for accept() in tests after extending network_helper.[ch]. Thanks to the reviewers for their feedback to this patch series: Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Lorenz Bauer <lmb@cloudflare.com> Cc: Marek Majkowski <marek@cloudflare.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> -jkbs [RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/ [RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ [v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@cloudflare.com/ [v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@cloudflare.com/ [v3] https://lore.kernel.org/bpf/20200702092416.11961-1-jakub@cloudflare.com/ [v4] https://lore.kernel.org/bpf/20200713174654.642628-1-jakub@cloudflare.com/ ==================== Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-07-17 20:19:32 -07:00
Jakub Sitnicki	0ab5539f85	selftests/bpf: Tests for BPF_SK_LOOKUP attach point Add tests to test_progs that exercise: - attaching/detaching/querying programs to BPF_SK_LOOKUP hook, - redirecting socket lookup to a socket selected by BPF program, - failing a socket lookup on BPF program's request, - error scenarios for selecting a socket from BPF program, - accessing BPF program context, - attaching and running multiple BPF programs. Run log: bash-5.0# ./test_progs -n 70 #70/1 query lookup prog:OK #70/2 TCP IPv4 redir port:OK #70/3 TCP IPv4 redir addr:OK #70/4 TCP IPv4 redir with reuseport:OK #70/5 TCP IPv4 redir skip reuseport:OK #70/6 TCP IPv6 redir port:OK #70/7 TCP IPv6 redir addr:OK #70/8 TCP IPv4->IPv6 redir port:OK #70/9 TCP IPv6 redir with reuseport:OK #70/10 TCP IPv6 redir skip reuseport:OK #70/11 UDP IPv4 redir port:OK #70/12 UDP IPv4 redir addr:OK #70/13 UDP IPv4 redir with reuseport:OK #70/14 UDP IPv4 redir skip reuseport:OK #70/15 UDP IPv6 redir port:OK #70/16 UDP IPv6 redir addr:OK #70/17 UDP IPv4->IPv6 redir port:OK #70/18 UDP IPv6 redir and reuseport:OK #70/19 UDP IPv6 redir skip reuseport:OK #70/20 TCP IPv4 drop on lookup:OK #70/21 TCP IPv6 drop on lookup:OK #70/22 UDP IPv4 drop on lookup:OK #70/23 UDP IPv6 drop on lookup:OK #70/24 TCP IPv4 drop on reuseport:OK #70/25 TCP IPv6 drop on reuseport:OK #70/26 UDP IPv4 drop on reuseport:OK #70/27 TCP IPv6 drop on reuseport:OK #70/28 sk_assign returns EEXIST:OK #70/29 sk_assign honors F_REPLACE:OK #70/30 sk_assign accepts NULL socket:OK #70/31 access ctx->sk:OK #70/32 narrow access to ctx v4:OK #70/33 narrow access to ctx v6:OK #70/34 sk_assign rejects TCP established:OK #70/35 sk_assign rejects UDP connected:OK #70/36 multi prog - pass, pass:OK #70/37 multi prog - drop, drop:OK #70/38 multi prog - pass, drop:OK #70/39 multi prog - drop, pass:OK #70/40 multi prog - pass, redir:OK #70/41 multi prog - redir, pass:OK #70/42 multi prog - drop, redir:OK #70/43 multi prog - redir, drop:OK #70/44 multi prog - redir, redir:OK #70 sk_lookup:OK Summary: 1/44 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-16-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	f7726cbea4	selftests/bpf: Add verifier tests for bpf_sk_lookup context access Exercise verifier access checks for bpf_sk_lookup context fields. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-15-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	93a3545d81	tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Make bpftool show human-friendly identifiers for newly introduced program and attach type, BPF_PROG_TYPE_SK_LOOKUP and BPF_SK_LOOKUP, respectively. Also, add the new prog type bash-completion, man page and help message. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-14-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	499dd29d90	libbpf: Add support for SK_LOOKUP program type Make libbpf aware of the newly added program type, and assign it a section name. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-13-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	a352b32ae9	bpf: Sync linux/bpf.h to tools/ Newly added program, context type and helper is used by tests in a subsequent patch. Synchronize the header file. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-12-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	6d4201b138	udp6: Run SK_LOOKUP BPF program on socket lookup Same as for udp4, let BPF program override the socket lookup result, by selecting a receiving socket of its choice or failing the lookup, if no connected UDP socket matched packet 4-tuple. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-11-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	2a08748cd3	udp6: Extract helper for selecting socket from reuseport group Prepare for calling into reuseport from __udp6_lib_lookup as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-10-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	72f7e9440e	udp: Run SK_LOOKUP BPF program on socket lookup Following INET/TCP socket lookup changes, modify UDP socket lookup to let BPF program select a receiving socket before searching for a socket by destination address and port as usual. Lookup of connected sockets that match packet 4-tuple is unaffected by this change. BPF program runs, and potentially overrides the lookup result, only if a 4-tuple match was not found. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-9-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	7629c73a14	udp: Extract helper for selecting socket from reuseport group Prepare for calling into reuseport from __udp4_lib_lookup as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-8-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	1122702f02	inet6: Run SK_LOOKUP BPF program on socket lookup Following ipv4 stack changes, run a BPF program attached to netns before looking up a listening socket. Program can return a listening socket to use as result of socket lookup, fail the lookup, or take no action. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-7-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	5df6531292	inet6: Extract helper for selecting socket from reuseport group Prepare for calling into reuseport from inet6_lookup_listener as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-6-jakub@cloudflare.com	2020-07-17 20:18:17 -07:00
Jakub Sitnicki	1559b4aa1d	inet: Run SK_LOOKUP BPF program on socket lookup Run a BPF program before looking up a listening socket on the receive path. Program selects a listening socket to yield as result of socket lookup by calling bpf_sk_assign() helper and returning SK_PASS code. Program can revert its decision by assigning a NULL socket with bpf_sk_assign(). Alternatively, BPF program can also fail the lookup by returning with SK_DROP, or let the lookup continue as usual with SK_PASS on return, when no socket has been selected with bpf_sk_assign(). This lets the user match packets with listening sockets freely at the last possible point on the receive path, where we know that packets are destined for local delivery after undergoing policing, filtering, and routing. With BPF code selecting the socket, directing packets destined to an IP range or to a port range to a single socket becomes possible. In case multiple programs are attached, they are run in series in the order in which they were attached. The end result is determined from return codes of all the programs according to following rules: 1. If any program returned SK_PASS and selected a valid socket, the socket is used as result of socket lookup. 2. If more than one program returned SK_PASS and selected a socket, last selection takes effect. 3. If any program returned SK_DROP, and no program returned SK_PASS and selected a socket, socket lookup fails with -ECONNREFUSED. 4. If all programs returned SK_PASS and none of them selected a socket, socket lookup continues to htable-based lookup. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com	2020-07-17 20:18:16 -07:00
Jakub Sitnicki	80b373f74f	inet: Extract helper for selecting socket from reuseport group Prepare for calling into reuseport from __inet_lookup_listener as well. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-4-jakub@cloudflare.com	2020-07-17 20:18:16 -07:00
Jakub Sitnicki	e9ddbb7707	bpf: Introduce SK_LOOKUP program type with a dedicated attach point Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer when looking up a listening socket for a new connection request for connection oriented protocols, or when looking up an unconnected socket for a packet for connection-less protocols. When called, SK_LOOKUP BPF program can select a socket that will receive the packet. This serves as a mechanism to overcome the limits of what bind() API allows to express. Two use-cases driving this work are: (1) steer packets destined to an IP range, on fixed port to a socket 192.0.2.0/24, port 80 -> NGINX socket (2) steer packets destined to an IP address, on any port to a socket 198.51.100.1, any port -> L7 proxy socket In its run-time context program receives information about the packet that triggered the socket lookup. Namely IP version, L4 protocol identifier, and address 4-tuple. Context can be further extended to include ingress interface identifier. To select a socket BPF program fetches it from a map holding socket references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...) helper to record the selection. Transport layer then uses the selected socket as a result of socket lookup. In its basic form, SK_LOOKUP acts as a filter and hence must return either SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should look for a socket to receive the packet, or use the one selected by the program if available, while SK_DROP informs the transport layer that the lookup should fail. This patch only enables the user to attach an SK_LOOKUP program to a network namespace. Subsequent patches hook it up to run on local delivery path in ipv4 and ipv6 stacks. Suggested-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com	2020-07-17 20:18:16 -07:00
Jakub Sitnicki	ce3aa9cc51	bpf, netns: Handle multiple link attachments Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the prog_array at given position when link gets attached/updated/released. This let's us lift the limit of having just one link attached for the new attach type introduced by subsequent patch. No functional changes intended. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com	2020-07-17 20:18:16 -07:00
Randy Dunlap	bfdfa51702	bpf: Drop duplicated words in uapi helper comments Drop doubled words "will" and "attach". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/6b9f71ae-4f8e-0259-2c5d-187ddaefe6eb@infradead.org	2020-07-16 21:00:09 +02:00
Stanislav Fomichev	e81e7a5337	selftests/bpf: Fix possible hang in sockopt_inherit Andrii reported that sockopt_inherit occasionally hangs up on 5.5 kernel [0]. This can happen if server_thread runs faster than the main thread. In that case, pthread_cond_wait will wait forever because pthread_cond_signal was executed before the main thread was blocking. Let's move pthread_mutex_lock up a bit to make sure server_thread runs strictly after the main thread goes to sleep. (Not sure why this is 5.5 specific, maybe scheduling is less deterministic? But I was able to confirm that it does indeed happen in a VM.) [0] https://lore.kernel.org/bpf/CAEf4BzY0-bVNHmCkMFPgObs=isUAyg-dFzGDY7QWYkmm7rmTSg@mail.gmail.com/ Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200715224107.3591967-1-sdf@google.com	2020-07-16 20:57:09 +02:00
Seth Forshee	de40a8abf0	bpf: revert "test_bpf: Flag tests that cannot be jited on s390" This reverts commit 3203c9010060 ("test_bpf: flag tests that cannot be jited on s390"). The s390 bpf JIT previously had a restriction on the maximum program size, which required some tests in test_bpf to be flagged as expected failures. The program size limitation has been removed, and the tests now pass, so these tests should no longer be flagged. Fixes: d1242b10ff03 ("s390/bpf: Remove JITed image size limitations") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/20200716143931.330122-1-seth.forshee@canonical.com	2020-07-16 20:52:43 +02:00
Lorenzo Bianconi	0550012502	selftest: Add tests for XDP programs in CPUMAP entries Similar to what have been done for DEVMAP, introduce tests to verify ability to add a XDP program to an entry in a CPUMAP. Verify CPUMAP programs can not be attached to devices as a normal XDP program, and only programs with BPF_XDP_CPUMAP attach type can be loaded in a CPUMAP. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/9c632fcea5382ea7b4578bd06b6eddf382c3550b.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:32 +02:00
Lorenzo Bianconi	ce4dade7f1	samples/bpf: xdp_redirect_cpu: Load a eBPF program on cpumap Extend xdp_redirect_cpu_{usr,kern}.c adding the possibility to load a XDP program on cpumap entries. The following options have been added: - mprog-name: cpumap entry program name - mprog-filename: cpumap entry program filename - redirect-device: output interface if the cpumap program performs a XDP_REDIRECT to an egress interface - redirect-map: bpf map used to perform XDP_REDIRECT to an egress interface - mprog-disable: disable loading XDP program on cpumap entries Add xdp_pass, xdp_drop, xdp_redirect stats accounting Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/aa5a9a281b9dac425620fdabe82670ffb6bbdb92.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:32 +02:00
Lorenzo Bianconi	4be556cf5a	libbpf: Add SEC name for xdp programs attached to CPUMAP As for DEVMAP, support SEC("xdp_cpumap/") as a short cut for loading the program with type BPF_PROG_TYPE_XDP and expected attach type BPF_XDP_CPUMAP. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/33174c41993a6d860d9c7c1f280a2477ee39ed11.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:32 +02:00
Lorenzo Bianconi	28b1520ebf	bpf: cpumap: Implement XDP_REDIRECT for eBPF programs attached to map entries Introduce XDP_REDIRECT support for eBPF programs attached to cpumap entries. This patch has been tested on Marvell ESPRESSObin using a modified version of xdp_redirect_cpu sample in order to attach a XDP program to CPUMAP entries to perform a redirect on the mvneta interface. In particular the following scenario has been tested: rq (cpu0) --> mvneta - XDP_REDIRECT (cpu0) --> CPUMAP - XDP_REDIRECT (cpu1) --> mvneta $./xdp_redirect_cpu -p xdp_cpu_map0 -d eth0 -c 1 -e xdp_redirect \ -f xdp_redirect_kern.o -m tx_port -r eth0 tx: 285.2 Kpps rx: 285.2 Kpps Attaching a simple XDP program on eth0 to perform XDP_TX gives comparable results: tx: 288.4 Kpps rx: 288.4 Kpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/2cf8373a731867af302b00c4ff16c122630c4980.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:32 +02:00
Lorenzo Bianconi	9216477449	bpf: cpumap: Add the possibility to attach an eBPF program to cpumap Introduce the capability to attach an eBPF program to cpumap entries. The idea behind this feature is to add the possibility to define on which CPU run the eBPF program if the underlying hw does not support RSS. Current supported verdicts are XDP_DROP and XDP_PASS. This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu sample available in the kernel tree to identify possible performance regressions. Results show there are no observable differences in packet-per-second: $./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1 rx: 354.8 Kpps rx: 356.0 Kpps rx: 356.8 Kpps rx: 356.3 Kpps rx: 356.6 Kpps rx: 356.6 Kpps rx: 356.7 Kpps rx: 355.8 Kpps rx: 356.8 Kpps rx: 356.8 Kpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:32 +02:00
Lorenzo Bianconi	644bfe51fa	cpumap: Formalize map value as a named struct As it has been already done for devmap, introduce 'struct bpf_cpumap_val' to formalize the expected values that can be passed in for a CPUMAP. Update cpumap code to use the struct. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/754f950674665dae6139c061d28c1d982aaf4170.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:32 +02:00
Lorenzo Bianconi	a4e76f1bda	samples/bpf: xdp_redirect_cpu_user: Do not update bpf maps in option loop Do not update xdp_redirect_cpu maps running while option loop but defer it after all available options have been parsed. This is a preliminary patch to pass the program name we want to attach to the map entries as a user option Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/95dc46286fd2c609042948e04bb7ae1f5b425538.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:31 +02:00
David Ahern	daa5cdc3fd	net: Refactor xdp_convert_buff_to_frame Move the guts of xdp_convert_buff_to_frame to a new helper, xdp_update_frame_from_buff so it can be reused removing code duplication Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/90a68c283d7ebeb48924934c9b7ac79492300472.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:31 +02:00
Jesper Dangaard Brouer	9b74ebb2b0	cpumap: Use non-locked version __ptr_ring_consume_batched Commit 77361825bb01 ("bpf: cpumap use ptr_ring_consume_batched") changed away from using single frame ptr_ring dequeue (__ptr_ring_consume) to consume a batched, but it uses a locked version, which as the comment explain isn't needed. Change to use the non-locked version __ptr_ring_consume_batched. Fixes: 77361825bb01 ("bpf: cpumap use ptr_ring_consume_batched") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/a9c7d06f9a009e282209f0c8c7b2c5d9b9ad60b9.1594734381.git.lorenzo@kernel.org	2020-07-16 17:00:31 +02:00
Randy Dunlap	59632b220f	net: ipv6: drop duplicate word in comment Drop the doubled word "by" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-07-15 20:34:11 -07:00
Randy Dunlap	d86f9868bd	net: sctp: drop duplicate words in comments Drop doubled words in several comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-07-15 20:34:11 -07:00
Randy Dunlap	4b48b0a3aa	net: ip6_fib.h: drop duplicate word in comment Drop doubled word "the" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-07-15 20:34:11 -07:00

1 2 3 4 5 ...

935409 Commits