Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2021-02-16

The following pull-request contains BPF updates for your *net-next* tree.

There's a small merge conflict between 7eeba1706e ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f81 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:

  [...]
                lock_sock(sk);
                err = tcp_zerocopy_receive(sk, &zc, &tss);
                err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
                                                          &zc, &len, err);
                release_sock(sk);
  [...]

We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).

The main changes are:

1) Adds support of pointers to types with known size among global function
   args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.

2) Add bpf_iter for task_vma which can be used to generate information similar
   to /proc/pid/maps, from Song Liu.

3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
   rewriting bind user ports from BPF side below the ip_unprivileged_port_start
   range, both from Stanislav Fomichev.

4) Prevent recursion on fentry/fexit & sleepable programs and allow map-in-map
   as well as per-cpu maps for the latter, from Alexei Starovoitov.

5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
   for sleepable programs, both from KP Singh.

6) Extend verifier to enable variable offset read/write access to the BPF
   program stack, from Andrei Matei.

7) Improve tc & XDP MTU handling and add a new bpf_check_mtu() helper to
   query device MTU from programs, from Jesper Dangaard Brouer.

8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
   tracing programs, from Florent Revest.

9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
   otherwise too many passes are required, from Gary Lin.

10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
    verification both related to zero-extension handling, from Ilya Leoshkevich.

11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.

12) Batch of AF_XDP selftest cleanups and small performance improvement
    for libbpf's xsk map redirect for newer kernels, from Björn Töpel.

13) Follow-up BPF doc and verifier improvements around atomics with
    BPF_FETCH, from Brendan Jackman.

14) Permit zero-sized data sections e.g. if ELF .rodata section contains
    read-only data from local variables, from Yonghong Song.

15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
David S. Miller 2021-02-16 13:14:06 -08:00
commit b8af417e4d
156 changed files with 5684 additions and 1511 deletions

View File

@ -208,6 +208,12 @@ data structures and compile with kernel internal headers. Both of these
kernel internals are subject to change and can break with newer kernels kernel internals are subject to change and can break with newer kernels
such that the program needs to be adapted accordingly. such that the program needs to be adapted accordingly.
Q: Are tracepoints part of the stable ABI?
------------------------------------------
A: NO. Tracepoints are tied to internal implementation details hence they are
subject to change and can break with newer kernels. BPF programs need to change
accordingly when this happens.
Q: How much stack space a BPF program uses? Q: How much stack space a BPF program uses?
------------------------------------------- -------------------------------------------
A: Currently all program types are limited to 512 bytes of stack A: Currently all program types are limited to 512 bytes of stack

View File

@ -501,16 +501,19 @@ All LLVM releases can be found at: http://releases.llvm.org/
Q: Got it, so how do I build LLVM manually anyway? Q: Got it, so how do I build LLVM manually anyway?
-------------------------------------------------- --------------------------------------------------
A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have A: We recommend that developers who want the fastest incremental builds
that set up, proceed with building the latest LLVM and clang version use the Ninja build system, you can find it in your system's package
manager, usually the package is ninja or ninja-build.
You need ninja, cmake and gcc-c++ as build requisites for LLVM. Once you
have that set up, proceed with building the latest LLVM and clang version
from the git repositories:: from the git repositories::
$ git clone https://github.com/llvm/llvm-project.git $ git clone https://github.com/llvm/llvm-project.git
$ mkdir -p llvm-project/llvm/build/install $ mkdir -p llvm-project/llvm/build
$ cd llvm-project/llvm/build $ cd llvm-project/llvm/build
$ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ $ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
-DLLVM_ENABLE_PROJECTS="clang" \ -DLLVM_ENABLE_PROJECTS="clang" \
-DBUILD_SHARED_LIBS=OFF \
-DCMAKE_BUILD_TYPE=Release \ -DCMAKE_BUILD_TYPE=Release \
-DLLVM_BUILD_RUNTIME=OFF -DLLVM_BUILD_RUNTIME=OFF
$ ninja $ ninja

View File

@ -1048,12 +1048,12 @@ Unlike classic BPF instruction set, eBPF has generic load/store operations::
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
It also includes atomic operations, which use the immediate field for extra It also includes atomic operations, which use the immediate field for extra
encoding. encoding::
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
The basic atomic operations supported are: The basic atomic operations supported are::
BPF_ADD BPF_ADD
BPF_AND BPF_AND
@ -1066,33 +1066,35 @@ memory location addresed by ``dst_reg + off`` is atomically modified, with
immediate, then these operations also overwrite ``src_reg`` with the immediate, then these operations also overwrite ``src_reg`` with the
value that was in memory before it was modified. value that was in memory before it was modified.
The more special operations are: The more special operations are::
BPF_XCHG BPF_XCHG
This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg + This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
off``. off``. ::
BPF_CMPXCHG BPF_CMPXCHG
This atomically compares the value addressed by ``dst_reg + off`` with This atomically compares the value addressed by ``dst_reg + off`` with
``R0``. If they match it is replaced with ``src_reg``, The value that was there ``R0``. If they match it is replaced with ``src_reg``. In either case, the
before is loaded back to ``R0``. value that was there before is zero-extended and loaded back to ``R0``.
Note that 1 and 2 byte atomic operations are not supported. Note that 1 and 2 byte atomic operations are not supported.
Except ``BPF_ADD`` _without_ ``BPF_FETCH`` (for legacy reasons), all 4 byte Clang can generate atomic instructions by default when ``-mcpu=v3`` is
atomic operations require alu32 mode. Clang enables this mode by default in enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
architecture v3 (``-mcpu=v3``). For older versions it can be enabled with Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
the atomics features, while keeping a lower ``-mcpu`` version, you can use
``-Xclang -target-feature -Xclang +alu32``. ``-Xclang -target-feature -Xclang +alu32``.
You may encounter BPF_XADD - this is a legacy name for BPF_ATOMIC, referring to You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``,
the exclusive-add operation encoded when the immediate field is zero. referring to the exclusive-add operation encoded when the immediate field is
zero.
eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists
of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
instruction that loads 64-bit immediate value into a dst_reg. instruction that loads 64-bit immediate value into a dst_reg.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads Classic BPF has similar instruction: ``BPF_LD | BPF_W | BPF_IMM`` which loads
32-bit immediate value into a register. 32-bit immediate value into a register.
eBPF verifier eBPF verifier

View File

@ -1082,6 +1082,17 @@ ifdef CONFIG_STACK_VALIDATION
endif endif
endif endif
PHONY += resolve_btfids_clean
resolve_btfids_O = $(abspath $(objtree))/tools/bpf/resolve_btfids
# tools/bpf/resolve_btfids directory might not exist
# in output directory, skip its clean in that case
resolve_btfids_clean:
ifneq ($(wildcard $(resolve_btfids_O)),)
$(Q)$(MAKE) -sC $(srctree)/tools/bpf/resolve_btfids O=$(resolve_btfids_O) clean
endif
ifdef CONFIG_BPF ifdef CONFIG_BPF
ifdef CONFIG_DEBUG_INFO_BTF ifdef CONFIG_DEBUG_INFO_BTF
ifeq ($(has_libelf),1) ifeq ($(has_libelf),1)
@ -1491,7 +1502,7 @@ vmlinuxclean:
$(Q)$(CONFIG_SHELL) $(srctree)/scripts/link-vmlinux.sh clean $(Q)$(CONFIG_SHELL) $(srctree)/scripts/link-vmlinux.sh clean
$(Q)$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) clean) $(Q)$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) clean)
clean: archclean vmlinuxclean clean: archclean vmlinuxclean resolve_btfids_clean
# mrproper - Delete all generated files, including .config # mrproper - Delete all generated files, including .config
# #

View File

@ -869,8 +869,31 @@ static void detect_reg_usage(struct bpf_insn *insn, int insn_cnt,
} }
} }
static int emit_nops(u8 **pprog, int len)
{
u8 *prog = *pprog;
int i, noplen, cnt = 0;
while (len > 0) {
noplen = len;
if (noplen > ASM_NOP_MAX)
noplen = ASM_NOP_MAX;
for (i = 0; i < noplen; i++)
EMIT1(ideal_nops[noplen][i]);
len -= noplen;
}
*pprog = prog;
return cnt;
}
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
int oldproglen, struct jit_context *ctx) int oldproglen, struct jit_context *ctx, bool jmp_padding)
{ {
bool tail_call_reachable = bpf_prog->aux->tail_call_reachable; bool tail_call_reachable = bpf_prog->aux->tail_call_reachable;
struct bpf_insn *insn = bpf_prog->insnsi; struct bpf_insn *insn = bpf_prog->insnsi;
@ -880,7 +903,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
bool seen_exit = false; bool seen_exit = false;
u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY]; u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
int i, cnt = 0, excnt = 0; int i, cnt = 0, excnt = 0;
int proglen = 0; int ilen, proglen = 0;
u8 *prog = temp; u8 *prog = temp;
int err; int err;
@ -894,17 +917,24 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
bpf_prog_was_classic(bpf_prog), tail_call_reachable, bpf_prog_was_classic(bpf_prog), tail_call_reachable,
bpf_prog->aux->func_idx != 0); bpf_prog->aux->func_idx != 0);
push_callee_regs(&prog, callee_regs_used); push_callee_regs(&prog, callee_regs_used);
addrs[0] = prog - temp;
ilen = prog - temp;
if (image)
memcpy(image + proglen, temp, ilen);
proglen += ilen;
addrs[0] = proglen;
prog = temp;
for (i = 1; i <= insn_cnt; i++, insn++) { for (i = 1; i <= insn_cnt; i++, insn++) {
const s32 imm32 = insn->imm; const s32 imm32 = insn->imm;
u32 dst_reg = insn->dst_reg; u32 dst_reg = insn->dst_reg;
u32 src_reg = insn->src_reg; u32 src_reg = insn->src_reg;
u8 b2 = 0, b3 = 0; u8 b2 = 0, b3 = 0;
u8 *start_of_ldx;
s64 jmp_offset; s64 jmp_offset;
u8 jmp_cond; u8 jmp_cond;
int ilen;
u8 *func; u8 *func;
int nops;
switch (insn->code) { switch (insn->code) {
/* ALU */ /* ALU */
@ -1249,12 +1279,30 @@ st: if (is_imm8(insn->off))
case BPF_LDX | BPF_PROBE_MEM | BPF_W: case BPF_LDX | BPF_PROBE_MEM | BPF_W:
case BPF_LDX | BPF_MEM | BPF_DW: case BPF_LDX | BPF_MEM | BPF_DW:
case BPF_LDX | BPF_PROBE_MEM | BPF_DW: case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
/* test src_reg, src_reg */
maybe_emit_mod(&prog, src_reg, src_reg, true); /* always 1 byte */
EMIT2(0x85, add_2reg(0xC0, src_reg, src_reg));
/* jne start_of_ldx */
EMIT2(X86_JNE, 0);
/* xor dst_reg, dst_reg */
emit_mov_imm32(&prog, false, dst_reg, 0);
/* jmp byte_after_ldx */
EMIT2(0xEB, 0);
/* populate jmp_offset for JNE above */
temp[4] = prog - temp - 5 /* sizeof(test + jne) */;
start_of_ldx = prog;
}
emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off); emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
struct exception_table_entry *ex; struct exception_table_entry *ex;
u8 *_insn = image + proglen; u8 *_insn = image + proglen;
s64 delta; s64 delta;
/* populate jmp_offset for JMP above */
start_of_ldx[-1] = prog - start_of_ldx;
if (!bpf_prog->aux->extable) if (!bpf_prog->aux->extable)
break; break;
@ -1502,6 +1550,30 @@ emit_cond_jmp: /* Convert BPF opcode to x86 */
} }
jmp_offset = addrs[i + insn->off] - addrs[i]; jmp_offset = addrs[i + insn->off] - addrs[i];
if (is_imm8(jmp_offset)) { if (is_imm8(jmp_offset)) {
if (jmp_padding) {
/* To keep the jmp_offset valid, the extra bytes are
* padded before the jump insn, so we substract the
* 2 bytes of jmp_cond insn from INSN_SZ_DIFF.
*
* If the previous pass already emits an imm8
* jmp_cond, then this BPF insn won't shrink, so
* "nops" is 0.
*
* On the other hand, if the previous pass emits an
* imm32 jmp_cond, the extra 4 bytes(*) is padded to
* keep the image from shrinking further.
*
* (*) imm32 jmp_cond is 6 bytes, and imm8 jmp_cond
* is 2 bytes, so the size difference is 4 bytes.
*/
nops = INSN_SZ_DIFF - 2;
if (nops != 0 && nops != 4) {
pr_err("unexpected jmp_cond padding: %d bytes\n",
nops);
return -EFAULT;
}
cnt += emit_nops(&prog, nops);
}
EMIT2(jmp_cond, jmp_offset); EMIT2(jmp_cond, jmp_offset);
} else if (is_simm32(jmp_offset)) { } else if (is_simm32(jmp_offset)) {
EMIT2_off32(0x0F, jmp_cond + 0x10, jmp_offset); EMIT2_off32(0x0F, jmp_cond + 0x10, jmp_offset);
@ -1524,11 +1596,55 @@ emit_cond_jmp: /* Convert BPF opcode to x86 */
else else
jmp_offset = addrs[i + insn->off] - addrs[i]; jmp_offset = addrs[i + insn->off] - addrs[i];
if (!jmp_offset) if (!jmp_offset) {
/* Optimize out nop jumps */ /*
* If jmp_padding is enabled, the extra nops will
* be inserted. Otherwise, optimize out nop jumps.
*/
if (jmp_padding) {
/* There are 3 possible conditions.
* (1) This BPF_JA is already optimized out in
* the previous run, so there is no need
* to pad any extra byte (0 byte).
* (2) The previous pass emits an imm8 jmp,
* so we pad 2 bytes to match the previous
* insn size.
* (3) Similarly, the previous pass emits an
* imm32 jmp, and 5 bytes is padded.
*/
nops = INSN_SZ_DIFF;
if (nops != 0 && nops != 2 && nops != 5) {
pr_err("unexpected nop jump padding: %d bytes\n",
nops);
return -EFAULT;
}
cnt += emit_nops(&prog, nops);
}
break; break;
}
emit_jmp: emit_jmp:
if (is_imm8(jmp_offset)) { if (is_imm8(jmp_offset)) {
if (jmp_padding) {
/* To avoid breaking jmp_offset, the extra bytes
* are padded before the actual jmp insn, so
* 2 bytes is substracted from INSN_SZ_DIFF.
*
* If the previous pass already emits an imm8
* jmp, there is nothing to pad (0 byte).
*
* If it emits an imm32 jmp (5 bytes) previously
* and now an imm8 jmp (2 bytes), then we pad
* (5 - 2 = 3) bytes to stop the image from
* shrinking further.
*/
nops = INSN_SZ_DIFF - 2;
if (nops != 0 && nops != 3) {
pr_err("unexpected jump padding: %d bytes\n",
nops);
return -EFAULT;
}
cnt += emit_nops(&prog, INSN_SZ_DIFF - 2);
}
EMIT2(0xEB, jmp_offset); EMIT2(0xEB, jmp_offset);
} else if (is_simm32(jmp_offset)) { } else if (is_simm32(jmp_offset)) {
EMIT1_off32(0xE9, jmp_offset); EMIT1_off32(0xE9, jmp_offset);
@ -1624,17 +1740,25 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
struct bpf_prog *p, int stack_size, bool mod_ret) struct bpf_prog *p, int stack_size, bool mod_ret)
{ {
u8 *prog = *pprog; u8 *prog = *pprog;
u8 *jmp_insn;
int cnt = 0; int cnt = 0;
if (p->aux->sleepable) { /* arg1: mov rdi, progs[i] */
if (emit_call(&prog, __bpf_prog_enter_sleepable, prog)) emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
if (emit_call(&prog,
p->aux->sleepable ? __bpf_prog_enter_sleepable :
__bpf_prog_enter, prog))
return -EINVAL; return -EINVAL;
} else { /* remember prog start time returned by __bpf_prog_enter */
if (emit_call(&prog, __bpf_prog_enter, prog)) emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
return -EINVAL;
/* remember prog start time returned by __bpf_prog_enter */ /* if (__bpf_prog_enter*(prog) == 0)
emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0); * goto skip_exec_of_prog;
} */
EMIT3(0x48, 0x85, 0xC0); /* test rax,rax */
/* emit 2 nops that will be replaced with JE insn */
jmp_insn = prog;
emit_nops(&prog, 2);
/* arg1: lea rdi, [rbp - stack_size] */ /* arg1: lea rdi, [rbp - stack_size] */
EMIT4(0x48, 0x8D, 0x7D, -stack_size); EMIT4(0x48, 0x8D, 0x7D, -stack_size);
@ -1654,43 +1778,23 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
if (mod_ret) if (mod_ret)
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
if (p->aux->sleepable) { /* replace 2 nops with JE insn, since jmp target is known */
if (emit_call(&prog, __bpf_prog_exit_sleepable, prog)) jmp_insn[0] = X86_JE;
jmp_insn[1] = prog - jmp_insn - 2;
/* arg1: mov rdi, progs[i] */
emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p);
/* arg2: mov rsi, rbx <- start time in nsec */
emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
if (emit_call(&prog,
p->aux->sleepable ? __bpf_prog_exit_sleepable :
__bpf_prog_exit, prog))
return -EINVAL; return -EINVAL;
} else {
/* arg1: mov rdi, progs[i] */
emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32,
(u32) (long) p);
/* arg2: mov rsi, rbx <- start time in nsec */
emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
if (emit_call(&prog, __bpf_prog_exit, prog))
return -EINVAL;
}
*pprog = prog; *pprog = prog;
return 0; return 0;
} }
static void emit_nops(u8 **pprog, unsigned int len)
{
unsigned int i, noplen;
u8 *prog = *pprog;
int cnt = 0;
while (len > 0) {
noplen = len;
if (noplen > ASM_NOP_MAX)
noplen = ASM_NOP_MAX;
for (i = 0; i < noplen; i++)
EMIT1(ideal_nops[noplen][i]);
len -= noplen;
}
*pprog = prog;
}
static void emit_align(u8 **pprog, u32 align) static void emit_align(u8 **pprog, u32 align)
{ {
u8 *target, *prog = *pprog; u8 *target, *prog = *pprog;
@ -2065,6 +2169,9 @@ struct x64_jit_data {
struct jit_context ctx; struct jit_context ctx;
}; };
#define MAX_PASSES 20
#define PADDING_PASSES (MAX_PASSES - 5)
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
{ {
struct bpf_binary_header *header = NULL; struct bpf_binary_header *header = NULL;
@ -2074,6 +2181,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
struct jit_context ctx = {}; struct jit_context ctx = {};
bool tmp_blinded = false; bool tmp_blinded = false;
bool extra_pass = false; bool extra_pass = false;
bool padding = false;
u8 *image = NULL; u8 *image = NULL;
int *addrs; int *addrs;
int pass; int pass;
@ -2110,6 +2218,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
image = jit_data->image; image = jit_data->image;
header = jit_data->header; header = jit_data->header;
extra_pass = true; extra_pass = true;
padding = true;
goto skip_init_addrs; goto skip_init_addrs;
} }
addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL); addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
@ -2135,8 +2244,10 @@ skip_init_addrs:
* may converge on the last pass. In such case do one more * may converge on the last pass. In such case do one more
* pass to emit the final image. * pass to emit the final image.
*/ */
for (pass = 0; pass < 20 || image; pass++) { for (pass = 0; pass < MAX_PASSES || image; pass++) {
proglen = do_jit(prog, addrs, image, oldproglen, &ctx); if (!padding && pass >= PADDING_PASSES)
padding = true;
proglen = do_jit(prog, addrs, image, oldproglen, &ctx, padding);
if (proglen <= 0) { if (proglen <= 0) {
out_image: out_image:
image = NULL; image = NULL;

View File

@ -35,6 +35,7 @@
#define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) #define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN)
#define VETH_XDP_TX_BULK_SIZE 16 #define VETH_XDP_TX_BULK_SIZE 16
#define VETH_XDP_BATCH 16
struct veth_stats { struct veth_stats {
u64 rx_drops; u64 rx_drops;
@ -562,20 +563,13 @@ static int veth_xdp_tx(struct veth_rq *rq, struct xdp_buff *xdp,
return 0; return 0;
} }
static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
struct xdp_frame *frame, struct xdp_frame *frame,
struct veth_xdp_tx_bq *bq, struct veth_xdp_tx_bq *bq,
struct veth_stats *stats) struct veth_stats *stats)
{ {
void *hard_start = frame->data - frame->headroom;
int len = frame->len, delta = 0;
struct xdp_frame orig_frame; struct xdp_frame orig_frame;
struct bpf_prog *xdp_prog; struct bpf_prog *xdp_prog;
unsigned int headroom;
struct sk_buff *skb;
/* bpf_xdp_adjust_head() assures BPF cannot access xdp_frame area */
hard_start -= sizeof(struct xdp_frame);
rcu_read_lock(); rcu_read_lock();
xdp_prog = rcu_dereference(rq->xdp_prog); xdp_prog = rcu_dereference(rq->xdp_prog);
@ -590,8 +584,8 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
switch (act) { switch (act) {
case XDP_PASS: case XDP_PASS:
delta = frame->data - xdp.data; if (xdp_update_frame_from_buff(&xdp, frame))
len = xdp.data_end - xdp.data; goto err_xdp;
break; break;
case XDP_TX: case XDP_TX:
orig_frame = *frame; orig_frame = *frame;
@ -629,19 +623,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
} }
rcu_read_unlock(); rcu_read_unlock();
headroom = sizeof(struct xdp_frame) + frame->headroom - delta; return frame;
skb = veth_build_skb(hard_start, headroom, len, frame->frame_sz);
if (!skb) {
xdp_return_frame(frame);
stats->rx_drops++;
goto err;
}
xdp_release_frame(frame);
xdp_scrub_frame(frame);
skb->protocol = eth_type_trans(skb, rq->dev);
err:
return skb;
err_xdp: err_xdp:
rcu_read_unlock(); rcu_read_unlock();
xdp_return_frame(frame); xdp_return_frame(frame);
@ -649,6 +631,37 @@ xdp_xmit:
return NULL; return NULL;
} }
/* frames array contains VETH_XDP_BATCH at most */
static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
int n_xdpf, struct veth_xdp_tx_bq *bq,
struct veth_stats *stats)
{
void *skbs[VETH_XDP_BATCH];
int i;
if (xdp_alloc_skb_bulk(skbs, n_xdpf,
GFP_ATOMIC | __GFP_ZERO) < 0) {
for (i = 0; i < n_xdpf; i++)
xdp_return_frame(frames[i]);
stats->rx_drops += n_xdpf;
return;
}
for (i = 0; i < n_xdpf; i++) {
struct sk_buff *skb = skbs[i];
skb = __xdp_build_skb_from_frame(frames[i], skb,
rq->dev);
if (!skb) {
xdp_return_frame(frames[i]);
stats->rx_drops++;
continue;
}
napi_gro_receive(&rq->xdp_napi, skb);
}
}
static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
struct sk_buff *skb, struct sk_buff *skb,
struct veth_xdp_tx_bq *bq, struct veth_xdp_tx_bq *bq,
@ -796,32 +809,45 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
struct veth_xdp_tx_bq *bq, struct veth_xdp_tx_bq *bq,
struct veth_stats *stats) struct veth_stats *stats)
{ {
int i, done = 0; int i, done = 0, n_xdpf = 0;
void *xdpf[VETH_XDP_BATCH];
for (i = 0; i < budget; i++) { for (i = 0; i < budget; i++) {
void *ptr = __ptr_ring_consume(&rq->xdp_ring); void *ptr = __ptr_ring_consume(&rq->xdp_ring);
struct sk_buff *skb;
if (!ptr) if (!ptr)
break; break;
if (veth_is_xdp_frame(ptr)) { if (veth_is_xdp_frame(ptr)) {
/* ndo_xdp_xmit */
struct xdp_frame *frame = veth_ptr_to_xdp(ptr); struct xdp_frame *frame = veth_ptr_to_xdp(ptr);
stats->xdp_bytes += frame->len; stats->xdp_bytes += frame->len;
skb = veth_xdp_rcv_one(rq, frame, bq, stats); frame = veth_xdp_rcv_one(rq, frame, bq, stats);
if (frame) {
/* XDP_PASS */
xdpf[n_xdpf++] = frame;
if (n_xdpf == VETH_XDP_BATCH) {
veth_xdp_rcv_bulk_skb(rq, xdpf, n_xdpf,
bq, stats);
n_xdpf = 0;
}
}
} else { } else {
skb = ptr; /* ndo_start_xmit */
struct sk_buff *skb = ptr;
stats->xdp_bytes += skb->len; stats->xdp_bytes += skb->len;
skb = veth_xdp_rcv_skb(rq, skb, bq, stats); skb = veth_xdp_rcv_skb(rq, skb, bq, stats);
if (skb)
napi_gro_receive(&rq->xdp_napi, skb);
} }
if (skb)
napi_gro_receive(&rq->xdp_napi, skb);
done++; done++;
} }
if (n_xdpf)
veth_xdp_rcv_bulk_skb(rq, xdpf, n_xdpf, bq, stats);
u64_stats_update_begin(&rq->stats.syncp); u64_stats_update_begin(&rq->stats.syncp);
rq->stats.vs.xdp_redirect += stats->xdp_redirect; rq->stats.vs.xdp_redirect += stats->xdp_redirect;
rq->stats.vs.xdp_bytes += stats->xdp_bytes; rq->stats.vs.xdp_bytes += stats->xdp_bytes;

View File

@ -23,8 +23,8 @@ struct ctl_table_header;
#ifdef CONFIG_CGROUP_BPF #ifdef CONFIG_CGROUP_BPF
extern struct static_key_false cgroup_bpf_enabled_key; extern struct static_key_false cgroup_bpf_enabled_key[MAX_BPF_ATTACH_TYPE];
#define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key) #define cgroup_bpf_enabled(type) static_branch_unlikely(&cgroup_bpf_enabled_key[type])
DECLARE_PER_CPU(struct bpf_cgroup_storage*, DECLARE_PER_CPU(struct bpf_cgroup_storage*,
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]); bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
@ -125,7 +125,8 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
struct sockaddr *uaddr, struct sockaddr *uaddr,
enum bpf_attach_type type, enum bpf_attach_type type,
void *t_ctx); void *t_ctx,
u32 *flags);
int __cgroup_bpf_run_filter_sock_ops(struct sock *sk, int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
struct bpf_sock_ops_kern *sock_ops, struct bpf_sock_ops_kern *sock_ops,
@ -147,6 +148,10 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
int __user *optlen, int max_optlen, int __user *optlen, int max_optlen,
int retval); int retval);
int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
int optname, void *optval,
int *optlen, int retval);
static inline enum bpf_cgroup_storage_type cgroup_storage_type( static inline enum bpf_cgroup_storage_type cgroup_storage_type(
struct bpf_map *map) struct bpf_map *map)
{ {
@ -185,7 +190,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb) \ #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) \ if (cgroup_bpf_enabled(BPF_CGROUP_INET_INGRESS)) \
__ret = __cgroup_bpf_run_filter_skb(sk, skb, \ __ret = __cgroup_bpf_run_filter_skb(sk, skb, \
BPF_CGROUP_INET_INGRESS); \ BPF_CGROUP_INET_INGRESS); \
\ \
@ -195,7 +200,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb) \ #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled && sk && sk == skb->sk) { \ if (cgroup_bpf_enabled(BPF_CGROUP_INET_EGRESS) && sk && sk == skb->sk) { \
typeof(sk) __sk = sk_to_full_sk(sk); \ typeof(sk) __sk = sk_to_full_sk(sk); \
if (sk_fullsock(__sk)) \ if (sk_fullsock(__sk)) \
__ret = __cgroup_bpf_run_filter_skb(__sk, skb, \ __ret = __cgroup_bpf_run_filter_skb(__sk, skb, \
@ -207,7 +212,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_SK_PROG(sk, type) \ #define BPF_CGROUP_RUN_SK_PROG(sk, type) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) { \ if (cgroup_bpf_enabled(type)) { \
__ret = __cgroup_bpf_run_filter_sk(sk, type); \ __ret = __cgroup_bpf_run_filter_sk(sk, type); \
} \ } \
__ret; \ __ret; \
@ -227,33 +232,53 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, type) \ #define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, type) \
({ \ ({ \
u32 __unused_flags; \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) \ if (cgroup_bpf_enabled(type)) \
__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \
NULL); \ NULL, \
&__unused_flags); \
__ret; \ __ret; \
}) })
#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx) \ #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx) \
({ \ ({ \
u32 __unused_flags; \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) { \ if (cgroup_bpf_enabled(type)) { \
lock_sock(sk); \ lock_sock(sk); \
__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \
t_ctx); \ t_ctx, \
&__unused_flags); \
release_sock(sk); \ release_sock(sk); \
} \ } \
__ret; \ __ret; \
}) })
#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) \ /* BPF_CGROUP_INET4_BIND and BPF_CGROUP_INET6_BIND can return extra flags
BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL) * via upper bits of return code. The only flag that is supported
* (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check
* should be bypassed (BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE).
*/
#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, bind_flags) \
({ \
u32 __flags = 0; \
int __ret = 0; \
if (cgroup_bpf_enabled(type)) { \
lock_sock(sk); \
__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \
NULL, &__flags); \
release_sock(sk); \
if (__flags & BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE) \
*bind_flags |= BIND_NO_CAP_NET_BIND_SERVICE; \
} \
__ret; \
})
#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) \ #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) \
BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL) ((cgroup_bpf_enabled(BPF_CGROUP_INET4_CONNECT) || \
cgroup_bpf_enabled(BPF_CGROUP_INET6_CONNECT)) && \
#define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (cgroup_bpf_enabled && \ (sk)->sk_prot->pre_connect)
sk->sk_prot->pre_connect)
#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) \ #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) \
BPF_CGROUP_RUN_SA_PROG(sk, uaddr, BPF_CGROUP_INET4_CONNECT) BPF_CGROUP_RUN_SA_PROG(sk, uaddr, BPF_CGROUP_INET4_CONNECT)
@ -297,7 +322,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_PROG_SOCK_OPS_SK(sock_ops, sk) \ #define BPF_CGROUP_RUN_PROG_SOCK_OPS_SK(sock_ops, sk) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) \ if (cgroup_bpf_enabled(BPF_CGROUP_SOCK_OPS)) \
__ret = __cgroup_bpf_run_filter_sock_ops(sk, \ __ret = __cgroup_bpf_run_filter_sock_ops(sk, \
sock_ops, \ sock_ops, \
BPF_CGROUP_SOCK_OPS); \ BPF_CGROUP_SOCK_OPS); \
@ -307,7 +332,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) \ #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled && (sock_ops)->sk) { \ if (cgroup_bpf_enabled(BPF_CGROUP_SOCK_OPS) && (sock_ops)->sk) { \
typeof(sk) __sk = sk_to_full_sk((sock_ops)->sk); \ typeof(sk) __sk = sk_to_full_sk((sock_ops)->sk); \
if (__sk && sk_fullsock(__sk)) \ if (__sk && sk_fullsock(__sk)) \
__ret = __cgroup_bpf_run_filter_sock_ops(__sk, \ __ret = __cgroup_bpf_run_filter_sock_ops(__sk, \
@ -320,7 +345,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access) \ #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) \ if (cgroup_bpf_enabled(BPF_CGROUP_DEVICE)) \
__ret = __cgroup_bpf_check_dev_permission(type, major, minor, \ __ret = __cgroup_bpf_check_dev_permission(type, major, minor, \
access, \ access, \
BPF_CGROUP_DEVICE); \ BPF_CGROUP_DEVICE); \
@ -332,7 +357,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, count, pos) \ #define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, count, pos) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) \ if (cgroup_bpf_enabled(BPF_CGROUP_SYSCTL)) \
__ret = __cgroup_bpf_run_filter_sysctl(head, table, write, \ __ret = __cgroup_bpf_run_filter_sysctl(head, table, write, \
buf, count, pos, \ buf, count, pos, \
BPF_CGROUP_SYSCTL); \ BPF_CGROUP_SYSCTL); \
@ -343,7 +368,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
kernel_optval) \ kernel_optval) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) \ if (cgroup_bpf_enabled(BPF_CGROUP_SETSOCKOPT)) \
__ret = __cgroup_bpf_run_filter_setsockopt(sock, level, \ __ret = __cgroup_bpf_run_filter_setsockopt(sock, level, \
optname, optval, \ optname, optval, \
optlen, \ optlen, \
@ -354,7 +379,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen) \ #define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \
if (cgroup_bpf_enabled) \ if (cgroup_bpf_enabled(BPF_CGROUP_GETSOCKOPT)) \
get_user(__ret, optlen); \ get_user(__ret, optlen); \
__ret; \ __ret; \
}) })
@ -363,11 +388,24 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
max_optlen, retval) \ max_optlen, retval) \
({ \ ({ \
int __ret = retval; \ int __ret = retval; \
if (cgroup_bpf_enabled) \ if (cgroup_bpf_enabled(BPF_CGROUP_GETSOCKOPT)) \
__ret = __cgroup_bpf_run_filter_getsockopt(sock, level, \ if (!(sock)->sk_prot->bpf_bypass_getsockopt || \
optname, optval, \ !INDIRECT_CALL_INET_1((sock)->sk_prot->bpf_bypass_getsockopt, \
optlen, max_optlen, \ tcp_bpf_bypass_getsockopt, \
retval); \ level, optname)) \
__ret = __cgroup_bpf_run_filter_getsockopt( \
sock, level, optname, optval, optlen, \
max_optlen, retval); \
__ret; \
})
#define BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sock, level, optname, optval, \
optlen, retval) \
({ \
int __ret = retval; \
if (cgroup_bpf_enabled(BPF_CGROUP_GETSOCKOPT)) \
__ret = __cgroup_bpf_run_filter_getsockopt_kern( \
sock, level, optname, optval, optlen, retval); \
__ret; \ __ret; \
}) })
@ -427,15 +465,14 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
return 0; return 0;
} }
#define cgroup_bpf_enabled (0) #define cgroup_bpf_enabled(type) (0)
#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx) ({ 0; }) #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx) ({ 0; })
#define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0) #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0)
#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
#define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
#define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
#define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; })
#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) ({ 0; })
#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) ({ 0; })
#define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; })
#define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; })
#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) ({ 0; })
@ -452,6 +489,8 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen) ({ 0; }) #define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen) ({ 0; })
#define BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock, level, optname, optval, \ #define BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock, level, optname, optval, \
optlen, max_optlen, retval) ({ retval; }) optlen, max_optlen, retval) ({ retval; })
#define BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sock, level, optname, optval, \
optlen, retval) ({ retval; })
#define BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock, level, optname, optval, optlen, \ #define BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock, level, optname, optval, optlen, \
kernel_optval) ({ 0; }) kernel_optval) ({ 0; })

View File

@ -14,7 +14,6 @@
#include <linux/numa.h> #include <linux/numa.h>
#include <linux/mm_types.h> #include <linux/mm_types.h>
#include <linux/wait.h> #include <linux/wait.h>
#include <linux/u64_stats_sync.h>
#include <linux/refcount.h> #include <linux/refcount.h>
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/module.h> #include <linux/module.h>
@ -507,12 +506,6 @@ enum bpf_cgroup_storage_type {
*/ */
#define MAX_BPF_FUNC_ARGS 12 #define MAX_BPF_FUNC_ARGS 12
struct bpf_prog_stats {
u64 cnt;
u64 nsecs;
struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));
struct btf_func_model { struct btf_func_model {
u8 ret_size; u8 ret_size;
u8 nr_args; u8 nr_args;
@ -536,7 +529,7 @@ struct btf_func_model {
/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
* bytes on x86. Pick a number to fit into BPF_IMAGE_SIZE / 2 * bytes on x86. Pick a number to fit into BPF_IMAGE_SIZE / 2
*/ */
#define BPF_MAX_TRAMP_PROGS 40 #define BPF_MAX_TRAMP_PROGS 38
struct bpf_tramp_progs { struct bpf_tramp_progs {
struct bpf_prog *progs[BPF_MAX_TRAMP_PROGS]; struct bpf_prog *progs[BPF_MAX_TRAMP_PROGS];
@ -568,10 +561,10 @@ int arch_prepare_bpf_trampoline(void *image, void *image_end,
struct bpf_tramp_progs *tprogs, struct bpf_tramp_progs *tprogs,
void *orig_call); void *orig_call);
/* these two functions are called from generated trampoline */ /* these two functions are called from generated trampoline */
u64 notrace __bpf_prog_enter(void); u64 notrace __bpf_prog_enter(struct bpf_prog *prog);
void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start); void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start);
void notrace __bpf_prog_enter_sleepable(void); u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog);
void notrace __bpf_prog_exit_sleepable(void); void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start);
struct bpf_ksym { struct bpf_ksym {
unsigned long start; unsigned long start;
@ -845,7 +838,6 @@ struct bpf_prog_aux {
u32 linfo_idx; u32 linfo_idx;
u32 num_exentries; u32 num_exentries;
struct exception_table_entry *extable; struct exception_table_entry *extable;
struct bpf_prog_stats __percpu *stats;
union { union {
struct work_struct work; struct work_struct work;
struct rcu_head rcu; struct rcu_head rcu;
@ -1073,6 +1065,34 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array,
struct bpf_prog *include_prog, struct bpf_prog *include_prog,
struct bpf_prog_array **new_array); struct bpf_prog_array **new_array);
/* BPF program asks to bypass CAP_NET_BIND_SERVICE in bind. */
#define BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE (1 << 0)
/* BPF program asks to set CN on the packet. */
#define BPF_RET_SET_CN (1 << 0)
#define BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, ret_flags) \
({ \
struct bpf_prog_array_item *_item; \
struct bpf_prog *_prog; \
struct bpf_prog_array *_array; \
u32 _ret = 1; \
u32 func_ret; \
migrate_disable(); \
rcu_read_lock(); \
_array = rcu_dereference(array); \
_item = &_array->items[0]; \
while ((_prog = READ_ONCE(_item->prog))) { \
bpf_cgroup_storage_set(_item->cgroup_storage); \
func_ret = func(_prog, ctx); \
_ret &= (func_ret & 1); \
*(ret_flags) |= (func_ret >> 1); \
_item++; \
} \
rcu_read_unlock(); \
migrate_enable(); \
_ret; \
})
#define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null) \ #define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null) \
({ \ ({ \
struct bpf_prog_array_item *_item; \ struct bpf_prog_array_item *_item; \
@ -1120,25 +1140,11 @@ _out: \
*/ */
#define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \
({ \ ({ \
struct bpf_prog_array_item *_item; \ u32 _flags = 0; \
struct bpf_prog *_prog; \ bool _cn; \
struct bpf_prog_array *_array; \ u32 _ret; \
u32 ret; \ _ret = BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, &_flags); \
u32 _ret = 1; \ _cn = _flags & BPF_RET_SET_CN; \
u32 _cn = 0; \
migrate_disable(); \
rcu_read_lock(); \
_array = rcu_dereference(array); \
_item = &_array->items[0]; \
while ((_prog = READ_ONCE(_item->prog))) { \
bpf_cgroup_storage_set(_item->cgroup_storage); \
ret = func(_prog, ctx); \
_ret &= (ret & 1); \
_cn |= (ret & 2); \
_item++; \
} \
rcu_read_unlock(); \
migrate_enable(); \
if (_ret) \ if (_ret) \
_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \ _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \
else \ else \
@ -1276,6 +1282,11 @@ static inline bool bpf_allow_ptr_leaks(void)
return perfmon_capable(); return perfmon_capable();
} }
static inline bool bpf_allow_uninit_stack(void)
{
return perfmon_capable();
}
static inline bool bpf_allow_ptr_to_map_access(void) static inline bool bpf_allow_ptr_to_map_access(void)
{ {
return perfmon_capable(); return perfmon_capable();
@ -1874,6 +1885,7 @@ extern const struct bpf_func_proto bpf_per_cpu_ptr_proto;
extern const struct bpf_func_proto bpf_this_cpu_ptr_proto; extern const struct bpf_func_proto bpf_this_cpu_ptr_proto;
extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto; extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto;
extern const struct bpf_func_proto bpf_sock_from_file_proto; extern const struct bpf_func_proto bpf_sock_from_file_proto;
extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
const struct bpf_func_proto *bpf_tracing_func_proto( const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog); enum bpf_func_id func_id, const struct bpf_prog *prog);

View File

@ -195,7 +195,7 @@ struct bpf_func_state {
* 0 = main function, 1 = first callee. * 0 = main function, 1 = first callee.
*/ */
u32 frameno; u32 frameno;
/* subprog number == index within subprog_stack_depth /* subprog number == index within subprog_info
* zero == main subprog * zero == main subprog
*/ */
u32 subprogno; u32 subprogno;
@ -404,6 +404,7 @@ struct bpf_verifier_env {
u32 used_btf_cnt; /* number of used BTF objects */ u32 used_btf_cnt; /* number of used BTF objects */
u32 id_gen; /* used to generate unique reg IDs */ u32 id_gen; /* used to generate unique reg IDs */
bool allow_ptr_leaks; bool allow_ptr_leaks;
bool allow_uninit_stack;
bool allow_ptr_to_map_access; bool allow_ptr_to_map_access;
bool bpf_capable; bool bpf_capable;
bool bypass_spec_v1; bool bypass_spec_v1;
@ -470,6 +471,8 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt);
int check_ctx_reg(struct bpf_verifier_env *env, int check_ctx_reg(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno); const struct bpf_reg_state *reg, int regno);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno, u32 mem_size);
/* this lives here instead of in bpf.h because it needs to dereference tgt_prog */ /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog, static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,

View File

@ -22,6 +22,7 @@
#include <linux/vmalloc.h> #include <linux/vmalloc.h>
#include <linux/sockptr.h> #include <linux/sockptr.h>
#include <crypto/sha1.h> #include <crypto/sha1.h>
#include <linux/u64_stats_sync.h>
#include <net/sch_generic.h> #include <net/sch_generic.h>
@ -539,6 +540,13 @@ struct bpf_binary_header {
u8 image[] __aligned(BPF_IMAGE_ALIGNMENT); u8 image[] __aligned(BPF_IMAGE_ALIGNMENT);
}; };
struct bpf_prog_stats {
u64 cnt;
u64 nsecs;
u64 misses;
struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));
struct bpf_prog { struct bpf_prog {
u16 pages; /* Number of allocated pages */ u16 pages; /* Number of allocated pages */
u16 jited:1, /* Is our filter JIT'ed? */ u16 jited:1, /* Is our filter JIT'ed? */
@ -557,10 +565,12 @@ struct bpf_prog {
u32 len; /* Number of filter blocks */ u32 len; /* Number of filter blocks */
u32 jited_len; /* Size of jited insns in bytes */ u32 jited_len; /* Size of jited insns in bytes */
u8 tag[BPF_TAG_SIZE]; u8 tag[BPF_TAG_SIZE];
struct bpf_prog_aux *aux; /* Auxiliary fields */ struct bpf_prog_stats __percpu *stats;
struct sock_fprog_kern *orig_prog; /* Original BPF program */ int __percpu *active;
unsigned int (*bpf_func)(const void *ctx, unsigned int (*bpf_func)(const void *ctx,
const struct bpf_insn *insn); const struct bpf_insn *insn);
struct bpf_prog_aux *aux; /* Auxiliary fields */
struct sock_fprog_kern *orig_prog; /* Original BPF program */
/* Instructions for interpreter */ /* Instructions for interpreter */
struct sock_filter insns[0]; struct sock_filter insns[0];
struct bpf_insn insnsi[]; struct bpf_insn insnsi[];
@ -581,7 +591,7 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
struct bpf_prog_stats *__stats; \ struct bpf_prog_stats *__stats; \
u64 __start = sched_clock(); \ u64 __start = sched_clock(); \
__ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \
__stats = this_cpu_ptr(prog->aux->stats); \ __stats = this_cpu_ptr(prog->stats); \
u64_stats_update_begin(&__stats->syncp); \ u64_stats_update_begin(&__stats->syncp); \
__stats->cnt++; \ __stats->cnt++; \
__stats->nsecs += sched_clock() - __start; \ __stats->nsecs += sched_clock() - __start; \
@ -1298,6 +1308,11 @@ struct bpf_sysctl_kern {
u64 tmp_reg; u64 tmp_reg;
}; };
#define BPF_SOCKOPT_KERN_BUF_SIZE 32
struct bpf_sockopt_buf {
u8 data[BPF_SOCKOPT_KERN_BUF_SIZE];
};
struct bpf_sockopt_kern { struct bpf_sockopt_kern {
struct sock *sk; struct sock *sk;
u8 *optval; u8 *optval;

View File

@ -62,4 +62,10 @@
#define INDIRECT_CALL_INET(f, f2, f1, ...) f(__VA_ARGS__) #define INDIRECT_CALL_INET(f, f2, f1, ...) f(__VA_ARGS__)
#endif #endif
#if IS_ENABLED(CONFIG_INET)
#define INDIRECT_CALL_INET_1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
#else
#define INDIRECT_CALL_INET_1(f, f1, ...) f(__VA_ARGS__)
#endif
#endif #endif

View File

@ -3931,14 +3931,42 @@ int xdp_umem_query(struct net_device *dev, u16 queue_id);
int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb); int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
int dev_forward_skb(struct net_device *dev, struct sk_buff *skb); int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
int dev_forward_skb_nomtu(struct net_device *dev, struct sk_buff *skb);
bool is_skb_forwardable(const struct net_device *dev, bool is_skb_forwardable(const struct net_device *dev,
const struct sk_buff *skb); const struct sk_buff *skb);
static __always_inline bool __is_skb_forwardable(const struct net_device *dev,
const struct sk_buff *skb,
const bool check_mtu)
{
const u32 vlan_hdr_len = 4; /* VLAN_HLEN */
unsigned int len;
if (!(dev->flags & IFF_UP))
return false;
if (!check_mtu)
return true;
len = dev->mtu + dev->hard_header_len + vlan_hdr_len;
if (skb->len <= len)
return true;
/* if TSO is enabled, we don't care about the length as the packet
* could be forwarded without being segmented before
*/
if (skb_is_gso(skb))
return true;
return false;
}
static __always_inline int ____dev_forward_skb(struct net_device *dev, static __always_inline int ____dev_forward_skb(struct net_device *dev,
struct sk_buff *skb) struct sk_buff *skb,
const bool check_mtu)
{ {
if (skb_orphan_frags(skb, GFP_ATOMIC) || if (skb_orphan_frags(skb, GFP_ATOMIC) ||
unlikely(!is_skb_forwardable(dev, skb))) { unlikely(!__is_skb_forwardable(dev, skb, check_mtu))) {
atomic_long_inc(&dev->rx_dropped); atomic_long_inc(&dev->rx_dropped);
kfree_skb(skb); kfree_skb(skb);
return NET_RX_DROP; return NET_RX_DROP;

View File

@ -390,7 +390,6 @@ static inline struct sk_psock *sk_psock_get(struct sock *sk)
} }
void sk_psock_stop(struct sock *sk, struct sk_psock *psock); void sk_psock_stop(struct sock *sk, struct sk_psock *psock);
void sk_psock_destroy(struct rcu_head *rcu);
void sk_psock_drop(struct sock *sk, struct sk_psock *psock); void sk_psock_drop(struct sock *sk, struct sk_psock *psock);
static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock) static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock)

View File

@ -41,6 +41,8 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
#define BIND_WITH_LOCK (1 << 1) #define BIND_WITH_LOCK (1 << 1)
/* Called from BPF program. */ /* Called from BPF program. */
#define BIND_FROM_BPF (1 << 2) #define BIND_FROM_BPF (1 << 2)
/* Skip CAP_NET_BIND_SERVICE check. */
#define BIND_NO_CAP_NET_BIND_SERVICE (1 << 3)
int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
u32 flags); u32 flags);
int inet_getname(struct socket *sock, struct sockaddr *uaddr, int inet_getname(struct socket *sock, struct sockaddr *uaddr,

View File

@ -1174,6 +1174,8 @@ struct proto {
int (*backlog_rcv) (struct sock *sk, int (*backlog_rcv) (struct sock *sk,
struct sk_buff *skb); struct sk_buff *skb);
bool (*bpf_bypass_getsockopt)(int level,
int optname);
void (*release_cb)(struct sock *sk); void (*release_cb)(struct sock *sk);

View File

@ -403,6 +403,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait); struct poll_table_struct *wait);
int tcp_getsockopt(struct sock *sk, int level, int optname, int tcp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen); char __user *optval, int __user *optlen);
bool tcp_bpf_bypass_getsockopt(int level, int optname);
int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
unsigned int optlen); unsigned int optlen);
void tcp_set_keepalive(struct sock *sk, int val); void tcp_set_keepalive(struct sock *sk, int val);

View File

@ -164,6 +164,12 @@ void xdp_warn(const char *msg, const char *func, const int line);
#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__) #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp); struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct sk_buff *skb,
struct net_device *dev);
struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct net_device *dev);
int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp);
static inline static inline
void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp) void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)

View File

@ -55,8 +55,7 @@
/* tracepoints with more than 12 arguments will hit build error */ /* tracepoints with more than 12 arguments will hit build error */
#define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__) #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
#undef DECLARE_EVENT_CLASS #define __BPF_DECLARE_TRACE(call, proto, args) \
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
static notrace void \ static notrace void \
__bpf_trace_##call(void *__data, proto) \ __bpf_trace_##call(void *__data, proto) \
{ \ { \
@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto) \
CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
} }
#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
__BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
/* /*
* This part is compiled out, it is only here as a build time check * This part is compiled out, it is only here as a build time check
* to make sure that if the tracepoint handling changes, the * to make sure that if the tracepoint handling changes, the
@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \ #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
#undef DECLARE_TRACE
#define DECLARE_TRACE(call, proto, args) \
__BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args)) \
__DEFINE_EVENT(call, call, PARAMS(proto), PARAMS(args), 0)
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE) #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
#undef DEFINE_EVENT_WRITABLE #undef DEFINE_EVENT_WRITABLE

View File

@ -1656,22 +1656,30 @@ union bpf_attr {
* networking traffic statistics as it provides a global socket * networking traffic statistics as it provides a global socket
* identifier that can be assumed unique. * identifier that can be assumed unique.
* Return * Return
* A 8-byte long non-decreasing number on success, or 0 if the * A 8-byte long unique number on success, or 0 if the socket
* socket field is missing inside *skb*. * field is missing inside *skb*.
* *
* u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx) * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
* Description * Description
* Equivalent to bpf_get_socket_cookie() helper that accepts * Equivalent to bpf_get_socket_cookie() helper that accepts
* *skb*, but gets socket from **struct bpf_sock_addr** context. * *skb*, but gets socket from **struct bpf_sock_addr** context.
* Return * Return
* A 8-byte long non-decreasing number. * A 8-byte long unique number.
* *
* u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx) * u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
* Description * Description
* Equivalent to **bpf_get_socket_cookie**\ () helper that accepts * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
* *skb*, but gets socket from **struct bpf_sock_ops** context. * *skb*, but gets socket from **struct bpf_sock_ops** context.
* Return * Return
* A 8-byte long non-decreasing number. * A 8-byte long unique number.
*
* u64 bpf_get_socket_cookie(struct sock *sk)
* Description
* Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
* *sk*, but gets socket from a BTF **struct sock**. This helper
* also works for sleepable programs.
* Return
* A 8-byte long unique number or 0 if *sk* is NULL.
* *
* u32 bpf_get_socket_uid(struct sk_buff *skb) * u32 bpf_get_socket_uid(struct sk_buff *skb)
* Return * Return
@ -2231,6 +2239,9 @@ union bpf_attr {
* * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
* packet is not forwarded or needs assist from full stack * packet is not forwarded or needs assist from full stack
* *
* If lookup fails with BPF_FIB_LKUP_RET_FRAG_NEEDED, then the MTU
* was exceeded and output params->mtu_result contains the MTU.
*
* long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags) * long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
* Description * Description
* Add an entry to, or update a sockhash *map* referencing sockets. * Add an entry to, or update a sockhash *map* referencing sockets.
@ -3836,6 +3847,69 @@ union bpf_attr {
* Return * Return
* A pointer to a struct socket on success or NULL if the file is * A pointer to a struct socket on success or NULL if the file is
* not a socket. * not a socket.
*
* long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags)
* Description
* Check ctx packet size against exceeding MTU of net device (based
* on *ifindex*). This helper will likely be used in combination
* with helpers that adjust/change the packet size.
*
* The argument *len_diff* can be used for querying with a planned
* size change. This allows to check MTU prior to changing packet
* ctx. Providing an *len_diff* adjustment that is larger than the
* actual packet size (resulting in negative packet size) will in
* principle not exceed the MTU, why it is not considered a
* failure. Other BPF-helpers are needed for performing the
* planned size change, why the responsability for catch a negative
* packet size belong in those helpers.
*
* Specifying *ifindex* zero means the MTU check is performed
* against the current net device. This is practical if this isn't
* used prior to redirect.
*
* The Linux kernel route table can configure MTUs on a more
* specific per route level, which is not provided by this helper.
* For route level MTU checks use the **bpf_fib_lookup**\ ()
* helper.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** for tc cls_act programs.
*
* The *flags* argument can be a combination of one or more of the
* following values:
*
* **BPF_MTU_CHK_SEGS**
* This flag will only works for *ctx* **struct sk_buff**.
* If packet context contains extra packet segment buffers
* (often knows as GSO skb), then MTU check is harder to
* check at this point, because in transmit path it is
* possible for the skb packet to get re-segmented
* (depending on net device features). This could still be
* a MTU violation, so this flag enables performing MTU
* check against segments, with a different violation
* return code to tell it apart. Check cannot use len_diff.
*
* On return *mtu_len* pointer contains the MTU value of the net
* device. Remember the net device configured MTU is the L3 size,
* which is returned here and XDP and TX length operate at L2.
* Helper take this into account for you, but remember when using
* MTU value in your BPF-code. On input *mtu_len* must be a valid
* pointer and be initialized (to zero), else verifier will reject
* BPF program.
*
* Return
* * 0 on success, and populate MTU value in *mtu_len* pointer.
*
* * < 0 if any input argument is invalid (*mtu_len* not updated)
*
* MTU violations return positive values, but also populate MTU
* value in *mtu_len* pointer, as this can be needed for
* implementing PMTU handing:
*
* * **BPF_MTU_CHK_RET_FRAG_NEEDED**
* * **BPF_MTU_CHK_RET_SEGS_TOOBIG**
*
*/ */
#define __BPF_FUNC_MAPPER(FN) \ #define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \ FN(unspec), \
@ -4001,6 +4075,7 @@ union bpf_attr {
FN(ktime_get_coarse_ns), \ FN(ktime_get_coarse_ns), \
FN(ima_inode_hash), \ FN(ima_inode_hash), \
FN(sock_from_file), \ FN(sock_from_file), \
FN(check_mtu), \
/* */ /* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -4501,6 +4576,7 @@ struct bpf_prog_info {
__aligned_u64 prog_tags; __aligned_u64 prog_tags;
__u64 run_time_ns; __u64 run_time_ns;
__u64 run_cnt; __u64 run_cnt;
__u64 recursion_misses;
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
struct bpf_map_info { struct bpf_map_info {
@ -4981,9 +5057,13 @@ struct bpf_fib_lookup {
__be16 sport; __be16 sport;
__be16 dport; __be16 dport;
/* total length of packet from network header - used for MTU check */ union { /* used for MTU check */
__u16 tot_len; /* input to lookup */
__u16 tot_len; /* L3 length from network hdr (iph->tot_len) */
/* output: MTU value */
__u16 mtu_result;
};
/* input: L3 device index for lookup /* input: L3 device index for lookup
* output: device index from FIB lookup * output: device index from FIB lookup
*/ */
@ -5029,6 +5109,17 @@ struct bpf_redir_neigh {
}; };
}; };
/* bpf_check_mtu flags*/
enum bpf_check_mtu_flags {
BPF_MTU_CHK_SEGS = (1U << 0),
};
enum bpf_check_mtu_ret {
BPF_MTU_CHK_RET_SUCCESS, /* check and lookup successful */
BPF_MTU_CHK_RET_FRAG_NEEDED, /* fragmentation required to fwd */
BPF_MTU_CHK_RET_SEGS_TOOBIG, /* GSO re-segmentation needed to fwd */
};
enum bpf_task_fd_type { enum bpf_task_fd_type {
BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
BPF_FD_TYPE_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */

View File

@ -287,7 +287,7 @@ int bpf_iter_reg_target(const struct bpf_iter_reg *reg_info)
{ {
struct bpf_iter_target_info *tinfo; struct bpf_iter_target_info *tinfo;
tinfo = kmalloc(sizeof(*tinfo), GFP_KERNEL); tinfo = kzalloc(sizeof(*tinfo), GFP_KERNEL);
if (!tinfo) if (!tinfo)
return -ENOMEM; return -ENOMEM;

View File

@ -502,13 +502,14 @@ struct bpf_lru_node *bpf_lru_pop_free(struct bpf_lru *lru, u32 hash)
static void bpf_common_lru_push_free(struct bpf_lru *lru, static void bpf_common_lru_push_free(struct bpf_lru *lru,
struct bpf_lru_node *node) struct bpf_lru_node *node)
{ {
u8 node_type = READ_ONCE(node->type);
unsigned long flags; unsigned long flags;
if (WARN_ON_ONCE(node->type == BPF_LRU_LIST_T_FREE) || if (WARN_ON_ONCE(node_type == BPF_LRU_LIST_T_FREE) ||
WARN_ON_ONCE(node->type == BPF_LRU_LOCAL_LIST_T_FREE)) WARN_ON_ONCE(node_type == BPF_LRU_LOCAL_LIST_T_FREE))
return; return;
if (node->type == BPF_LRU_LOCAL_LIST_T_PENDING) { if (node_type == BPF_LRU_LOCAL_LIST_T_PENDING) {
struct bpf_lru_locallist *loc_l; struct bpf_lru_locallist *loc_l;
loc_l = per_cpu_ptr(lru->common_lru.local_list, node->cpu); loc_l = per_cpu_ptr(lru->common_lru.local_list, node->cpu);

View File

@ -3540,11 +3540,6 @@ static s32 btf_datasec_check_meta(struct btf_verifier_env *env,
return -EINVAL; return -EINVAL;
} }
if (!btf_type_vlen(t)) {
btf_verifier_log_type(env, t, "vlen == 0");
return -EINVAL;
}
if (!t->size) { if (!t->size) {
btf_verifier_log_type(env, t, "size == 0"); btf_verifier_log_type(env, t, "size == 0");
return -EINVAL; return -EINVAL;
@ -5296,15 +5291,16 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
* Only PTR_TO_CTX and SCALAR_VALUE states are recognized. * Only PTR_TO_CTX and SCALAR_VALUE states are recognized.
*/ */
int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog, int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *reg) struct bpf_reg_state *regs)
{ {
struct bpf_verifier_log *log = &env->log; struct bpf_verifier_log *log = &env->log;
struct bpf_prog *prog = env->prog; struct bpf_prog *prog = env->prog;
struct btf *btf = prog->aux->btf; struct btf *btf = prog->aux->btf;
const struct btf_param *args; const struct btf_param *args;
const struct btf_type *t; const struct btf_type *t, *ref_t;
u32 i, nargs, btf_id; u32 i, nargs, btf_id, type_size;
const char *tname; const char *tname;
bool is_global;
if (!prog->aux->func_info) if (!prog->aux->func_info)
return -EINVAL; return -EINVAL;
@ -5338,38 +5334,57 @@ int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
bpf_log(log, "Function %s has %d > 5 args\n", tname, nargs); bpf_log(log, "Function %s has %d > 5 args\n", tname, nargs);
goto out; goto out;
} }
is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
/* check that BTF function arguments match actual types that the /* check that BTF function arguments match actual types that the
* verifier sees. * verifier sees.
*/ */
for (i = 0; i < nargs; i++) { for (i = 0; i < nargs; i++) {
struct bpf_reg_state *reg = &regs[i + 1];
t = btf_type_by_id(btf, args[i].type); t = btf_type_by_id(btf, args[i].type);
while (btf_type_is_modifier(t)) while (btf_type_is_modifier(t))
t = btf_type_by_id(btf, t->type); t = btf_type_by_id(btf, t->type);
if (btf_type_is_int(t) || btf_type_is_enum(t)) { if (btf_type_is_int(t) || btf_type_is_enum(t)) {
if (reg[i + 1].type == SCALAR_VALUE) if (reg->type == SCALAR_VALUE)
continue; continue;
bpf_log(log, "R%d is not a scalar\n", i + 1); bpf_log(log, "R%d is not a scalar\n", i + 1);
goto out; goto out;
} }
if (btf_type_is_ptr(t)) { if (btf_type_is_ptr(t)) {
if (reg[i + 1].type == SCALAR_VALUE) {
bpf_log(log, "R%d is not a pointer\n", i + 1);
goto out;
}
/* If function expects ctx type in BTF check that caller /* If function expects ctx type in BTF check that caller
* is passing PTR_TO_CTX. * is passing PTR_TO_CTX.
*/ */
if (btf_get_prog_ctx_type(log, btf, t, prog->type, i)) { if (btf_get_prog_ctx_type(log, btf, t, prog->type, i)) {
if (reg[i + 1].type != PTR_TO_CTX) { if (reg->type != PTR_TO_CTX) {
bpf_log(log, bpf_log(log,
"arg#%d expected pointer to ctx, but got %s\n", "arg#%d expected pointer to ctx, but got %s\n",
i, btf_kind_str[BTF_INFO_KIND(t->info)]); i, btf_kind_str[BTF_INFO_KIND(t->info)]);
goto out; goto out;
} }
if (check_ctx_reg(env, &reg[i + 1], i + 1)) if (check_ctx_reg(env, reg, i + 1))
goto out; goto out;
continue; continue;
} }
if (!is_global)
goto out;
t = btf_type_skip_modifiers(btf, t->type, NULL);
ref_t = btf_resolve_size(btf, t, &type_size);
if (IS_ERR(ref_t)) {
bpf_log(log,
"arg#%d reference type('%s %s') size cannot be determined: %ld\n",
i, btf_type_str(t), btf_name_by_offset(btf, t->name_off),
PTR_ERR(ref_t));
goto out;
}
if (check_mem_reg(env, reg, i + 1, type_size))
goto out;
continue;
} }
bpf_log(log, "Unrecognized arg#%d type %s\n", bpf_log(log, "Unrecognized arg#%d type %s\n",
i, btf_kind_str[BTF_INFO_KIND(t->info)]); i, btf_kind_str[BTF_INFO_KIND(t->info)]);
@ -5393,14 +5408,14 @@ out:
* (either PTR_TO_CTX or SCALAR_VALUE). * (either PTR_TO_CTX or SCALAR_VALUE).
*/ */
int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *reg) struct bpf_reg_state *regs)
{ {
struct bpf_verifier_log *log = &env->log; struct bpf_verifier_log *log = &env->log;
struct bpf_prog *prog = env->prog; struct bpf_prog *prog = env->prog;
enum bpf_prog_type prog_type = prog->type; enum bpf_prog_type prog_type = prog->type;
struct btf *btf = prog->aux->btf; struct btf *btf = prog->aux->btf;
const struct btf_param *args; const struct btf_param *args;
const struct btf_type *t; const struct btf_type *t, *ref_t;
u32 i, nargs, btf_id; u32 i, nargs, btf_id;
const char *tname; const char *tname;
@ -5464,16 +5479,35 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
* Only PTR_TO_CTX and SCALAR are supported atm. * Only PTR_TO_CTX and SCALAR are supported atm.
*/ */
for (i = 0; i < nargs; i++) { for (i = 0; i < nargs; i++) {
struct bpf_reg_state *reg = &regs[i + 1];
t = btf_type_by_id(btf, args[i].type); t = btf_type_by_id(btf, args[i].type);
while (btf_type_is_modifier(t)) while (btf_type_is_modifier(t))
t = btf_type_by_id(btf, t->type); t = btf_type_by_id(btf, t->type);
if (btf_type_is_int(t) || btf_type_is_enum(t)) { if (btf_type_is_int(t) || btf_type_is_enum(t)) {
reg[i + 1].type = SCALAR_VALUE; reg->type = SCALAR_VALUE;
continue; continue;
} }
if (btf_type_is_ptr(t) && if (btf_type_is_ptr(t)) {
btf_get_prog_ctx_type(log, btf, t, prog_type, i)) { if (btf_get_prog_ctx_type(log, btf, t, prog_type, i)) {
reg[i + 1].type = PTR_TO_CTX; reg->type = PTR_TO_CTX;
continue;
}
t = btf_type_skip_modifiers(btf, t->type, NULL);
ref_t = btf_resolve_size(btf, t, &reg->mem_size);
if (IS_ERR(ref_t)) {
bpf_log(log,
"arg#%d reference type('%s %s') size cannot be determined: %ld\n",
i, btf_type_str(t), btf_name_by_offset(btf, t->name_off),
PTR_ERR(ref_t));
return -EINVAL;
}
reg->type = PTR_TO_MEM_OR_NULL;
reg->id = ++env->id_gen;
continue; continue;
} }
bpf_log(log, "Arg#%d type %s in %s() is not supported yet.\n", bpf_log(log, "Arg#%d type %s in %s() is not supported yet.\n",

View File

@ -19,7 +19,7 @@
#include "../cgroup/cgroup-internal.h" #include "../cgroup/cgroup-internal.h"
DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key); DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_BPF_ATTACH_TYPE);
EXPORT_SYMBOL(cgroup_bpf_enabled_key); EXPORT_SYMBOL(cgroup_bpf_enabled_key);
void cgroup_bpf_offline(struct cgroup *cgrp) void cgroup_bpf_offline(struct cgroup *cgrp)
@ -128,7 +128,7 @@ static void cgroup_bpf_release(struct work_struct *work)
if (pl->link) if (pl->link)
bpf_cgroup_link_auto_detach(pl->link); bpf_cgroup_link_auto_detach(pl->link);
kfree(pl); kfree(pl);
static_branch_dec(&cgroup_bpf_enabled_key); static_branch_dec(&cgroup_bpf_enabled_key[type]);
} }
old_array = rcu_dereference_protected( old_array = rcu_dereference_protected(
cgrp->bpf.effective[type], cgrp->bpf.effective[type],
@ -499,7 +499,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
if (old_prog) if (old_prog)
bpf_prog_put(old_prog); bpf_prog_put(old_prog);
else else
static_branch_inc(&cgroup_bpf_enabled_key); static_branch_inc(&cgroup_bpf_enabled_key[type]);
bpf_cgroup_storages_link(new_storage, cgrp, type); bpf_cgroup_storages_link(new_storage, cgrp, type);
return 0; return 0;
@ -698,7 +698,7 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
cgrp->bpf.flags[type] = 0; cgrp->bpf.flags[type] = 0;
if (old_prog) if (old_prog)
bpf_prog_put(old_prog); bpf_prog_put(old_prog);
static_branch_dec(&cgroup_bpf_enabled_key); static_branch_dec(&cgroup_bpf_enabled_key[type]);
return 0; return 0;
cleanup: cleanup:
@ -1055,6 +1055,8 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
* @uaddr: sockaddr struct provided by user * @uaddr: sockaddr struct provided by user
* @type: The type of program to be exectuted * @type: The type of program to be exectuted
* @t_ctx: Pointer to attach type specific context * @t_ctx: Pointer to attach type specific context
* @flags: Pointer to u32 which contains higher bits of BPF program
* return value (OR'ed together).
* *
* socket is expected to be of type INET or INET6. * socket is expected to be of type INET or INET6.
* *
@ -1064,7 +1066,8 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
struct sockaddr *uaddr, struct sockaddr *uaddr,
enum bpf_attach_type type, enum bpf_attach_type type,
void *t_ctx) void *t_ctx,
u32 *flags)
{ {
struct bpf_sock_addr_kern ctx = { struct bpf_sock_addr_kern ctx = {
.sk = sk, .sk = sk,
@ -1087,7 +1090,8 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
} }
cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN); ret = BPF_PROG_RUN_ARRAY_FLAGS(cgrp->bpf.effective[type], &ctx,
BPF_PROG_RUN, flags);
return ret == 1 ? 0 : -EPERM; return ret == 1 ? 0 : -EPERM;
} }
@ -1298,7 +1302,8 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
return empty; return empty;
} }
static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen) static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen,
struct bpf_sockopt_buf *buf)
{ {
if (unlikely(max_optlen < 0)) if (unlikely(max_optlen < 0))
return -EINVAL; return -EINVAL;
@ -1310,6 +1315,15 @@ static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
max_optlen = PAGE_SIZE; max_optlen = PAGE_SIZE;
} }
if (max_optlen <= sizeof(buf->data)) {
/* When the optval fits into BPF_SOCKOPT_KERN_BUF_SIZE
* bytes avoid the cost of kzalloc.
*/
ctx->optval = buf->data;
ctx->optval_end = ctx->optval + max_optlen;
return max_optlen;
}
ctx->optval = kzalloc(max_optlen, GFP_USER); ctx->optval = kzalloc(max_optlen, GFP_USER);
if (!ctx->optval) if (!ctx->optval)
return -ENOMEM; return -ENOMEM;
@ -1319,16 +1333,26 @@ static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
return max_optlen; return max_optlen;
} }
static void sockopt_free_buf(struct bpf_sockopt_kern *ctx) static void sockopt_free_buf(struct bpf_sockopt_kern *ctx,
struct bpf_sockopt_buf *buf)
{ {
if (ctx->optval == buf->data)
return;
kfree(ctx->optval); kfree(ctx->optval);
} }
static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx,
struct bpf_sockopt_buf *buf)
{
return ctx->optval != buf->data;
}
int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
int *optname, char __user *optval, int *optname, char __user *optval,
int *optlen, char **kernel_optval) int *optlen, char **kernel_optval)
{ {
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
struct bpf_sockopt_buf buf = {};
struct bpf_sockopt_kern ctx = { struct bpf_sockopt_kern ctx = {
.sk = sk, .sk = sk,
.level = *level, .level = *level,
@ -1340,8 +1364,7 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
* attached to the hook so we don't waste time allocating * attached to the hook so we don't waste time allocating
* memory and locking the socket. * memory and locking the socket.
*/ */
if (!cgroup_bpf_enabled || if (__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_SETSOCKOPT))
__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_SETSOCKOPT))
return 0; return 0;
/* Allocate a bit more than the initial user buffer for /* Allocate a bit more than the initial user buffer for
@ -1350,7 +1373,7 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
*/ */
max_optlen = max_t(int, 16, *optlen); max_optlen = max_t(int, 16, *optlen);
max_optlen = sockopt_alloc_buf(&ctx, max_optlen); max_optlen = sockopt_alloc_buf(&ctx, max_optlen, &buf);
if (max_optlen < 0) if (max_optlen < 0)
return max_optlen; return max_optlen;
@ -1390,14 +1413,31 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
*/ */
if (ctx.optlen != 0) { if (ctx.optlen != 0) {
*optlen = ctx.optlen; *optlen = ctx.optlen;
*kernel_optval = ctx.optval; /* We've used bpf_sockopt_kern->buf as an intermediary
* storage, but the BPF program indicates that we need
* to pass this data to the kernel setsockopt handler.
* No way to export on-stack buf, have to allocate a
* new buffer.
*/
if (!sockopt_buf_allocated(&ctx, &buf)) {
void *p = kmalloc(ctx.optlen, GFP_USER);
if (!p) {
ret = -ENOMEM;
goto out;
}
memcpy(p, ctx.optval, ctx.optlen);
*kernel_optval = p;
} else {
*kernel_optval = ctx.optval;
}
/* export and don't free sockopt buf */ /* export and don't free sockopt buf */
return 0; return 0;
} }
} }
out: out:
sockopt_free_buf(&ctx); sockopt_free_buf(&ctx, &buf);
return ret; return ret;
} }
@ -1407,6 +1447,7 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
int retval) int retval)
{ {
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
struct bpf_sockopt_buf buf = {};
struct bpf_sockopt_kern ctx = { struct bpf_sockopt_kern ctx = {
.sk = sk, .sk = sk,
.level = level, .level = level,
@ -1419,13 +1460,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
* attached to the hook so we don't waste time allocating * attached to the hook so we don't waste time allocating
* memory and locking the socket. * memory and locking the socket.
*/ */
if (!cgroup_bpf_enabled || if (__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_GETSOCKOPT))
__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_GETSOCKOPT))
return retval; return retval;
ctx.optlen = max_optlen; ctx.optlen = max_optlen;
max_optlen = sockopt_alloc_buf(&ctx, max_optlen); max_optlen = sockopt_alloc_buf(&ctx, max_optlen, &buf);
if (max_optlen < 0) if (max_optlen < 0)
return max_optlen; return max_optlen;
@ -1488,9 +1528,55 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
ret = ctx.retval; ret = ctx.retval;
out: out:
sockopt_free_buf(&ctx); sockopt_free_buf(&ctx, &buf);
return ret; return ret;
} }
int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
int optname, void *optval,
int *optlen, int retval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
struct bpf_sockopt_kern ctx = {
.sk = sk,
.level = level,
.optname = optname,
.retval = retval,
.optlen = *optlen,
.optval = optval,
.optval_end = optval + *optlen,
};
int ret;
/* Note that __cgroup_bpf_run_filter_getsockopt doesn't copy
* user data back into BPF buffer when reval != 0. This is
* done as an optimization to avoid extra copy, assuming
* kernel won't populate the data in case of an error.
* Here we always pass the data and memset() should
* be called if that data shouldn't be "exported".
*/
ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
&ctx, BPF_PROG_RUN);
if (!ret)
return -EPERM;
if (ctx.optlen > *optlen)
return -EFAULT;
/* BPF programs only allowed to set retval to 0, not some
* arbitrary value.
*/
if (ctx.retval != 0 && ctx.retval != retval)
return -EFAULT;
/* BPF programs can shrink the buffer, export the modifications.
*/
if (ctx.optlen != 0)
*optlen = ctx.optlen;
return ctx.retval;
}
#endif #endif
static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp, static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp,

View File

@ -91,6 +91,12 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
vfree(fp); vfree(fp);
return NULL; return NULL;
} }
fp->active = alloc_percpu_gfp(int, GFP_KERNEL_ACCOUNT | gfp_extra_flags);
if (!fp->active) {
vfree(fp);
kfree(aux);
return NULL;
}
fp->pages = size / PAGE_SIZE; fp->pages = size / PAGE_SIZE;
fp->aux = aux; fp->aux = aux;
@ -114,8 +120,9 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
if (!prog) if (!prog)
return NULL; return NULL;
prog->aux->stats = alloc_percpu_gfp(struct bpf_prog_stats, gfp_flags); prog->stats = alloc_percpu_gfp(struct bpf_prog_stats, gfp_flags);
if (!prog->aux->stats) { if (!prog->stats) {
free_percpu(prog->active);
kfree(prog->aux); kfree(prog->aux);
vfree(prog); vfree(prog);
return NULL; return NULL;
@ -124,7 +131,7 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
struct bpf_prog_stats *pstats; struct bpf_prog_stats *pstats;
pstats = per_cpu_ptr(prog->aux->stats, cpu); pstats = per_cpu_ptr(prog->stats, cpu);
u64_stats_init(&pstats->syncp); u64_stats_init(&pstats->syncp);
} }
return prog; return prog;
@ -238,6 +245,8 @@ struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
* reallocated structure. * reallocated structure.
*/ */
fp_old->aux = NULL; fp_old->aux = NULL;
fp_old->stats = NULL;
fp_old->active = NULL;
__bpf_prog_free(fp_old); __bpf_prog_free(fp_old);
} }
@ -249,10 +258,11 @@ void __bpf_prog_free(struct bpf_prog *fp)
if (fp->aux) { if (fp->aux) {
mutex_destroy(&fp->aux->used_maps_mutex); mutex_destroy(&fp->aux->used_maps_mutex);
mutex_destroy(&fp->aux->dst_mutex); mutex_destroy(&fp->aux->dst_mutex);
free_percpu(fp->aux->stats);
kfree(fp->aux->poke_tab); kfree(fp->aux->poke_tab);
kfree(fp->aux); kfree(fp->aux);
} }
free_percpu(fp->stats);
free_percpu(fp->active);
vfree(fp); vfree(fp);
} }

View File

@ -141,49 +141,6 @@ static void cpu_map_kthread_stop(struct work_struct *work)
kthread_stop(rcpu->kthread); kthread_stop(rcpu->kthread);
} }
static struct sk_buff *cpu_map_build_skb(struct xdp_frame *xdpf,
struct sk_buff *skb)
{
unsigned int hard_start_headroom;
unsigned int frame_size;
void *pkt_data_start;
/* Part of headroom was reserved to xdpf */
hard_start_headroom = sizeof(struct xdp_frame) + xdpf->headroom;
/* Memory size backing xdp_frame data already have reserved
* room for build_skb to place skb_shared_info in tailroom.
*/
frame_size = xdpf->frame_sz;
pkt_data_start = xdpf->data - hard_start_headroom;
skb = build_skb_around(skb, pkt_data_start, frame_size);
if (unlikely(!skb))
return NULL;
skb_reserve(skb, hard_start_headroom);
__skb_put(skb, xdpf->len);
if (xdpf->metasize)
skb_metadata_set(skb, xdpf->metasize);
/* Essential SKB info: protocol and skb->dev */
skb->protocol = eth_type_trans(skb, xdpf->dev_rx);
/* Optional SKB info, currently missing:
* - HW checksum info (skb->ip_summed)
* - HW RX hash (skb_set_hash)
* - RX ring dev queue index (skb_record_rx_queue)
*/
/* Until page_pool get SKB return path, release DMA here */
xdp_release_frame(xdpf);
/* Allow SKB to reuse area used by xdp_frame */
xdp_scrub_frame(xdpf);
return skb;
}
static void __cpu_map_ring_cleanup(struct ptr_ring *ring) static void __cpu_map_ring_cleanup(struct ptr_ring *ring)
{ {
/* The tear-down procedure should have made sure that queue is /* The tear-down procedure should have made sure that queue is
@ -350,7 +307,8 @@ static int cpu_map_kthread_run(void *data)
struct sk_buff *skb = skbs[i]; struct sk_buff *skb = skbs[i];
int ret; int ret;
skb = cpu_map_build_skb(xdpf, skb); skb = __xdp_build_skb_from_frame(xdpf, skb,
xdpf->dev_rx);
if (!skb) { if (!skb) {
xdp_return_frame(xdpf); xdp_return_frame(xdpf);
continue; continue;

View File

@ -802,9 +802,7 @@ static int dev_map_notification(struct notifier_block *notifier,
break; break;
/* will be freed in free_netdev() */ /* will be freed in free_netdev() */
netdev->xdp_bulkq = netdev->xdp_bulkq = alloc_percpu(struct xdp_dev_bulk_queue);
__alloc_percpu_gfp(sizeof(struct xdp_dev_bulk_queue),
sizeof(void *), GFP_ATOMIC);
if (!netdev->xdp_bulkq) if (!netdev->xdp_bulkq)
return NOTIFY_BAD; return NOTIFY_BAD;

View File

@ -161,7 +161,7 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
insn->dst_reg, insn->dst_reg,
insn->off, insn->src_reg); insn->off, insn->src_reg);
else if (BPF_MODE(insn->code) == BPF_ATOMIC && else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
(insn->imm == BPF_ADD || insn->imm == BPF_ADD || (insn->imm == BPF_ADD || insn->imm == BPF_AND ||
insn->imm == BPF_OR || insn->imm == BPF_XOR)) { insn->imm == BPF_OR || insn->imm == BPF_XOR)) {
verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) %s r%d\n", verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) %s r%d\n",
insn->code, insn->code,

View File

@ -1148,7 +1148,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
/* unknown flags */ /* unknown flags */
return -EINVAL; return -EINVAL;
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size; key_size = map->key_size;
@ -1202,7 +1202,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
/* unknown flags */ /* unknown flags */
return -EINVAL; return -EINVAL;
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size; key_size = map->key_size;

View File

@ -720,14 +720,6 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_spin_lock_proto; return &bpf_spin_lock_proto;
case BPF_FUNC_spin_unlock: case BPF_FUNC_spin_unlock:
return &bpf_spin_unlock_proto; return &bpf_spin_unlock_proto;
case BPF_FUNC_trace_printk:
if (!perfmon_capable())
return NULL;
return bpf_get_trace_printk_proto();
case BPF_FUNC_snprintf_btf:
if (!perfmon_capable())
return NULL;
return &bpf_snprintf_btf_proto;
case BPF_FUNC_jiffies64: case BPF_FUNC_jiffies64:
return &bpf_jiffies64_proto; return &bpf_jiffies64_proto;
case BPF_FUNC_per_cpu_ptr: case BPF_FUNC_per_cpu_ptr:
@ -742,6 +734,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return NULL; return NULL;
switch (func_id) { switch (func_id) {
case BPF_FUNC_trace_printk:
return bpf_get_trace_printk_proto();
case BPF_FUNC_get_current_task: case BPF_FUNC_get_current_task:
return &bpf_get_current_task_proto; return &bpf_get_current_task_proto;
case BPF_FUNC_probe_read_user: case BPF_FUNC_probe_read_user:
@ -752,6 +746,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_probe_read_user_str_proto; return &bpf_probe_read_user_str_proto;
case BPF_FUNC_probe_read_kernel_str: case BPF_FUNC_probe_read_kernel_str:
return &bpf_probe_read_kernel_str_proto; return &bpf_probe_read_kernel_str_proto;
case BPF_FUNC_snprintf_btf:
return &bpf_snprintf_btf_proto;
default: default:
return NULL; return NULL;
} }

View File

@ -1731,25 +1731,28 @@ static int bpf_prog_release(struct inode *inode, struct file *filp)
static void bpf_prog_get_stats(const struct bpf_prog *prog, static void bpf_prog_get_stats(const struct bpf_prog *prog,
struct bpf_prog_stats *stats) struct bpf_prog_stats *stats)
{ {
u64 nsecs = 0, cnt = 0; u64 nsecs = 0, cnt = 0, misses = 0;
int cpu; int cpu;
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
const struct bpf_prog_stats *st; const struct bpf_prog_stats *st;
unsigned int start; unsigned int start;
u64 tnsecs, tcnt; u64 tnsecs, tcnt, tmisses;
st = per_cpu_ptr(prog->aux->stats, cpu); st = per_cpu_ptr(prog->stats, cpu);
do { do {
start = u64_stats_fetch_begin_irq(&st->syncp); start = u64_stats_fetch_begin_irq(&st->syncp);
tnsecs = st->nsecs; tnsecs = st->nsecs;
tcnt = st->cnt; tcnt = st->cnt;
tmisses = st->misses;
} while (u64_stats_fetch_retry_irq(&st->syncp, start)); } while (u64_stats_fetch_retry_irq(&st->syncp, start));
nsecs += tnsecs; nsecs += tnsecs;
cnt += tcnt; cnt += tcnt;
misses += tmisses;
} }
stats->nsecs = nsecs; stats->nsecs = nsecs;
stats->cnt = cnt; stats->cnt = cnt;
stats->misses = misses;
} }
#ifdef CONFIG_PROC_FS #ifdef CONFIG_PROC_FS
@ -1768,14 +1771,16 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
"memlock:\t%llu\n" "memlock:\t%llu\n"
"prog_id:\t%u\n" "prog_id:\t%u\n"
"run_time_ns:\t%llu\n" "run_time_ns:\t%llu\n"
"run_cnt:\t%llu\n", "run_cnt:\t%llu\n"
"recursion_misses:\t%llu\n",
prog->type, prog->type,
prog->jited, prog->jited,
prog_tag, prog_tag,
prog->pages * 1ULL << PAGE_SHIFT, prog->pages * 1ULL << PAGE_SHIFT,
prog->aux->id, prog->aux->id,
stats.nsecs, stats.nsecs,
stats.cnt); stats.cnt,
stats.misses);
} }
#endif #endif
@ -3438,6 +3443,7 @@ static int bpf_prog_get_info_by_fd(struct file *file,
bpf_prog_get_stats(prog, &stats); bpf_prog_get_stats(prog, &stats);
info.run_time_ns = stats.nsecs; info.run_time_ns = stats.nsecs;
info.run_cnt = stats.cnt; info.run_cnt = stats.cnt;
info.recursion_misses = stats.misses;
if (!bpf_capable()) { if (!bpf_capable()) {
info.jited_prog_len = 0; info.jited_prog_len = 0;

View File

@ -286,9 +286,248 @@ static const struct seq_operations task_file_seq_ops = {
.show = task_file_seq_show, .show = task_file_seq_show,
}; };
struct bpf_iter_seq_task_vma_info {
/* The first field must be struct bpf_iter_seq_task_common.
* this is assumed by {init, fini}_seq_pidns() callback functions.
*/
struct bpf_iter_seq_task_common common;
struct task_struct *task;
struct vm_area_struct *vma;
u32 tid;
unsigned long prev_vm_start;
unsigned long prev_vm_end;
};
enum bpf_task_vma_iter_find_op {
task_vma_iter_first_vma, /* use mm->mmap */
task_vma_iter_next_vma, /* use curr_vma->vm_next */
task_vma_iter_find_vma, /* use find_vma() to find next vma */
};
static struct vm_area_struct *
task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
{
struct pid_namespace *ns = info->common.ns;
enum bpf_task_vma_iter_find_op op;
struct vm_area_struct *curr_vma;
struct task_struct *curr_task;
u32 curr_tid = info->tid;
/* If this function returns a non-NULL vma, it holds a reference to
* the task_struct, and holds read lock on vma->mm->mmap_lock.
* If this function returns NULL, it does not hold any reference or
* lock.
*/
if (info->task) {
curr_task = info->task;
curr_vma = info->vma;
/* In case of lock contention, drop mmap_lock to unblock
* the writer.
*
* After relock, call find(mm, prev_vm_end - 1) to find
* new vma to process.
*
* +------+------+-----------+
* | VMA1 | VMA2 | VMA3 |
* +------+------+-----------+
* | | | |
* 4k 8k 16k 400k
*
* For example, curr_vma == VMA2. Before unlock, we set
*
* prev_vm_start = 8k
* prev_vm_end = 16k
*
* There are a few cases:
*
* 1) VMA2 is freed, but VMA3 exists.
*
* find_vma() will return VMA3, just process VMA3.
*
* 2) VMA2 still exists.
*
* find_vma() will return VMA2, process VMA2->next.
*
* 3) no more vma in this mm.
*
* Process the next task.
*
* 4) find_vma() returns a different vma, VMA2'.
*
* 4.1) If VMA2 covers same range as VMA2', skip VMA2',
* because we already covered the range;
* 4.2) VMA2 and VMA2' covers different ranges, process
* VMA2'.
*/
if (mmap_lock_is_contended(curr_task->mm)) {
info->prev_vm_start = curr_vma->vm_start;
info->prev_vm_end = curr_vma->vm_end;
op = task_vma_iter_find_vma;
mmap_read_unlock(curr_task->mm);
if (mmap_read_lock_killable(curr_task->mm))
goto finish;
} else {
op = task_vma_iter_next_vma;
}
} else {
again:
curr_task = task_seq_get_next(ns, &curr_tid, true);
if (!curr_task) {
info->tid = curr_tid + 1;
goto finish;
}
if (curr_tid != info->tid) {
info->tid = curr_tid;
/* new task, process the first vma */
op = task_vma_iter_first_vma;
} else {
/* Found the same tid, which means the user space
* finished data in previous buffer and read more.
* We dropped mmap_lock before returning to user
* space, so it is necessary to use find_vma() to
* find the next vma to process.
*/
op = task_vma_iter_find_vma;
}
if (!curr_task->mm)
goto next_task;
if (mmap_read_lock_killable(curr_task->mm))
goto finish;
}
switch (op) {
case task_vma_iter_first_vma:
curr_vma = curr_task->mm->mmap;
break;
case task_vma_iter_next_vma:
curr_vma = curr_vma->vm_next;
break;
case task_vma_iter_find_vma:
/* We dropped mmap_lock so it is necessary to use find_vma
* to find the next vma. This is similar to the mechanism
* in show_smaps_rollup().
*/
curr_vma = find_vma(curr_task->mm, info->prev_vm_end - 1);
/* case 1) and 4.2) above just use curr_vma */
/* check for case 2) or case 4.1) above */
if (curr_vma &&
curr_vma->vm_start == info->prev_vm_start &&
curr_vma->vm_end == info->prev_vm_end)
curr_vma = curr_vma->vm_next;
break;
}
if (!curr_vma) {
/* case 3) above, or case 2) 4.1) with vma->next == NULL */
mmap_read_unlock(curr_task->mm);
goto next_task;
}
info->task = curr_task;
info->vma = curr_vma;
return curr_vma;
next_task:
put_task_struct(curr_task);
info->task = NULL;
curr_tid++;
goto again;
finish:
if (curr_task)
put_task_struct(curr_task);
info->task = NULL;
info->vma = NULL;
return NULL;
}
static void *task_vma_seq_start(struct seq_file *seq, loff_t *pos)
{
struct bpf_iter_seq_task_vma_info *info = seq->private;
struct vm_area_struct *vma;
vma = task_vma_seq_get_next(info);
if (vma && *pos == 0)
++*pos;
return vma;
}
static void *task_vma_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
struct bpf_iter_seq_task_vma_info *info = seq->private;
++*pos;
return task_vma_seq_get_next(info);
}
struct bpf_iter__task_vma {
__bpf_md_ptr(struct bpf_iter_meta *, meta);
__bpf_md_ptr(struct task_struct *, task);
__bpf_md_ptr(struct vm_area_struct *, vma);
};
DEFINE_BPF_ITER_FUNC(task_vma, struct bpf_iter_meta *meta,
struct task_struct *task, struct vm_area_struct *vma)
static int __task_vma_seq_show(struct seq_file *seq, bool in_stop)
{
struct bpf_iter_seq_task_vma_info *info = seq->private;
struct bpf_iter__task_vma ctx;
struct bpf_iter_meta meta;
struct bpf_prog *prog;
meta.seq = seq;
prog = bpf_iter_get_info(&meta, in_stop);
if (!prog)
return 0;
ctx.meta = &meta;
ctx.task = info->task;
ctx.vma = info->vma;
return bpf_iter_run_prog(prog, &ctx);
}
static int task_vma_seq_show(struct seq_file *seq, void *v)
{
return __task_vma_seq_show(seq, false);
}
static void task_vma_seq_stop(struct seq_file *seq, void *v)
{
struct bpf_iter_seq_task_vma_info *info = seq->private;
if (!v) {
(void)__task_vma_seq_show(seq, true);
} else {
/* info->vma has not been seen by the BPF program. If the
* user space reads more, task_vma_seq_get_next should
* return this vma again. Set prev_vm_start to ~0UL,
* so that we don't skip the vma returned by the next
* find_vma() (case task_vma_iter_find_vma in
* task_vma_seq_get_next()).
*/
info->prev_vm_start = ~0UL;
info->prev_vm_end = info->vma->vm_end;
mmap_read_unlock(info->task->mm);
put_task_struct(info->task);
info->task = NULL;
}
}
static const struct seq_operations task_vma_seq_ops = {
.start = task_vma_seq_start,
.next = task_vma_seq_next,
.stop = task_vma_seq_stop,
.show = task_vma_seq_show,
};
BTF_ID_LIST(btf_task_file_ids) BTF_ID_LIST(btf_task_file_ids)
BTF_ID(struct, task_struct) BTF_ID(struct, task_struct)
BTF_ID(struct, file) BTF_ID(struct, file)
BTF_ID(struct, vm_area_struct)
static const struct bpf_iter_seq_info task_seq_info = { static const struct bpf_iter_seq_info task_seq_info = {
.seq_ops = &task_seq_ops, .seq_ops = &task_seq_ops,
@ -328,6 +567,26 @@ static struct bpf_iter_reg task_file_reg_info = {
.seq_info = &task_file_seq_info, .seq_info = &task_file_seq_info,
}; };
static const struct bpf_iter_seq_info task_vma_seq_info = {
.seq_ops = &task_vma_seq_ops,
.init_seq_private = init_seq_pidns,
.fini_seq_private = fini_seq_pidns,
.seq_priv_size = sizeof(struct bpf_iter_seq_task_vma_info),
};
static struct bpf_iter_reg task_vma_reg_info = {
.target = "task_vma",
.feature = BPF_ITER_RESCHED,
.ctx_arg_info_size = 2,
.ctx_arg_info = {
{ offsetof(struct bpf_iter__task_vma, task),
PTR_TO_BTF_ID_OR_NULL },
{ offsetof(struct bpf_iter__task_vma, vma),
PTR_TO_BTF_ID_OR_NULL },
},
.seq_info = &task_vma_seq_info,
};
static int __init task_iter_init(void) static int __init task_iter_init(void)
{ {
int ret; int ret;
@ -339,6 +598,12 @@ static int __init task_iter_init(void)
task_file_reg_info.ctx_arg_info[0].btf_id = btf_task_file_ids[0]; task_file_reg_info.ctx_arg_info[0].btf_id = btf_task_file_ids[0];
task_file_reg_info.ctx_arg_info[1].btf_id = btf_task_file_ids[1]; task_file_reg_info.ctx_arg_info[1].btf_id = btf_task_file_ids[1];
return bpf_iter_reg_target(&task_file_reg_info); ret = bpf_iter_reg_target(&task_file_reg_info);
if (ret)
return ret;
task_vma_reg_info.ctx_arg_info[0].btf_id = btf_task_file_ids[0];
task_vma_reg_info.ctx_arg_info[1].btf_id = btf_task_file_ids[2];
return bpf_iter_reg_target(&task_vma_reg_info);
} }
late_initcall(task_iter_init); late_initcall(task_iter_init);

View File

@ -381,55 +381,100 @@ out:
mutex_unlock(&trampoline_mutex); mutex_unlock(&trampoline_mutex);
} }
/* The logic is similar to BPF_PROG_RUN, but with an explicit #define NO_START_TIME 1
* rcu_read_lock() and migrate_disable() which are required static u64 notrace bpf_prog_start_time(void)
* for the trampoline. The macro is split into
* call _bpf_prog_enter
* call prog->bpf_func
* call __bpf_prog_exit
*/
u64 notrace __bpf_prog_enter(void)
__acquires(RCU)
{ {
u64 start = 0; u64 start = NO_START_TIME;
rcu_read_lock(); if (static_branch_unlikely(&bpf_stats_enabled_key)) {
migrate_disable();
if (static_branch_unlikely(&bpf_stats_enabled_key))
start = sched_clock(); start = sched_clock();
if (unlikely(!start))
start = NO_START_TIME;
}
return start; return start;
} }
void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start) static void notrace inc_misses_counter(struct bpf_prog *prog)
__releases(RCU) {
struct bpf_prog_stats *stats;
stats = this_cpu_ptr(prog->stats);
u64_stats_update_begin(&stats->syncp);
stats->misses++;
u64_stats_update_end(&stats->syncp);
}
/* The logic is similar to BPF_PROG_RUN, but with an explicit
* rcu_read_lock() and migrate_disable() which are required
* for the trampoline. The macro is split into
* call __bpf_prog_enter
* call prog->bpf_func
* call __bpf_prog_exit
*
* __bpf_prog_enter returns:
* 0 - skip execution of the bpf prog
* 1 - execute bpf prog
* [2..MAX_U64] - excute bpf prog and record execution time.
* This is start time.
*/
u64 notrace __bpf_prog_enter(struct bpf_prog *prog)
__acquires(RCU)
{
rcu_read_lock();
migrate_disable();
if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
inc_misses_counter(prog);
return 0;
}
return bpf_prog_start_time();
}
static void notrace update_prog_stats(struct bpf_prog *prog,
u64 start)
{ {
struct bpf_prog_stats *stats; struct bpf_prog_stats *stats;
if (static_branch_unlikely(&bpf_stats_enabled_key) && if (static_branch_unlikely(&bpf_stats_enabled_key) &&
/* static_key could be enabled in __bpf_prog_enter /* static_key could be enabled in __bpf_prog_enter*
* and disabled in __bpf_prog_exit. * and disabled in __bpf_prog_exit*.
* And vice versa. * And vice versa.
* Hence check that 'start' is not zero. * Hence check that 'start' is valid.
*/ */
start) { start > NO_START_TIME) {
stats = this_cpu_ptr(prog->aux->stats); stats = this_cpu_ptr(prog->stats);
u64_stats_update_begin(&stats->syncp); u64_stats_update_begin(&stats->syncp);
stats->cnt++; stats->cnt++;
stats->nsecs += sched_clock() - start; stats->nsecs += sched_clock() - start;
u64_stats_update_end(&stats->syncp); u64_stats_update_end(&stats->syncp);
} }
}
void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
__releases(RCU)
{
update_prog_stats(prog, start);
__this_cpu_dec(*(prog->active));
migrate_enable(); migrate_enable();
rcu_read_unlock(); rcu_read_unlock();
} }
void notrace __bpf_prog_enter_sleepable(void) u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog)
{ {
rcu_read_lock_trace(); rcu_read_lock_trace();
migrate_disable();
might_fault(); might_fault();
if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
inc_misses_counter(prog);
return 0;
}
return bpf_prog_start_time();
} }
void notrace __bpf_prog_exit_sleepable(void) void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start)
{ {
update_prog_stats(prog, start);
__this_cpu_dec(*(prog->active));
migrate_enable();
rcu_read_unlock_trace(); rcu_read_unlock_trace();
} }

File diff suppressed because it is too large Load Diff

View File

@ -1188,6 +1188,10 @@ BTF_SET_END(btf_allowlist_d_path)
static bool bpf_d_path_allowed(const struct bpf_prog *prog) static bool bpf_d_path_allowed(const struct bpf_prog *prog)
{ {
if (prog->type == BPF_PROG_TYPE_TRACING &&
prog->expected_attach_type == BPF_TRACE_ITER)
return true;
if (prog->type == BPF_PROG_TYPE_LSM) if (prog->type == BPF_PROG_TYPE_LSM)
return bpf_lsm_is_sleepable_hook(prog->aux->attach_btf_id); return bpf_lsm_is_sleepable_hook(prog->aux->attach_btf_id);
@ -1757,6 +1761,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_sk_storage_delete_tracing_proto; return &bpf_sk_storage_delete_tracing_proto;
case BPF_FUNC_sock_from_file: case BPF_FUNC_sock_from_file:
return &bpf_sock_from_file_proto; return &bpf_sock_from_file_proto;
case BPF_FUNC_get_socket_cookie:
return &bpf_get_socket_ptr_cookie_proto;
#endif #endif
case BPF_FUNC_seq_printf: case BPF_FUNC_seq_printf:
return prog->expected_attach_type == BPF_TRACE_ITER ? return prog->expected_attach_type == BPF_TRACE_ITER ?

View File

@ -345,7 +345,7 @@ static int __bpf_fill_ja(struct bpf_test *self, unsigned int len,
static int bpf_fill_maxinsns11(struct bpf_test *self) static int bpf_fill_maxinsns11(struct bpf_test *self)
{ {
/* Hits 70 passes on x86_64, so cannot get JITed there. */ /* Hits 70 passes on x86_64 and triggers NOPs padding. */
return __bpf_fill_ja(self, BPF_MAXINSNS, 68); return __bpf_fill_ja(self, BPF_MAXINSNS, 68);
} }
@ -5318,15 +5318,10 @@ static struct bpf_test tests[] = {
{ {
"BPF_MAXINSNS: Jump, gap, jump, ...", "BPF_MAXINSNS: Jump, gap, jump, ...",
{ }, { },
#if defined(CONFIG_BPF_JIT_ALWAYS_ON) && defined(CONFIG_X86)
CLASSIC | FLAG_NO_DATA | FLAG_EXPECTED_FAIL,
#else
CLASSIC | FLAG_NO_DATA, CLASSIC | FLAG_NO_DATA,
#endif
{ }, { },
{ { 0, 0xababcbac } }, { { 0, 0xababcbac } },
.fill_helper = bpf_fill_maxinsns11, .fill_helper = bpf_fill_maxinsns11,
.expected_errcode = -ENOTSUPP,
}, },
{ {
"BPF_MAXINSNS: jump over MSH", "BPF_MAXINSNS: jump over MSH",

View File

@ -2217,28 +2217,14 @@ static inline void net_timestamp_set(struct sk_buff *skb)
bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb) bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
{ {
unsigned int len; return __is_skb_forwardable(dev, skb, true);
if (!(dev->flags & IFF_UP))
return false;
len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
if (skb->len <= len)
return true;
/* if TSO is enabled, we don't care about the length as the packet
* could be forwarded without being segmented before
*/
if (skb_is_gso(skb))
return true;
return false;
} }
EXPORT_SYMBOL_GPL(is_skb_forwardable); EXPORT_SYMBOL_GPL(is_skb_forwardable);
int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb) static int __dev_forward_skb2(struct net_device *dev, struct sk_buff *skb,
bool check_mtu)
{ {
int ret = ____dev_forward_skb(dev, skb); int ret = ____dev_forward_skb(dev, skb, check_mtu);
if (likely(!ret)) { if (likely(!ret)) {
skb->protocol = eth_type_trans(skb, dev); skb->protocol = eth_type_trans(skb, dev);
@ -2247,6 +2233,11 @@ int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
return ret; return ret;
} }
int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
{
return __dev_forward_skb2(dev, skb, true);
}
EXPORT_SYMBOL_GPL(__dev_forward_skb); EXPORT_SYMBOL_GPL(__dev_forward_skb);
/** /**
@ -2273,6 +2264,11 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
} }
EXPORT_SYMBOL_GPL(dev_forward_skb); EXPORT_SYMBOL_GPL(dev_forward_skb);
int dev_forward_skb_nomtu(struct net_device *dev, struct sk_buff *skb)
{
return __dev_forward_skb2(dev, skb, false) ?: netif_rx_internal(skb);
}
static inline int deliver_skb(struct sk_buff *skb, static inline int deliver_skb(struct sk_buff *skb,
struct packet_type *pt_prev, struct packet_type *pt_prev,
struct net_device *orig_dev) struct net_device *orig_dev)

View File

@ -2083,13 +2083,13 @@ static const struct bpf_func_proto bpf_csum_level_proto = {
static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb) static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
{ {
return dev_forward_skb(dev, skb); return dev_forward_skb_nomtu(dev, skb);
} }
static inline int __bpf_rx_skb_no_mac(struct net_device *dev, static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
struct sk_buff *skb) struct sk_buff *skb)
{ {
int ret = ____dev_forward_skb(dev, skb); int ret = ____dev_forward_skb(dev, skb, false);
if (likely(!ret)) { if (likely(!ret)) {
skb->dev = dev; skb->dev = dev;
@ -2480,7 +2480,7 @@ int skb_do_redirect(struct sk_buff *skb)
goto out_drop; goto out_drop;
dev = ops->ndo_get_peer_dev(dev); dev = ops->ndo_get_peer_dev(dev);
if (unlikely(!dev || if (unlikely(!dev ||
!is_skb_forwardable(dev, skb) || !(dev->flags & IFF_UP) ||
net_eq(net, dev_net(dev)))) net_eq(net, dev_net(dev))))
goto out_drop; goto out_drop;
skb->dev = dev; skb->dev = dev;
@ -3552,11 +3552,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
return 0; return 0;
} }
static u32 __bpf_skb_max_len(const struct sk_buff *skb) #define BPF_SKB_MAX_LEN SKB_MAX_ALLOC
{
return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
SKB_MAX_ALLOC;
}
BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
u32, mode, u64, flags) u32, mode, u64, flags)
@ -3605,7 +3601,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
{ {
u32 len_cur, len_diff_abs = abs(len_diff); u32 len_cur, len_diff_abs = abs(len_diff);
u32 len_min = bpf_skb_net_base_len(skb); u32 len_min = bpf_skb_net_base_len(skb);
u32 len_max = __bpf_skb_max_len(skb); u32 len_max = BPF_SKB_MAX_LEN;
__be16 proto = skb->protocol; __be16 proto = skb->protocol;
bool shrink = len_diff < 0; bool shrink = len_diff < 0;
u32 off; u32 off;
@ -3688,7 +3684,7 @@ static int bpf_skb_trim_rcsum(struct sk_buff *skb, unsigned int new_len)
static inline int __bpf_skb_change_tail(struct sk_buff *skb, u32 new_len, static inline int __bpf_skb_change_tail(struct sk_buff *skb, u32 new_len,
u64 flags) u64 flags)
{ {
u32 max_len = __bpf_skb_max_len(skb); u32 max_len = BPF_SKB_MAX_LEN;
u32 min_len = __bpf_skb_min_len(skb); u32 min_len = __bpf_skb_min_len(skb);
int ret; int ret;
@ -3764,7 +3760,7 @@ static const struct bpf_func_proto sk_skb_change_tail_proto = {
static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room, static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
u64 flags) u64 flags)
{ {
u32 max_len = __bpf_skb_max_len(skb); u32 max_len = BPF_SKB_MAX_LEN;
u32 new_len = skb->len + head_room; u32 new_len = skb->len + head_room;
int ret; int ret;
@ -4631,6 +4627,18 @@ static const struct bpf_func_proto bpf_get_socket_cookie_sock_proto = {
.arg1_type = ARG_PTR_TO_CTX, .arg1_type = ARG_PTR_TO_CTX,
}; };
BPF_CALL_1(bpf_get_socket_ptr_cookie, struct sock *, sk)
{
return sk ? sock_gen_cookie(sk) : 0;
}
const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto = {
.func = bpf_get_socket_ptr_cookie,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
};
BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx) BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx)
{ {
return __sock_gen_cookie(ctx->sk); return __sock_gen_cookie(ctx->sk);
@ -5291,12 +5299,14 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
#if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6) #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
const struct neighbour *neigh, const struct neighbour *neigh,
const struct net_device *dev) const struct net_device *dev, u32 mtu)
{ {
memcpy(params->dmac, neigh->ha, ETH_ALEN); memcpy(params->dmac, neigh->ha, ETH_ALEN);
memcpy(params->smac, dev->dev_addr, ETH_ALEN); memcpy(params->smac, dev->dev_addr, ETH_ALEN);
params->h_vlan_TCI = 0; params->h_vlan_TCI = 0;
params->h_vlan_proto = 0; params->h_vlan_proto = 0;
if (mtu)
params->mtu_result = mtu; /* union with tot_len */
return 0; return 0;
} }
@ -5312,8 +5322,8 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
struct net_device *dev; struct net_device *dev;
struct fib_result res; struct fib_result res;
struct flowi4 fl4; struct flowi4 fl4;
u32 mtu = 0;
int err; int err;
u32 mtu;
dev = dev_get_by_index_rcu(net, params->ifindex); dev = dev_get_by_index_rcu(net, params->ifindex);
if (unlikely(!dev)) if (unlikely(!dev))
@ -5380,8 +5390,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
if (check_mtu) { if (check_mtu) {
mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst); mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
if (params->tot_len > mtu) if (params->tot_len > mtu) {
params->mtu_result = mtu; /* union with tot_len */
return BPF_FIB_LKUP_RET_FRAG_NEEDED; return BPF_FIB_LKUP_RET_FRAG_NEEDED;
}
} }
nhc = res.nhc; nhc = res.nhc;
@ -5415,7 +5427,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
if (!neigh) if (!neigh)
return BPF_FIB_LKUP_RET_NO_NEIGH; return BPF_FIB_LKUP_RET_NO_NEIGH;
return bpf_fib_set_fwd_params(params, neigh, dev); return bpf_fib_set_fwd_params(params, neigh, dev, mtu);
} }
#endif #endif
@ -5432,7 +5444,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
struct flowi6 fl6; struct flowi6 fl6;
int strict = 0; int strict = 0;
int oif, err; int oif, err;
u32 mtu; u32 mtu = 0;
/* link local addresses are never forwarded */ /* link local addresses are never forwarded */
if (rt6_need_strict(dst) || rt6_need_strict(src)) if (rt6_need_strict(dst) || rt6_need_strict(src))
@ -5507,8 +5519,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
if (check_mtu) { if (check_mtu) {
mtu = ipv6_stub->ip6_mtu_from_fib6(&res, dst, src); mtu = ipv6_stub->ip6_mtu_from_fib6(&res, dst, src);
if (params->tot_len > mtu) if (params->tot_len > mtu) {
params->mtu_result = mtu; /* union with tot_len */
return BPF_FIB_LKUP_RET_FRAG_NEEDED; return BPF_FIB_LKUP_RET_FRAG_NEEDED;
}
} }
if (res.nh->fib_nh_lws) if (res.nh->fib_nh_lws)
@ -5528,7 +5542,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
if (!neigh) if (!neigh)
return BPF_FIB_LKUP_RET_NO_NEIGH; return BPF_FIB_LKUP_RET_NO_NEIGH;
return bpf_fib_set_fwd_params(params, neigh, dev); return bpf_fib_set_fwd_params(params, neigh, dev, mtu);
} }
#endif #endif
@ -5571,6 +5585,7 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
{ {
struct net *net = dev_net(skb->dev); struct net *net = dev_net(skb->dev);
int rc = -EAFNOSUPPORT; int rc = -EAFNOSUPPORT;
bool check_mtu = false;
if (plen < sizeof(*params)) if (plen < sizeof(*params))
return -EINVAL; return -EINVAL;
@ -5578,25 +5593,33 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT))
return -EINVAL; return -EINVAL;
if (params->tot_len)
check_mtu = true;
switch (params->family) { switch (params->family) {
#if IS_ENABLED(CONFIG_INET) #if IS_ENABLED(CONFIG_INET)
case AF_INET: case AF_INET:
rc = bpf_ipv4_fib_lookup(net, params, flags, false); rc = bpf_ipv4_fib_lookup(net, params, flags, check_mtu);
break; break;
#endif #endif
#if IS_ENABLED(CONFIG_IPV6) #if IS_ENABLED(CONFIG_IPV6)
case AF_INET6: case AF_INET6:
rc = bpf_ipv6_fib_lookup(net, params, flags, false); rc = bpf_ipv6_fib_lookup(net, params, flags, check_mtu);
break; break;
#endif #endif
} }
if (!rc) { if (rc == BPF_FIB_LKUP_RET_SUCCESS && !check_mtu) {
struct net_device *dev; struct net_device *dev;
/* When tot_len isn't provided by user, check skb
* against MTU of FIB lookup resulting net_device
*/
dev = dev_get_by_index_rcu(net, params->ifindex); dev = dev_get_by_index_rcu(net, params->ifindex);
if (!is_skb_forwardable(dev, skb)) if (!is_skb_forwardable(dev, skb))
rc = BPF_FIB_LKUP_RET_FRAG_NEEDED; rc = BPF_FIB_LKUP_RET_FRAG_NEEDED;
params->mtu_result = dev->mtu; /* union with tot_len */
} }
return rc; return rc;
@ -5612,6 +5635,116 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
.arg4_type = ARG_ANYTHING, .arg4_type = ARG_ANYTHING,
}; };
static struct net_device *__dev_via_ifindex(struct net_device *dev_curr,
u32 ifindex)
{
struct net *netns = dev_net(dev_curr);
/* Non-redirect use-cases can use ifindex=0 and save ifindex lookup */
if (ifindex == 0)
return dev_curr;
return dev_get_by_index_rcu(netns, ifindex);
}
BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb,
u32, ifindex, u32 *, mtu_len, s32, len_diff, u64, flags)
{
int ret = BPF_MTU_CHK_RET_FRAG_NEEDED;
struct net_device *dev = skb->dev;
int skb_len, dev_len;
int mtu;
if (unlikely(flags & ~(BPF_MTU_CHK_SEGS)))
return -EINVAL;
if (unlikely(flags & BPF_MTU_CHK_SEGS && len_diff))
return -EINVAL;
dev = __dev_via_ifindex(dev, ifindex);
if (unlikely(!dev))
return -ENODEV;
mtu = READ_ONCE(dev->mtu);
dev_len = mtu + dev->hard_header_len;
skb_len = skb->len + len_diff; /* minus result pass check */
if (skb_len <= dev_len) {
ret = BPF_MTU_CHK_RET_SUCCESS;
goto out;
}
/* At this point, skb->len exceed MTU, but as it include length of all
* segments, it can still be below MTU. The SKB can possibly get
* re-segmented in transmit path (see validate_xmit_skb). Thus, user
* must choose if segs are to be MTU checked.
*/
if (skb_is_gso(skb)) {
ret = BPF_MTU_CHK_RET_SUCCESS;
if (flags & BPF_MTU_CHK_SEGS &&
!skb_gso_validate_network_len(skb, mtu))
ret = BPF_MTU_CHK_RET_SEGS_TOOBIG;
}
out:
/* BPF verifier guarantees valid pointer */
*mtu_len = mtu;
return ret;
}
BPF_CALL_5(bpf_xdp_check_mtu, struct xdp_buff *, xdp,
u32, ifindex, u32 *, mtu_len, s32, len_diff, u64, flags)
{
struct net_device *dev = xdp->rxq->dev;
int xdp_len = xdp->data_end - xdp->data;
int ret = BPF_MTU_CHK_RET_SUCCESS;
int mtu, dev_len;
/* XDP variant doesn't support multi-buffer segment check (yet) */
if (unlikely(flags))
return -EINVAL;
dev = __dev_via_ifindex(dev, ifindex);
if (unlikely(!dev))
return -ENODEV;
mtu = READ_ONCE(dev->mtu);
/* Add L2-header as dev MTU is L3 size */
dev_len = mtu + dev->hard_header_len;
xdp_len += len_diff; /* minus result pass check */
if (xdp_len > dev_len)
ret = BPF_MTU_CHK_RET_FRAG_NEEDED;
/* BPF verifier guarantees valid pointer */
*mtu_len = mtu;
return ret;
}
static const struct bpf_func_proto bpf_skb_check_mtu_proto = {
.func = bpf_skb_check_mtu,
.gpl_only = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
.arg2_type = ARG_ANYTHING,
.arg3_type = ARG_PTR_TO_INT,
.arg4_type = ARG_ANYTHING,
.arg5_type = ARG_ANYTHING,
};
static const struct bpf_func_proto bpf_xdp_check_mtu_proto = {
.func = bpf_xdp_check_mtu,
.gpl_only = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
.arg2_type = ARG_ANYTHING,
.arg3_type = ARG_PTR_TO_INT,
.arg4_type = ARG_ANYTHING,
.arg5_type = ARG_ANYTHING,
};
#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len) static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
{ {
@ -7021,6 +7154,14 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET6_BIND:
case BPF_CGROUP_INET4_CONNECT: case BPF_CGROUP_INET4_CONNECT:
case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_INET6_CONNECT:
case BPF_CGROUP_UDP4_RECVMSG:
case BPF_CGROUP_UDP6_RECVMSG:
case BPF_CGROUP_UDP4_SENDMSG:
case BPF_CGROUP_UDP6_SENDMSG:
case BPF_CGROUP_INET4_GETPEERNAME:
case BPF_CGROUP_INET6_GETPEERNAME:
case BPF_CGROUP_INET4_GETSOCKNAME:
case BPF_CGROUP_INET6_GETSOCKNAME:
return &bpf_sock_addr_setsockopt_proto; return &bpf_sock_addr_setsockopt_proto;
default: default:
return NULL; return NULL;
@ -7031,6 +7172,14 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET6_BIND:
case BPF_CGROUP_INET4_CONNECT: case BPF_CGROUP_INET4_CONNECT:
case BPF_CGROUP_INET6_CONNECT: case BPF_CGROUP_INET6_CONNECT:
case BPF_CGROUP_UDP4_RECVMSG:
case BPF_CGROUP_UDP6_RECVMSG:
case BPF_CGROUP_UDP4_SENDMSG:
case BPF_CGROUP_UDP6_SENDMSG:
case BPF_CGROUP_INET4_GETPEERNAME:
case BPF_CGROUP_INET6_GETPEERNAME:
case BPF_CGROUP_INET4_GETSOCKNAME:
case BPF_CGROUP_INET6_GETSOCKNAME:
return &bpf_sock_addr_getsockopt_proto; return &bpf_sock_addr_getsockopt_proto;
default: default:
return NULL; return NULL;
@ -7181,6 +7330,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_socket_uid_proto; return &bpf_get_socket_uid_proto;
case BPF_FUNC_fib_lookup: case BPF_FUNC_fib_lookup:
return &bpf_skb_fib_lookup_proto; return &bpf_skb_fib_lookup_proto;
case BPF_FUNC_check_mtu:
return &bpf_skb_check_mtu_proto;
case BPF_FUNC_sk_fullsock: case BPF_FUNC_sk_fullsock:
return &bpf_sk_fullsock_proto; return &bpf_sk_fullsock_proto;
case BPF_FUNC_sk_storage_get: case BPF_FUNC_sk_storage_get:
@ -7250,6 +7401,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_xdp_adjust_tail_proto; return &bpf_xdp_adjust_tail_proto;
case BPF_FUNC_fib_lookup: case BPF_FUNC_fib_lookup:
return &bpf_xdp_fib_lookup_proto; return &bpf_xdp_fib_lookup_proto;
case BPF_FUNC_check_mtu:
return &bpf_xdp_check_mtu_proto;
#ifdef CONFIG_INET #ifdef CONFIG_INET
case BPF_FUNC_sk_lookup_udp: case BPF_FUNC_sk_lookup_udp:
return &bpf_xdp_sk_lookup_udp_proto; return &bpf_xdp_sk_lookup_udp_proto;

View File

@ -669,14 +669,13 @@ static void sk_psock_destroy_deferred(struct work_struct *gc)
kfree(psock); kfree(psock);
} }
void sk_psock_destroy(struct rcu_head *rcu) static void sk_psock_destroy(struct rcu_head *rcu)
{ {
struct sk_psock *psock = container_of(rcu, struct sk_psock, rcu); struct sk_psock *psock = container_of(rcu, struct sk_psock, rcu);
INIT_WORK(&psock->gc, sk_psock_destroy_deferred); INIT_WORK(&psock->gc, sk_psock_destroy_deferred);
schedule_work(&psock->gc); schedule_work(&psock->gc);
} }
EXPORT_SYMBOL_GPL(sk_psock_destroy);
void sk_psock_drop(struct sock *sk, struct sk_psock *psock) void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
{ {

View File

@ -513,3 +513,73 @@ void xdp_warn(const char *msg, const char *func, const int line)
WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg); WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg);
}; };
EXPORT_SYMBOL_GPL(xdp_warn); EXPORT_SYMBOL_GPL(xdp_warn);
int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
{
n_skb = kmem_cache_alloc_bulk(skbuff_head_cache, gfp,
n_skb, skbs);
if (unlikely(!n_skb))
return -ENOMEM;
return 0;
}
EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct sk_buff *skb,
struct net_device *dev)
{
unsigned int headroom, frame_size;
void *hard_start;
/* Part of headroom was reserved to xdpf */
headroom = sizeof(*xdpf) + xdpf->headroom;
/* Memory size backing xdp_frame data already have reserved
* room for build_skb to place skb_shared_info in tailroom.
*/
frame_size = xdpf->frame_sz;
hard_start = xdpf->data - headroom;
skb = build_skb_around(skb, hard_start, frame_size);
if (unlikely(!skb))
return NULL;
skb_reserve(skb, headroom);
__skb_put(skb, xdpf->len);
if (xdpf->metasize)
skb_metadata_set(skb, xdpf->metasize);
/* Essential SKB info: protocol and skb->dev */
skb->protocol = eth_type_trans(skb, dev);
/* Optional SKB info, currently missing:
* - HW checksum info (skb->ip_summed)
* - HW RX hash (skb_set_hash)
* - RX ring dev queue index (skb_record_rx_queue)
*/
/* Until page_pool get SKB return path, release DMA here */
xdp_release_frame(xdpf);
/* Allow SKB to reuse area used by xdp_frame */
xdp_scrub_frame(xdpf);
return skb;
}
EXPORT_SYMBOL_GPL(__xdp_build_skb_from_frame);
struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct net_device *dev)
{
struct sk_buff *skb;
skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC);
if (unlikely(!skb))
return NULL;
memset(skb, 0, offsetof(struct sk_buff, tail));
return __xdp_build_skb_from_frame(xdpf, skb, dev);
}
EXPORT_SYMBOL_GPL(xdp_build_skb_from_frame);

View File

@ -438,6 +438,7 @@ EXPORT_SYMBOL(inet_release);
int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
{ {
struct sock *sk = sock->sk; struct sock *sk = sock->sk;
u32 flags = BIND_WITH_LOCK;
int err; int err;
/* If the socket has its own bind function then use it. (RAW) */ /* If the socket has its own bind function then use it. (RAW) */
@ -450,11 +451,12 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
/* BPF prog is run before any checks are done so that if the prog /* BPF prog is run before any checks are done so that if the prog
* changes context in a wrong way it will be caught. * changes context in a wrong way it will be caught.
*/ */
err = BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr); err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr,
BPF_CGROUP_INET4_BIND, &flags);
if (err) if (err)
return err; return err;
return __inet_bind(sk, uaddr, addr_len, BIND_WITH_LOCK); return __inet_bind(sk, uaddr, addr_len, flags);
} }
EXPORT_SYMBOL(inet_bind); EXPORT_SYMBOL(inet_bind);
@ -499,7 +501,8 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
snum = ntohs(addr->sin_port); snum = ntohs(addr->sin_port);
err = -EACCES; err = -EACCES;
if (snum && inet_port_requires_bind_service(net, snum) && if (!(flags & BIND_NO_CAP_NET_BIND_SERVICE) &&
snum && inet_port_requires_bind_service(net, snum) &&
!ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
goto out; goto out;
@ -777,18 +780,19 @@ int inet_getname(struct socket *sock, struct sockaddr *uaddr,
return -ENOTCONN; return -ENOTCONN;
sin->sin_port = inet->inet_dport; sin->sin_port = inet->inet_dport;
sin->sin_addr.s_addr = inet->inet_daddr; sin->sin_addr.s_addr = inet->inet_daddr;
BPF_CGROUP_RUN_SA_PROG_LOCK(sk, (struct sockaddr *)sin,
BPF_CGROUP_INET4_GETPEERNAME,
NULL);
} else { } else {
__be32 addr = inet->inet_rcv_saddr; __be32 addr = inet->inet_rcv_saddr;
if (!addr) if (!addr)
addr = inet->inet_saddr; addr = inet->inet_saddr;
sin->sin_port = inet->inet_sport; sin->sin_port = inet->inet_sport;
sin->sin_addr.s_addr = addr; sin->sin_addr.s_addr = addr;
}
if (cgroup_bpf_enabled)
BPF_CGROUP_RUN_SA_PROG_LOCK(sk, (struct sockaddr *)sin, BPF_CGROUP_RUN_SA_PROG_LOCK(sk, (struct sockaddr *)sin,
peer ? BPF_CGROUP_INET4_GETPEERNAME : BPF_CGROUP_INET4_GETSOCKNAME,
BPF_CGROUP_INET4_GETSOCKNAME,
NULL); NULL);
}
memset(sin->sin_zero, 0, sizeof(sin->sin_zero)); memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
return sizeof(*sin); return sizeof(*sin);
} }

View File

@ -4162,6 +4162,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
return -EINVAL; return -EINVAL;
lock_sock(sk); lock_sock(sk);
err = tcp_zerocopy_receive(sk, &zc, &tss); err = tcp_zerocopy_receive(sk, &zc, &tss);
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
&zc, &len, err);
release_sock(sk); release_sock(sk);
if (len >= offsetofend(struct tcp_zerocopy_receive, msg_flags)) if (len >= offsetofend(struct tcp_zerocopy_receive, msg_flags))
goto zerocopy_rcv_cmsg; goto zerocopy_rcv_cmsg;
@ -4208,6 +4210,18 @@ zerocopy_rcv_out:
return 0; return 0;
} }
bool tcp_bpf_bypass_getsockopt(int level, int optname)
{
/* TCP do_tcp_getsockopt has optimized getsockopt implementation
* to avoid extra socket lock for TCP_ZEROCOPY_RECEIVE.
*/
if (level == SOL_TCP && optname == TCP_ZEROCOPY_RECEIVE)
return true;
return false;
}
EXPORT_SYMBOL(tcp_bpf_bypass_getsockopt);
int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
int __user *optlen) int __user *optlen)
{ {

View File

@ -2796,6 +2796,7 @@ struct proto tcp_prot = {
.shutdown = tcp_shutdown, .shutdown = tcp_shutdown,
.setsockopt = tcp_setsockopt, .setsockopt = tcp_setsockopt,
.getsockopt = tcp_getsockopt, .getsockopt = tcp_getsockopt,
.bpf_bypass_getsockopt = tcp_bpf_bypass_getsockopt,
.keepalive = tcp_set_keepalive, .keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg, .recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg, .sendmsg = tcp_sendmsg,

View File

@ -1130,7 +1130,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
rcu_read_unlock(); rcu_read_unlock();
} }
if (cgroup_bpf_enabled && !connected) { if (cgroup_bpf_enabled(BPF_CGROUP_UDP4_SENDMSG) && !connected) {
err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
(struct sockaddr *)usin, &ipc.addr); (struct sockaddr *)usin, &ipc.addr);
if (err) if (err)
@ -1864,9 +1864,8 @@ try_again:
memset(sin->sin_zero, 0, sizeof(sin->sin_zero)); memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
*addr_len = sizeof(*sin); *addr_len = sizeof(*sin);
if (cgroup_bpf_enabled) BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk,
BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, (struct sockaddr *)sin);
(struct sockaddr *)sin);
} }
if (udp_sk(sk)->gro_enabled) if (udp_sk(sk)->gro_enabled)

View File

@ -295,7 +295,8 @@ static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
return -EINVAL; return -EINVAL;
snum = ntohs(addr->sin6_port); snum = ntohs(addr->sin6_port);
if (snum && inet_port_requires_bind_service(net, snum) && if (!(flags & BIND_NO_CAP_NET_BIND_SERVICE) &&
snum && inet_port_requires_bind_service(net, snum) &&
!ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE))
return -EACCES; return -EACCES;
@ -439,6 +440,7 @@ out_unlock:
int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
{ {
struct sock *sk = sock->sk; struct sock *sk = sock->sk;
u32 flags = BIND_WITH_LOCK;
int err = 0; int err = 0;
/* If the socket has its own bind function then use it. */ /* If the socket has its own bind function then use it. */
@ -451,11 +453,12 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
/* BPF prog is run before any checks are done so that if the prog /* BPF prog is run before any checks are done so that if the prog
* changes context in a wrong way it will be caught. * changes context in a wrong way it will be caught.
*/ */
err = BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr); err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr,
BPF_CGROUP_INET6_BIND, &flags);
if (err) if (err)
return err; return err;
return __inet6_bind(sk, uaddr, addr_len, BIND_WITH_LOCK); return __inet6_bind(sk, uaddr, addr_len, flags);
} }
EXPORT_SYMBOL(inet6_bind); EXPORT_SYMBOL(inet6_bind);
@ -527,18 +530,19 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
sin->sin6_addr = sk->sk_v6_daddr; sin->sin6_addr = sk->sk_v6_daddr;
if (np->sndflow) if (np->sndflow)
sin->sin6_flowinfo = np->flow_label; sin->sin6_flowinfo = np->flow_label;
BPF_CGROUP_RUN_SA_PROG_LOCK(sk, (struct sockaddr *)sin,
BPF_CGROUP_INET6_GETPEERNAME,
NULL);
} else { } else {
if (ipv6_addr_any(&sk->sk_v6_rcv_saddr)) if (ipv6_addr_any(&sk->sk_v6_rcv_saddr))
sin->sin6_addr = np->saddr; sin->sin6_addr = np->saddr;
else else
sin->sin6_addr = sk->sk_v6_rcv_saddr; sin->sin6_addr = sk->sk_v6_rcv_saddr;
sin->sin6_port = inet->inet_sport; sin->sin6_port = inet->inet_sport;
}
if (cgroup_bpf_enabled)
BPF_CGROUP_RUN_SA_PROG_LOCK(sk, (struct sockaddr *)sin, BPF_CGROUP_RUN_SA_PROG_LOCK(sk, (struct sockaddr *)sin,
peer ? BPF_CGROUP_INET6_GETPEERNAME : BPF_CGROUP_INET6_GETSOCKNAME,
BPF_CGROUP_INET6_GETSOCKNAME,
NULL); NULL);
}
sin->sin6_scope_id = ipv6_iface_scope_id(&sin->sin6_addr, sin->sin6_scope_id = ipv6_iface_scope_id(&sin->sin6_addr,
sk->sk_bound_dev_if); sk->sk_bound_dev_if);
return sizeof(*sin); return sizeof(*sin);

View File

@ -2124,6 +2124,7 @@ struct proto tcpv6_prot = {
.shutdown = tcp_shutdown, .shutdown = tcp_shutdown,
.setsockopt = tcp_setsockopt, .setsockopt = tcp_setsockopt,
.getsockopt = tcp_getsockopt, .getsockopt = tcp_getsockopt,
.bpf_bypass_getsockopt = tcp_bpf_bypass_getsockopt,
.keepalive = tcp_set_keepalive, .keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg, .recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg, .sendmsg = tcp_sendmsg,

View File

@ -409,9 +409,8 @@ try_again:
} }
*addr_len = sizeof(*sin6); *addr_len = sizeof(*sin6);
if (cgroup_bpf_enabled) BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk,
BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, (struct sockaddr *)sin6);
(struct sockaddr *)sin6);
} }
if (udp_sk(sk)->gro_enabled) if (udp_sk(sk)->gro_enabled)
@ -1462,7 +1461,7 @@ do_udp_sendmsg:
fl6.saddr = np->saddr; fl6.saddr = np->saddr;
fl6.fl6_sport = inet->inet_sport; fl6.fl6_sport = inet->inet_sport;
if (cgroup_bpf_enabled && !connected) { if (cgroup_bpf_enabled(BPF_CGROUP_UDP6_SENDMSG) && !connected) {
err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
(struct sockaddr *)sin6, &fl6.saddr); (struct sockaddr *)sin6, &fl6.saddr);
if (err) if (err)

View File

@ -2126,6 +2126,9 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
return __sys_setsockopt(fd, level, optname, optval, optlen); return __sys_setsockopt(fd, level, optname, optval, optlen);
} }
INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
int optname));
/* /*
* Get a socket option. Because we don't know the option lengths we have * Get a socket option. Because we don't know the option lengths we have
* to pass a user mode parameter for the protocols to sort out. * to pass a user mode parameter for the protocols to sort out.

View File

@ -184,12 +184,13 @@ static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len)
memcpy(to_buf, from_buf, len + metalen); memcpy(to_buf, from_buf, len + metalen);
} }
static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
bool explicit_free)
{ {
struct xdp_buff *xsk_xdp; struct xdp_buff *xsk_xdp;
int err; int err;
u32 len;
len = xdp->data_end - xdp->data;
if (len > xsk_pool_get_rx_frame_size(xs->pool)) { if (len > xsk_pool_get_rx_frame_size(xs->pool)) {
xs->rx_dropped++; xs->rx_dropped++;
return -ENOSPC; return -ENOSPC;
@ -207,8 +208,6 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len,
xsk_buff_free(xsk_xdp); xsk_buff_free(xsk_xdp);
return err; return err;
} }
if (explicit_free)
xdp_return_buff(xdp);
return 0; return 0;
} }
@ -230,11 +229,8 @@ static bool xsk_is_bound(struct xdp_sock *xs)
return false; return false;
} }
static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
bool explicit_free)
{ {
u32 len;
if (!xsk_is_bound(xs)) if (!xsk_is_bound(xs))
return -EINVAL; return -EINVAL;
@ -242,11 +238,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
return -EINVAL; return -EINVAL;
sk_mark_napi_id_once_xdp(&xs->sk, xdp); sk_mark_napi_id_once_xdp(&xs->sk, xdp);
len = xdp->data_end - xdp->data; return 0;
return xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL ?
__xsk_rcv_zc(xs, xdp, len) :
__xsk_rcv(xs, xdp, len, explicit_free);
} }
static void xsk_flush(struct xdp_sock *xs) static void xsk_flush(struct xdp_sock *xs)
@ -261,18 +253,41 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
int err; int err;
spin_lock_bh(&xs->rx_lock); spin_lock_bh(&xs->rx_lock);
err = xsk_rcv(xs, xdp, false); err = xsk_rcv_check(xs, xdp);
xsk_flush(xs); if (!err) {
err = __xsk_rcv(xs, xdp);
xsk_flush(xs);
}
spin_unlock_bh(&xs->rx_lock); spin_unlock_bh(&xs->rx_lock);
return err; return err;
} }
static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
{
int err;
u32 len;
err = xsk_rcv_check(xs, xdp);
if (err)
return err;
if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) {
len = xdp->data_end - xdp->data;
return __xsk_rcv_zc(xs, xdp, len);
}
err = __xsk_rcv(xs, xdp);
if (!err)
xdp_return_buff(xdp);
return err;
}
int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp) int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
{ {
struct list_head *flush_list = this_cpu_ptr(&xskmap_flush_list); struct list_head *flush_list = this_cpu_ptr(&xskmap_flush_list);
int err; int err;
err = xsk_rcv(xs, xdp, true); err = xsk_rcv(xs, xdp);
if (err) if (err)
return err; return err;

View File

@ -119,8 +119,8 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
} }
} }
static int __xp_assign_dev(struct xsk_buff_pool *pool, int xp_assign_dev(struct xsk_buff_pool *pool,
struct net_device *netdev, u16 queue_id, u16 flags) struct net_device *netdev, u16 queue_id, u16 flags)
{ {
bool force_zc, force_copy; bool force_zc, force_copy;
struct netdev_bpf bpf; struct netdev_bpf bpf;
@ -191,12 +191,6 @@ err_unreg_pool:
return err; return err;
} }
int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev,
u16 queue_id, u16 flags)
{
return __xp_assign_dev(pool, dev, queue_id, flags);
}
int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem, int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem,
struct net_device *dev, u16 queue_id) struct net_device *dev, u16 queue_id)
{ {
@ -210,7 +204,7 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem,
if (pool->uses_need_wakeup) if (pool->uses_need_wakeup)
flags |= XDP_USE_NEED_WAKEUP; flags |= XDP_USE_NEED_WAKEUP;
return __xp_assign_dev(pool, dev, queue_id, flags); return xp_assign_dev(pool, dev, queue_id, flags);
} }
void xp_clear_dev(struct xsk_buff_pool *pool) void xp_clear_dev(struct xsk_buff_pool *pool)

View File

@ -183,6 +183,14 @@ BPF_EXTRA_CFLAGS := $(ARM_ARCH_SELECTOR)
TPROGS_CFLAGS += $(ARM_ARCH_SELECTOR) TPROGS_CFLAGS += $(ARM_ARCH_SELECTOR)
endif endif
ifeq ($(ARCH), mips)
TPROGS_CFLAGS += -D__SANE_USERSPACE_TYPES__
ifdef CONFIG_MACH_LOONGSON64
BPF_EXTRA_CFLAGS += -I$(srctree)/arch/mips/include/asm/mach-loongson64
BPF_EXTRA_CFLAGS += -I$(srctree)/arch/mips/include/asm/mach-generic
endif
endif
TPROGS_CFLAGS += -Wall -O2 TPROGS_CFLAGS += -Wall -O2
TPROGS_CFLAGS += -Wmissing-prototypes TPROGS_CFLAGS += -Wmissing-prototypes
TPROGS_CFLAGS += -Wstrict-prototypes TPROGS_CFLAGS += -Wstrict-prototypes
@ -208,7 +216,7 @@ TPROGLDLIBS_xdpsock += -pthread -lcap
TPROGLDLIBS_xsk_fwd += -pthread TPROGLDLIBS_xsk_fwd += -pthread
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
# make M=samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang # make M=samples/bpf LLC=~/git/llvm-project/llvm/build/bin/llc CLANG=~/git/llvm-project/llvm/build/bin/clang
LLC ?= llc LLC ?= llc
CLANG ?= clang CLANG ?= clang
OPT ?= opt OPT ?= opt

View File

@ -62,20 +62,26 @@ To generate a smaller llc binary one can use::
-DLLVM_TARGETS_TO_BUILD="BPF" -DLLVM_TARGETS_TO_BUILD="BPF"
Quick sniplet for manually compiling LLVM and clang We recommend that developers who want the fastest incremental builds
(build dependencies are cmake and gcc-c++):: use the Ninja build system, you can find it in your system's package
manager, usually the package is ninja or ninja-build.
$ git clone http://llvm.org/git/llvm.git Quick sniplet for manually compiling LLVM and clang
$ cd llvm/tools (build dependencies are ninja, cmake and gcc-c++)::
$ git clone --depth 1 http://llvm.org/git/clang.git
$ cd ..; mkdir build; cd build $ git clone https://github.com/llvm/llvm-project.git
$ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" $ mkdir -p llvm-project/llvm/build
$ make -j $(getconf _NPROCESSORS_ONLN) $ cd llvm-project/llvm/build
$ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
-DLLVM_ENABLE_PROJECTS="clang" \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_BUILD_RUNTIME=OFF
$ ninja
It is also possible to point make to the newly compiled 'llc' or It is also possible to point make to the newly compiled 'llc' or
'clang' command via redefining LLC or CLANG on the make command line:: 'clang' command via redefining LLC or CLANG on the make command line::
make M=samples/bpf LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang make M=samples/bpf LLC=~/git/llvm-project/llvm/build/bin/llc CLANG=~/git/llvm-project/llvm/build/bin/clang
Cross compiling samples Cross compiling samples
----------------------- -----------------------

View File

@ -134,15 +134,31 @@ struct bpf_insn;
.off = OFF, \ .off = OFF, \
.imm = 0 }) .imm = 0 })
/* Atomic memory add, *(uint *)(dst_reg + off16) += src_reg */ /*
* Atomic operations:
*
* BPF_ADD *(uint *) (dst_reg + off16) += src_reg
* BPF_AND *(uint *) (dst_reg + off16) &= src_reg
* BPF_OR *(uint *) (dst_reg + off16) |= src_reg
* BPF_XOR *(uint *) (dst_reg + off16) ^= src_reg
* BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg);
* BPF_AND | BPF_FETCH src_reg = atomic_fetch_and(dst_reg + off16, src_reg);
* BPF_OR | BPF_FETCH src_reg = atomic_fetch_or(dst_reg + off16, src_reg);
* BPF_XOR | BPF_FETCH src_reg = atomic_fetch_xor(dst_reg + off16, src_reg);
* BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg)
* BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg)
*/
#define BPF_STX_XADD(SIZE, DST, SRC, OFF) \ #define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF) \
((struct bpf_insn) { \ ((struct bpf_insn) { \
.code = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \
.dst_reg = DST, \ .dst_reg = DST, \
.src_reg = SRC, \ .src_reg = SRC, \
.off = OFF, \ .off = OFF, \
.imm = BPF_ADD }) .imm = OP })
/* Legacy alias */
#define BPF_STX_XADD(SIZE, DST, SRC, OFF) BPF_ATOMIC_OP(SIZE, BPF_ADD, DST, SRC, OFF)
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */ /* Memory store, *(uint *) (dst_reg + off16) = imm32 */

View File

@ -313,7 +313,7 @@ int main(int argc, char *argv[])
print_table(); print_table();
printf("\n"); printf("\n");
sleep(1); sleep(1);
}; }
} else if (cfg_test_cookie) { } else if (cfg_test_cookie) {
udp_client(); udp_client();
} }

View File

@ -19,12 +19,22 @@
#include <linux/ipv6.h> #include <linux/ipv6.h>
#include <bpf/bpf_helpers.h> #include <bpf/bpf_helpers.h>
/* The 2nd xdp prog on egress does not support skb mode, so we define two
* maps, tx_port_general and tx_port_native.
*/
struct { struct {
__uint(type, BPF_MAP_TYPE_DEVMAP); __uint(type, BPF_MAP_TYPE_DEVMAP);
__uint(key_size, sizeof(int)); __uint(key_size, sizeof(int));
__uint(value_size, sizeof(int)); __uint(value_size, sizeof(int));
__uint(max_entries, 100); __uint(max_entries, 100);
} tx_port SEC(".maps"); } tx_port_general SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_DEVMAP);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(struct bpf_devmap_val));
__uint(max_entries, 100);
} tx_port_native SEC(".maps");
/* Count RX packets, as XDP bpf_prog doesn't get direct TX-success /* Count RX packets, as XDP bpf_prog doesn't get direct TX-success
* feedback. Redirect TX errors can be caught via a tracepoint. * feedback. Redirect TX errors can be caught via a tracepoint.
@ -36,6 +46,14 @@ struct {
__uint(max_entries, 1); __uint(max_entries, 1);
} rxcnt SEC(".maps"); } rxcnt SEC(".maps");
/* map to store egress interface mac address */
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, __be64);
__uint(max_entries, 1);
} tx_mac SEC(".maps");
static void swap_src_dst_mac(void *data) static void swap_src_dst_mac(void *data)
{ {
unsigned short *p = data; unsigned short *p = data;
@ -52,17 +70,16 @@ static void swap_src_dst_mac(void *data)
p[5] = dst[2]; p[5] = dst[2];
} }
SEC("xdp_redirect_map") static __always_inline int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map)
int xdp_redirect_map_prog(struct xdp_md *ctx)
{ {
void *data_end = (void *)(long)ctx->data_end; void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data; void *data = (void *)(long)ctx->data;
struct ethhdr *eth = data; struct ethhdr *eth = data;
int rc = XDP_DROP; int rc = XDP_DROP;
int vport, port = 0, m = 0;
long *value; long *value;
u32 key = 0; u32 key = 0;
u64 nh_off; u64 nh_off;
int vport;
nh_off = sizeof(*eth); nh_off = sizeof(*eth);
if (data + nh_off > data_end) if (data + nh_off > data_end)
@ -79,7 +96,40 @@ int xdp_redirect_map_prog(struct xdp_md *ctx)
swap_src_dst_mac(data); swap_src_dst_mac(data);
/* send packet out physical port */ /* send packet out physical port */
return bpf_redirect_map(&tx_port, vport, 0); return bpf_redirect_map(redirect_map, vport, 0);
}
SEC("xdp_redirect_general")
int xdp_redirect_map_general(struct xdp_md *ctx)
{
return xdp_redirect_map(ctx, &tx_port_general);
}
SEC("xdp_redirect_native")
int xdp_redirect_map_native(struct xdp_md *ctx)
{
return xdp_redirect_map(ctx, &tx_port_native);
}
SEC("xdp_devmap/map_prog")
int xdp_redirect_map_egress(struct xdp_md *ctx)
{
void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data;
struct ethhdr *eth = data;
__be64 *mac;
u32 key = 0;
u64 nh_off;
nh_off = sizeof(*eth);
if (data + nh_off > data_end)
return XDP_DROP;
mac = bpf_map_lookup_elem(&tx_mac, &key);
if (mac)
__builtin_memcpy(eth->h_source, mac, ETH_ALEN);
return XDP_PASS;
} }
/* Redirect require an XDP bpf_prog loaded on the TX device */ /* Redirect require an XDP bpf_prog loaded on the TX device */

View File

@ -14,6 +14,10 @@
#include <unistd.h> #include <unistd.h>
#include <libgen.h> #include <libgen.h>
#include <sys/resource.h> #include <sys/resource.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include "bpf_util.h" #include "bpf_util.h"
#include <bpf/bpf.h> #include <bpf/bpf.h>
@ -22,6 +26,7 @@
static int ifindex_in; static int ifindex_in;
static int ifindex_out; static int ifindex_out;
static bool ifindex_out_xdp_dummy_attached = true; static bool ifindex_out_xdp_dummy_attached = true;
static bool xdp_devmap_attached;
static __u32 prog_id; static __u32 prog_id;
static __u32 dummy_prog_id; static __u32 dummy_prog_id;
@ -83,6 +88,32 @@ static void poll_stats(int interval, int ifindex)
} }
} }
static int get_mac_addr(unsigned int ifindex_out, void *mac_addr)
{
char ifname[IF_NAMESIZE];
struct ifreq ifr;
int fd, ret = -1;
fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0)
return ret;
if (!if_indextoname(ifindex_out, ifname))
goto err_out;
strcpy(ifr.ifr_name, ifname);
if (ioctl(fd, SIOCGIFHWADDR, &ifr) != 0)
goto err_out;
memcpy(mac_addr, ifr.ifr_hwaddr.sa_data, 6 * sizeof(char));
ret = 0;
err_out:
close(fd);
return ret;
}
static void usage(const char *prog) static void usage(const char *prog)
{ {
fprintf(stderr, fprintf(stderr,
@ -90,24 +121,26 @@ static void usage(const char *prog)
"OPTS:\n" "OPTS:\n"
" -S use skb-mode\n" " -S use skb-mode\n"
" -N enforce native mode\n" " -N enforce native mode\n"
" -F force loading prog\n", " -F force loading prog\n"
" -X load xdp program on egress\n",
prog); prog);
} }
int main(int argc, char **argv) int main(int argc, char **argv)
{ {
struct bpf_prog_load_attr prog_load_attr = { struct bpf_prog_load_attr prog_load_attr = {
.prog_type = BPF_PROG_TYPE_XDP, .prog_type = BPF_PROG_TYPE_UNSPEC,
}; };
struct bpf_program *prog, *dummy_prog; struct bpf_program *prog, *dummy_prog, *devmap_prog;
int prog_fd, dummy_prog_fd, devmap_prog_fd = 0;
int tx_port_map_fd, tx_mac_map_fd;
struct bpf_devmap_val devmap_val;
struct bpf_prog_info info = {}; struct bpf_prog_info info = {};
__u32 info_len = sizeof(info); __u32 info_len = sizeof(info);
int prog_fd, dummy_prog_fd; const char *optstr = "FSNX";
const char *optstr = "FSN";
struct bpf_object *obj; struct bpf_object *obj;
int ret, opt, key = 0; int ret, opt, key = 0;
char filename[256]; char filename[256];
int tx_port_map_fd;
while ((opt = getopt(argc, argv, optstr)) != -1) { while ((opt = getopt(argc, argv, optstr)) != -1) {
switch (opt) { switch (opt) {
@ -120,14 +153,21 @@ int main(int argc, char **argv)
case 'F': case 'F':
xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST; xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
break; break;
case 'X':
xdp_devmap_attached = true;
break;
default: default:
usage(basename(argv[0])); usage(basename(argv[0]));
return 1; return 1;
} }
} }
if (!(xdp_flags & XDP_FLAGS_SKB_MODE)) if (!(xdp_flags & XDP_FLAGS_SKB_MODE)) {
xdp_flags |= XDP_FLAGS_DRV_MODE; xdp_flags |= XDP_FLAGS_DRV_MODE;
} else if (xdp_devmap_attached) {
printf("Load xdp program on egress with SKB mode not supported yet\n");
return 1;
}
if (optind == argc) { if (optind == argc) {
printf("usage: %s <IFNAME|IFINDEX>_IN <IFNAME|IFINDEX>_OUT\n", argv[0]); printf("usage: %s <IFNAME|IFINDEX>_IN <IFNAME|IFINDEX>_OUT\n", argv[0]);
@ -150,24 +190,28 @@ int main(int argc, char **argv)
if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
return 1; return 1;
prog = bpf_program__next(NULL, obj); if (xdp_flags & XDP_FLAGS_SKB_MODE) {
dummy_prog = bpf_program__next(prog, obj); prog = bpf_object__find_program_by_name(obj, "xdp_redirect_map_general");
if (!prog || !dummy_prog) { tx_port_map_fd = bpf_object__find_map_fd_by_name(obj, "tx_port_general");
printf("finding a prog in obj file failed\n"); } else {
return 1; prog = bpf_object__find_program_by_name(obj, "xdp_redirect_map_native");
tx_port_map_fd = bpf_object__find_map_fd_by_name(obj, "tx_port_native");
} }
/* bpf_prog_load_xattr gives us the pointer to first prog's fd, dummy_prog = bpf_object__find_program_by_name(obj, "xdp_redirect_dummy_prog");
* so we're missing only the fd for dummy prog if (!prog || dummy_prog < 0 || tx_port_map_fd < 0) {
*/ printf("finding prog/dummy_prog/tx_port_map in obj file failed\n");
goto out;
}
prog_fd = bpf_program__fd(prog);
dummy_prog_fd = bpf_program__fd(dummy_prog); dummy_prog_fd = bpf_program__fd(dummy_prog);
if (prog_fd < 0 || dummy_prog_fd < 0) { if (prog_fd < 0 || dummy_prog_fd < 0 || tx_port_map_fd < 0) {
printf("bpf_prog_load_xattr: %s\n", strerror(errno)); printf("bpf_prog_load_xattr: %s\n", strerror(errno));
return 1; return 1;
} }
tx_port_map_fd = bpf_object__find_map_fd_by_name(obj, "tx_port"); tx_mac_map_fd = bpf_object__find_map_fd_by_name(obj, "tx_mac");
rxcnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rxcnt"); rxcnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rxcnt");
if (tx_port_map_fd < 0 || rxcnt_map_fd < 0) { if (tx_mac_map_fd < 0 || rxcnt_map_fd < 0) {
printf("bpf_object__find_map_fd_by_name failed\n"); printf("bpf_object__find_map_fd_by_name failed\n");
return 1; return 1;
} }
@ -199,11 +243,39 @@ int main(int argc, char **argv)
} }
dummy_prog_id = info.id; dummy_prog_id = info.id;
/* Load 2nd xdp prog on egress. */
if (xdp_devmap_attached) {
unsigned char mac_addr[6];
devmap_prog = bpf_object__find_program_by_name(obj, "xdp_redirect_map_egress");
if (!devmap_prog) {
printf("finding devmap_prog in obj file failed\n");
goto out;
}
devmap_prog_fd = bpf_program__fd(devmap_prog);
if (devmap_prog_fd < 0) {
printf("finding devmap_prog fd failed\n");
goto out;
}
if (get_mac_addr(ifindex_out, mac_addr) < 0) {
printf("get interface %d mac failed\n", ifindex_out);
goto out;
}
ret = bpf_map_update_elem(tx_mac_map_fd, &key, mac_addr, 0);
if (ret) {
perror("bpf_update_elem tx_mac_map_fd");
goto out;
}
}
signal(SIGINT, int_exit); signal(SIGINT, int_exit);
signal(SIGTERM, int_exit); signal(SIGTERM, int_exit);
/* populate virtual to physical port map */ devmap_val.ifindex = ifindex_out;
ret = bpf_map_update_elem(tx_port_map_fd, &key, &ifindex_out, 0); devmap_val.bpf_prog.fd = devmap_prog_fd;
ret = bpf_map_update_elem(tx_port_map_fd, &key, &devmap_val, 0);
if (ret) { if (ret) {
perror("bpf_update_elem"); perror("bpf_update_elem");
goto out; goto out;

View File

@ -890,7 +890,7 @@ static int bpf_run_stepping(struct sock_filter *f, uint16_t bpf_len,
bool stop = false; bool stop = false;
int i = 1; int i = 1;
while (bpf_curr.Rs == false && stop == false) { while (!bpf_curr.Rs && !stop) {
bpf_safe_regs(); bpf_safe_regs();
if (i++ == next) if (i++ == next)

View File

@ -75,8 +75,6 @@ endif
INSTALL ?= install INSTALL ?= install
RM ?= rm -f RM ?= rm -f
CLANG ?= clang
LLVM_STRIP ?= llvm-strip
FEATURE_USER = .bpftool FEATURE_USER = .bpftool
FEATURE_TESTS = libbfd disassembler-four-args reallocarray zlib libcap \ FEATURE_TESTS = libbfd disassembler-four-args reallocarray zlib libcap \

View File

@ -368,6 +368,8 @@ static void print_prog_header_json(struct bpf_prog_info *info)
jsonw_uint_field(json_wtr, "run_time_ns", info->run_time_ns); jsonw_uint_field(json_wtr, "run_time_ns", info->run_time_ns);
jsonw_uint_field(json_wtr, "run_cnt", info->run_cnt); jsonw_uint_field(json_wtr, "run_cnt", info->run_cnt);
} }
if (info->recursion_misses)
jsonw_uint_field(json_wtr, "recursion_misses", info->recursion_misses);
} }
static void print_prog_json(struct bpf_prog_info *info, int fd) static void print_prog_json(struct bpf_prog_info *info, int fd)
@ -446,6 +448,8 @@ static void print_prog_header_plain(struct bpf_prog_info *info)
if (info->run_time_ns) if (info->run_time_ns)
printf(" run_time_ns %lld run_cnt %lld", printf(" run_time_ns %lld run_cnt %lld",
info->run_time_ns, info->run_cnt); info->run_time_ns, info->run_cnt);
if (info->recursion_misses)
printf(" recursion_misses %lld", info->recursion_misses);
printf("\n"); printf("\n");
} }

View File

@ -1,4 +1,3 @@
/FEATURE-DUMP.libbpf
/bpf_helper_defs.h
/fixdep /fixdep
/resolve_btfids /resolve_btfids
/libbpf/

View File

@ -2,11 +2,7 @@
include ../../scripts/Makefile.include include ../../scripts/Makefile.include
include ../../scripts/Makefile.arch include ../../scripts/Makefile.arch
ifeq ($(srctree),) srctree := $(abspath $(CURDIR)/../../../)
srctree := $(patsubst %/,%,$(dir $(CURDIR)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
endif
ifeq ($(V),1) ifeq ($(V),1)
Q = Q =
@ -22,28 +18,29 @@ AR = $(HOSTAR)
CC = $(HOSTCC) CC = $(HOSTCC)
LD = $(HOSTLD) LD = $(HOSTLD)
ARCH = $(HOSTARCH) ARCH = $(HOSTARCH)
RM ?= rm
OUTPUT ?= $(srctree)/tools/bpf/resolve_btfids/ OUTPUT ?= $(srctree)/tools/bpf/resolve_btfids/
LIBBPF_SRC := $(srctree)/tools/lib/bpf/ LIBBPF_SRC := $(srctree)/tools/lib/bpf/
SUBCMD_SRC := $(srctree)/tools/lib/subcmd/ SUBCMD_SRC := $(srctree)/tools/lib/subcmd/
BPFOBJ := $(OUTPUT)/libbpf.a BPFOBJ := $(OUTPUT)/libbpf/libbpf.a
SUBCMDOBJ := $(OUTPUT)/libsubcmd.a SUBCMDOBJ := $(OUTPUT)/libsubcmd/libsubcmd.a
BINARY := $(OUTPUT)/resolve_btfids BINARY := $(OUTPUT)/resolve_btfids
BINARY_IN := $(BINARY)-in.o BINARY_IN := $(BINARY)-in.o
all: $(BINARY) all: $(BINARY)
$(OUTPUT): $(OUTPUT) $(OUTPUT)/libbpf $(OUTPUT)/libsubcmd:
$(call msg,MKDIR,,$@) $(call msg,MKDIR,,$@)
$(Q)mkdir -p $(OUTPUT) $(Q)mkdir -p $(@)
$(SUBCMDOBJ): fixdep FORCE $(SUBCMDOBJ): fixdep FORCE | $(OUTPUT)/libsubcmd
$(Q)$(MAKE) -C $(SUBCMD_SRC) OUTPUT=$(OUTPUT) $(Q)$(MAKE) -C $(SUBCMD_SRC) OUTPUT=$(abspath $(dir $@))/ $(abspath $@)
$(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT) $(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT)/libbpf
$(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) OUTPUT=$(abspath $(dir $@))/ $(abspath $@) $(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) OUTPUT=$(abspath $(dir $@))/ $(abspath $@)
CFLAGS := -g \ CFLAGS := -g \
@ -57,24 +54,27 @@ LIBS = -lelf -lz
export srctree OUTPUT CFLAGS Q export srctree OUTPUT CFLAGS Q
include $(srctree)/tools/build/Makefile.include include $(srctree)/tools/build/Makefile.include
$(BINARY_IN): fixdep FORCE $(BINARY_IN): fixdep FORCE | $(OUTPUT)
$(Q)$(MAKE) $(build)=resolve_btfids $(Q)$(MAKE) $(build)=resolve_btfids
$(BINARY): $(BPFOBJ) $(SUBCMDOBJ) $(BINARY_IN) $(BINARY): $(BPFOBJ) $(SUBCMDOBJ) $(BINARY_IN)
$(call msg,LINK,$@) $(call msg,LINK,$@)
$(Q)$(CC) $(BINARY_IN) $(LDFLAGS) -o $@ $(BPFOBJ) $(SUBCMDOBJ) $(LIBS) $(Q)$(CC) $(BINARY_IN) $(LDFLAGS) -o $@ $(BPFOBJ) $(SUBCMDOBJ) $(LIBS)
libsubcmd-clean: clean_objects := $(wildcard $(OUTPUT)/*.o \
$(Q)$(MAKE) -C $(SUBCMD_SRC) OUTPUT=$(OUTPUT) clean $(OUTPUT)/.*.o.cmd \
$(OUTPUT)/.*.o.d \
$(OUTPUT)/libbpf \
$(OUTPUT)/libsubcmd \
$(OUTPUT)/resolve_btfids)
libbpf-clean: ifneq ($(clean_objects),)
$(Q)$(MAKE) -C $(LIBBPF_SRC) OUTPUT=$(OUTPUT) clean clean: fixdep-clean
clean: libsubcmd-clean libbpf-clean fixdep-clean
$(call msg,CLEAN,$(BINARY)) $(call msg,CLEAN,$(BINARY))
$(Q)$(RM) -f $(BINARY); \ $(Q)$(RM) -rf $(clean_objects)
$(RM) -rf $(if $(OUTPUT),$(OUTPUT),.)/feature; \ else
find $(if $(OUTPUT),$(OUTPUT),.) -name \*.o -or -name \*.o.cmd -or -name \*.o.d | xargs $(RM) clean:
endif
tags: tags:
$(call msg,GEN,,tags) $(call msg,GEN,,tags)

View File

@ -3,9 +3,6 @@ include ../../scripts/Makefile.include
OUTPUT ?= $(abspath .output)/ OUTPUT ?= $(abspath .output)/
CLANG ?= clang
LLC ?= llc
LLVM_STRIP ?= llvm-strip
BPFTOOL_OUTPUT := $(OUTPUT)bpftool/ BPFTOOL_OUTPUT := $(OUTPUT)bpftool/
DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bpftool DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bpftool
BPFTOOL ?= $(DEFAULT_BPFTOOL) BPFTOOL ?= $(DEFAULT_BPFTOOL)

View File

@ -1,4 +1,6 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
include ../../scripts/Makefile.include
FILES= \ FILES= \
test-all.bin \ test-all.bin \
test-backtrace.bin \ test-backtrace.bin \
@ -76,8 +78,6 @@ FILES= \
FILES := $(addprefix $(OUTPUT),$(FILES)) FILES := $(addprefix $(OUTPUT),$(FILES))
PKG_CONFIG ?= $(CROSS_COMPILE)pkg-config PKG_CONFIG ?= $(CROSS_COMPILE)pkg-config
LLVM_CONFIG ?= llvm-config
CLANG ?= clang
all: $(FILES) all: $(FILES)

View File

@ -6,7 +6,10 @@
#include <stddef.h> #include <stddef.h>
#include <stdint.h> #include <stdint.h>
#ifndef __SANE_USERSPACE_TYPES__
#define __SANE_USERSPACE_TYPES__ /* For PPC64, to get LL64 types */ #define __SANE_USERSPACE_TYPES__ /* For PPC64, to get LL64 types */
#endif
#include <asm/types.h> #include <asm/types.h>
#include <asm/posix_types.h> #include <asm/posix_types.h>

View File

@ -1656,22 +1656,30 @@ union bpf_attr {
* networking traffic statistics as it provides a global socket * networking traffic statistics as it provides a global socket
* identifier that can be assumed unique. * identifier that can be assumed unique.
* Return * Return
* A 8-byte long non-decreasing number on success, or 0 if the * A 8-byte long unique number on success, or 0 if the socket
* socket field is missing inside *skb*. * field is missing inside *skb*.
* *
* u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx) * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
* Description * Description
* Equivalent to bpf_get_socket_cookie() helper that accepts * Equivalent to bpf_get_socket_cookie() helper that accepts
* *skb*, but gets socket from **struct bpf_sock_addr** context. * *skb*, but gets socket from **struct bpf_sock_addr** context.
* Return * Return
* A 8-byte long non-decreasing number. * A 8-byte long unique number.
* *
* u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx) * u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
* Description * Description
* Equivalent to **bpf_get_socket_cookie**\ () helper that accepts * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
* *skb*, but gets socket from **struct bpf_sock_ops** context. * *skb*, but gets socket from **struct bpf_sock_ops** context.
* Return * Return
* A 8-byte long non-decreasing number. * A 8-byte long unique number.
*
* u64 bpf_get_socket_cookie(struct sock *sk)
* Description
* Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
* *sk*, but gets socket from a BTF **struct sock**. This helper
* also works for sleepable programs.
* Return
* A 8-byte long unique number or 0 if *sk* is NULL.
* *
* u32 bpf_get_socket_uid(struct sk_buff *skb) * u32 bpf_get_socket_uid(struct sk_buff *skb)
* Return * Return
@ -2231,6 +2239,9 @@ union bpf_attr {
* * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
* packet is not forwarded or needs assist from full stack * packet is not forwarded or needs assist from full stack
* *
* If lookup fails with BPF_FIB_LKUP_RET_FRAG_NEEDED, then the MTU
* was exceeded and output params->mtu_result contains the MTU.
*
* long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags) * long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
* Description * Description
* Add an entry to, or update a sockhash *map* referencing sockets. * Add an entry to, or update a sockhash *map* referencing sockets.
@ -3836,6 +3847,69 @@ union bpf_attr {
* Return * Return
* A pointer to a struct socket on success or NULL if the file is * A pointer to a struct socket on success or NULL if the file is
* not a socket. * not a socket.
*
* long bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff, u64 flags)
* Description
* Check ctx packet size against exceeding MTU of net device (based
* on *ifindex*). This helper will likely be used in combination
* with helpers that adjust/change the packet size.
*
* The argument *len_diff* can be used for querying with a planned
* size change. This allows to check MTU prior to changing packet
* ctx. Providing an *len_diff* adjustment that is larger than the
* actual packet size (resulting in negative packet size) will in
* principle not exceed the MTU, why it is not considered a
* failure. Other BPF-helpers are needed for performing the
* planned size change, why the responsability for catch a negative
* packet size belong in those helpers.
*
* Specifying *ifindex* zero means the MTU check is performed
* against the current net device. This is practical if this isn't
* used prior to redirect.
*
* The Linux kernel route table can configure MTUs on a more
* specific per route level, which is not provided by this helper.
* For route level MTU checks use the **bpf_fib_lookup**\ ()
* helper.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** for tc cls_act programs.
*
* The *flags* argument can be a combination of one or more of the
* following values:
*
* **BPF_MTU_CHK_SEGS**
* This flag will only works for *ctx* **struct sk_buff**.
* If packet context contains extra packet segment buffers
* (often knows as GSO skb), then MTU check is harder to
* check at this point, because in transmit path it is
* possible for the skb packet to get re-segmented
* (depending on net device features). This could still be
* a MTU violation, so this flag enables performing MTU
* check against segments, with a different violation
* return code to tell it apart. Check cannot use len_diff.
*
* On return *mtu_len* pointer contains the MTU value of the net
* device. Remember the net device configured MTU is the L3 size,
* which is returned here and XDP and TX length operate at L2.
* Helper take this into account for you, but remember when using
* MTU value in your BPF-code. On input *mtu_len* must be a valid
* pointer and be initialized (to zero), else verifier will reject
* BPF program.
*
* Return
* * 0 on success, and populate MTU value in *mtu_len* pointer.
*
* * < 0 if any input argument is invalid (*mtu_len* not updated)
*
* MTU violations return positive values, but also populate MTU
* value in *mtu_len* pointer, as this can be needed for
* implementing PMTU handing:
*
* * **BPF_MTU_CHK_RET_FRAG_NEEDED**
* * **BPF_MTU_CHK_RET_SEGS_TOOBIG**
*
*/ */
#define __BPF_FUNC_MAPPER(FN) \ #define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \ FN(unspec), \
@ -4001,6 +4075,7 @@ union bpf_attr {
FN(ktime_get_coarse_ns), \ FN(ktime_get_coarse_ns), \
FN(ima_inode_hash), \ FN(ima_inode_hash), \
FN(sock_from_file), \ FN(sock_from_file), \
FN(check_mtu), \
/* */ /* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -4501,6 +4576,7 @@ struct bpf_prog_info {
__aligned_u64 prog_tags; __aligned_u64 prog_tags;
__u64 run_time_ns; __u64 run_time_ns;
__u64 run_cnt; __u64 run_cnt;
__u64 recursion_misses;
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
struct bpf_map_info { struct bpf_map_info {
@ -4981,9 +5057,13 @@ struct bpf_fib_lookup {
__be16 sport; __be16 sport;
__be16 dport; __be16 dport;
/* total length of packet from network header - used for MTU check */ union { /* used for MTU check */
__u16 tot_len; /* input to lookup */
__u16 tot_len; /* L3 length from network hdr (iph->tot_len) */
/* output: MTU value */
__u16 mtu_result;
};
/* input: L3 device index for lookup /* input: L3 device index for lookup
* output: device index from FIB lookup * output: device index from FIB lookup
*/ */
@ -5029,6 +5109,17 @@ struct bpf_redir_neigh {
}; };
}; };
/* bpf_check_mtu flags*/
enum bpf_check_mtu_flags {
BPF_MTU_CHK_SEGS = (1U << 0),
};
enum bpf_check_mtu_ret {
BPF_MTU_CHK_RET_SUCCESS, /* check and lookup successful */
BPF_MTU_CHK_RET_FRAG_NEEDED, /* fragmentation required to fwd */
BPF_MTU_CHK_RET_SEGS_TOOBIG, /* GSO re-segmentation needed to fwd */
};
enum bpf_task_fd_type { enum bpf_task_fd_type {
BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
BPF_FD_TYPE_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */

View File

@ -13,6 +13,7 @@
struct bpf_perf_event_data { struct bpf_perf_event_data {
bpf_user_pt_regs_t regs; bpf_user_pt_regs_t regs;
__u64 sample_period; __u64 sample_period;
__u64 addr;
}; };
#endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */ #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */

View File

@ -0,0 +1,357 @@
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* INET An implementation of the TCP/IP protocol suite for the LINUX
* operating system. INET is implemented using the BSD Socket
* interface as the means of communication with the user level.
*
* Definitions for the TCP protocol.
*
* Version: @(#)tcp.h 1.0.2 04/28/93
*
* Author: Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*/
#ifndef _UAPI_LINUX_TCP_H
#define _UAPI_LINUX_TCP_H
#include <linux/types.h>
#include <asm/byteorder.h>
#include <linux/socket.h>
struct tcphdr {
__be16 source;
__be16 dest;
__be32 seq;
__be32 ack_seq;
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u16 res1:4,
doff:4,
fin:1,
syn:1,
rst:1,
psh:1,
ack:1,
urg:1,
ece:1,
cwr:1;
#elif defined(__BIG_ENDIAN_BITFIELD)
__u16 doff:4,
res1:4,
cwr:1,
ece:1,
urg:1,
ack:1,
psh:1,
rst:1,
syn:1,
fin:1;
#else
#error "Adjust your <asm/byteorder.h> defines"
#endif
__be16 window;
__sum16 check;
__be16 urg_ptr;
};
/*
* The union cast uses a gcc extension to avoid aliasing problems
* (union is compatible to any of its members)
* This means this part of the code is -fstrict-aliasing safe now.
*/
union tcp_word_hdr {
struct tcphdr hdr;
__be32 words[5];
};
#define tcp_flag_word(tp) ( ((union tcp_word_hdr *)(tp))->words [3])
enum {
TCP_FLAG_CWR = __constant_cpu_to_be32(0x00800000),
TCP_FLAG_ECE = __constant_cpu_to_be32(0x00400000),
TCP_FLAG_URG = __constant_cpu_to_be32(0x00200000),
TCP_FLAG_ACK = __constant_cpu_to_be32(0x00100000),
TCP_FLAG_PSH = __constant_cpu_to_be32(0x00080000),
TCP_FLAG_RST = __constant_cpu_to_be32(0x00040000),
TCP_FLAG_SYN = __constant_cpu_to_be32(0x00020000),
TCP_FLAG_FIN = __constant_cpu_to_be32(0x00010000),
TCP_RESERVED_BITS = __constant_cpu_to_be32(0x0F000000),
TCP_DATA_OFFSET = __constant_cpu_to_be32(0xF0000000)
};
/*
* TCP general constants
*/
#define TCP_MSS_DEFAULT 536U /* IPv4 (RFC1122, RFC2581) */
#define TCP_MSS_DESIRED 1220U /* IPv6 (tunneled), EDNS0 (RFC3226) */
/* TCP socket options */
#define TCP_NODELAY 1 /* Turn off Nagle's algorithm. */
#define TCP_MAXSEG 2 /* Limit MSS */
#define TCP_CORK 3 /* Never send partially complete segments */
#define TCP_KEEPIDLE 4 /* Start keeplives after this period */
#define TCP_KEEPINTVL 5 /* Interval between keepalives */
#define TCP_KEEPCNT 6 /* Number of keepalives before death */
#define TCP_SYNCNT 7 /* Number of SYN retransmits */
#define TCP_LINGER2 8 /* Life time of orphaned FIN-WAIT-2 state */
#define TCP_DEFER_ACCEPT 9 /* Wake up listener only when data arrive */
#define TCP_WINDOW_CLAMP 10 /* Bound advertised window */
#define TCP_INFO 11 /* Information about this connection. */
#define TCP_QUICKACK 12 /* Block/reenable quick acks */
#define TCP_CONGESTION 13 /* Congestion control algorithm */
#define TCP_MD5SIG 14 /* TCP MD5 Signature (RFC2385) */
#define TCP_THIN_LINEAR_TIMEOUTS 16 /* Use linear timeouts for thin streams*/
#define TCP_THIN_DUPACK 17 /* Fast retrans. after 1 dupack */
#define TCP_USER_TIMEOUT 18 /* How long for loss retry before timeout */
#define TCP_REPAIR 19 /* TCP sock is under repair right now */
#define TCP_REPAIR_QUEUE 20
#define TCP_QUEUE_SEQ 21
#define TCP_REPAIR_OPTIONS 22
#define TCP_FASTOPEN 23 /* Enable FastOpen on listeners */
#define TCP_TIMESTAMP 24
#define TCP_NOTSENT_LOWAT 25 /* limit number of unsent bytes in write queue */
#define TCP_CC_INFO 26 /* Get Congestion Control (optional) info */
#define TCP_SAVE_SYN 27 /* Record SYN headers for new connections */
#define TCP_SAVED_SYN 28 /* Get SYN headers recorded for connection */
#define TCP_REPAIR_WINDOW 29 /* Get/set window parameters */
#define TCP_FASTOPEN_CONNECT 30 /* Attempt FastOpen with connect */
#define TCP_ULP 31 /* Attach a ULP to a TCP connection */
#define TCP_MD5SIG_EXT 32 /* TCP MD5 Signature with extensions */
#define TCP_FASTOPEN_KEY 33 /* Set the key for Fast Open (cookie) */
#define TCP_FASTOPEN_NO_COOKIE 34 /* Enable TFO without a TFO cookie */
#define TCP_ZEROCOPY_RECEIVE 35
#define TCP_INQ 36 /* Notify bytes available to read as a cmsg on read */
#define TCP_CM_INQ TCP_INQ
#define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */
#define TCP_REPAIR_ON 1
#define TCP_REPAIR_OFF 0
#define TCP_REPAIR_OFF_NO_WP -1 /* Turn off without window probes */
struct tcp_repair_opt {
__u32 opt_code;
__u32 opt_val;
};
struct tcp_repair_window {
__u32 snd_wl1;
__u32 snd_wnd;
__u32 max_window;
__u32 rcv_wnd;
__u32 rcv_wup;
};
enum {
TCP_NO_QUEUE,
TCP_RECV_QUEUE,
TCP_SEND_QUEUE,
TCP_QUEUES_NR,
};
/* why fastopen failed from client perspective */
enum tcp_fastopen_client_fail {
TFO_STATUS_UNSPEC, /* catch-all */
TFO_COOKIE_UNAVAILABLE, /* if not in TFO_CLIENT_NO_COOKIE mode */
TFO_DATA_NOT_ACKED, /* SYN-ACK did not ack SYN data */
TFO_SYN_RETRANSMITTED, /* SYN-ACK did not ack SYN data after timeout */
};
/* for TCP_INFO socket option */
#define TCPI_OPT_TIMESTAMPS 1
#define TCPI_OPT_SACK 2
#define TCPI_OPT_WSCALE 4
#define TCPI_OPT_ECN 8 /* ECN was negociated at TCP session init */
#define TCPI_OPT_ECN_SEEN 16 /* we received at least one packet with ECT */
#define TCPI_OPT_SYN_DATA 32 /* SYN-ACK acked data in SYN sent or rcvd */
/*
* Sender's congestion state indicating normal or abnormal situations
* in the last round of packets sent. The state is driven by the ACK
* information and timer events.
*/
enum tcp_ca_state {
/*
* Nothing bad has been observed recently.
* No apparent reordering, packet loss, or ECN marks.
*/
TCP_CA_Open = 0,
#define TCPF_CA_Open (1<<TCP_CA_Open)
/*
* The sender enters disordered state when it has received DUPACKs or
* SACKs in the last round of packets sent. This could be due to packet
* loss or reordering but needs further information to confirm packets
* have been lost.
*/
TCP_CA_Disorder = 1,
#define TCPF_CA_Disorder (1<<TCP_CA_Disorder)
/*
* The sender enters Congestion Window Reduction (CWR) state when it
* has received ACKs with ECN-ECE marks, or has experienced congestion
* or packet discard on the sender host (e.g. qdisc).
*/
TCP_CA_CWR = 2,
#define TCPF_CA_CWR (1<<TCP_CA_CWR)
/*
* The sender is in fast recovery and retransmitting lost packets,
* typically triggered by ACK events.
*/
TCP_CA_Recovery = 3,
#define TCPF_CA_Recovery (1<<TCP_CA_Recovery)
/*
* The sender is in loss recovery triggered by retransmission timeout.
*/
TCP_CA_Loss = 4
#define TCPF_CA_Loss (1<<TCP_CA_Loss)
};
struct tcp_info {
__u8 tcpi_state;
__u8 tcpi_ca_state;
__u8 tcpi_retransmits;
__u8 tcpi_probes;
__u8 tcpi_backoff;
__u8 tcpi_options;
__u8 tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4;
__u8 tcpi_delivery_rate_app_limited:1, tcpi_fastopen_client_fail:2;
__u32 tcpi_rto;
__u32 tcpi_ato;
__u32 tcpi_snd_mss;
__u32 tcpi_rcv_mss;
__u32 tcpi_unacked;
__u32 tcpi_sacked;
__u32 tcpi_lost;
__u32 tcpi_retrans;
__u32 tcpi_fackets;
/* Times. */
__u32 tcpi_last_data_sent;
__u32 tcpi_last_ack_sent; /* Not remembered, sorry. */
__u32 tcpi_last_data_recv;
__u32 tcpi_last_ack_recv;
/* Metrics. */
__u32 tcpi_pmtu;
__u32 tcpi_rcv_ssthresh;
__u32 tcpi_rtt;
__u32 tcpi_rttvar;
__u32 tcpi_snd_ssthresh;
__u32 tcpi_snd_cwnd;
__u32 tcpi_advmss;
__u32 tcpi_reordering;
__u32 tcpi_rcv_rtt;
__u32 tcpi_rcv_space;
__u32 tcpi_total_retrans;
__u64 tcpi_pacing_rate;
__u64 tcpi_max_pacing_rate;
__u64 tcpi_bytes_acked; /* RFC4898 tcpEStatsAppHCThruOctetsAcked */
__u64 tcpi_bytes_received; /* RFC4898 tcpEStatsAppHCThruOctetsReceived */
__u32 tcpi_segs_out; /* RFC4898 tcpEStatsPerfSegsOut */
__u32 tcpi_segs_in; /* RFC4898 tcpEStatsPerfSegsIn */
__u32 tcpi_notsent_bytes;
__u32 tcpi_min_rtt;
__u32 tcpi_data_segs_in; /* RFC4898 tcpEStatsDataSegsIn */
__u32 tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */
__u64 tcpi_delivery_rate;
__u64 tcpi_busy_time; /* Time (usec) busy sending data */
__u64 tcpi_rwnd_limited; /* Time (usec) limited by receive window */
__u64 tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
__u32 tcpi_delivered;
__u32 tcpi_delivered_ce;
__u64 tcpi_bytes_sent; /* RFC4898 tcpEStatsPerfHCDataOctetsOut */
__u64 tcpi_bytes_retrans; /* RFC4898 tcpEStatsPerfOctetsRetrans */
__u32 tcpi_dsack_dups; /* RFC4898 tcpEStatsStackDSACKDups */
__u32 tcpi_reord_seen; /* reordering events seen */
__u32 tcpi_rcv_ooopack; /* Out-of-order packets received */
__u32 tcpi_snd_wnd; /* peer's advertised receive window after
* scaling (bytes)
*/
};
/* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
enum {
TCP_NLA_PAD,
TCP_NLA_BUSY, /* Time (usec) busy sending data */
TCP_NLA_RWND_LIMITED, /* Time (usec) limited by receive window */
TCP_NLA_SNDBUF_LIMITED, /* Time (usec) limited by send buffer */
TCP_NLA_DATA_SEGS_OUT, /* Data pkts sent including retransmission */
TCP_NLA_TOTAL_RETRANS, /* Data pkts retransmitted */
TCP_NLA_PACING_RATE, /* Pacing rate in bytes per second */
TCP_NLA_DELIVERY_RATE, /* Delivery rate in bytes per second */
TCP_NLA_SND_CWND, /* Sending congestion window */
TCP_NLA_REORDERING, /* Reordering metric */
TCP_NLA_MIN_RTT, /* minimum RTT */
TCP_NLA_RECUR_RETRANS, /* Recurring retransmits for the current pkt */
TCP_NLA_DELIVERY_RATE_APP_LMT, /* delivery rate application limited ? */
TCP_NLA_SNDQ_SIZE, /* Data (bytes) pending in send queue */
TCP_NLA_CA_STATE, /* ca_state of socket */
TCP_NLA_SND_SSTHRESH, /* Slow start size threshold */
TCP_NLA_DELIVERED, /* Data pkts delivered incl. out-of-order */
TCP_NLA_DELIVERED_CE, /* Like above but only ones w/ CE marks */
TCP_NLA_BYTES_SENT, /* Data bytes sent including retransmission */
TCP_NLA_BYTES_RETRANS, /* Data bytes retransmitted */
TCP_NLA_DSACK_DUPS, /* DSACK blocks received */
TCP_NLA_REORD_SEEN, /* reordering events seen */
TCP_NLA_SRTT, /* smoothed RTT in usecs */
TCP_NLA_TIMEOUT_REHASH, /* Timeout-triggered rehash attempts */
TCP_NLA_BYTES_NOTSENT, /* Bytes in write queue not yet sent */
TCP_NLA_EDT, /* Earliest departure time (CLOCK_MONOTONIC) */
};
/* for TCP_MD5SIG socket option */
#define TCP_MD5SIG_MAXKEYLEN 80
/* tcp_md5sig extension flags for TCP_MD5SIG_EXT */
#define TCP_MD5SIG_FLAG_PREFIX 0x1 /* address prefix length */
#define TCP_MD5SIG_FLAG_IFINDEX 0x2 /* ifindex set */
struct tcp_md5sig {
struct __kernel_sockaddr_storage tcpm_addr; /* address associated */
__u8 tcpm_flags; /* extension flags */
__u8 tcpm_prefixlen; /* address prefix */
__u16 tcpm_keylen; /* key length */
int tcpm_ifindex; /* device index for scope */
__u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
};
/* INET_DIAG_MD5SIG */
struct tcp_diag_md5sig {
__u8 tcpm_family;
__u8 tcpm_prefixlen;
__u16 tcpm_keylen;
__be32 tcpm_addr[4];
__u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN];
};
/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1
struct tcp_zerocopy_receive {
__u64 address; /* in: address of mapping */
__u32 length; /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint; /* out: amount of bytes to skip */
__u32 inq; /* out: amount of bytes in read queue */
__s32 err; /* out: socket error */
__u64 copybuf_address; /* in: copybuf address (small reads) */
__s32 copybuf_len; /* in/out: copybuf bytes avail/used or error */
__u32 flags; /* in: flags */
};
#endif /* _UAPI_LINUX_TCP_H */

View File

@ -1,7 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only # SPDX-License-Identifier: GPL-2.0-only
libbpf_version.h libbpf_version.h
libbpf.pc libbpf.pc
FEATURE-DUMP.libbpf
libbpf.so.* libbpf.so.*
TAGS TAGS
tags tags

View File

@ -58,28 +58,7 @@ ifndef VERBOSE
VERBOSE = 0 VERBOSE = 0
endif endif
FEATURE_USER = .libbpf
FEATURE_TESTS = libelf zlib bpf
FEATURE_DISPLAY = libelf zlib bpf
INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/include/uapi INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/include/uapi
FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES)
check_feat := 1
NON_CHECK_FEAT_TARGETS := clean TAGS tags cscope help
ifdef MAKECMDGOALS
ifeq ($(filter-out $(NON_CHECK_FEAT_TARGETS),$(MAKECMDGOALS)),)
check_feat := 0
endif
endif
ifeq ($(check_feat),1)
ifeq ($(FEATURES_DUMP),)
include $(srctree)/tools/build/Makefile.feature
else
include $(FEATURES_DUMP)
endif
endif
export prefix libdir src obj export prefix libdir src obj
@ -157,7 +136,7 @@ all: fixdep
all_cmd: $(CMD_TARGETS) check all_cmd: $(CMD_TARGETS) check
$(BPF_IN_SHARED): force elfdep zdep bpfdep $(BPF_HELPER_DEFS) $(BPF_IN_SHARED): force $(BPF_HELPER_DEFS)
@(test -f ../../include/uapi/linux/bpf.h -a -f ../../../include/uapi/linux/bpf.h && ( \ @(test -f ../../include/uapi/linux/bpf.h -a -f ../../../include/uapi/linux/bpf.h && ( \
(diff -B ../../include/uapi/linux/bpf.h ../../../include/uapi/linux/bpf.h >/dev/null) || \ (diff -B ../../include/uapi/linux/bpf.h ../../../include/uapi/linux/bpf.h >/dev/null) || \
echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'" >&2 )) || true echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'" >&2 )) || true
@ -175,7 +154,7 @@ $(BPF_IN_SHARED): force elfdep zdep bpfdep $(BPF_HELPER_DEFS)
echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'" >&2 )) || true echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'" >&2 )) || true
$(Q)$(MAKE) $(build)=libbpf OUTPUT=$(SHARED_OBJDIR) CFLAGS="$(CFLAGS) $(SHLIB_FLAGS)" $(Q)$(MAKE) $(build)=libbpf OUTPUT=$(SHARED_OBJDIR) CFLAGS="$(CFLAGS) $(SHLIB_FLAGS)"
$(BPF_IN_STATIC): force elfdep zdep bpfdep $(BPF_HELPER_DEFS) $(BPF_IN_STATIC): force $(BPF_HELPER_DEFS)
$(Q)$(MAKE) $(build)=libbpf OUTPUT=$(STATIC_OBJDIR) $(Q)$(MAKE) $(build)=libbpf OUTPUT=$(STATIC_OBJDIR)
$(BPF_HELPER_DEFS): $(srctree)/tools/include/uapi/linux/bpf.h $(BPF_HELPER_DEFS): $(srctree)/tools/include/uapi/linux/bpf.h
@ -264,34 +243,16 @@ install_pkgconfig: $(PC_FILE)
install: install_lib install_pkgconfig install_headers install: install_lib install_pkgconfig install_headers
### Cleaning rules clean:
config-clean:
$(call QUIET_CLEAN, feature-detect)
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null
clean: config-clean
$(call QUIET_CLEAN, libbpf) $(RM) -rf $(CMD_TARGETS) \ $(call QUIET_CLEAN, libbpf) $(RM) -rf $(CMD_TARGETS) \
*~ .*.d .*.cmd LIBBPF-CFLAGS $(BPF_HELPER_DEFS) \ *~ .*.d .*.cmd LIBBPF-CFLAGS $(BPF_HELPER_DEFS) \
$(SHARED_OBJDIR) $(STATIC_OBJDIR) \ $(SHARED_OBJDIR) $(STATIC_OBJDIR) \
$(addprefix $(OUTPUT), \ $(addprefix $(OUTPUT), \
*.o *.a *.so *.so.$(LIBBPF_MAJOR_VERSION) *.pc) *.o *.a *.so *.so.$(LIBBPF_MAJOR_VERSION) *.pc)
$(call QUIET_CLEAN, core-gen) $(RM) $(OUTPUT)FEATURE-DUMP.libbpf
PHONY += force cscope tags
PHONY += force elfdep zdep bpfdep cscope tags
force: force:
elfdep:
@if [ "$(feature-libelf)" != "1" ]; then echo "No libelf found"; exit 1 ; fi
zdep:
@if [ "$(feature-zlib)" != "1" ]; then echo "No zlib found"; exit 1 ; fi
bpfdep:
@if [ "$(feature-bpf)" != "1" ]; then echo "BPF API too old"; exit 1 ; fi
cscope: cscope:
ls *.c *.h > cscope.files ls *.c *.h > cscope.files
cscope -b -q -I $(srctree)/include -f cscope.out cscope -b -q -I $(srctree)/include -f cscope.out

View File

@ -858,6 +858,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
Elf_Scn *scn = NULL; Elf_Scn *scn = NULL;
Elf *elf = NULL; Elf *elf = NULL;
GElf_Ehdr ehdr; GElf_Ehdr ehdr;
size_t shstrndx;
if (elf_version(EV_CURRENT) == EV_NONE) { if (elf_version(EV_CURRENT) == EV_NONE) {
pr_warn("failed to init libelf for %s\n", path); pr_warn("failed to init libelf for %s\n", path);
@ -882,7 +883,14 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
pr_warn("failed to get EHDR from %s\n", path); pr_warn("failed to get EHDR from %s\n", path);
goto done; goto done;
} }
if (!elf_rawdata(elf_getscn(elf, ehdr.e_shstrndx), NULL)) {
if (elf_getshdrstrndx(elf, &shstrndx)) {
pr_warn("failed to get section names section index for %s\n",
path);
goto done;
}
if (!elf_rawdata(elf_getscn(elf, shstrndx), NULL)) {
pr_warn("failed to get e_shstrndx from %s\n", path); pr_warn("failed to get e_shstrndx from %s\n", path);
goto done; goto done;
} }
@ -897,7 +905,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
idx, path); idx, path);
goto done; goto done;
} }
name = elf_strptr(elf, ehdr.e_shstrndx, sh.sh_name); name = elf_strptr(elf, shstrndx, sh.sh_name);
if (!name) { if (!name) {
pr_warn("failed to get section(%d) name from %s\n", pr_warn("failed to get section(%d) name from %s\n",
idx, path); idx, path);

View File

@ -884,24 +884,24 @@ static int bpf_map__init_kern_struct_ops(struct bpf_map *map,
if (btf_is_ptr(mtype)) { if (btf_is_ptr(mtype)) {
struct bpf_program *prog; struct bpf_program *prog;
mtype = skip_mods_and_typedefs(btf, mtype->type, &mtype_id); prog = st_ops->progs[i];
if (!prog)
continue;
kern_mtype = skip_mods_and_typedefs(kern_btf, kern_mtype = skip_mods_and_typedefs(kern_btf,
kern_mtype->type, kern_mtype->type,
&kern_mtype_id); &kern_mtype_id);
if (!btf_is_func_proto(mtype) ||
!btf_is_func_proto(kern_mtype)) { /* mtype->type must be a func_proto which was
pr_warn("struct_ops init_kern %s: non func ptr %s is not supported\n", * guaranteed in bpf_object__collect_st_ops_relos(),
* so only check kern_mtype for func_proto here.
*/
if (!btf_is_func_proto(kern_mtype)) {
pr_warn("struct_ops init_kern %s: kernel member %s is not a func ptr\n",
map->name, mname); map->name, mname);
return -ENOTSUP; return -ENOTSUP;
} }
prog = st_ops->progs[i];
if (!prog) {
pr_debug("struct_ops init_kern %s: func ptr %s is not set\n",
map->name, mname);
continue;
}
prog->attach_btf_id = kern_type_id; prog->attach_btf_id = kern_type_id;
prog->expected_attach_type = kern_member_idx; prog->expected_attach_type = kern_member_idx;

View File

@ -46,6 +46,11 @@
#define PF_XDP AF_XDP #define PF_XDP AF_XDP
#endif #endif
enum xsk_prog {
XSK_PROG_FALLBACK,
XSK_PROG_REDIRECT_FLAGS,
};
struct xsk_umem { struct xsk_umem {
struct xsk_ring_prod *fill_save; struct xsk_ring_prod *fill_save;
struct xsk_ring_cons *comp_save; struct xsk_ring_cons *comp_save;
@ -351,6 +356,54 @@ int xsk_umem__create_v0_0_2(struct xsk_umem **umem_ptr, void *umem_area,
COMPAT_VERSION(xsk_umem__create_v0_0_2, xsk_umem__create, LIBBPF_0.0.2) COMPAT_VERSION(xsk_umem__create_v0_0_2, xsk_umem__create, LIBBPF_0.0.2)
DEFAULT_VERSION(xsk_umem__create_v0_0_4, xsk_umem__create, LIBBPF_0.0.4) DEFAULT_VERSION(xsk_umem__create_v0_0_4, xsk_umem__create, LIBBPF_0.0.4)
static enum xsk_prog get_xsk_prog(void)
{
enum xsk_prog detected = XSK_PROG_FALLBACK;
struct bpf_load_program_attr prog_attr;
struct bpf_create_map_attr map_attr;
__u32 size_out, retval, duration;
char data_in = 0, data_out;
struct bpf_insn insns[] = {
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_MOV64_IMM(BPF_REG_3, XDP_PASS),
BPF_EMIT_CALL(BPF_FUNC_redirect_map),
BPF_EXIT_INSN(),
};
int prog_fd, map_fd, ret;
memset(&map_attr, 0, sizeof(map_attr));
map_attr.map_type = BPF_MAP_TYPE_XSKMAP;
map_attr.key_size = sizeof(int);
map_attr.value_size = sizeof(int);
map_attr.max_entries = 1;
map_fd = bpf_create_map_xattr(&map_attr);
if (map_fd < 0)
return detected;
insns[0].imm = map_fd;
memset(&prog_attr, 0, sizeof(prog_attr));
prog_attr.prog_type = BPF_PROG_TYPE_XDP;
prog_attr.insns = insns;
prog_attr.insns_cnt = ARRAY_SIZE(insns);
prog_attr.license = "GPL";
prog_fd = bpf_load_program_xattr(&prog_attr, NULL, 0);
if (prog_fd < 0) {
close(map_fd);
return detected;
}
ret = bpf_prog_test_run(prog_fd, 0, &data_in, 1, &data_out, &size_out, &retval, &duration);
if (!ret && retval == XDP_PASS)
detected = XSK_PROG_REDIRECT_FLAGS;
close(prog_fd);
close(map_fd);
return detected;
}
static int xsk_load_xdp_prog(struct xsk_socket *xsk) static int xsk_load_xdp_prog(struct xsk_socket *xsk)
{ {
static const int log_buf_size = 16 * 1024; static const int log_buf_size = 16 * 1024;
@ -358,7 +411,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
char log_buf[log_buf_size]; char log_buf[log_buf_size];
int err, prog_fd; int err, prog_fd;
/* This is the C-program: /* This is the fallback C-program:
* SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
* { * {
* int ret, index = ctx->rx_queue_index; * int ret, index = ctx->rx_queue_index;
@ -414,9 +467,31 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
/* The jumps are to this instruction */ /* The jumps are to this instruction */
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
}; };
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt, /* This is the post-5.3 kernel C-program:
* SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
* {
* return bpf_redirect_map(&xsks_map, ctx->rx_queue_index, XDP_PASS);
* }
*/
struct bpf_insn prog_redirect_flags[] = {
/* r2 = *(u32 *)(r1 + 16) */
BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 16),
/* r1 = xskmap[] */
BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd),
/* r3 = XDP_PASS */
BPF_MOV64_IMM(BPF_REG_3, 2),
/* call bpf_redirect_map */
BPF_EMIT_CALL(BPF_FUNC_redirect_map),
BPF_EXIT_INSN(),
};
size_t insns_cnt[] = {sizeof(prog) / sizeof(struct bpf_insn),
sizeof(prog_redirect_flags) / sizeof(struct bpf_insn),
};
struct bpf_insn *progs[] = {prog, prog_redirect_flags};
enum xsk_prog option = get_xsk_prog();
prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, progs[option], insns_cnt[option],
"LGPL-2.1 or BSD-2-Clause", 0, log_buf, "LGPL-2.1 or BSD-2-Clause", 0, log_buf,
log_buf_size); log_buf_size);
if (prog_fd < 0) { if (prog_fd < 0) {
@ -442,7 +517,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
struct ifreq ifr = {}; struct ifreq ifr = {};
int fd, err, ret; int fd, err, ret;
fd = socket(AF_INET, SOCK_DGRAM, 0); fd = socket(AF_LOCAL, SOCK_DGRAM, 0);
if (fd < 0) if (fd < 0)
return -errno; return -errno;

View File

@ -176,7 +176,6 @@ endef
LD += $(EXTRA_LDFLAGS) LD += $(EXTRA_LDFLAGS)
PKG_CONFIG = $(CROSS_COMPILE)pkg-config PKG_CONFIG = $(CROSS_COMPILE)pkg-config
LLVM_CONFIG ?= llvm-config
RM = rm -f RM = rm -f
LN = ln -f LN = ln -f

View File

@ -69,6 +69,13 @@ HOSTCC ?= gcc
HOSTLD ?= ld HOSTLD ?= ld
endif endif
# Some tools require Clang, LLC and/or LLVM utils
CLANG ?= clang
LLC ?= llc
LLVM_CONFIG ?= llvm-config
LLVM_OBJCOPY ?= llvm-objcopy
LLVM_STRIP ?= llvm-strip
ifeq ($(CC_NO_CLANG), 1) ifeq ($(CC_NO_CLANG), 1)
EXTRA_WARNINGS += -Wstrict-aliasing=3 EXTRA_WARNINGS += -Wstrict-aliasing=3
endif endif

View File

@ -17,7 +17,6 @@ test_sockmap
test_lirc_mode2_user test_lirc_mode2_user
get_cgroup_id_user get_cgroup_id_user
test_skb_cgroup_id_user test_skb_cgroup_id_user
test_socket_cookie
test_cgroup_storage test_cgroup_storage
test_flow_dissector test_flow_dissector
flow_dissector_load flow_dissector_load
@ -26,7 +25,6 @@ test_tcpnotify_user
test_libbpf test_libbpf
test_tcp_check_syncookie_user test_tcp_check_syncookie_user
test_sysctl test_sysctl
test_current_pid_tgid_new_ns
xdping xdping
test_cpp test_cpp
*.skel.h *.skel.h

View File

@ -19,8 +19,6 @@ ifneq ($(wildcard $(GENHDR)),)
GENFLAGS := -DHAVE_GENHDR GENFLAGS := -DHAVE_GENHDR
endif endif
CLANG ?= clang
LLVM_OBJCOPY ?= llvm-objcopy
BPF_GCC ?= $(shell command -v bpf-gcc;) BPF_GCC ?= $(shell command -v bpf-gcc;)
SAN_CFLAGS ?= SAN_CFLAGS ?=
CFLAGS += -g -rdynamic -Wall -O2 $(GENFLAGS) $(SAN_CFLAGS) \ CFLAGS += -g -rdynamic -Wall -O2 $(GENFLAGS) $(SAN_CFLAGS) \
@ -33,11 +31,10 @@ LDLIBS += -lcap -lelf -lz -lrt -lpthread
# Order correspond to 'make run_tests' order # Order correspond to 'make run_tests' order
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
test_verifier_log test_dev_cgroup \ test_verifier_log test_dev_cgroup \
test_sock test_sockmap get_cgroup_id_user test_socket_cookie \ test_sock test_sockmap get_cgroup_id_user \
test_cgroup_storage \ test_cgroup_storage \
test_netcnt test_tcpnotify_user test_sysctl \ test_netcnt test_tcpnotify_user test_sysctl \
test_progs-no_alu32 \ test_progs-no_alu32
test_current_pid_tgid_new_ns
# Also test bpf-gcc, if present # Also test bpf-gcc, if present
ifneq ($(BPF_GCC),) ifneq ($(BPF_GCC),)
@ -188,7 +185,6 @@ $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
$(OUTPUT)/test_skb_cgroup_id_user: cgroup_helpers.c $(OUTPUT)/test_skb_cgroup_id_user: cgroup_helpers.c
$(OUTPUT)/test_sock: cgroup_helpers.c $(OUTPUT)/test_sock: cgroup_helpers.c
$(OUTPUT)/test_sock_addr: cgroup_helpers.c $(OUTPUT)/test_sock_addr: cgroup_helpers.c
$(OUTPUT)/test_socket_cookie: cgroup_helpers.c
$(OUTPUT)/test_sockmap: cgroup_helpers.c $(OUTPUT)/test_sockmap: cgroup_helpers.c
$(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c $(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c
$(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c

View File

@ -6,6 +6,30 @@ General instructions on running selftests can be found in
__ /Documentation/bpf/bpf_devel_QA.rst#q-how-to-run-bpf-selftests __ /Documentation/bpf/bpf_devel_QA.rst#q-how-to-run-bpf-selftests
=========================
Running Selftests in a VM
=========================
It's now possible to run the selftests using ``tools/testing/selftests/bpf/vmtest.sh``.
The script tries to ensure that the tests are run with the same environment as they
would be run post-submit in the CI used by the Maintainers.
This script downloads a suitable Kconfig and VM userspace image from the system used by
the CI. It builds the kernel (without overwriting your existing Kconfig), recompiles the
bpf selftests, runs them (by default ``tools/testing/selftests/bpf/test_progs``) and
saves the resulting output (by default in ``~/.bpf_selftests``).
For more information on about using the script, run:
.. code-block:: console
$ tools/testing/selftests/bpf/vmtest.sh -h
.. note:: The script uses pahole and clang based on host environment setting.
If you want to change pahole and llvm, you can change `PATH` environment
variable in the beginning of script.
.. note:: The script currently only supports x86_64.
Additional information about selftest failures are Additional information about selftest failures are
documented here. documented here.

View File

@ -319,7 +319,7 @@ static void ringbuf_custom_process_ring(struct ringbuf_custom *r)
smp_store_release(r->consumer_pos, cons_pos); smp_store_release(r->consumer_pos, cons_pos);
else else
break; break;
}; }
} }
static void *ringbuf_custom_consumer(void *input) static void *ringbuf_custom_consumer(void *input)

View File

@ -0,0 +1,21 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <sys/socket.h>
#include <bpf/bpf_helpers.h>
int get_set_sk_priority(void *ctx)
{
int prio;
/* Verify that context allows calling bpf_getsockopt and
* bpf_setsockopt by reading and writing back socket
* priority.
*/
if (bpf_getsockopt(ctx, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio)))
return 0;
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio)))
return 0;
return 1;
}

View File

@ -177,6 +177,7 @@ struct tcp_congestion_ops {
* after all the ca_state processing. (optional) * after all the ca_state processing. (optional)
*/ */
void (*cong_control)(struct sock *sk, const struct rate_sample *rs); void (*cong_control)(struct sock *sk, const struct rate_sample *rs);
void *owner;
}; };
#define min(a, b) ((a) < (b) ? (a) : (b)) #define min(a, b) ((a) < (b) ? (a) : (b))

View File

@ -28,6 +28,12 @@ TRACE_EVENT(bpf_testmod_test_read,
__entry->pid, __entry->comm, __entry->off, __entry->len) __entry->pid, __entry->comm, __entry->off, __entry->len)
); );
/* A bare tracepoint with no event associated with it */
DECLARE_TRACE(bpf_testmod_test_write_bare,
TP_PROTO(struct task_struct *task, struct bpf_testmod_test_write_ctx *ctx),
TP_ARGS(task, ctx)
);
#endif /* _BPF_TESTMOD_EVENTS_H */ #endif /* _BPF_TESTMOD_EVENTS_H */
#undef TRACE_INCLUDE_PATH #undef TRACE_INCLUDE_PATH

View File

@ -31,9 +31,28 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
EXPORT_SYMBOL(bpf_testmod_test_read); EXPORT_SYMBOL(bpf_testmod_test_read);
ALLOW_ERROR_INJECTION(bpf_testmod_test_read, ERRNO); ALLOW_ERROR_INJECTION(bpf_testmod_test_read, ERRNO);
noinline ssize_t
bpf_testmod_test_write(struct file *file, struct kobject *kobj,
struct bin_attribute *bin_attr,
char *buf, loff_t off, size_t len)
{
struct bpf_testmod_test_write_ctx ctx = {
.buf = buf,
.off = off,
.len = len,
};
trace_bpf_testmod_test_write_bare(current, &ctx);
return -EIO; /* always fail */
}
EXPORT_SYMBOL(bpf_testmod_test_write);
ALLOW_ERROR_INJECTION(bpf_testmod_test_write, ERRNO);
static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = { static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
.attr = { .name = "bpf_testmod", .mode = 0444, }, .attr = { .name = "bpf_testmod", .mode = 0666, },
.read = bpf_testmod_test_read, .read = bpf_testmod_test_read,
.write = bpf_testmod_test_write,
}; };
static int bpf_testmod_init(void) static int bpf_testmod_init(void)

View File

@ -11,4 +11,10 @@ struct bpf_testmod_test_read_ctx {
size_t len; size_t len;
}; };
struct bpf_testmod_test_write_ctx {
char *buf;
loff_t off;
size_t len;
};
#endif /* _BPF_TESTMOD_H */ #endif /* _BPF_TESTMOD_H */

View File

@ -0,0 +1,17 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include "atomic_bounds.skel.h"
void test_atomic_bounds(void)
{
struct atomic_bounds *skel;
__u32 duration = 0;
skel = atomic_bounds__open_and_load();
if (CHECK(!skel, "skel_load", "couldn't load program\n"))
return;
atomic_bounds__destroy(skel);
}

View File

@ -0,0 +1,109 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include "bind_perm.skel.h"
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/capability.h>
static int duration;
void try_bind(int family, int port, int expected_errno)
{
struct sockaddr_storage addr = {};
struct sockaddr_in6 *sin6;
struct sockaddr_in *sin;
int fd = -1;
fd = socket(family, SOCK_STREAM, 0);
if (CHECK(fd < 0, "fd", "errno %d", errno))
goto close_socket;
if (family == AF_INET) {
sin = (struct sockaddr_in *)&addr;
sin->sin_family = family;
sin->sin_port = htons(port);
} else {
sin6 = (struct sockaddr_in6 *)&addr;
sin6->sin6_family = family;
sin6->sin6_port = htons(port);
}
errno = 0;
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
ASSERT_EQ(errno, expected_errno, "bind");
close_socket:
if (fd >= 0)
close(fd);
}
bool cap_net_bind_service(cap_flag_value_t flag)
{
const cap_value_t cap_net_bind_service = CAP_NET_BIND_SERVICE;
cap_flag_value_t original_value;
bool was_effective = false;
cap_t caps;
caps = cap_get_proc();
if (CHECK(!caps, "cap_get_proc", "errno %d", errno))
goto free_caps;
if (CHECK(cap_get_flag(caps, CAP_NET_BIND_SERVICE, CAP_EFFECTIVE,
&original_value),
"cap_get_flag", "errno %d", errno))
goto free_caps;
was_effective = (original_value == CAP_SET);
if (CHECK(cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_net_bind_service,
flag),
"cap_set_flag", "errno %d", errno))
goto free_caps;
if (CHECK(cap_set_proc(caps), "cap_set_proc", "errno %d", errno))
goto free_caps;
free_caps:
CHECK(cap_free(caps), "cap_free", "errno %d", errno);
return was_effective;
}
void test_bind_perm(void)
{
bool cap_was_effective;
struct bind_perm *skel;
int cgroup_fd;
cgroup_fd = test__join_cgroup("/bind_perm");
if (CHECK(cgroup_fd < 0, "cg-join", "errno %d", errno))
return;
skel = bind_perm__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel"))
goto close_cgroup_fd;
skel->links.bind_v4_prog = bpf_program__attach_cgroup(skel->progs.bind_v4_prog, cgroup_fd);
if (!ASSERT_OK_PTR(skel, "bind_v4_prog"))
goto close_skeleton;
skel->links.bind_v6_prog = bpf_program__attach_cgroup(skel->progs.bind_v6_prog, cgroup_fd);
if (!ASSERT_OK_PTR(skel, "bind_v6_prog"))
goto close_skeleton;
cap_was_effective = cap_net_bind_service(CAP_CLEAR);
try_bind(AF_INET, 110, EACCES);
try_bind(AF_INET6, 110, EACCES);
try_bind(AF_INET, 111, 0);
try_bind(AF_INET6, 111, 0);
if (cap_was_effective)
cap_net_bind_service(CAP_SET);
close_skeleton:
bind_perm__destroy(skel);
close_cgroup_fd:
close(cgroup_fd);
}

View File

@ -7,6 +7,7 @@
#include "bpf_iter_task.skel.h" #include "bpf_iter_task.skel.h"
#include "bpf_iter_task_stack.skel.h" #include "bpf_iter_task_stack.skel.h"
#include "bpf_iter_task_file.skel.h" #include "bpf_iter_task_file.skel.h"
#include "bpf_iter_task_vma.skel.h"
#include "bpf_iter_task_btf.skel.h" #include "bpf_iter_task_btf.skel.h"
#include "bpf_iter_tcp4.skel.h" #include "bpf_iter_tcp4.skel.h"
#include "bpf_iter_tcp6.skel.h" #include "bpf_iter_tcp6.skel.h"
@ -64,6 +65,22 @@ free_link:
bpf_link__destroy(link); bpf_link__destroy(link);
} }
static int read_fd_into_buffer(int fd, char *buf, int size)
{
int bufleft = size;
int len;
do {
len = read(fd, buf, bufleft);
if (len > 0) {
buf += len;
bufleft -= len;
}
} while (len > 0);
return len < 0 ? len : size - bufleft;
}
static void test_ipv6_route(void) static void test_ipv6_route(void)
{ {
struct bpf_iter_ipv6_route *skel; struct bpf_iter_ipv6_route *skel;
@ -177,7 +194,7 @@ static int do_btf_read(struct bpf_iter_task_btf *skel)
{ {
struct bpf_program *prog = skel->progs.dump_task_struct; struct bpf_program *prog = skel->progs.dump_task_struct;
struct bpf_iter_task_btf__bss *bss = skel->bss; struct bpf_iter_task_btf__bss *bss = skel->bss;
int iter_fd = -1, len = 0, bufleft = TASKBUFSZ; int iter_fd = -1, err;
struct bpf_link *link; struct bpf_link *link;
char *buf = taskbuf; char *buf = taskbuf;
int ret = 0; int ret = 0;
@ -190,14 +207,7 @@ static int do_btf_read(struct bpf_iter_task_btf *skel)
if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n")) if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
goto free_link; goto free_link;
do { err = read_fd_into_buffer(iter_fd, buf, TASKBUFSZ);
len = read(iter_fd, buf, bufleft);
if (len > 0) {
buf += len;
bufleft -= len;
}
} while (len > 0);
if (bss->skip) { if (bss->skip) {
printf("%s:SKIP:no __builtin_btf_type_id\n", __func__); printf("%s:SKIP:no __builtin_btf_type_id\n", __func__);
ret = 1; ret = 1;
@ -205,7 +215,7 @@ static int do_btf_read(struct bpf_iter_task_btf *skel)
goto free_link; goto free_link;
} }
if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno))) if (CHECK(err < 0, "read", "read failed: %s\n", strerror(errno)))
goto free_link; goto free_link;
CHECK(strstr(taskbuf, "(struct task_struct)") == NULL, CHECK(strstr(taskbuf, "(struct task_struct)") == NULL,
@ -1133,6 +1143,92 @@ static void test_buf_neg_offset(void)
bpf_iter_test_kern6__destroy(skel); bpf_iter_test_kern6__destroy(skel);
} }
#define CMP_BUFFER_SIZE 1024
static char task_vma_output[CMP_BUFFER_SIZE];
static char proc_maps_output[CMP_BUFFER_SIZE];
/* remove \0 and \t from str, and only keep the first line */
static void str_strip_first_line(char *str)
{
char *dst = str, *src = str;
do {
if (*src == ' ' || *src == '\t')
src++;
else
*(dst++) = *(src++);
} while (*src != '\0' && *src != '\n');
*dst = '\0';
}
#define min(a, b) ((a) < (b) ? (a) : (b))
static void test_task_vma(void)
{
int err, iter_fd = -1, proc_maps_fd = -1;
struct bpf_iter_task_vma *skel;
int len, read_size = 4;
char maps_path[64];
skel = bpf_iter_task_vma__open();
if (CHECK(!skel, "bpf_iter_task_vma__open", "skeleton open failed\n"))
return;
skel->bss->pid = getpid();
err = bpf_iter_task_vma__load(skel);
if (CHECK(err, "bpf_iter_task_vma__load", "skeleton load failed\n"))
goto out;
skel->links.proc_maps = bpf_program__attach_iter(
skel->progs.proc_maps, NULL);
if (CHECK(IS_ERR(skel->links.proc_maps), "bpf_program__attach_iter",
"attach iterator failed\n")) {
skel->links.proc_maps = NULL;
goto out;
}
iter_fd = bpf_iter_create(bpf_link__fd(skel->links.proc_maps));
if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
goto out;
/* Read CMP_BUFFER_SIZE (1kB) from bpf_iter. Read in small chunks
* to trigger seq_file corner cases. The expected output is much
* longer than 1kB, so the while loop will terminate.
*/
len = 0;
while (len < CMP_BUFFER_SIZE) {
err = read_fd_into_buffer(iter_fd, task_vma_output + len,
min(read_size, CMP_BUFFER_SIZE - len));
if (CHECK(err < 0, "read_iter_fd", "read_iter_fd failed\n"))
goto out;
len += err;
}
/* read CMP_BUFFER_SIZE (1kB) from /proc/pid/maps */
snprintf(maps_path, 64, "/proc/%u/maps", skel->bss->pid);
proc_maps_fd = open(maps_path, O_RDONLY);
if (CHECK(proc_maps_fd < 0, "open_proc_maps", "open_proc_maps failed\n"))
goto out;
err = read_fd_into_buffer(proc_maps_fd, proc_maps_output, CMP_BUFFER_SIZE);
if (CHECK(err < 0, "read_prog_maps_fd", "read_prog_maps_fd failed\n"))
goto out;
/* strip and compare the first line of the two files */
str_strip_first_line(task_vma_output);
str_strip_first_line(proc_maps_output);
CHECK(strcmp(task_vma_output, proc_maps_output), "compare_output",
"found mismatch\n");
out:
close(proc_maps_fd);
close(iter_fd);
bpf_iter_task_vma__destroy(skel);
}
void test_bpf_iter(void) void test_bpf_iter(void)
{ {
if (test__start_subtest("btf_id_or_null")) if (test__start_subtest("btf_id_or_null"))
@ -1149,6 +1245,8 @@ void test_bpf_iter(void)
test_task_stack(); test_task_stack();
if (test__start_subtest("task_file")) if (test__start_subtest("task_file"))
test_task_file(); test_task_file();
if (test__start_subtest("task_vma"))
test_task_vma();
if (test__start_subtest("task_btf")) if (test__start_subtest("task_btf"))
test_task_btf(); test_task_btf();
if (test__start_subtest("tcp4")) if (test__start_subtest("tcp4"))

View File

@ -2,6 +2,7 @@
/* Copyright (c) 2019 Facebook */ /* Copyright (c) 2019 Facebook */
#include <linux/err.h> #include <linux/err.h>
#include <netinet/tcp.h>
#include <test_progs.h> #include <test_progs.h>
#include "bpf_dctcp.skel.h" #include "bpf_dctcp.skel.h"
#include "bpf_cubic.skel.h" #include "bpf_cubic.skel.h"

View File

@ -914,7 +914,7 @@ static struct btf_raw_test raw_tests[] = {
.err_str = "Member exceeds struct_size", .err_str = "Member exceeds struct_size",
}, },
/* Test member exeeds the size of struct /* Test member exceeds the size of struct
* *
* struct A { * struct A {
* int m; * int m;
@ -948,7 +948,7 @@ static struct btf_raw_test raw_tests[] = {
.err_str = "Member exceeds struct_size", .err_str = "Member exceeds struct_size",
}, },
/* Test member exeeds the size of struct /* Test member exceeds the size of struct
* *
* struct A { * struct A {
* int m; * int m;
@ -3509,6 +3509,27 @@ static struct btf_raw_test raw_tests[] = {
.value_type_id = 3 /* arr_t */, .value_type_id = 3 /* arr_t */,
.max_entries = 4, .max_entries = 4,
}, },
/*
* elf .rodata section size 4 and btf .rodata section vlen 0.
*/
{
.descr = "datasec: vlen == 0",
.raw_types = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* .rodata section */
BTF_TYPE_ENC(NAME_NTH(1), BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 0), 4),
/* [2] */
BTF_END_RAW,
},
BTF_STR_SEC("\0.rodata"),
.map_type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.key_type_id = 1,
.value_type_id = 1,
.max_entries = 1,
},
}; /* struct btf_raw_test raw_tests[] */ }; /* struct btf_raw_test raw_tests[] */

View File

@ -0,0 +1,216 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Jesper Dangaard Brouer */
#include <linux/if_link.h> /* before test_progs.h, avoid bpf_util.h redefines */
#include <test_progs.h>
#include "test_check_mtu.skel.h"
#include "network_helpers.h"
#include <stdlib.h>
#include <inttypes.h>
#define IFINDEX_LO 1
static __u32 duration; /* Hint: needed for CHECK macro */
static int read_mtu_device_lo(void)
{
const char *filename = "/sys/class/net/lo/mtu";
char buf[11] = {};
int value, n, fd;
fd = open(filename, 0, O_RDONLY);
if (fd == -1)
return -1;
n = read(fd, buf, sizeof(buf));
close(fd);
if (n == -1)
return -2;
value = strtoimax(buf, NULL, 10);
if (errno == ERANGE)
return -3;
return value;
}
static void test_check_mtu_xdp_attach(void)
{
struct bpf_link_info link_info;
__u32 link_info_len = sizeof(link_info);
struct test_check_mtu *skel;
struct bpf_program *prog;
struct bpf_link *link;
int err = 0;
int fd;
skel = test_check_mtu__open_and_load();
if (CHECK(!skel, "open and load skel", "failed"))
return; /* Exit if e.g. helper unknown to kernel */
prog = skel->progs.xdp_use_helper_basic;
link = bpf_program__attach_xdp(prog, IFINDEX_LO);
if (CHECK(IS_ERR(link), "link_attach", "failed: %ld\n", PTR_ERR(link)))
goto out;
skel->links.xdp_use_helper_basic = link;
memset(&link_info, 0, sizeof(link_info));
fd = bpf_link__fd(link);
err = bpf_obj_get_info_by_fd(fd, &link_info, &link_info_len);
if (CHECK(err, "link_info", "failed: %d\n", err))
goto out;
CHECK(link_info.type != BPF_LINK_TYPE_XDP, "link_type",
"got %u != exp %u\n", link_info.type, BPF_LINK_TYPE_XDP);
CHECK(link_info.xdp.ifindex != IFINDEX_LO, "link_ifindex",
"got %u != exp %u\n", link_info.xdp.ifindex, IFINDEX_LO);
err = bpf_link__detach(link);
CHECK(err, "link_detach", "failed %d\n", err);
out:
test_check_mtu__destroy(skel);
}
static void test_check_mtu_run_xdp(struct test_check_mtu *skel,
struct bpf_program *prog,
__u32 mtu_expect)
{
const char *prog_name = bpf_program__name(prog);
int retval_expect = XDP_PASS;
__u32 mtu_result = 0;
char buf[256] = {};
int err;
struct bpf_prog_test_run_attr tattr = {
.repeat = 1,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.data_out = buf,
.data_size_out = sizeof(buf),
.prog_fd = bpf_program__fd(prog),
};
err = bpf_prog_test_run_xattr(&tattr);
CHECK_ATTR(err != 0, "bpf_prog_test_run",
"prog_name:%s (err %d errno %d retval %d)\n",
prog_name, err, errno, tattr.retval);
CHECK(tattr.retval != retval_expect, "retval",
"progname:%s unexpected retval=%d expected=%d\n",
prog_name, tattr.retval, retval_expect);
/* Extract MTU that BPF-prog got */
mtu_result = skel->bss->global_bpf_mtu_xdp;
ASSERT_EQ(mtu_result, mtu_expect, "MTU-compare-user");
}
static void test_check_mtu_xdp(__u32 mtu, __u32 ifindex)
{
struct test_check_mtu *skel;
int err;
skel = test_check_mtu__open();
if (CHECK(!skel, "skel_open", "failed"))
return;
/* Update "constants" in BPF-prog *BEFORE* libbpf load */
skel->rodata->GLOBAL_USER_MTU = mtu;
skel->rodata->GLOBAL_USER_IFINDEX = ifindex;
err = test_check_mtu__load(skel);
if (CHECK(err, "skel_load", "failed: %d\n", err))
goto cleanup;
test_check_mtu_run_xdp(skel, skel->progs.xdp_use_helper, mtu);
test_check_mtu_run_xdp(skel, skel->progs.xdp_exceed_mtu, mtu);
test_check_mtu_run_xdp(skel, skel->progs.xdp_minus_delta, mtu);
cleanup:
test_check_mtu__destroy(skel);
}
static void test_check_mtu_run_tc(struct test_check_mtu *skel,
struct bpf_program *prog,
__u32 mtu_expect)
{
const char *prog_name = bpf_program__name(prog);
int retval_expect = BPF_OK;
__u32 mtu_result = 0;
char buf[256] = {};
int err;
struct bpf_prog_test_run_attr tattr = {
.repeat = 1,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.data_out = buf,
.data_size_out = sizeof(buf),
.prog_fd = bpf_program__fd(prog),
};
err = bpf_prog_test_run_xattr(&tattr);
CHECK_ATTR(err != 0, "bpf_prog_test_run",
"prog_name:%s (err %d errno %d retval %d)\n",
prog_name, err, errno, tattr.retval);
CHECK(tattr.retval != retval_expect, "retval",
"progname:%s unexpected retval=%d expected=%d\n",
prog_name, tattr.retval, retval_expect);
/* Extract MTU that BPF-prog got */
mtu_result = skel->bss->global_bpf_mtu_tc;
ASSERT_EQ(mtu_result, mtu_expect, "MTU-compare-user");
}
static void test_check_mtu_tc(__u32 mtu, __u32 ifindex)
{
struct test_check_mtu *skel;
int err;
skel = test_check_mtu__open();
if (CHECK(!skel, "skel_open", "failed"))
return;
/* Update "constants" in BPF-prog *BEFORE* libbpf load */
skel->rodata->GLOBAL_USER_MTU = mtu;
skel->rodata->GLOBAL_USER_IFINDEX = ifindex;
err = test_check_mtu__load(skel);
if (CHECK(err, "skel_load", "failed: %d\n", err))
goto cleanup;
test_check_mtu_run_tc(skel, skel->progs.tc_use_helper, mtu);
test_check_mtu_run_tc(skel, skel->progs.tc_exceed_mtu, mtu);
test_check_mtu_run_tc(skel, skel->progs.tc_exceed_mtu_da, mtu);
test_check_mtu_run_tc(skel, skel->progs.tc_minus_delta, mtu);
cleanup:
test_check_mtu__destroy(skel);
}
void test_check_mtu(void)
{
__u32 mtu_lo;
if (test__start_subtest("bpf_check_mtu XDP-attach"))
test_check_mtu_xdp_attach();
mtu_lo = read_mtu_device_lo();
if (CHECK(mtu_lo < 0, "reading MTU value", "failed (err:%d)", mtu_lo))
return;
if (test__start_subtest("bpf_check_mtu XDP-run"))
test_check_mtu_xdp(mtu_lo, 0);
if (test__start_subtest("bpf_check_mtu XDP-run ifindex-lookup"))
test_check_mtu_xdp(mtu_lo, IFINDEX_LO);
if (test__start_subtest("bpf_check_mtu TC-run"))
test_check_mtu_tc(mtu_lo, 0);
if (test__start_subtest("bpf_check_mtu TC-run ifindex-lookup"))
test_check_mtu_tc(mtu_lo, IFINDEX_LO);
}

View File

@ -7,6 +7,7 @@
#include <string.h> #include <string.h>
#include <linux/pkt_cls.h> #include <linux/pkt_cls.h>
#include <netinet/tcp.h>
#include <test_progs.h> #include <test_progs.h>

View File

@ -2,8 +2,8 @@
/* Copyright (c) 2019 Facebook */ /* Copyright (c) 2019 Facebook */
#include <test_progs.h> #include <test_progs.h>
/* x86-64 fits 55 JITed and 43 interpreted progs into half page */ /* that's kernel internal BPF_MAX_TRAMP_PROGS define */
#define CNT 40 #define CNT 38
void test_fexit_stress(void) void test_fexit_stress(void)
{ {

View File

@ -0,0 +1,60 @@
// SPDX-License-Identifier: GPL-2.0
#include "test_progs.h"
#include "network_helpers.h"
static __u32 duration;
static void test_global_func_args0(struct bpf_object *obj)
{
int err, i, map_fd, actual_value;
const char *map_name = "values";
map_fd = bpf_find_map(__func__, obj, map_name);
if (CHECK(map_fd < 0, "bpf_find_map", "cannot find BPF map %s: %s\n",
map_name, strerror(errno)))
return;
struct {
const char *descr;
int expected_value;
} tests[] = {
{"passing NULL pointer", 0},
{"returning value", 1},
{"reading local variable", 100 },
{"writing local variable", 101 },
{"reading global variable", 42 },
{"writing global variable", 43 },
{"writing to pointer-to-pointer", 1 },
};
for (i = 0; i < ARRAY_SIZE(tests); ++i) {
const int expected_value = tests[i].expected_value;
err = bpf_map_lookup_elem(map_fd, &i, &actual_value);
CHECK(err || actual_value != expected_value, tests[i].descr,
"err %d result %d expected %d\n", err, actual_value, expected_value);
}
}
void test_global_func_args(void)
{
const char *file = "./test_global_func_args.o";
__u32 retval;
struct bpf_object *obj;
int err, prog_fd;
err = bpf_prog_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
if (CHECK(err, "load program", "error %d loading %s\n", err, file))
return;
err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
NULL, NULL, &retval, &duration);
CHECK(err || retval, "pass global func args run",
"err %d errno %d retval %d duration %d\n",
err, errno, retval, duration);
test_global_func_args0(obj);
bpf_object__close(obj);
}

View File

@ -21,9 +21,34 @@ static int trigger_module_test_read(int read_sz)
return 0; return 0;
} }
static int trigger_module_test_write(int write_sz)
{
int fd, err;
char *buf = malloc(write_sz);
if (!buf)
return -ENOMEM;
memset(buf, 'a', write_sz);
buf[write_sz-1] = '\0';
fd = open("/sys/kernel/bpf_testmod", O_WRONLY);
err = -errno;
if (CHECK(fd < 0, "testmod_file_open", "failed: %d\n", err)) {
free(buf);
return err;
}
write(fd, buf, write_sz);
close(fd);
free(buf);
return 0;
}
void test_module_attach(void) void test_module_attach(void)
{ {
const int READ_SZ = 456; const int READ_SZ = 456;
const int WRITE_SZ = 457;
struct test_module_attach* skel; struct test_module_attach* skel;
struct test_module_attach__bss *bss; struct test_module_attach__bss *bss;
int err; int err;
@ -48,8 +73,10 @@ void test_module_attach(void)
/* trigger tracepoint */ /* trigger tracepoint */
ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read"); ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
ASSERT_OK(trigger_module_test_write(WRITE_SZ), "trigger_write");
ASSERT_EQ(bss->raw_tp_read_sz, READ_SZ, "raw_tp"); ASSERT_EQ(bss->raw_tp_read_sz, READ_SZ, "raw_tp");
ASSERT_EQ(bss->raw_tp_bare_write_sz, WRITE_SZ, "raw_tp_bare");
ASSERT_EQ(bss->tp_btf_read_sz, READ_SZ, "tp_btf"); ASSERT_EQ(bss->tp_btf_read_sz, READ_SZ, "tp_btf");
ASSERT_EQ(bss->fentry_read_sz, READ_SZ, "fentry"); ASSERT_EQ(bss->fentry_read_sz, READ_SZ, "fentry");
ASSERT_EQ(bss->fentry_manual_read_sz, READ_SZ, "fentry_manual"); ASSERT_EQ(bss->fentry_manual_read_sz, READ_SZ, "fentry_manual");

View File

@ -1,85 +1,87 @@
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Carlos Neira cneirabustos@gmail.com */ /* Copyright (c) 2020 Carlos Neira cneirabustos@gmail.com */
#define _GNU_SOURCE
#include <test_progs.h> #include <test_progs.h>
#include "test_ns_current_pid_tgid.skel.h"
#include <sys/stat.h> #include <sys/stat.h>
#include <sys/types.h> #include <sys/types.h>
#include <unistd.h> #include <unistd.h>
#include <sys/syscall.h> #include <sys/syscall.h>
#include <sched.h>
#include <sys/wait.h>
#include <sys/mount.h>
#include <sys/fcntl.h>
struct bss { #define STACK_SIZE (1024 * 1024)
__u64 dev; static char child_stack[STACK_SIZE];
__u64 ino;
__u64 pid_tgid; static int test_current_pid_tgid(void *args)
__u64 user_pid_tgid; {
}; struct test_ns_current_pid_tgid__bss *bss;
struct test_ns_current_pid_tgid *skel;
int err = -1, duration = 0;
pid_t tgid, pid;
struct stat st;
skel = test_ns_current_pid_tgid__open_and_load();
if (CHECK(!skel, "skel_open_load", "failed to load skeleton\n"))
goto cleanup;
pid = syscall(SYS_gettid);
tgid = getpid();
err = stat("/proc/self/ns/pid", &st);
if (CHECK(err, "stat", "failed /proc/self/ns/pid: %d\n", err))
goto cleanup;
bss = skel->bss;
bss->dev = st.st_dev;
bss->ino = st.st_ino;
bss->user_pid = 0;
bss->user_tgid = 0;
err = test_ns_current_pid_tgid__attach(skel);
if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
goto cleanup;
/* trigger tracepoint */
usleep(1);
ASSERT_EQ(bss->user_pid, pid, "pid");
ASSERT_EQ(bss->user_tgid, tgid, "tgid");
err = 0;
cleanup:
test_ns_current_pid_tgid__destroy(skel);
return err;
}
static void test_ns_current_pid_tgid_new_ns(void)
{
int wstatus, duration = 0;
pid_t cpid;
/* Create a process in a new namespace, this process
* will be the init process of this new namespace hence will be pid 1.
*/
cpid = clone(test_current_pid_tgid, child_stack + STACK_SIZE,
CLONE_NEWPID | SIGCHLD, NULL);
if (CHECK(cpid == -1, "clone", strerror(errno)))
return;
if (CHECK(waitpid(cpid, &wstatus, 0) == -1, "waitpid", strerror(errno)))
return;
if (CHECK(WEXITSTATUS(wstatus) != 0, "newns_pidtgid", "failed"))
return;
}
void test_ns_current_pid_tgid(void) void test_ns_current_pid_tgid(void)
{ {
const char *probe_name = "raw_tracepoint/sys_enter"; if (test__start_subtest("ns_current_pid_tgid_root_ns"))
const char *file = "test_ns_current_pid_tgid.o"; test_current_pid_tgid(NULL);
int err, key = 0, duration = 0; if (test__start_subtest("ns_current_pid_tgid_new_ns"))
struct bpf_link *link = NULL; test_ns_current_pid_tgid_new_ns();
struct bpf_program *prog;
struct bpf_map *bss_map;
struct bpf_object *obj;
struct bss bss;
struct stat st;
__u64 id;
obj = bpf_object__open_file(file, NULL);
if (CHECK(IS_ERR(obj), "obj_open", "err %ld\n", PTR_ERR(obj)))
return;
err = bpf_object__load(obj);
if (CHECK(err, "obj_load", "err %d errno %d\n", err, errno))
goto cleanup;
bss_map = bpf_object__find_map_by_name(obj, "test_ns_.bss");
if (CHECK(!bss_map, "find_bss_map", "failed\n"))
goto cleanup;
prog = bpf_object__find_program_by_title(obj, probe_name);
if (CHECK(!prog, "find_prog", "prog '%s' not found\n",
probe_name))
goto cleanup;
memset(&bss, 0, sizeof(bss));
pid_t tid = syscall(SYS_gettid);
pid_t pid = getpid();
id = (__u64) tid << 32 | pid;
bss.user_pid_tgid = id;
if (CHECK_FAIL(stat("/proc/self/ns/pid", &st))) {
perror("Failed to stat /proc/self/ns/pid");
goto cleanup;
}
bss.dev = st.st_dev;
bss.ino = st.st_ino;
err = bpf_map_update_elem(bpf_map__fd(bss_map), &key, &bss, 0);
if (CHECK(err, "setting_bss", "failed to set bss : %d\n", err))
goto cleanup;
link = bpf_program__attach_raw_tracepoint(prog, "sys_enter");
if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n",
PTR_ERR(link))) {
link = NULL;
goto cleanup;
}
/* trigger some syscalls */
usleep(1);
err = bpf_map_lookup_elem(bpf_map__fd(bss_map), &key, &bss);
if (CHECK(err, "set_bss", "failed to get bss : %d\n", err))
goto cleanup;
if (CHECK(id != bss.pid_tgid, "Compare user pid/tgid vs. bpf pid/tgid",
"User pid/tgid %llu BPF pid/tgid %llu\n", id, bss.pid_tgid))
goto cleanup;
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
} }

View File

@ -0,0 +1,41 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2021 Facebook */
#include <test_progs.h>
#include "recursion.skel.h"
void test_recursion(void)
{
struct bpf_prog_info prog_info = {};
__u32 prog_info_len = sizeof(prog_info);
struct recursion *skel;
int key = 0;
int err;
skel = recursion__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
return;
err = recursion__attach(skel);
if (!ASSERT_OK(err, "skel_attach"))
goto out;
ASSERT_EQ(skel->bss->pass1, 0, "pass1 == 0");
bpf_map_lookup_elem(bpf_map__fd(skel->maps.hash1), &key, 0);
ASSERT_EQ(skel->bss->pass1, 1, "pass1 == 1");
bpf_map_lookup_elem(bpf_map__fd(skel->maps.hash1), &key, 0);
ASSERT_EQ(skel->bss->pass1, 2, "pass1 == 2");
ASSERT_EQ(skel->bss->pass2, 0, "pass2 == 0");
bpf_map_lookup_elem(bpf_map__fd(skel->maps.hash2), &key, 0);
ASSERT_EQ(skel->bss->pass2, 1, "pass2 == 1");
bpf_map_lookup_elem(bpf_map__fd(skel->maps.hash2), &key, 0);
ASSERT_EQ(skel->bss->pass2, 2, "pass2 == 2");
err = bpf_obj_get_info_by_fd(bpf_program__fd(skel->progs.on_lookup),
&prog_info, &prog_info_len);
if (!ASSERT_OK(err, "get_prog_info"))
goto out;
ASSERT_EQ(prog_info.recursion_misses, 2, "recursion_misses");
out:
recursion__destroy(skel);
}

View File

@ -0,0 +1,76 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2020 Google LLC.
// Copyright (c) 2018 Facebook
#include <test_progs.h>
#include "socket_cookie_prog.skel.h"
#include "network_helpers.h"
static int duration;
struct socket_cookie {
__u64 cookie_key;
__u32 cookie_value;
};
void test_socket_cookie(void)
{
int server_fd = 0, client_fd = 0, cgroup_fd = 0, err = 0;
socklen_t addr_len = sizeof(struct sockaddr_in6);
struct socket_cookie_prog *skel;
__u32 cookie_expected_value;
struct sockaddr_in6 addr;
struct socket_cookie val;
skel = socket_cookie_prog__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
cgroup_fd = test__join_cgroup("/socket_cookie");
if (CHECK(cgroup_fd < 0, "join_cgroup", "cgroup creation failed\n"))
goto out;
skel->links.set_cookie = bpf_program__attach_cgroup(
skel->progs.set_cookie, cgroup_fd);
if (!ASSERT_OK_PTR(skel->links.set_cookie, "prog_attach"))
goto close_cgroup_fd;
skel->links.update_cookie_sockops = bpf_program__attach_cgroup(
skel->progs.update_cookie_sockops, cgroup_fd);
if (!ASSERT_OK_PTR(skel->links.update_cookie_sockops, "prog_attach"))
goto close_cgroup_fd;
skel->links.update_cookie_tracing = bpf_program__attach(
skel->progs.update_cookie_tracing);
if (!ASSERT_OK_PTR(skel->links.update_cookie_tracing, "prog_attach"))
goto close_cgroup_fd;
server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
goto close_cgroup_fd;
client_fd = connect_to_fd(server_fd, 0);
if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
goto close_server_fd;
err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.socket_cookies),
&client_fd, &val);
if (!ASSERT_OK(err, "map_lookup(socket_cookies)"))
goto close_client_fd;
err = getsockname(client_fd, (struct sockaddr *)&addr, &addr_len);
if (!ASSERT_OK(err, "getsockname"))
goto close_client_fd;
cookie_expected_value = (ntohs(addr.sin6_port) << 8) | 0xFF;
ASSERT_EQ(val.cookie_value, cookie_expected_value, "cookie_value");
close_client_fd:
close(client_fd);
close_server_fd:
close(server_fd);
close_cgroup_fd:
close(cgroup_fd);
out:
socket_cookie_prog__destroy(skel);
}

View File

@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2020 Cloudflare // Copyright (c) 2020 Cloudflare
#include <error.h> #include <error.h>
#include <netinet/tcp.h>
#include "test_progs.h" #include "test_progs.h"
#include "test_skmsg_load_helpers.skel.h" #include "test_skmsg_load_helpers.skel.h"

View File

@ -2,6 +2,12 @@
#include <test_progs.h> #include <test_progs.h>
#include "cgroup_helpers.h" #include "cgroup_helpers.h"
#include <linux/tcp.h>
#ifndef SOL_TCP
#define SOL_TCP IPPROTO_TCP
#endif
#define SOL_CUSTOM 0xdeadbeef #define SOL_CUSTOM 0xdeadbeef
static int getsetsockopt(void) static int getsetsockopt(void)
@ -11,6 +17,7 @@ static int getsetsockopt(void)
char u8[4]; char u8[4];
__u32 u32; __u32 u32;
char cc[16]; /* TCP_CA_NAME_MAX */ char cc[16]; /* TCP_CA_NAME_MAX */
struct tcp_zerocopy_receive zc;
} buf = {}; } buf = {};
socklen_t optlen; socklen_t optlen;
char *big_buf = NULL; char *big_buf = NULL;
@ -154,6 +161,27 @@ static int getsetsockopt(void)
goto err; goto err;
} }
/* TCP_ZEROCOPY_RECEIVE triggers */
memset(&buf, 0, sizeof(buf));
optlen = sizeof(buf.zc);
err = getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen);
if (err) {
log_err("Unexpected getsockopt(TCP_ZEROCOPY_RECEIVE) err=%d errno=%d",
err, errno);
goto err;
}
memset(&buf, 0, sizeof(buf));
buf.zc.address = 12345; /* rejected by BPF */
optlen = sizeof(buf.zc);
errno = 0;
err = getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen);
if (errno != EPERM) {
log_err("Unexpected getsockopt(TCP_ZEROCOPY_RECEIVE) err=%d errno=%d",
err, errno);
goto err;
}
free(big_buf); free(big_buf);
close(fd); close(fd);
return 0; return 0;

View File

@ -0,0 +1,35 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include "test_stack_var_off.skel.h"
/* Test read and writes to the stack performed with offsets that are not
* statically known.
*/
void test_stack_var_off(void)
{
int duration = 0;
struct test_stack_var_off *skel;
skel = test_stack_var_off__open_and_load();
if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
return;
/* Give pid to bpf prog so it doesn't trigger for anyone else. */
skel->bss->test_pid = getpid();
/* Initialize the probe's input. */
skel->bss->input[0] = 2;
skel->bss->input[1] = 42; /* This will be returned in probe_res. */
if (!ASSERT_OK(test_stack_var_off__attach(skel), "skel_attach"))
goto cleanup;
/* Trigger probe. */
usleep(1);
if (CHECK(skel->bss->probe_res != 42, "check_probe_res",
"wrong probe res: %d\n", skel->bss->probe_res))
goto cleanup;
cleanup:
test_stack_var_off__destroy(skel);
}

View File

@ -61,6 +61,14 @@ void test_test_global_funcs(void)
{ "test_global_func6.o" , "modified ctx ptr R2" }, { "test_global_func6.o" , "modified ctx ptr R2" },
{ "test_global_func7.o" , "foo() doesn't return scalar" }, { "test_global_func7.o" , "foo() doesn't return scalar" },
{ "test_global_func8.o" }, { "test_global_func8.o" },
{ "test_global_func9.o" },
{ "test_global_func10.o", "invalid indirect read from stack" },
{ "test_global_func11.o", "Caller passes invalid args into func#1" },
{ "test_global_func12.o", "invalid mem access 'mem_or_null'" },
{ "test_global_func13.o", "Caller passes invalid args into func#1" },
{ "test_global_func14.o", "reference type('FWD S') size cannot be determined" },
{ "test_global_func15.o", "At program exit the register R0 has value" },
{ "test_global_func16.o", "invalid indirect read from stack" },
}; };
libbpf_print_fn_t old_print_fn = NULL; libbpf_print_fn_t old_print_fn = NULL;
int err, i, duration = 0; int err, i, duration = 0;

View File

@ -9,6 +9,7 @@
#include <unistd.h> #include <unistd.h>
#include <sys/wait.h> #include <sys/wait.h>
#include <test_progs.h> #include <test_progs.h>
#include <linux/ring_buffer.h>
#include "ima.skel.h" #include "ima.skel.h"
@ -31,9 +32,18 @@ static int run_measured_process(const char *measured_dir, u32 *monitored_pid)
return -EINVAL; return -EINVAL;
} }
static u64 ima_hash_from_bpf;
static int process_sample(void *ctx, void *data, size_t len)
{
ima_hash_from_bpf = *((u64 *)data);
return 0;
}
void test_test_ima(void) void test_test_ima(void)
{ {
char measured_dir_template[] = "/tmp/ima_measuredXXXXXX"; char measured_dir_template[] = "/tmp/ima_measuredXXXXXX";
struct ring_buffer *ringbuf;
const char *measured_dir; const char *measured_dir;
char cmd[256]; char cmd[256];
@ -44,6 +54,11 @@ void test_test_ima(void)
if (CHECK(!skel, "skel_load", "skeleton failed\n")) if (CHECK(!skel, "skel_load", "skeleton failed\n"))
goto close_prog; goto close_prog;
ringbuf = ring_buffer__new(bpf_map__fd(skel->maps.ringbuf),
process_sample, NULL, NULL);
if (!ASSERT_OK_PTR(ringbuf, "ringbuf"))
goto close_prog;
err = ima__attach(skel); err = ima__attach(skel);
if (CHECK(err, "attach", "attach failed: %d\n", err)) if (CHECK(err, "attach", "attach failed: %d\n", err))
goto close_prog; goto close_prog;
@ -60,11 +75,9 @@ void test_test_ima(void)
if (CHECK(err, "run_measured_process", "err = %d\n", err)) if (CHECK(err, "run_measured_process", "err = %d\n", err))
goto close_clean; goto close_clean;
CHECK(skel->data->ima_hash_ret < 0, "ima_hash_ret", err = ring_buffer__consume(ringbuf);
"ima_hash_ret = %ld\n", skel->data->ima_hash_ret); ASSERT_EQ(err, 1, "num_samples_or_err");
ASSERT_NEQ(ima_hash_from_bpf, 0, "ima_hash");
CHECK(skel->bss->ima_hash == 0, "ima_hash",
"ima_hash = %lu\n", skel->bss->ima_hash);
close_clean: close_clean:
snprintf(cmd, sizeof(cmd), "./ima_setup.sh cleanup %s", measured_dir); snprintf(cmd, sizeof(cmd), "./ima_setup.sh cleanup %s", measured_dir);

Some files were not shown because too many files have changed in this diff Show More