IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Enhance partial_stack_load_preserves_zeros subtest with detailed
precision propagation log checks. We know expect fp-16 to be spilled,
initially imprecise, zero const register, which is later marked as
precise even when partial stack slot load is performed, even if it's not
a register fill (!).
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231205184248.1502704-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Validate that 1-, 2-, and 4-byte loads from stack slots not aligned on
8-byte boundary still preserve zero, when loading from all-STACK_ZERO
sub-slots, or when stack sub-slots are covered by spilled register with
known constant zero value.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231205184248.1502704-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Similar to special handling of STACK_ZERO, when reading 1/2/4 bytes from
stack from slot that has register spilled into it and that register has
a constant value zero, preserve that zero and mark spilled register as
precise for that. This makes spilled const zero register and STACK_ZERO
cases equivalent in their behavior.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231205184248.1502704-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add tests validating that STACK_ZERO slots are preserved when slot is
partially overwritten with subregister spill.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231205184248.1502704-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Instead of always forcing STACK_ZERO slots to STACK_MISC, preserve it in
situations where this is possible. E.g., when spilling register as
1/2/4-byte subslots on the stack, all the remaining bytes in the stack
slot do not automatically become unknown. If we knew they contained
zeroes, we can preserve those STACK_ZERO markers.
Add a helper mark_stack_slot_misc(), similar to scrub_spilled_slot(),
but that doesn't overwrite either STACK_INVALID nor STACK_ZERO. Note
that we need to take into account possibility of being in unprivileged
mode, in which case STACK_INVALID is forced to STACK_MISC for correctness,
as treating STACK_INVALID as equivalent STACK_MISC is only enabled in
privileged mode.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231205184248.1502704-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When register is spilled onto a stack as a 1/2/4-byte register, we set
slot_type[BPF_REG_SIZE - 1] (plus potentially few more below it,
depending on actual spill size). So to check if some stack slot has
spilled register we need to consult slot_type[7], not slot_type[0].
To avoid the need to remember and double-check this in the future, just
use is_spilled_reg() helper.
Fixes: 27113c59b6 ("bpf: Check the other end of slot_type for STACK_SPILL")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231205184248.1502704-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new selftests that validates precision tracking for stack access
instruction, using both r10-based and non-r10-based accesses. For
non-r10 ones we also make sure to have non-zero var_off to validate that
final stack offset is tracked properly in instruction history
information inside verifier.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231205184248.1502704-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use instruction (jump) history to record instructions that performed
register spill/fill to/from stack, regardless if this was done through
read-only r10 register, or any other register after copying r10 into it
*and* potentially adjusting offset.
To make this work reliably, we push extra per-instruction flags into
instruction history, encoding stack slot index (spi) and stack frame
number in extra 10 bit flags we take away from prev_idx in instruction
history. We don't touch idx field for maximum performance, as it's
checked most frequently during backtracking.
This change removes basically the last remaining practical limitation of
precision backtracking logic in BPF verifier. It fixes known
deficiencies, but also opens up new opportunities to reduce number of
verified states, explored in the subsequent patches.
There are only three differences in selftests' BPF object files
according to veristat, all in the positive direction (less states).
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
-------------------------------------- ------------- --------- --------- ------------- ---------- ---------- -------------
test_cls_redirect_dynptr.bpf.linked3.o cls_redirect 2987 2864 -123 (-4.12%) 240 231 -9 (-3.75%)
xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82848 82661 -187 (-0.23%) 5107 5073 -34 (-0.67%)
xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 85116 84964 -152 (-0.18%) 5162 5130 -32 (-0.62%)
Note, I avoided renaming jmp_history to more generic insn_hist to
minimize number of lines changed and potential merge conflicts between
bpf and bpf-next trees.
Notice also cur_hist_entry pointer reset to NULL at the beginning of
instruction verification loop. This pointer avoids the problem of
relying on last jump history entry's insn_idx to determine whether we
already have entry for current instruction or not. It can happen that we
added jump history entry because current instruction is_jmp_point(), but
also we need to add instruction flags for stack access. In this case, we
don't want to entries, so we need to reuse last added entry, if it is
present.
Relying on insn_idx comparison has the same ambiguity problem as the one
that was fixed recently in [0], so we avoid that.
[0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@kernel.org/
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reported-by: Tao Lyu <tao.lyu@epfl.ch>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231205184248.1502704-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
xdp_metadata test is flaky sometimes:
verify_xsk_metadata:FAIL:rx_hash_type unexpected rx_hash_type: actual 8 != expected 0
Where 8 means XDP_RSS_TYPE_L4_ANY and is exported from veth driver only when
'skb->l4_hash' condition is met. This makes me think that the program is
triggering again for some other packet.
Let's have a filter, similar to xdp_hw_metadata, where we trigger XDP kfuncs
only for UDP packets destined to port 8080.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231204174423.3460052-1-sdf@google.com
There was some confusion amongst Meta sched_ext folks regarding whether
stashing bpf_rb_root - the tree itself, rather than a single node - was
supported. This patch adds a small test which demonstrates this
functionality: a local kptr with rb_root is created, a node is created
and added to the tree, then the tree is kptr_xchg'd into a mapval.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20231204211722.571346-1-davemarchevsky@fb.com
Hou Tao says:
====================
bpf: Fix the release of inner map
From: Hou Tao <houtao1@huawei.com>
Hi,
The patchset aims to fix the release of inner map in map array or map
htab. The release of inner map is different with normal map. For normal
map, the map is released after the bpf program which uses the map is
destroyed, because the bpf program tracks the used maps. However bpf
program can not track the used inner map because these inner map may be
updated or deleted dynamically, and for now the ref-counter of inner map
is decreased after the inner map is remove from outer map, so the inner
map may be freed before the bpf program, which is accessing the inner
map, exits and there will be use-after-free problem as demonstrated by
patch #6.
The patchset fixes the problem by deferring the release of inner map.
The freeing of inner map is deferred according to the sleepable
attributes of the bpf programs which own the outer map. Patch #1 fixes
the warning when running the newly-added selftest under interpreter
mode. Patch #2 adds more parameters to .map_fd_put_ptr() to prepare for
the fix. Patch #3 fixes the incorrect value of need_defer when freeing
the fd array. Patch #4 fixes the potential use-after-free problem by
using call_rcu_tasks_trace() and call_rcu() to wait for one tasks trace
RCU GP and one RCU GP unconditionally. Patch #5 optimizes the free of
inner map by removing the unnecessary RCU GP waiting. Patch #6 adds a
selftest to demonstrate the potential use-after-free problem. Patch #7
updates a selftest to update outer map in syscall bpf program.
Please see individual patches for more details. And comments are always
welcome.
Change Log:
v5:
* patch #3: rename fd_array_map_delete_elem_with_deferred_free() to
__fd_array_map_delete_elem() (Alexei)
* patch #5: use atomic64_t instead of atomic_t to prevent potential
overflow (Alexei)
* patch #7: use ptr_to_u64() helper instead of force casting to initialize
pointers in bpf_attr (Alexei)
v4: https://lore.kernel.org/bpf/20231130140120.1736235-1-houtao@huaweicloud.com
* patch #2: don't use "deferred", use "need_defer" uniformly
* patch #3: newly-added, fix the incorrect value of need_defer during
fd array free.
* patch #4: doesn't consider the case in which bpf map is not used by
any bpf program and only use sleepable_refcnt to remove
unnecessary tasks trace RCU GP (Alexei)
* patch #4: remove memory barriers added due to cautiousness (Alexei)
v3: https://lore.kernel.org/bpf/20231124113033.503338-1-houtao@huaweicloud.com
* multiple variable renamings (Martin)
* define BPF_MAP_RCU_GP/BPF_MAP_RCU_TT_GP as bit (Martin)
* use call_rcu() and its variants instead of synchronize_rcu() (Martin)
* remove unnecessary mask in bpf_map_free_deferred() (Martin)
* place atomic_or() and the related smp_mb() together (Martin)
* add patch #6 to demonstrate that updating outer map in syscall
program is dead-lock free (Alexei)
* update comments about the memory barrier in bpf_map_fd_put_ptr()
* update commit message for patch #3 and #4 to describe more details
v2: https://lore.kernel.org/bpf/20231113123324.3914612-1-houtao@huaweicloud.com
* defer the invocation of ops->map_free() instead of bpf_map_put() (Martin)
* update selftest to make it being reproducible under JIT mode (Martin)
* remove unnecessary preparatory patches
v1: https://lore.kernel.org/bpf/20231107140702.1891778-1-houtao@huaweicloud.com
====================
Link: https://lore.kernel.org/r/20231204140425.1480317-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Syscall program is running with rcu_read_lock_trace being held, so if
bpf_map_update_elem() or bpf_map_delete_elem() invokes
synchronize_rcu_tasks_trace() when operating on an outer map, there will
be dead-lock, so add a test to guarantee that it is dead-lock free.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test cases to test the race between the destroy of inner map due to
map-in-map update and the access of inner map in bpf program. The
following 4 combinations are added:
(1) array map in map array + bpf program
(2) array map in map array + sleepable bpf program
(3) array map in map htab + bpf program
(4) array map in map htab + sleepable bpf program
Before applying the fixes, when running `./test_prog -a map_in_map`, the
following error was reported:
==================================================================
BUG: KASAN: slab-use-after-free in array_map_update_elem+0x48/0x3e0
Read of size 4 at addr ffff888114f33824 by task test_progs/1858
CPU: 1 PID: 1858 Comm: test_progs Tainted: G O 6.6.0+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x90
print_report+0xd2/0x620
kasan_report+0xd1/0x110
__asan_load4+0x81/0xa0
array_map_update_elem+0x48/0x3e0
bpf_prog_be94a9f26772f5b7_access_map_in_array+0xe6/0xf6
trace_call_bpf+0x1aa/0x580
kprobe_perf_func+0xdd/0x430
kprobe_dispatcher+0xa0/0xb0
kprobe_ftrace_handler+0x18b/0x2e0
0xffffffffc02280f7
RIP: 0010:__x64_sys_getpgid+0x1/0x30
......
</TASK>
Allocated by task 1857:
kasan_save_stack+0x26/0x50
kasan_set_track+0x25/0x40
kasan_save_alloc_info+0x1e/0x30
__kasan_kmalloc+0x98/0xa0
__kmalloc_node+0x6a/0x150
__bpf_map_area_alloc+0x141/0x170
bpf_map_area_alloc+0x10/0x20
array_map_alloc+0x11f/0x310
map_create+0x28a/0xb40
__sys_bpf+0x753/0x37c0
__x64_sys_bpf+0x44/0x60
do_syscall_64+0x36/0xb0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Freed by task 11:
kasan_save_stack+0x26/0x50
kasan_set_track+0x25/0x40
kasan_save_free_info+0x2b/0x50
__kasan_slab_free+0x113/0x190
slab_free_freelist_hook+0xd7/0x1e0
__kmem_cache_free+0x170/0x260
kfree+0x9b/0x160
kvfree+0x2d/0x40
bpf_map_area_free+0xe/0x20
array_map_free+0x120/0x2c0
bpf_map_free_deferred+0xd7/0x1e0
process_one_work+0x462/0x990
worker_thread+0x370/0x670
kthread+0x1b0/0x200
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
Last potentially related work creation:
kasan_save_stack+0x26/0x50
__kasan_record_aux_stack+0x94/0xb0
kasan_record_aux_stack_noalloc+0xb/0x20
__queue_work+0x331/0x950
queue_work_on+0x75/0x80
bpf_map_put+0xfa/0x160
bpf_map_fd_put_ptr+0xe/0x20
bpf_fd_array_map_update_elem+0x174/0x1b0
bpf_map_update_value+0x2b7/0x4a0
__sys_bpf+0x2551/0x37c0
__x64_sys_bpf+0x44/0x60
do_syscall_64+0x36/0xb0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When removing the inner map from the outer map, the inner map will be
freed after one RCU grace period and one RCU tasks trace grace
period, so it is certain that the bpf program, which may access the
inner map, has exited before the inner map is freed.
However there is no need to wait for one RCU tasks trace grace period if
the outer map is only accessed by non-sleepable program. So adding
sleepable_refcnt in bpf_map and increasing sleepable_refcnt when adding
the outer map into env->used_maps for sleepable program. Although the
max number of bpf program is INT_MAX - 1, the number of bpf programs
which are being loaded may be greater than INT_MAX, so using atomic64_t
instead of atomic_t for sleepable_refcnt. When removing the inner map
from the outer map, using sleepable_refcnt to decide whether or not a
RCU tasks trace grace period is needed before freeing the inner map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When updating or deleting an inner map in map array or map htab, the map
may still be accessed by non-sleepable program or sleepable program.
However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map
directly through bpf_map_put(), if the ref-counter is the last one
(which is true for most cases), the inner map will be freed by
ops->map_free() in a kworker. But for now, most .map_free() callbacks
don't use synchronize_rcu() or its variants to wait for the elapse of a
RCU grace period, so after the invocation of ops->map_free completes,
the bpf program which is accessing the inner map may incur
use-after-free problem.
Fix the free of inner map by invoking bpf_map_free_deferred() after both
one RCU grace period and one tasks trace RCU grace period if the inner
map has been removed from the outer map before. The deferment is
accomplished by using call_rcu() or call_rcu_tasks_trace() when
releasing the last ref-counter of bpf map. The newly-added rcu_head
field in bpf_map shares the same storage space with work field to
reduce the size of bpf_map.
Fixes: bba1dc0b55 ("bpf: Remove redundant synchronize_rcu.")
Fixes: 638e4b825d ("bpf: Allows per-cpu maps and map-in-map in sleepable programs")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Both map deletion operation, map release and map free operation use
fd_array_map_delete_elem() to remove the element from fd array and
need_defer is always true in fd_array_map_delete_elem(). For the map
deletion operation and map release operation, need_defer=true is
necessary, because the bpf program, which accesses the element in fd
array, may still alive. However for map free operation, it is certain
that the bpf program which owns the fd array has already been exited, so
setting need_defer as false is appropriate for map free operation.
So fix it by adding need_defer parameter to bpf_fd_array_map_clear() and
adding a new helper __fd_array_map_delete_elem() to handle the map
deletion, map release and map free operations correspondingly.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
map is the pointer of outer map, and need_defer needs some explanation.
need_defer tells the implementation to defer the reference release of
the passed element and ensure that the element is still alive before
the bpf program, which may manipulate it, exits.
The following three cases will invoke map_fd_put_ptr() and different
need_defer values will be passed to these callers:
1) release the reference of the old element in the map during map update
or map deletion. The release must be deferred, otherwise the bpf
program may incur use-after-free problem, so need_defer needs to be
true.
2) release the reference of the to-be-added element in the error path of
map update. The to-be-added element is not visible to any bpf
program, so it is OK to pass false for need_defer parameter.
3) release the references of all elements in the map during map release.
Any bpf program which has access to the map must have been exited and
released, so need_defer=false will be OK.
These two parameters will be used by the following patches to fix the
potential use-after-free problem for map-in-map.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko says:
====================
BPF verifier retval logic fixes
This patch set fixes BPF verifier logic around validating and enforcing return
values for BPF programs that have specific range of expected return values.
Both sync and async callbacks have similar logic and are fixes as well.
A few tests are added that would fail without the fixes in this patch set.
Also, while at it, we update retval checking logic to use smin/smax range
instead of tnum, avoiding future potential issues if expected range cannot be
represented precisely by tnum (e.g., [0, 2] is not representable by tnum and
is treated as [0, 3]).
There is a little bit of refactoring to unify async callback and program exit
logic to avoid duplication of checks as much as possible.
v4->v5:
- fix timer_bad_ret test on no-alu32 flavor (CI);
v3->v4:
- add back bpf_func_state rearrangement patch;
- simplified patch #4 as suggested (Shung-Hsi);
v2->v3:
- more carefullly switch from umin/umax to smin/smax;
v1->v2:
- drop tnum from retval checks (Eduard);
- use smin/smax instead of umin/umax (Alexei).
====================
Link: https://lore.kernel.org/r/20231202175705.885270-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Emit tnum representation as just a constant if all bits are known.
Use decimal-vs-hex logic to determine exact format of emitted
constant value, just like it's done for register range values.
For that move tnum_strn() to kernel/bpf/log.c to reuse decimal-vs-hex
determination logic and constants.
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-12-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add one more subtest to global_func15 selftest to validate that
verifier properly marks r0 as precise and avoids erroneous state pruning
of the branch that has return value outside of expected [0, 1] value.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adjust timer/timer_ret_1 test to validate more carefully verifier logic
of enforcing async callback return value. This test will pass only if
return result is marked precise and read.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Given we enforce a valid range for program and async callback return
value, we must mark R0 as precise to avoid incorrect state pruning.
Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use common logic to verify program return values and async callback
return values. This allows to avoid duplication of any extra steps
necessary, like precision marking, which will be added in the next
patch.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Similarly to subprog/callback logic, enforce return value of BPF program
using more precise smin/smax range.
We need to adjust a bunch of tests due to a changed format of an error
message.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BPF verifier expects callback subprogs to return values from specified
range (typically [0, 1]). This requires that r0 at exit is both precise
(because we rely on specific value range) and is marked as read
(otherwise state comparison will ignore such register as unimportant).
Add a simple test that validates that all these conditions are enforced.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Instead of relying on potentially imprecise tnum representation of
expected return value range for callbacks and subprogs, validate that
smin/smax range satisfy exact expected range of return values.
E.g., if callback would need to return [0, 2] range, tnum can't
represent this precisely and instead will allow [0, 3] range. By
checking smin/smax range, we can make sure that subprog/callback indeed
returns only valid [0, 2] range.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Given verifier checks actual value, r0 has to be precise, so we need to
propagate precision properly. r0 also has to be marked as read,
otherwise subsequent state comparisons will ignore such register as
unimportant and precision won't really help here.
Fixes: 69c087ba62 ("bpf: Add bpf_for_each_map_elem() helper")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231202175705.885270-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Song Liu says:
====================
bpf: File verification with LSM and fsverity
Changes v14 => v15:
1. Fix selftest build without CONFIG_FS_VERITY. (Alexei)
2. Add Acked-by from KP.
Changes v13 => v14:
1. Add "static" for bpf_fs_kfunc_set.
2. Add Acked-by from Christian Brauner.
Changes v12 => v13:
1. Only keep 4/9 through 9/9 of v12, as the first 3 patches already
applied;
2. Use new macro __bpf_kfunc_[start|end]_defs().
Changes v11 => v12:
1. Fix typo (data_ptr => sig_ptr) in bpf_get_file_xattr().
Changes v10 => v11:
1. Let __bpf_dynptr_data() return const void *. (Andrii)
2. Optimize code to reuse output from __bpf_dynptr_size(). (Andrii)
3. Add __diag_ignore_all("-Wmissing-declarations") for kfunc definition.
4. Fix an off indentation. (Andrii)
Changes v9 => v10:
1. Remove WARN_ON_ONCE() from check_reg_const_str. (Alexei)
Changes v8 => v9:
1. Fix test_progs kfunc_dynptr_param/dynptr_data_null.
Changes v7 => v8:
1. Do not use bpf_dynptr_slice* in the kernel. Add __bpf_dynptr_data* and
use them in ther kernel. (Andrii)
Changes v6 => v7:
1. Change "__const_str" annotation to "__str". (Alexei, Andrii)
2. Add KF_TRUSTED_ARGS flag for both new kfuncs. (KP)
3. Only allow bpf_get_file_xattr() to read xattr with "user." prefix.
4. Add Acked-by from Eric Biggers.
Changes v5 => v6:
1. Let fsverity_init_bpf() return void. (Eric Biggers)
2. Sort things in alphabetic orders. (Eric Biggers)
Changes v4 => v5:
1. Revise commit logs. (Alexei)
Changes v3 => v4:
1. Fix error reported by CI.
2. Update comments of bpf_dynptr_slice* that they may return error pointer.
Changes v2 => v3:
1. Rebase and resolve conflicts.
Changes v1 => v2:
1. Let bpf_get_file_xattr() use const string for arg "name". (Alexei)
2. Add recursion prevention with allowlist. (Alexei)
3. Let bpf_get_file_xattr() use __vfs_getxattr() to avoid recursion,
as vfs_getxattr() calls into other LSM hooks.
4. Do not use dynptr->data directly, use helper insteadd. (Andrii)
5. Fixes with bpf_get_fsverity_digest. (Eric Biggers)
6. Add documentation. (Eric Biggers)
7. Fix some compile warnings. (kernel test robot)
This set enables file verification with BPF LSM and fsverity.
In this solution, fsverity is used to provide reliable and efficient hash
of files; and BPF LSM is used to implement signature verification (against
asymmetric keys), and to enforce access control.
This solution can be used to implement access control in complicated cases.
For example: only signed python binary and signed python script and access
special files/devices/ports.
Thanks,
Song
====================
Link: https://lore.kernel.org/r/20231129234417.856536-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This selftests shows a proof of concept method to use BPF LSM to enforce
file signature. This test is added to verify_pkcs7_sig, so that some
existing logic can be reused.
This file signature method uses fsverity, which provides reliable and
efficient hash (known as digest) of the file. The file digest is signed
with asymmetic key, and the signature is stored in xattr. At the run time,
BPF LSM reads file digest and the signature, and then checks them against
the public key.
Note that this solution does NOT require FS_VERITY_BUILTIN_SIGNATURES.
fsverity is only used to provide file digest. The signature verification
and access control is all implemented in BPF LSM.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-7-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add selftests for two new filesystem kfuncs:
1. bpf_get_file_xattr
2. bpf_get_fsverity_digest
These tests simply make sure the two kfuncs work. Another selftest will be
added to demonstrate how to use these kfuncs to verify file signature.
CONFIG_FS_VERITY is added to selftests config. However, this is not
sufficient to guarantee bpf_get_fsverity_digest works. This is because
fsverity need to be enabled at file system level (for example, with tune2fs
on ext4). If local file system doesn't have this feature enabled, just skip
the test.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-6-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a brief introduction for file system kfuncs:
bpf_get_file_xattr()
bpf_get_fsverity_digest()
The documentation highlights the strategy to avoid recursions of these
kfuncs.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-4-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
fsverity provides fast and reliable hash of files, namely fsverity_digest.
The digest can be used by security solutions to verify file contents.
Add new kfunc bpf_get_fsverity_digest() so that we can access fsverity from
BPF LSM programs. This kfunc is added to fs/verity/measure.c because some
data structure used in the function is private to fsverity
(fs/verity/fsverity_private.h).
To avoid recursion, bpf_get_fsverity_digest is only allowed in BPF LSM
programs.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20231129234417.856536-3-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It is common practice for security solutions to store tags/labels in
xattrs. To implement similar functionalities in BPF LSM, add new kfunc
bpf_get_file_xattr().
The first use case of bpf_get_file_xattr() is to implement file
verifications with asymmetric keys. Specificially, security applications
could use fsverity for file hashes and use xattr to store file signatures.
(kfunc for fsverity hash will be added in a separate commit.)
Currently, only xattrs with "user." prefix can be read with kfunc
bpf_get_file_xattr(). As use cases evolve, we may add a dedicated prefix
for bpf_get_file_xattr().
To avoid recursion, bpf_get_file_xattr can be only called from LSM hooks.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20231129234417.856536-2-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
xdp_synproxy_kern.c is a BPF program that generates SYN cookies on
allowed TCP ports and sends SYNACKs to clients, accelerating synproxy
iptables module.
Fix the bitmask operation when checking the status of an existing
conntrack entry within tcp_lookup() function. Do not AND with the bit
position number, but with the bitmask value to check whether the entry
found has the IPS_CONFIRMED flag set.
Fixes: fb5cd0ce70 ("selftests/bpf: Add selftests for raw syncookie helpers")
Signed-off-by: Jeroen van Ingen Schenau <jeroen.vaningenschenau@novoserve.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Minh Le Hoang <minh.lehoang@novoserve.com>
Link: https://lore.kernel.org/xdp-newbies/CAAi1gX7owA+Tcxq-titC-h-KPM7Ri-6ZhTNMhrnPq5gmYYwKow@mail.gmail.com/T/#u
Link: https://lore.kernel.org/bpf/20231130120353.3084-1-jeroen.vaningenschenau@novoserve.com
Set backpressure watermark for hardware RX queues. Backpressure
gets triggered when the available buffers of a hardware RX queue
falls below the set watermark. This backpressure will propagate
to packet processing pipeline in the OCTEON card, so that the host
receives fewer packets and prevents packet dropping at host.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the error code if octep_ctrl_net_get_mtu() fails. Currently the code
returns success.
Fixes: 0a5f8534e3 ("octeon_ep: get max rx packet length from firmware")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Sathesh B Edara <sedara@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pedro Tammela says:
====================
selftests: tc-testing: more tdc updates
Follow-up on a feedback from Jakub and random cleanups from related
net/sched patches
====================
Link: https://lore.kernel.org/r/20231129222424.910148-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove this generic file and move the tests to their appropriate files
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231129222424.910148-5-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All tests in this file pertain to flower, so name it appropriately
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231129222424.910148-4-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tests using DEV2 should not be run in a dedicated net namespace,
and in parallel, as this device cannot be shared.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231129222424.910148-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To increase the chances of people finding the rendered docs
add a link to specs.rst and index.rst.
Add a label in the generated index.rst and while at it adjust
the title a little bit.
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231129041427.2763074-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>