IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZpGVmAAKCRDbK58LschI
gxB4AQCgquQis63yqTI36j4iXBT+TuxHEBNoQBSLyzYdrLS1dgD/S5DRJDA+3LD+
394hn/VtB1qvX5vaqjsov4UIwSMyxA0=
=OhSn
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-07-12
We've added 23 non-merge commits during the last 3 day(s) which contain
a total of 18 files changed, 234 insertions(+), 243 deletions(-).
The main changes are:
1) Improve BPF verifier by utilizing overflow.h helpers to check
for overflows, from Shung-Hsi Yu.
2) Fix NULL pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
when attr->attach_prog_fd was not specified, from Tengda Wu.
3) Fix arm64 BPF JIT when generating code for BPF trampolines with
BPF_TRAMP_F_CALL_ORIG which corrupted upper address bits,
from Puranjay Mohan.
4) Remove test_run callback from lwt_seg6local_prog_ops which never worked
in the first place and caused syzbot reports,
from Sebastian Andrzej Siewior.
5) Relax BPF verifier to accept non-zero offset on KF_TRUSTED_ARGS/
/KF_RCU-typed BPF kfuncs, from Matt Bobrowski.
6) Fix a long standing bug in libbpf with regards to handling of BPF
skeleton's forward and backward compatibility, from Andrii Nakryiko.
7) Annotate btf_{seq,snprintf}_show functions with __printf,
from Alan Maguire.
8) BPF selftest improvements to reuse common network helpers in sk_lookup
test and dropping the open-coded inetaddr_len() and make_socket() ones,
from Geliang Tang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type()
bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again
bpf: use check_sub_overflow() to check for subtraction overflows
bpf: use check_add_overflow() to check for addition overflows
bpf: fix overflow check in adjust_jmp_off()
bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o
bpf: annotate BTF show functions with __printf
bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG
selftests/bpf: Close obj in error path in xdp_adjust_tail
selftests/bpf: Null checks for links in bpf_tcp_ca
selftests/bpf: Use connect_fd_to_fd in sk_lookup
selftests/bpf: Use start_server_addr in sk_lookup
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Add ASSERT_OK_FD macro
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m
bpf: Remove tst_run from lwt_seg6local_prog_ops.
bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU
...
====================
Link: https://patch.msgid.link/20240712212448.5378-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
adjust_jmp_off() incorrectly used the insn->imm field for all overflow check,
which is incorrect as that should only be done or the BPF_JMP32 | BPF_JA case,
not the general jump instruction case. Fix it by using insn->off for overflow
check in the general case.
Fixes: 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and jump to the 1st insn.")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20240712080127.136608-2-shung-hsi.yu@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As reported by Mirsad [1] we still see format warnings in kernel/bpf/btf.o
at W=1 warning level:
CC kernel/bpf/btf.o
./kernel/bpf/btf.c: In function ‘btf_type_seq_show_flags’:
./kernel/bpf/btf.c:7553:21: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format]
7553 | sseq.showfn = btf_seq_show;
| ^
./kernel/bpf/btf.c: In function ‘btf_type_snprintf_show’:
./kernel/bpf/btf.c:7604:31: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format]
7604 | ssnprintf.show.showfn = btf_snprintf_show;
| ^
Combined with CONFIG_WERROR=y these can halt the build.
The fix (annotating the structure field with __printf())
suggested by Mirsad resolves these. Apologies I missed this last time.
No other W=1 warnings were observed in kernel/bpf after this fix.
[1] https://lore.kernel.org/bpf/92c9d047-f058-400c-9c7d-81d4dc1ef71b@gmail.com/
Fixes: b3470da314fd ("bpf: annotate BTF show functions with __printf")
Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Suggested-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240712092859.1390960-1-alan.maguire@oracle.com
-Werror=suggest-attribute=format warns about two functions
in kernel/bpf/btf.c [1]; add __printf() annotations to silence
these warnings since for CONFIG_WERROR=y they will trigger
build failures.
[1] https://lore.kernel.org/bpf/a8b20c72-6631-4404-9e1f-0410642d7d20@gmail.com/
Fixes: 31d0bc81637d ("bpf: Move to generic BTF show support, apply it to seq files/strings")
Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Mirsad Todorovac <mtodorovac69@yahoo.com>
Link: https://lore.kernel.org/r/20240711182321.963667-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/act_ct.c
26488172b029 ("net/sched: Fix UAF when resolving a clash")
3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the same case as previous patch (two timer callbacks trying
to cancel each other) can be invoked through bpf_map_update_elem as
well, or more precisely, freeing map elements containing timers. Since
this relies on hrtimer_cancel as well, it is prone to the same deadlock
situation as the previous patch.
It would be sufficient to use hrtimer_try_to_cancel to fix this problem,
as the timer cannot be enqueued after async_cancel_and_free. Once
async_cancel_and_free has been done, the timer must be reinitialized
before it can be armed again. The callback running in parallel trying to
arm the timer will fail, and freeing bpf_hrtimer without waiting is
sufficient (given kfree_rcu), and bpf_timer_cb will return
HRTIMER_NORESTART, preventing the timer from being rearmed again.
However, there exists a UAF scenario where the callback arms the timer
before entering this function, such that if cancellation fails (due to
timer callback invoking this routine, or the target timer callback
running concurrently). In such a case, if the timer expiration is
significantly far in the future, the RCU grace period expiration
happening before it will free the bpf_hrtimer state and along with it
the struct hrtimer, that is enqueued.
Hence, it is clear cancellation needs to occur after
async_cancel_and_free, and yet it cannot be done inline due to deadlock
issues. We thus modify bpf_timer_cancel_and_free to defer work to the
global workqueue, adding a work_struct alongside rcu_head (both used at
_different_ points of time, so can share space).
Update existing code comments to reflect the new state of affairs.
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240709185440.1104957-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Given a schedule:
timer1 cb timer2 cb
bpf_timer_cancel(timer2); bpf_timer_cancel(timer1);
Both bpf_timer_cancel calls would wait for the other callback to finish
executing, introducing a lockup.
Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps
track of all in-flight cancellation requests for a given BPF timer.
Whenever cancelling a BPF timer, we must check if we have outstanding
cancellation requests, and if so, we must fail the operation with an
error (-EDEADLK) since cancellation is synchronous and waits for the
callback to finish executing. This implies that we can enter a deadlock
situation involving two or more timer callbacks executing in parallel
and attempting to cancel one another.
Note that we avoid incrementing the cancelling counter for the target
timer (the one being cancelled) if bpf_timer_cancel is not invoked from
a callback, to avoid spurious errors. The whole point of detecting
cur->cancelling and returning -EDEADLK is to not enter a busy wait loop
(which may or may not lead to a lockup). This does not apply in case the
caller is in a non-callback context, the other side can continue to
cancel as it sees fit without running into errors.
Background on prior attempts:
Earlier versions of this patch used a bool 'cancelling' bit and used the
following pattern under timer->lock to publish cancellation status.
lock(t->lock);
t->cancelling = true;
mb();
if (cur->cancelling)
return -EDEADLK;
unlock(t->lock);
hrtimer_cancel(t->timer);
t->cancelling = false;
The store outside the critical section could overwrite a parallel
requests t->cancelling assignment to true, to ensure the parallely
executing callback observes its cancellation status.
It would be necessary to clear this cancelling bit once hrtimer_cancel
is done, but lack of serialization introduced races. Another option was
explored where bpf_timer_start would clear the bit when (re)starting the
timer under timer->lock. This would ensure serialized access to the
cancelling bit, but may allow it to be cleared before in-flight
hrtimer_cancel has finished executing, such that lockups can occur
again.
Thus, we choose an atomic counter to keep track of all outstanding
cancellation requests and use it to prevent lockups in case callbacks
attempt to cancel each other while executing in parallel.
Reported-by: Dohyun Kim <dohyunkim@google.com>
Reported-by: Neel Natu <neelnatu@google.com>
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240709185440.1104957-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The original function call passed size of smap->bucket before the number of
buckets which raises the error 'calloc-transposed-args' on compilation.
Vlastimil Babka added:
The order of parameters can be traced back all the way to 6ac99e8f23d4
("bpf: Introduce bpf sk local storage") accross several refactorings,
and that's why the commit is used as a Fixes: tag.
In v6.10-rc1, a different commit 2c321f3f70bc ("mm: change inlined
allocation helpers to account at the call site") however exposed the
order of args in a way that gcc-14 has enough visibility to start
warning about it, because (in !CONFIG_MEMCG case) bpf_map_kvcalloc is
then a macro alias for kvcalloc instead of a static inline wrapper.
To sum up the warning happens when the following conditions are all met:
- gcc-14 is used (didn't see it with gcc-13)
- commit 2c321f3f70bc is present
- CONFIG_MEMCG is not enabled in .config
- CONFIG_WERROR turns this from a compiler warning to error
Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Reviewed-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Mohammad Shehar Yaar Tausif <sheharyaar48@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20240710100521.15061-2-vbabka@suse.cz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, BPF kfuncs which accept trusted pointer arguments
i.e. those flagged as KF_TRUSTED_ARGS, KF_RCU, or KF_RELEASE, all
require an original/unmodified trusted pointer argument to be supplied
to them. By original/unmodified, it means that the backing register
holding the trusted pointer argument that is to be supplied to the BPF
kfunc must have its fixed offset set to zero, or else the BPF verifier
will outright reject the BPF program load. However, this zero fixed
offset constraint that is currently enforced by the BPF verifier onto
BPF kfuncs specifically flagged to accept KF_TRUSTED_ARGS or KF_RCU
trusted pointer arguments is rather unnecessary, and can limit their
usability in practice. Specifically, it completely eliminates the
possibility of constructing a derived trusted pointer from an original
trusted pointer. To put it simply, a derived pointer is a pointer
which points to one of the nested member fields of the object being
pointed to by the original trusted pointer.
This patch relaxes the zero fixed offset constraint that is enforced
upon BPF kfuncs which specifically accept KF_TRUSTED_ARGS, or KF_RCU
arguments. Although, the zero fixed offset constraint technically also
applies to BPF kfuncs accepting KF_RELEASE arguments, relaxing this
constraint for such BPF kfuncs has subtle and unwanted
side-effects. This was discovered by experimenting a little further
with an initial version of this patch series [0]. The primary issue
with relaxing the zero fixed offset constraint on BPF kfuncs accepting
KF_RELEASE arguments is that it'd would open up the opportunity for
BPF programs to supply both trusted pointers and derived trusted
pointers to them. For KF_RELEASE BPF kfuncs specifically, this could
be problematic as resources associated with the backing pointer could
be released by the backing BPF kfunc and cause instabilities for the
rest of the kernel.
With this new fixed offset semantic in-place for BPF kfuncs accepting
KF_TRUSTED_ARGS and KF_RCU arguments, we now have more flexibility
when it comes to the BPF kfuncs that we're able to introduce moving
forward.
Early discussions covering the possibility of relaxing the zero fixed
offset constraint can be found using the link below. This will provide
more context on where all this has stemmed from [1].
Notably, pre-existing tests have been updated such that they provide
coverage for the updated zero fixed offset
functionality. Specifically, the nested offset test was converted from
a negative to positive test as it was already designed to assert zero
fixed offset semantics of a KF_TRUSTED_ARGS BPF kfunc.
[0] https://lore.kernel.org/bpf/ZnA9ndnXKtHOuYMe@google.com/
[1] https://lore.kernel.org/bpf/ZhkbrM55MKQ0KeIV@google.com/
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240709210939.1544011-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZoxN0AAKCRDbK58LschI
g0c5AQDa3ZV9gfbN42y1zSDoM1uOgO60fb+ydxyOYh8l3+OiQQD/fLfpTY3gBFSY
9yi/pZhw/QdNzQskHNIBrHFGtJbMxgs=
=p1Zz
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-07-08
The following pull-request contains BPF updates for your *net-next* tree.
We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).
The main changes are:
1) Support resilient split BTF which cuts down on duplication and makes BTF
as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman.
2) Add support for dumping kfunc prototypes from BTF which enables both detecting
as well as dumping compilable prototypes for kfuncs, from Daniel Xu.
3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
support for BPF exceptions, from Ilya Leoshkevich.
4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.
5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
for skbs and add coverage along with it, from Vadim Fedorenko.
6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.
7) Extend the BPF verifier to track the delta between linked registers in order
to better deal with recent LLVM code optimizations, from Alexei Starovoitov.
8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
have been a pointer to the map value, from Benjamin Tissoires.
9) Extend BPF selftests to add regular expression support for test output matching
and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.
10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
iterates exactly once anyway, from Dan Carpenter.
11) Add the capability to offload the netfilter flowtable in XDP layer through
kfuncs, from Florian Westphal & Lorenzo Bianconi.
12) Various cleanups in networking helpers in BPF selftests to shave off a few
lines of open-coded functions on client/server handling, from Geliang Tang.
13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so
that x86 JIT does not need to implement detection, from Leon Hwang.
14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
out-of-bounds memory access for dynpointers, from Matt Bobrowski.
15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
it might lead to problems on 32-bit archs, from Jiri Olsa.
16) Enhance traffic validation and dynamic batch size support in xsk selftests,
from Tushar Vyavahare.
bpf-next-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
bpf: helpers: fix bpf_wq_set_callback_impl signature
libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
selftests/bpf: Remove exceptions tests from DENYLIST.s390x
s390/bpf: Implement exceptions
s390/bpf: Change seen_reg to a mask
bpf: Remove unnecessary loop in task_file_seq_get_next()
riscv, bpf: Optimize stack usage of trampoline
bpf, devmap: Add .map_alloc_check
selftests/bpf: Remove arena tests from DENYLIST.s390x
selftests/bpf: Add UAF tests for arena atomics
selftests/bpf: Introduce __arena_global
s390/bpf: Support arena atomics
s390/bpf: Enable arena
s390/bpf: Support address space cast instruction
s390/bpf: Support BPF_PROBE_MEM32
s390/bpf: Land on the next JITed instruction after exception
s390/bpf: Introduce pre- and post- probe functions
s390/bpf: Get rid of get_probe_mem_regno()
...
====================
Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I realized this while having a map containing both a struct bpf_timer and
a struct bpf_wq: the third argument provided to the bpf_wq callback is
not the struct bpf_wq pointer itself, but the pointer to the value in
the map.
Which means that the users need to double cast the provided "value" as
this is not a struct bpf_wq *.
This is a change of API, but there doesn't seem to be much users of bpf_wq
right now, so we should be able to go with this right now.
Fixes: 81f1d7a583fa ("bpf: wq: add bpf_wq_set_callback_impl")
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://lore.kernel.org/r/20240708-fix-wq-v2-1-667e5c9fbd99@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
After commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") this
loop always iterates exactly one time. Delete the for statement and pull
the code in a tab.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/ZoWJF51D4zWb6f5t@stanley.mountain
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tycho Andersen <tandersen@netflix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use the .map_allock_check callback to perform allocation checks before
allocating memory for the devmap.
Signed-off-by: Florian Lehner <dev@der-flo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240615101158.57889-1-dev@der-flo.net
Zero-extending results of atomic probe operations fails with:
verifier bug. zext_dst is set, but no reg is defined
The problem is that insn_def_regno() handles BPF_ATOMICs, but not
BPF_PROBE_ATOMICs. Fix by adding the missing condition.
Fixes: d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240701234304.14336-2-iii@linux.ibm.com
The bpf_net_ctx_get_.*_flush_list() are used at the top of the function.
This means the variable is always assigned even if unused. By moving the
function to where it is used, it is possible to delay the initialisation
until it is unavoidable.
Not sure how much this gains in reality but by looking at bq_enqueue()
(in devmap.c) gcc pushes one register less to the stack. \o/.
Move flush list retrieval to where it is used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Every NIC driver utilizing XDP should invoke xdp_do_flush() after
processing all packages. With the introduction of the bpf_net_context
logic the flush lists (for dev, CPU-map and xsk) are lazy initialized
only if used. However xdp_do_flush() tries to flush all three of them so
all three lists are always initialized and the likely empty lists are
"iterated".
Without the usage of XDP but with CONFIG_DEBUG_NET the lists are also
initialized due to xdp_do_check_flushed().
Jakub suggest to utilize the hints in bpf_net_context and avoid invoking
the flush function. This will also avoiding initializing the lists which
are otherwise unused.
Introduce bpf_net_ctx_get_all_used_flush_lists() to return the
individual list if not-empty. Use the logic in xdp_do_flush() and
xdp_do_check_flushed(). Remove the not needed .*_check_flush().
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There is no clone() invocation within a bpf_net_ctx_…() block. Therefore
the task_struct::bpf_net_context has always to be NULL and an explicit
initialisation is not required.
Remove the NULL assignment in the clone() path.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For trampoline using bpf_prog_pack, we need to generate a rw_image
buffer with size of (image_end - image). For regular trampoline, we use
the precise image size generated by arch_bpf_trampoline_size to allocate
rw_image. But for struct_ops trampoline, we allocate rw_image directly
using close to PAGE_SIZE size. We do not need to allocate for that much,
as the patch size is usually much smaller than PAGE_SIZE. Let's use
precise image size for it too.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com> #riscv
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20240622030437.3973492-2-pulehui@huaweicloud.com
Here are a bunch of fixes/reverts for 6.10-rc6. Include in here are:
- revert the bunch of tty/serial/console changes that landed in -rc1
that didn't quite work properly yet. Everyone agreed to just revert
them for now and will work on making them better for a future
release instead of trying to quick fix the existing changes this
late in the release cycle
- 8250 driver port count bugfix
- Other tiny serial port bugfixes for reported issues
All of these have been in linux-next this week with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZoFmvg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymziACgvoDTxuDHHfPOd6h/1qrHqYpFK1YAn2IDMJGj
Ng4/I/gwnkJeeHQC5JSn
=g9o4
-----END PGP SIGNATURE-----
Merge tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial / console fixes from Greg KH:
"Here are a bunch of fixes/reverts for 6.10-rc6. Include in here are:
- revert the bunch of tty/serial/console changes that landed in -rc1
that didn't quite work properly yet.
Everyone agreed to just revert them for now and will work on making
them better for a future release instead of trying to quick fix the
existing changes this late in the release cycle
- 8250 driver port count bugfix
- Other tiny serial port bugfixes for reported issues
All of these have been in linux-next this week with no reported
issues"
* tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "printk: Save console options for add_preferred_console_match()"
Revert "printk: Don't try to parse DEVNAME:0.0 console options"
Revert "printk: Flag register_console() if console is set on command line"
Revert "serial: core: Add support for DEVNAME:0.0 style naming for kernel console"
Revert "serial: core: Handle serial console options"
Revert "serial: 8250: Add preferred console in serial8250_isa_init_ports()"
Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports"
Revert "serial: 8250: Fix add preferred console for serial8250_isa_init_ports()"
Revert "serial: core: Fix ifdef for serial base console functions"
serial: bcm63xx-uart: fix tx after conversion to uart_port_tx_limited()
serial: core: introduce uart_port_tx_limited_flags()
Revert "serial: core: only stop transmit when HW fifo is empty"
serial: imx: set receiver level before starting uart
tty: mcf: MCF54418 has 10 UARTS
serial: 8250_omap: Implementation of Errata i2310
tty: serial: 8250: Fix port count mismatch with the device
- Make sure CPU hotplug dynamic prepare states are actually executed
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaBF4QACgkQEsHwGGHe
VUq3JA//UOmjzHAdcRnGNnh6h1dMKyQW4KH98eiQMXaSuvDeBOCAd6Y4tq6YF/Om
AIxHgLlhOY5O1OSVJZhtxf/lALkolCAEIAkIvGvn6EpPjog5UtNoIf6XAzwLzMn3
O8WVASO2fkypaAYBY+tUEQoLY6CAfkxogV0lzNA8HGMr6Yf/YWueiK2GO63z9Bgt
n0h0362xqACMdUbFnPGrX2wpMDA+WuhHwl8Z1Z1TB0rprYiA/tFCMLcVkT3Fezjh
hx7sYMwBM8cunMya8p9ucd4kBUJROrfNo4SfHWfG0lsitW/cflTgRXOfLp4GFLvp
z0OI9oeSHQyRATOU9yiXrWcbO8M3rFRw4/YcdRZ+5mlydJWDM00DZPqPcuxs4R3Q
nH3gE82CvzWchLU5InHwYhi5oqwNUq1N8mz2bN4T9Yjtaj7zArSLqjqIafhxpJqV
9DllV9gGroAUawlRSgo5dpl2XvPcbr9Sx8bIJqwn36esuBb2qZwL6pOtVJIBr88O
QWamnvUH6NnIqweUUR9lRRjO5WjR3Xf2ECpEt5rNnqHXLn92usNaphEhBDo3tvrG
+O3pjNER3sTEgF43yYpDX0gMZmHuXfmN+fT6QDcDGk764As+/UawIHStyI3nustI
R7gM6SUx8Fv3883LuzZtQ7KNLuhPvLxf8YD2I626HpTtLA9tn5k=
=qGvT
-----END PGP SIGNATURE-----
Merge tag 'smp_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp fixes from Borislav Petkov:
- Fix "nosmp" and "maxcpus=0" after the parallel CPU bringup work went
in and broke them
- Make sure CPU hotplug dynamic prepare states are actually executed
* tag 'smp_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Fix broken cmdline "nosmp" and "maxcpus=0"
cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaBFDwACgkQEsHwGGHe
VUqPnw//aRU3MsjXkoBKmK98O7M+6qoHL2rFRGcvw1o0GxzVB4gODgE2mZWeirG7
JLRp/lVX4xhR85NSBpKlmsnnkC8UCDnpXLRpO24ZTdlc84xEyJGsN0gHqJPjpm9M
GkBLRPOwDiSEBzL++6IyR/m3f88WDucQJXVyFa/LQIkSiFdzPBbLwX4otuIieD19
6niyXlqQQ+iAyvkDIH7tNELrOHxivPpH3+QQEfAdtE7TWamv5dkQpu9Kbf811vQb
DUsaD4E2+kQUY9ulevvz9OnsGpyhd3m30PUOHKdsrUfaE9bM/RTBDpnQ1dR3lPFD
kEb4OXsrcM0z++eIUUTBMpRATVjxl17nSgkDg5S6GLTq/Om4KQP33Co7iXE5D4sI
ephbA9jlnHAOtaNh/C1/95pIBidMBHw5HE63XcHJGei1x1pRtFx1apI9UezGGc9H
IwRzpKR2UorojCcJedZFiXGt54nJL9UUg7d7sybiVurlKOxIxnaB7cfg1MgeG9ke
yUGj6ElXvEAoEmnaf7ScAQnQ5VmkyJYTE8PUlR8h8dumQ3tyBEHanOUxqkQAlZ2P
TzVqNeCymh8XGChKKs9pHHUeySkQKYMBOZhEGwSGte0kw8JLJuEsTFud+vcONkda
4MqkH73ebPUdsH5pBNDX7eeDFLvrbpwPNh9u3wQtAnpMGLTKH4k=
=et0y
-----END PGP SIGNATURE-----
Merge tag 'timers_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Warn when an hrtimer doesn't get a callback supplied
* tag 'timers_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Prevent queuing of hrtimer without a function callback
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling")
d9c04209990b ("ionic: Mark error paths in the data path as unlikely")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These are some bugfixes for system call ABI issues I found while
working on a cleanup series. None of these are urgent since these
bugs have gone unnoticed for many years, but I think we probably
want to backport them all to stable kernels, so it makes sense
to have the fixes included as early as possible.
One more fix addresses a compile-time warning in kallsyms that was
uncovered by a patch I did to enable additional warnings in 6.10. I had
mistakenly thought that this fix was already merged through the module
tree, but as Geert pointed out it was still missing.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmZ9iRQACgkQYKtH/8kJ
UicHIxAA0ej8dMJ3znHovc/CQYkZMpb88bxLlqLotOYuOItEzvR6wd7vnu4cPeZf
nHguBiP9RAnzCZhL3F7AS3p8NNJ+P1OZo+sj6tZOANO955mzj1VQ5p2fbSRw+WI3
4Oc1HKvP6UMhHGjU3wHY0+Odd5bpoepN9/fnoiQcHPzq0LbUFM8e4D9KGr51I7fV
r7tuDMy9xykEfs6umuDu9wOXih3JkpV9eSmefmjvzgxG3hKLdsvTbWVsVmnKXhZm
xdFiTROOmiNvttfkQh0ruBd0drBl8aVhzCKPqIe0vQqS9rBmcf9WTkcJzpihq/fI
BA3QjVQFvmHeXs+viaLZf4r/y0qabaTPRBMQxZyEFE0QgtwfxT4/ZnNEbH2s3pIC
Pcm0JltLlHLbZs7V63drL6txCoFVndiPXdEBTBsqBwnuDHXCj/tvDcO3tuVTfYoz
9G8TTOsYNEDLYmn8AmzzhJOh75gp6O6A2ui3TtcD9KFNaoTQqqzPJWp8IoxBfxcb
3+rzRWQvXAhfSRBIaejv1quo2ZxoZk3KO3i+ysRITTUF1MLz7b0/Yy/8r74CqmOu
8Iw2Q0BaFPtj1x+VjneQnL++iYWYPEh+ZBEg7AD/z6QHwMLz33SyHlD+/RgRkthV
J/L9xUBs5HagWJxRYkVc+l0LOVclTqVJieKD2AWONZ5OFRB+CCI=
=ieQy
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"These are some bugfixes for system call ABI issues I found while
working on a cleanup series. None of these are urgent since these bugs
have gone unnoticed for many years, but I think we probably want to
backport them all to stable kernels, so it makes sense to have the
fixes included as early as possible.
One more fix addresses a compile-time warning in kallsyms that was
uncovered by a patch I did to enable additional warnings in 6.10. I
had mistakenly thought that this fix was already merged through the
module tree, but as Geert pointed out it was still missing"
* tag 'asm-generic-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
kallsyms: rework symbol lookup return codes
linux/syscalls.h: add missing __user annotations
syscalls: mmap(): use unsigned offset type consistently
s390: remove native mmap2() syscall
hexagon: fix fadvise64_64 calling conventions
csky, hexagon: fix broken sys_sync_file_range
sh: rework sync_file_range ABI
powerpc: restore some missing spu syscalls
parisc: use generic sys_fanotify_mark implementation
parisc: use correct compat recv/recvfrom syscalls
sparc: fix compat recv/recvfrom syscalls
sparc: fix old compat_sys_select()
syscalls: fix compat_sys_io_pgetevents_time64 usage
ftruncate: pass a signed offset
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
- batman-adv: fix RCU race at module unload time
Current release - new code bugs:
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset.
- fix the corner case with may_goto and jump to the 1st insn.
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmZ9ZlQSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkawoQAKLTWHswqM790uaAAgqP6jGuC4/waRS8
MowEt5rHlwdMXcHhLrDSrLQoDJAZRsWmjniIgbsaeX+HtY4HXfF0tfDMPKiws3vx
Z51qVj7zYjdT7IoZ7Yc8Zlwmt2kVgO4ba6gSigQSORQO9Qq/WNSb0q8BM6cDaYXT
cXC7ikPeMlLnxKxsFRpZ3CUD06dI/aJFp/pefPEm7/X/EbROlSs5y+2GshPdp5t7
tzOUsLHs6ORVq/6jg2nRHH+0D+LMuQG0Z0yCMmYerJMJNtRIxyW6tTYeAsWXeyn3
UN3gaoQ/SIURDrNRZvHsaVDNO/u4rbYtFLoK7S5uPffPWqsGJY59FcH+xYFukFCD
P5Lca4kKBr8xOahsRfSiO0uFbwQfQAauzNiz9Ue39n1hj+ZhZ/CliBLhUeoBl6Y6
jSsxq+/8CZCQ7beek96cyLx83skAcWAU5BEC9xOVlOTuTL91Gxr9UzSx/FqLI34h
Smgw9ZUPzJgvFLgB/OBQ/WYne9LfJ5RYQHZoAXObiozO3TX7NgBUfa0e1T9dLE3F
TalysSO3/goiZNK5a/UNJcj3fAcSEs4M2z9UIK790i3P3GuRigs1sJEtTUqyowWk
aaTFmWCXE0wdoshJjux3syh3Vk6phJWpOlMLYjy0v5s0BF/ZOfDaKQT/dGsvV1HE
AFGpKpybizNV
=BYgZ
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, bpf and netfilter.
There are a bunch of regressions addressed here, but hopefully nothing
spectacular. We are still waiting the driver fix from Intel, mentioned
by Jakub in the previous networking pull.
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed
TFO
- batman-adv: fix RCU race at module unload time
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not
confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset
- fix the corner case with may_goto and jump to the 1st insn
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM
transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added"
* tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: mana: Fix possible double free in error handling path
selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c.
af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head.
selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c
selftest: af_unix: Check SIGURG after every send() in msg_oob.c
selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c
af_unix: Don't stop recv() at consumed ex-OOB skb.
selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c.
af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.
af_unix: Stop recv(MSG_PEEK) at consumed OOB skb.
selftest: af_unix: Add msg_oob.c.
selftest: af_unix: Remove test_unix_oob.c.
tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset()
netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
net: usb: qmi_wwan: add Telit FN912 compositions
tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
ionic: use dev_consume_skb_any outside of napi
net: dsa: microchip: fix wrong register write when masking interrupt
Fix race for duplicate reqsk on identical SYN
ibmvnic: Add tx check to prevent skb leak
...
Building with W=1 in some configurations produces a false positive
warning for kallsyms:
kernel/kallsyms.c: In function '__sprint_symbol.isra':
kernel/kallsyms.c:503:17: error: 'strcpy' source argument is the same as destination [-Werror=restrict]
503 | strcpy(buffer, name);
| ^~~~~~~~~~~~~~~~~~~~
This originally showed up while building with -O3, but later started
happening in other configurations as well, depending on inlining
decisions. The underlying issue is that the local 'name' variable is
always initialized to the be the same as 'buffer' in the called functions
that fill the buffer, which gcc notices while inlining, though it could
see that the address check always skips the copy.
The calling conventions here are rather unusual, as all of the internal
lookup functions (bpf_address_lookup, ftrace_mod_address_lookup,
ftrace_func_address_lookup, module_address_lookup and
kallsyms_lookup_buildid) already use the provided buffer and either return
the address of that buffer to indicate success, or NULL for failure,
but the callers are written to also expect an arbitrary other buffer
to be returned.
Rework the calling conventions to return the length of the filled buffer
instead of its address, which is simpler and easier to follow as well
as avoiding the warning. Leave only the kallsyms_lookup() calling conventions
unchanged, since that is called from 16 different functions and
adapting this would be a much bigger change.
Link: https://lore.kernel.org/lkml/20200107214042.855757-1-arnd@arndb.de/
Link: https://lore.kernel.org/lkml/20240326130647.7bfb1d92@gandalf.local.home/
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Two patches to fix kworker name formatting.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZnyHUw4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGV8AAQCi2flF10nHfE+nPsji8Z2FWtDKIzemsKgmxSnW
VvThlAD+MeK29fbz0uZzFa65TqIeckf7DGoGQpKnMzC7+FeJWAc=
=6POk
-----END PGP SIGNATURE-----
Merge tag 'wq-for-6.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Two patches to fix kworker name formatting"
* tag 'wq-for-6.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Increase worker desc's length to 32
workqueue: Refactor worker ID formatting and make wq_worker_comm() use full ID string
Currently, it's possible to pass in a modified CONST_PTR_TO_DYNPTR to
a global function as an argument. The adverse effects of this is that
BPF helpers can continue to make use of this modified
CONST_PTR_TO_DYNPTR from within the context of the global function,
which can unintentionally result in out-of-bounds memory accesses and
therefore compromise overall system stability i.e.
[ 244.157771] BUG: KASAN: slab-out-of-bounds in bpf_dynptr_data+0x137/0x140
[ 244.161345] Read of size 8 at addr ffff88810914be68 by task test_progs/302
[ 244.167151] CPU: 0 PID: 302 Comm: test_progs Tainted: G O E 6.10.0-rc3-00131-g66b586715063 #533
[ 244.174318] Call Trace:
[ 244.175787] <TASK>
[ 244.177356] dump_stack_lvl+0x66/0xa0
[ 244.179531] print_report+0xce/0x670
[ 244.182314] ? __virt_addr_valid+0x200/0x3e0
[ 244.184908] kasan_report+0xd7/0x110
[ 244.187408] ? bpf_dynptr_data+0x137/0x140
[ 244.189714] ? bpf_dynptr_data+0x137/0x140
[ 244.192020] bpf_dynptr_data+0x137/0x140
[ 244.194264] bpf_prog_b02a02fdd2bdc5fa_global_call_bpf_dynptr_data+0x22/0x26
[ 244.198044] bpf_prog_b0fe7b9d7dc3abde_callback_adjust_bpf_dynptr_reg_off+0x1f/0x23
[ 244.202136] bpf_user_ringbuf_drain+0x2c7/0x570
[ 244.204744] ? 0xffffffffc0009e58
[ 244.206593] ? __pfx_bpf_user_ringbuf_drain+0x10/0x10
[ 244.209795] bpf_prog_33ab33f6a804ba2d_user_ringbuf_callback_const_ptr_to_dynptr_reg_off+0x47/0x4b
[ 244.215922] bpf_trampoline_6442502480+0x43/0xe3
[ 244.218691] __x64_sys_prlimit64+0x9/0xf0
[ 244.220912] do_syscall_64+0xc1/0x1d0
[ 244.223043] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 244.226458] RIP: 0033:0x7ffa3eb8f059
[ 244.228582] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00 f7 d8 64 89 01 48
[ 244.241307] RSP: 002b:00007ffa3e9c6eb8 EFLAGS: 00000206 ORIG_RAX: 000000000000012e
[ 244.246474] RAX: ffffffffffffffda RBX: 00007ffa3e9c7cdc RCX: 00007ffa3eb8f059
[ 244.250478] RDX: 00007ffa3eb162b4 RSI: 0000000000000000 RDI: 00007ffa3e9c7fb0
[ 244.255396] RBP: 00007ffa3e9c6ed0 R08: 00007ffa3e9c76c0 R09: 0000000000000000
[ 244.260195] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffff80
[ 244.264201] R13: 000000000000001c R14: 00007ffc5d6b4260 R15: 00007ffa3e1c7000
[ 244.268303] </TASK>
Add a check_func_arg_reg_off() to the path in which the BPF verifier
verifies the arguments of global function arguments, specifically
those which take an argument of type ARG_PTR_TO_DYNPTR |
MEM_RDONLY. Also, process_dynptr_func() doesn't appear to perform any
explicit and strict type matching on the supplied register type, so
let's also enforce that a register either type PTR_TO_STACK or
CONST_PTR_TO_DYNPTR is by the caller.
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20240625062857.92760-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The hrtimer function callback must not be NULL. It has to be specified by
the call side but it is not validated by the hrtimer code. When a hrtimer
is queued without a function callback, the kernel crashes with a null
pointer dereference when trying to execute the callback in __run_hrtimer().
Introduce a validation before queuing the hrtimer in
hrtimer_start_range_ns().
[anna-maria: Rephrase commit message]
Signed-off-by: Phil Chang <phil.chang@mediatek.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Using sys_io_pgetevents() as the entry point for compat mode tasks
works almost correctly, but misses the sign extension for the min_nr
and nr arguments.
This was addressed on parisc by switching to
compat_sys_io_pgetevents_time64() in commit 6431e92fc827 ("parisc:
io_pgetevents_time64() needs compat syscall in 32-bit compat mode"),
as well as by using more sophisticated system call wrappers on x86 and
s390. However, arm64, mips, powerpc, sparc and riscv still have the
same bug.
Change all of them over to use compat_sys_io_pgetevents_time64()
like parisc already does. This was clearly the intention when the
function was originally added, but it got hooked up incorrectly in
the tables.
Cc: stable@vger.kernel.org
Fixes: 48166e6ea47d ("y2038: add 64-bit time_t syscalls to all 32-bit architectures")
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This reverts commit f03e8c1060f86c23eb49bafee99d9fcbd1c1bd77.
Let's roll back all of the serial core and printk console changes that
went into 6.10-rc1 as there still are problems with them that need to be
sorted out.
Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1
Reported-by: Petr Mladek <pmladek@suse.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 8a831c584e6e80cf68f79893dc395c16cdf47dc8.
Let's roll back all of the serial core and printk console changes that
went into 6.10-rc1 as there still are problems with them that need to be
sorted out.
Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1
Reported-by: Petr Mladek <pmladek@suse.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit b73c9cbe4f1fc02645228aa575998dd54067f8ef.
Let's roll back all of the serial core and printk console changes that
went into 6.10-rc1 as there still are problems with them that need to be
sorted out.
Link: https://lore.kernel.org/r/ZnpRozsdw6zbjqze@tlindgre-MOBL1
Reported-by: Petr Mladek <pmladek@suse.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZnlmXgAKCRDbK58LschI
g2ovAP9iynwwFEjMSxHjQVXSq1J1PMqF4966vmy30RCKJMMN/QD/SRsRRKcfsPis
BzKOdsOVbWlDl2CUqvBrPZGT6laKoQc=
=6/0V
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-06-24
We've added 12 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 412 insertions(+), 16 deletions(-).
The main changes are:
1) Fix a BPF verifier issue validating may_goto with a negative offset,
from Alexei Starovoitov.
2) Fix a BPF verifier validation bug with may_goto combined with jump to
the first instruction, also from Alexei Starovoitov.
3) Fix a bug with overrunning reservations in BPF ring buffer,
from Daniel Borkmann.
4) Fix a bug in BPF verifier due to missing proper var_off setting related
to movsx instruction, from Yonghong Song.
5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(),
from Daniil Dulov.
* tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xdp: Remove WARN() from __xdp_reg_mem_model()
selftests/bpf: Add tests for may_goto with negative offset.
bpf: Fix may_goto with negative offset.
selftests/bpf: Add more ring buffer test coverage
bpf: Fix overrunning reservations in ringbuf
selftests/bpf: Tests with may_goto and jumps to the 1st insn
bpf: Fix the corner case with may_goto and jump to the 1st insn.
bpf: Update BPF LSM maintainer list
bpf: Fix remap of arena.
selftests/bpf: Add a few tests to cover
bpf: Add missed var_off setting in coerce_subreg_to_size_sx()
bpf: Add missed var_off setting in set_sext32_default_val()
====================
Link: https://patch.msgid.link/20240624124330.8401-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The per-CPU flush lists, which are accessed from within the NAPI callback
(xdp_do_flush() for instance), are per-CPU. There are subject to the
same problem as struct bpf_redirect_info.
Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and
xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the
access. The lists initialized on first usage (similar to
bpf_net_ctx_get_ri()).
Cc: "Björn Töpel" <bjorn@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-16-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The XDP redirect process is two staged:
- bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
packet and makes decisions. While doing that, the per-CPU variable
bpf_redirect_info is used.
- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
and it may also access other per-CPU variables like xskmap_flush_list.
At the very end of the NAPI callback, xdp_do_flush() is invoked which
does not access bpf_redirect_info but will touch the individual per-CPU
lists.
The per-CPU variables are only used in the NAPI callback hence disabling
bottom halves is the only protection mechanism. Users from preemptible
context (like cpu_map_kthread_run()) explicitly disable bottom halves
for protections reasons.
Without locking in local_bh_disable() on PREEMPT_RT this data structure
requires explicit locking.
PREEMPT_RT has forced-threaded interrupts enabled and every
NAPI-callback runs in a thread. If each thread has its own data
structure then locking can be avoided.
Create a struct bpf_net_context which contains struct bpf_redirect_info.
Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
it, bpf_net_ctx_clear() removes it again.
The bpf_net_ctx_set() may nest. For instance a function can be used from
within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
NET_TX_SOFTIRQ which does not. Therefore only the first invocations
updates the pointer.
Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
bpf_redirect_info. The returned data structure is zero initialized to
ensure nothing is leaked from stack. This is done on first usage of the
struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to
note that initialisation is required. First invocation of
bpf_net_ctx_get_ri() will memset() the data structure and update
bpf_redirect_info::kern_flags.
bpf_redirect_info::nh is excluded from memset because it is only used
once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is
moved past nh to exclude it from memset.
The pointer to bpf_net_context is saved task's task_struct. Using
always the bpf_net_context approach has the advantage that there is
almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add local_lock_nested_bh() locking. It is based on local_lock_t and the
naming follows the preempt_disable_nested() example.
For !PREEMPT_RT + !LOCKDEP it is a per-CPU annotation for locking
assumptions based on local_bh_disable(). The macro is optimized away
during compilation.
For !PREEMPT_RT + LOCKDEP the local_lock_nested_bh() is reduced to
the usual lock-acquire plus lockdep_assert_in_softirq() - ensuring that
BH is disabled.
For PREEMPT_RT local_lock_nested_bh() acquires the specified per-CPU
lock. It does not disable CPU migration because it relies on
local_bh_disable() disabling CPU migration.
With LOCKDEP it performans the usual lockdep checks as with !PREEMPT_RT.
Due to include hell the softirq check has been moved spinlock.c.
The intention is to use this locking in places where locking of a per-CPU
variable relies on BH being disabled. Instead of treating disabled
bottom halves as a big per-CPU lock, PREEMPT_RT can use this to reduce
the locking scope to what actually needs protecting.
A side effect is that it also documents the protection scope of the
per-CPU variables.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240620132727.660738-3-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zac's syzbot crafted a bpf prog that exposed two bugs in may_goto.
The 1st bug is the way may_goto is patched. When offset is negative
it should be patched differently.
The 2nd bug is in the verifier:
when current state may_goto_depth is equal to visited state may_goto_depth
it means there is an actual infinite loop. It's not correct to prune
exploration of the program at this point.
Note, that this check doesn't limit the program to only one may_goto insn,
since 2nd and any further may_goto will increment may_goto_depth only
in the queued state pushed for future exploration. The current state
will have may_goto_depth == 0 regardless of number of may_goto insns
and the verifier has to explore the program until bpf_exit.
Fixes: 011832b97b31 ("bpf: Introduce may_goto instruction")
Reported-by: Zac Ecob <zacecob@protonmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Closes: https://lore.kernel.org/bpf/CAADnVQL-15aNp04-cyHRn47Yv61NXfYyhopyZtUyxNojUZUXpA@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20240619235355.85031-1-alexei.starovoitov@gmail.com
After the rework of "Parallel CPU bringup", the cmdline "nosmp" and
"maxcpus=0" parameters are not working anymore. These parameters set
setup_max_cpus to zero and that's handed to bringup_nonboot_cpus().
The code there does a decrement before checking for zero, which brings it
into the negative space and brings up all CPUs.
Add a zero check at the beginning of the function to prevent this.
[ tglx: Massaged change log ]
Fixes: 18415f33e2ac4ab382 ("cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE")
Fixes: 06c6796e0304234da6 ("cpu/hotplug: Fix off by one in cpuhp_bringup_mask()")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240618081336.3996825-1-chenhuacai@loongson.cn
Share relocation implementation with the kernel. As part of this,
we also need the type/string iteration functions so also share
btf_iter.c file. Relocation code in kernel and userspace is identical
save for the impementation of the reparenting of split BTF to the
relocated base BTF and retrieval of the BTF header from "struct btf";
these small functions need separate user-space and kernel implementations
for the separate "struct btf"s they operate upon.
One other wrinkle on the kernel side is we have to map .BTF.ids in
modules as they were generated with the type ids used at BTF encoding
time. btf_relocate() optionally returns an array mapping from old BTF
ids to relocated ids, so we use that to fix up these references where
needed for kfuncs.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240620091733.1967885-5-alan.maguire@oracle.com
...as this will allow split BTF modules with a base BTF
representation (rather than the full vmlinux BTF at time of
BTF encoding) to resolve their references to kernel types in a
way that is more resilient to small changes in kernel types.
This will allow modules that are not built every time the kernel
is to provide more resilient BTF, rather than have it invalidated
every time BTF ids for core kernel types change.
Fields are ordered to avoid holes in struct module.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240620091733.1967885-3-alan.maguire@oracle.com
The BPF ring buffer internally is implemented as a power-of-2 sized circular
buffer, with two logical and ever-increasing counters: consumer_pos is the
consumer counter to show which logical position the consumer consumed the
data, and producer_pos which is the producer counter denoting the amount of
data reserved by all producers.
Each time a record is reserved, the producer that "owns" the record will
successfully advance producer counter. In user space each time a record is
read, the consumer of the data advanced the consumer counter once it finished
processing. Both counters are stored in separate pages so that from user
space, the producer counter is read-only and the consumer counter is read-write.
One aspect that simplifies and thus speeds up the implementation of both
producers and consumers is how the data area is mapped twice contiguously
back-to-back in the virtual memory, allowing to not take any special measures
for samples that have to wrap around at the end of the circular buffer data
area, because the next page after the last data page would be first data page
again, and thus the sample will still appear completely contiguous in virtual
memory.
Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for
book-keeping the length and offset, and is inaccessible to the BPF program.
Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
possible to make a second allocated memory chunk overlapping with the first
chunk and as a result, the BPF program is now able to edit first chunk's
header.
For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size
of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to
bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
allocate a chunk B with size 0x3000. This will succeed because consumer_pos
was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then
locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk
B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong
page and could cause a crash.
Fix it by calculating the oldest pending_pos and check whether the range
from the oldest outstanding record to the newest would span beyond the ring
buffer size. If that is the case, then reject the request. We've tested with
the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)
before/after the fix and while it seems a bit slower on some benchmarks, it
is still not significantly enough to matter.
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg>
Co-developed-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240621140828.18238-1-daniel@iogearbox.net
When the following program is processed by the verifier:
L1: may_goto L2
goto L1
L2: w0 = 0
exit
the may_goto insn is first converted to:
L1: r11 = *(u64 *)(r10 -8)
if r11 == 0x0 goto L2
r11 -= 1
*(u64 *)(r10 -8) = r11
goto L1
L2: w0 = 0
exit
then later as the last step the verifier inserts:
*(u64 *)(r10 -8) = BPF_MAX_LOOPS
as the first insn of the program to initialize loop count.
When the first insn happens to be a branch target of some jmp the
bpf_patch_insn_data() logic will produce:
L1: *(u64 *)(r10 -8) = BPF_MAX_LOOPS
r11 = *(u64 *)(r10 -8)
if r11 == 0x0 goto L2
r11 -= 1
*(u64 *)(r10 -8) = r11
goto L1
L2: w0 = 0
exit
because instruction patching adjusts all jmps and calls, but for this
particular corner case it's incorrect and the L1 label should be one
instruction down, like:
*(u64 *)(r10 -8) = BPF_MAX_LOOPS
L1: r11 = *(u64 *)(r10 -8)
if r11 == 0x0 goto L2
r11 -= 1
*(u64 *)(r10 -8) = r11
goto L1
L2: w0 = 0
exit
and that's what this patch is fixing.
After bpf_patch_insn_data() call adjust_jmp_off() to adjust all jmps
that point to newly insert BPF_ST insn to point to insn after.
Note that bpf_patch_insn_data() cannot easily be changed to accommodate
this logic, since jumps that point before or after a sequence of patched
instructions have to be adjusted with the full length of the patch.
Conceptually it's somewhat similar to "insert" of instructions between other
instructions with weird semantics. Like "insert" before 1st insn would require
adjustment of CALL insns to point to newly inserted 1st insn, but not an
adjustment JMP insns that point to 1st, yet still adjusting JMP insns that
cross over 1st insn (point to insn before or insn after), hence use simple
adjust_jmp_off() logic to fix this corner case. Ideally bpf_patch_insn_data()
would have an auxiliary info to say where 'the start of newly inserted patch
is', but it would be too complex for backport.
Fixes: 011832b97b31 ("bpf: Introduce may_goto instruction")
Reported-by: Zac Ecob <zacecob@protonmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Closes: https://lore.kernel.org/bpf/CAADnVQJ_WWx8w4b=6Gc2EpzAjgv+6A0ridnMz2TvS2egj4r3Gw@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20240619011859.79334-1-alexei.starovoitov@gmail.com
The new generic LSM hook security_file_post_open() was recently added
to the LSM framework in commit 8f46ff5767b0b ("security: Introduce
file_post_open hook"). Let's proactively add this generic LSM hook to
the sleepable_lsm_hooks BTF ID set, because I can't see there being
any strong reasons not to, and it's only a matter of time before
someone else comes around and asks for it to be there.
security_file_post_open() is inherently sleepable as it's purposely
situated in the kernel that allows LSMs to directly read out the
contents of the backing file if need be. Additionally, it's called
directly after security_file_open(), and that LSM hook in itself
already exists in the sleepable_lsm_hooks BTF ID set.
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240618192923.379852-1-mattbobrowski@google.com
This reverts [1] and changes return value for bpf_session_cookie
in bpf selftests. Having long * might lead to problems on 32-bit
architectures.
Fixes: 2b8dd87332cd ("bpf: Make bpf_session_cookie() kfunc return long *")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240619081624.1620152-1-jolsa@kernel.org
This new_n is defined in the start of this function.
Its value is overwritten by `new_n = min(n, log->len_total);`
a couple lines before my change,
rendering the shadow declaration unnecessary.
Signed-off-by: Rafael Passos <rafael@rcpassos.me>
Link: https://lore.kernel.org/r/20240615022641.210320-4-rafael@rcpassos.me
Signed-off-by: Alexei Starovoitov <ast@kernel.org>