1249268 Commits

Author SHA1 Message Date
Kui-Feng Lee
1611603537 bpf: Create argument information for nullable arguments.
Collect argument information from the type information of stub functions to
mark arguments of BPF struct_ops programs with PTR_MAYBE_NULL if they are
nullable.  A nullable argument is annotated by suffixing "__nullable" at
the argument name of stub function.

For nullable arguments, this patch sets a struct bpf_ctx_arg_aux to label
their reg_type with PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL. This
makes the verifier to check programs and ensure that they properly check
the pointer. The programs should check if the pointer is null before
accessing the pointed memory.

The implementer of a struct_ops type should annotate the arguments that can
be null. The implementer should define a stub function (empty) as a
placeholder for each defined operator. The name of a stub function should
be in the pattern "<st_op_type>__<operator name>". For example, for
test_maybe_null of struct bpf_testmod_ops, it's stub function name should
be "bpf_testmod_ops__test_maybe_null". You mark an argument nullable by
suffixing the argument name with "__nullable" at the stub function.

Since we already has stub functions for kCFI, we just reuse these stub
functions with the naming convention mentioned earlier. These stub
functions with the naming convention is only required if there are nullable
arguments to annotate. For functions having not nullable arguments, stub
functions are not necessary for the purpose of this patch.

This patch will prepare a list of struct bpf_ctx_arg_aux, aka arg_info, for
each member field of a struct_ops type.  "arg_info" will be assigned to
"prog->aux->ctx_arg_info" of BPF struct_ops programs in
check_struct_ops_btf_id() so that it can be used by btf_ctx_access() later
to set reg_type properly for the verifier.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240209023750.1153905-4-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-13 15:16:44 -08:00
Kui-Feng Lee
6115a0aeef bpf: Move __kfunc_param_match_suffix() to btf.c.
Move __kfunc_param_match_suffix() to btf.c and rename it as
btf_param_match_suffix(). It can be reused by bpf_struct_ops later.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240209023750.1153905-3-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-13 15:16:44 -08:00
Kui-Feng Lee
77c0208e19 bpf: add btf pointer to struct bpf_ctx_arg_aux.
Enable the providers to use types defined in a module instead of in the
kernel (btf_vmlinux).

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240209023750.1153905-2-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-13 15:16:44 -08:00
Dave Thaler
dc8543b597 bpf, docs: Update ISA document title
* Use "Instruction Set Architecture (ISA)" instead of "Instruction Set
  Specification"
* Remove version number

As previously discussed on the mailing list at
https://mailarchive.ietf.org/arch/msg/bpf/SEpn3OL9TabNRn-4rDX9A6XVbjM/

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240208221449.12274-1-dthaler1968@gmail.com
2024-02-13 23:14:15 +01:00
Cupertino Miranda
12bbcf8e84 libbpf: Add support to GCC in CORE macro definitions
Due to internal differences between LLVM and GCC the current
implementation for the CO-RE macros does not fit GCC parser, as it will
optimize those expressions even before those would be accessible by the
BPF backend.

As examples, the following would be optimized out with the original
definitions:
  - As enums are converted to their integer representation during
  parsing, the IR would not know how to distinguish an integer
  constant from an actual enum value.
  - Types need to be kept as temporary variables, as the existing type
  casts of the 0 address (as expanded for LLVM), are optimized away by
  the GCC C parser, never really reaching GCCs IR.

Although, the macros appear to add extra complexity, the expanded code
is removed from the compilation flow very early in the compilation
process, not really affecting the quality of the generated assembly.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240213173543.1397708-1-cupertino.miranda@oracle.com
2024-02-13 11:28:12 -08:00
Jose E. Marchesi
52dbd67dff bpf: Abstract loop unrolling pragmas in BPF selftests
[Changes from V1:
- Avoid conflict by rebasing with latest master.]

Some BPF tests use loop unrolling compiler pragmas that are clang
specific and not supported by GCC.  These pragmas, along with their
GCC equivalences are:

  #pragma clang loop unroll_count(N)
  #pragma GCC unroll N

  #pragma clang loop unroll(full)
  #pragma GCC unroll 65534

  #pragma clang loop unroll(disable)
  #pragma GCC unroll 1

  #pragma unroll [aka #pragma clang loop unroll(enable)]
  There is no GCC equivalence to this pragma.  It enables unrolling on
  loops that the compiler would not ordinarily unroll even with
  -O2|-funroll-loops, but it is not equivalent to full unrolling
  either.

This patch adds a new header progs/bpf_compiler.h that defines the
following macros, which correspond to each pair of compiler-specific
pragmas above:

  __pragma_loop_unroll_count(N)
  __pragma_loop_unroll_full
  __pragma_loop_no_unroll
  __pragma_loop_unroll

The selftests using loop unrolling pragmas are then changed to include
the header and use these macros in place of the explicit pragmas.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240208203612.29611-1-jose.marchesi@oracle.com
2024-02-13 11:17:30 -08:00
Yonghong Song
fc1c9e40da selftests/bpf: Ensure fentry prog cannot attach to bpf_spin_{lock,unlcok}()
Add two tests to ensure fentry programs cannot attach to
bpf_spin_{lock,unlock}() helpers. The tracing_failure.c files
can be used in the future for other tracing failure cases.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240207070107.335341-1-yonghong.song@linux.dev
2024-02-13 11:11:25 -08:00
Yonghong Song
178c54666f bpf: Mark bpf_spin_{lock,unlock}() helpers with notrace correctly
Currently tracing is supposed not to allow for bpf_spin_{lock,unlock}()
helper calls. This is to prevent deadlock for the following cases:
  - there is a prog (prog-A) calling bpf_spin_{lock,unlock}().
  - there is a tracing program (prog-B), e.g., fentry, attached
    to bpf_spin_lock() and/or bpf_spin_unlock().
  - prog-B calls bpf_spin_{lock,unlock}().
For such a case, when prog-A calls bpf_spin_{lock,unlock}(),
a deadlock will happen.

The related source codes are below in kernel/bpf/helpers.c:
  notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
  notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
notrace is supposed to prevent fentry prog from attaching to
bpf_spin_{lock,unlock}().

But actually this is not the case and fentry prog can successfully
attached to bpf_spin_lock(). Siddharth Chintamaneni reported
the issue in [1]. The following is the macro definition for
above BPF_CALL_1:
  #define BPF_CALL_x(x, name, ...)                                               \
        static __always_inline                                                 \
        u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__));   \
        typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
        u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__));         \
        u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__))          \
        {                                                                      \
                return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
        }                                                                      \
        static __always_inline                                                 \
        u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__))

  #define BPF_CALL_1(name, ...)   BPF_CALL_x(1, name, __VA_ARGS__)

The notrace attribute is actually applied to the static always_inline function
____bpf_spin_{lock,unlock}(). The actual callback function
bpf_spin_{lock,unlock}() is not marked with notrace, hence
allowing fentry prog to attach to two helpers, and this
may cause the above mentioned deadlock. Siddharth Chintamaneni
actually has a reproducer in [2].

To fix the issue, a new macro NOTRACE_BPF_CALL_1 is introduced which
will add notrace attribute to the original function instead of
the hidden always_inline function and this fixed the problem.

  [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@mail.gmail.com/
  [2] https://lore.kernel.org/bpf/CAE5sdEg6yUc_Jz50AnUXEEUh6O73yQ1Z6NV2srJnef0ZrQkZew@mail.gmail.com/

Fixes: d83525ca62cf ("bpf: introduce bpf_spin_lock")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240207070102.335167-1-yonghong.song@linux.dev
2024-02-13 11:11:25 -08:00
Daniel Xu
5b268d1ebc bpf: Have bpf_rdonly_cast() take a const pointer
Since 20d59ee55172 ("libbpf: add bpf_core_cast() macro"), libbpf is now
exporting a const arg version of bpf_rdonly_cast(). This causes the
following conflicting type error when generating kfunc prototypes from
BTF:

In file included from skeleton/pid_iter.bpf.c:5:
/home/dxu/dev/linux/tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_core_read.h:297:14: error: conflicting types for 'bpf_rdonly_cast'
extern void *bpf_rdonly_cast(const void *obj__ign, __u32 btf_id__k) __ksym __weak;
             ^
./vmlinux.h:135625:14: note: previous declaration is here
extern void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k) __weak __ksym;

This is b/c the kernel defines bpf_rdonly_cast() with non-const arg.
Since const arg is more permissive and thus backwards compatible, we
change the kernel definition as well to avoid conflicting type errors.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/dfd3823f11ffd2d4c838e961d61ec9ae8a646773.1707080349.git.dxu@dxuuu.xyz
2024-02-13 11:05:26 -08:00
Marco Elver
68bc61c26c bpf: Allow compiler to inline most of bpf_local_storage_lookup()
In various performance profiles of kernels with BPF programs attached,
bpf_local_storage_lookup() appears as a significant portion of CPU
cycles spent. To enable the compiler generate more optimal code, turn
bpf_local_storage_lookup() into a static inline function, where only the
cache insertion code path is outlined

Notably, outlining cache insertion helps avoid bloating callers by
duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on
architectures which do not inline spin_lock/unlock, such as x86), which
would cause the compiler produce worse code by deciding to outline
otherwise inlinable functions. The call overhead is neutral, because we
make 2 calls either way: either calling raw_spin_lock_irqsave() and
raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(),
which calls raw_spin_lock_irqsave(), followed by a tail-call to
raw_spin_unlock_irqsave() where the compiler can perform TCO and (in
optimized uninstrumented builds) turns it into a plain jump. The call to
__bpf_local_storage_insert_cache() can be elided entirely if
cacheit_lockit is a false constant expression.

Based on results from './benchs/run_bench_local_storage.sh' (21 trials,
reboot between each trial; x86 defconfig + BPF, clang 16) this produces
improvements in throughput and latency in the majority of cases, with an
average (geomean) improvement of 8%:

+---- Hashmap Control --------------------
|
| + num keys: 10
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
|   +- hits latency                       | 67.679 ns/op         | 67.879 ns/op   (  ~  )
|   +- important_hits throughput          | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
|
| + num keys: 1000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
|   +- hits latency                       | 81.754 ns/op         | 82.185 ns/op   (  ~  )
|   +- important_hits throughput          | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
|
| + num keys: 10000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
|   +- hits latency                       | 138.522 ns/op        | 138.842 ns/op  (  ~  )
|   +- important_hits throughput          | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
|
| + num keys: 100000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
|   +- hits latency                       | 198.483 ns/op        | 194.270 ns/op  (-2.1%)
|   +- important_hits throughput          | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
|
| + num keys: 4194304
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
|   +- hits latency                       | 365.220 ns/op        | 361.418 ns/op  (-1.0%)
|   +- important_hits throughput          | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
|
+---- Local Storage ----------------------
|
| + num_maps: 1
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
|   +- hits latency                       | 30.300 ns/op         | 25.598 ns/op   (-15.5%)
|   +- important_hits throughput          | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
|   +- hits latency                       | 26.919 ns/op         | 22.259 ns/op   (-17.3%)
|   +- important_hits throughput          | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
|
| + num_maps: 10
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 32.288 M ops/s       | 38.099 M ops/s (+18.0%)
|   +- hits latency                       | 30.972 ns/op         | 26.248 ns/op   (-15.3%)
|   +- important_hits throughput          | 3.229 M ops/s        | 3.810 M ops/s  (+18.0%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 34.473 M ops/s       | 41.145 M ops/s (+19.4%)
|   +- hits latency                       | 29.010 ns/op         | 24.307 ns/op   (-16.2%)
|   +- important_hits throughput          | 12.312 M ops/s       | 14.695 M ops/s (+19.4%)
|
| + num_maps: 16
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 32.524 M ops/s       | 38.341 M ops/s (+17.9%)
|   +- hits latency                       | 30.748 ns/op         | 26.083 ns/op   (-15.2%)
|   +- important_hits throughput          | 2.033 M ops/s        | 2.396 M ops/s  (+17.9%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 34.575 M ops/s       | 41.338 M ops/s (+19.6%)
|   +- hits latency                       | 28.925 ns/op         | 24.193 ns/op   (-16.4%)
|   +- important_hits throughput          | 11.001 M ops/s       | 13.153 M ops/s (+19.6%)
|
| + num_maps: 17
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 28.861 M ops/s       | 32.756 M ops/s (+13.5%)
|   +- hits latency                       | 34.649 ns/op         | 30.530 ns/op   (-11.9%)
|   +- important_hits throughput          | 1.700 M ops/s        | 1.929 M ops/s  (+13.5%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 31.529 M ops/s       | 36.110 M ops/s (+14.5%)
|   +- hits latency                       | 31.719 ns/op         | 27.697 ns/op   (-12.7%)
|   +- important_hits throughput          | 9.598 M ops/s        | 10.993 M ops/s (+14.5%)
|
| + num_maps: 24
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 18.602 M ops/s       | 19.937 M ops/s (+7.2%)
|   +- hits latency                       | 53.767 ns/op         | 50.166 ns/op   (-6.7%)
|   +- important_hits throughput          | 0.776 M ops/s        | 0.831 M ops/s  (+7.2%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 21.718 M ops/s       | 23.332 M ops/s (+7.4%)
|   +- hits latency                       | 46.047 ns/op         | 42.865 ns/op   (-6.9%)
|   +- important_hits throughput          | 6.110 M ops/s        | 6.564 M ops/s  (+7.4%)
|
| + num_maps: 32
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 14.118 M ops/s       | 14.626 M ops/s (+3.6%)
|   +- hits latency                       | 70.856 ns/op         | 68.381 ns/op   (-3.5%)
|   +- important_hits throughput          | 0.442 M ops/s        | 0.458 M ops/s  (+3.6%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 17.111 M ops/s       | 17.906 M ops/s (+4.6%)
|   +- hits latency                       | 58.451 ns/op         | 55.865 ns/op   (-4.4%)
|   +- important_hits throughput          | 4.776 M ops/s        | 4.998 M ops/s  (+4.6%)
|
| + num_maps: 100
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 5.281 M ops/s        | 5.528 M ops/s  (+4.7%)
|   +- hits latency                       | 192.398 ns/op        | 183.059 ns/op  (-4.9%)
|   +- important_hits throughput          | 0.053 M ops/s        | 0.055 M ops/s  (+4.9%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 6.265 M ops/s        | 6.498 M ops/s  (+3.7%)
|   +- hits latency                       | 161.436 ns/op        | 152.877 ns/op  (-5.3%)
|   +- important_hits throughput          | 1.636 M ops/s        | 1.697 M ops/s  (+3.7%)
|
| + num_maps: 1000
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 0.355 M ops/s        | 0.354 M ops/s  (  ~  )
|   +- hits latency                       | 2826.538 ns/op       | 2827.139 ns/op (  ~  )
|   +- important_hits throughput          | 0.000 M ops/s        | 0.000 M ops/s  (  ~  )
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 0.404 M ops/s        | 0.403 M ops/s  (  ~  )
|   +- hits latency                       | 2481.190 ns/op       | 2487.555 ns/op (  ~  )
|   +- important_hits throughput          | 0.102 M ops/s        | 0.101 M ops/s  (  ~  )

The on_lookup test in {cgrp,task}_ls_recursion.c is removed
because the bpf_local_storage_lookup is no longer traceable
and adding tracepoint will make the compiler generate worse
code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/

Signed-off-by: Marco Elver <elver@google.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-11 14:06:24 -08:00
Martin KaFai Lau
a7170d81e0 Merge branch 'bpf, btf: Add DEBUG_INFO_BTF checks for __register_bpf_struct_ops'
Geliang Tang says:

====================
bpf: Add DEBUG_INFO_BTF checks for __register_bpf_struct_ops

This patch set avoids module loading failure when the module
trying to register a struct_ops and the module has its btf section
stripped. This will then work similarly as module kfunc registration in
commit 3de4d22cc9ac ("bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()")

v5:
 - drop CONFIG_MODULE_ALLOW_BTF_MISMATCH check as Martin suggested.

v4:
 - add a new patch to fix error checks for btf_get_module_btf.
 - rename the helper to check_btf_kconfigs.

v3:
 - fix this build error:
kernel/bpf/btf.c:7750:11: error: incomplete definition of type 'struct module'

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402040934.Fph0XeEo-lkp@intel.com/

v2:
 - add register_check_missing_btf helper as Jiri suggested.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-08 11:40:18 -08:00
Geliang Tang
947e56f82f bpf, btf: Check btf for register_bpf_struct_ops
Similar to the handling in the functions __register_btf_kfunc_id_set()
and register_btf_id_dtor_kfuncs(), this patch uses the newly added
helper check_btf_kconfigs() to handle module with its btf section
stripped.

While at it, the patch also adds the missed IS_ERR() check to fix the
commit f6be98d19985 ("bpf, net: switch to dynamic registration")

Fixes: f6be98d19985 ("bpf, net: switch to dynamic registration")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/69082b9835463fe36f9e354bddf2d0a97df39c2b.1707373307.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-08 11:37:49 -08:00
Geliang Tang
9e60b0e025 bpf, btf: Add check_btf_kconfigs helper
This patch extracts duplicate code on error path when btf_get_module_btf()
returns NULL from the functions __register_btf_kfunc_id_set() and
register_btf_id_dtor_kfuncs() into a new helper named check_btf_kconfigs()
to check CONFIG_DEBUG_INFO_BTF and CONFIG_DEBUG_INFO_BTF_MODULES in it.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/fa5537fc55f1e4d0bfd686598c81b7ab9dbd82b7.1707373307.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-08 11:22:56 -08:00
Geliang Tang
b9a395f0f7 bpf, btf: Fix return value of register_btf_id_dtor_kfuncs
The same as __register_btf_kfunc_id_set(), to let the modules with
stripped btf section loaded, this patch changes the return value of
register_btf_id_dtor_kfuncs() too from -ENOENT to 0 when btf is NULL.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/eab65586d7fb0e72f2707d3747c7d4a5d60c823f.1707373307.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-08 11:05:07 -08:00
Masahiro Yamada
e55dad12ab bpf: Merge two CONFIG_BPF entries
'config BPF' exists in both init/Kconfig and kernel/bpf/Kconfig.

Commit b24abcff918a ("bpf, kconfig: Add consolidated menu entry for bpf
with core options") added the second one to kernel/bpf/Kconfig instead
of moving the existing one.

Merge them together.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240204075634.32969-1-masahiroy@kernel.org
2024-02-07 16:38:20 -08:00
Yafang Shao
ba6a6abb3b selftests/bpf: Mark cpumask kfunc declarations as __weak
After the series "Annotate kfuncs in .BTF_ids section"[0], kfuncs can be
generated from bpftool. Let's mark the existing cpumask kfunc declarations
__weak so they don't conflict with definitions that will eventually come
from vmlinux.h.

[0]. https://lore.kernel.org/all/cover.1706491398.git.dxu@dxuuu.xyz

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/20240206081416.26242-5-laoar.shao@gmail.com
2024-02-07 16:14:19 -08:00
Yafang Shao
a2bff65cfc selftests/bpf: Fix error checking for cpumask_success__load()
We should verify the return value of cpumask_success__load().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206081416.26242-4-laoar.shao@gmail.com
2024-02-07 16:14:19 -08:00
Andrii Nakryiko
abae1ac523 Merge branch 'tools-resolve_btfids-fix-cross-compilation-to-non-host-endianness'
Viktor Malik says:

====================
tools/resolve_btfids: fix cross-compilation to non-host endianness

The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
build and afterwards patched by resolve_btfids with correct values.
Since resolve_btfids always writes in host-native endianness, it relies
on libelf to do the translation when the target ELF is cross-compiled to
a different endianness (this was introduced in commit 61e8aeda9398
("bpf: Fix libelf endian handling in resolv_btfids")).

Unfortunately, the translation will corrupt the flags fields of SET8
entries because these were written during vmlinux compilation and are in
the correct endianness already. This will lead to numerous selftests
failures such as:

    $ sudo ./test_verifier 502 502
    #502/p sleepable fentry accept FAIL
    Failed to load prog 'Invalid argument'!
    bpf_fentry_test1 is not sleepable
    verification time 34 usec
    stack depth 0
    processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
    Summary: 0 PASSED, 0 SKIPPED, 1 FAILED

Since it's not possible to instruct libelf to translate just certain
values, let's manually bswap the flags (both global and entry flags) in
resolve_btfids when needed, so that libelf then translates everything
correctly.

The first patch of the series refactors resolve_btfids by using types
from btf_ids.h instead of accessing the BTF ID data using magic offsets.
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
Changes in v4:
- remove unnecessary vars and pointer casts (suggested by Daniel Xu)

Changes in v3:
- add byte swap of global 'flags' field in btf_id_set8 (suggested by
  Jiri Olsa)
- cleaner refactoring of sets_patch (suggested by Jiri Olsa)
- add compile-time assertion that IDs are at the beginning of pairs
  struct in btf_id_set8 (suggested by Daniel Borkmann)

Changes in v2:
- use type defs from btf_ids.h (suggested by Andrii Nakryiko)
====================

Link: https://lore.kernel.org/r/cover.1707223196.git.vmalik@redhat.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-07 15:56:18 -08:00
Viktor Malik
903fad4394 tools/resolve_btfids: Fix cross-compilation to non-host endianness
The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
build and afterwards patched by resolve_btfids with correct values.
Since resolve_btfids always writes in host-native endianness, it relies
on libelf to do the translation when the target ELF is cross-compiled to
a different endianness (this was introduced in commit 61e8aeda9398
("bpf: Fix libelf endian handling in resolv_btfids")).

Unfortunately, the translation will corrupt the flags fields of SET8
entries because these were written during vmlinux compilation and are in
the correct endianness already. This will lead to numerous selftests
failures such as:

    $ sudo ./test_verifier 502 502
    #502/p sleepable fentry accept FAIL
    Failed to load prog 'Invalid argument'!
    bpf_fentry_test1 is not sleepable
    verification time 34 usec
    stack depth 0
    processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
    Summary: 0 PASSED, 0 SKIPPED, 1 FAILED

Since it's not possible to instruct libelf to translate just certain
values, let's manually bswap the flags (both global and entry flags) in
resolve_btfids when needed, so that libelf then translates everything
correctly.

Fixes: ef2c6f370a63 ("tools/resolve_btfids: Add support for 8-byte BTF sets")
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com
2024-02-07 15:56:18 -08:00
Viktor Malik
9707ac4fe2 tools/resolve_btfids: Refactor set sorting with types from btf_ids.h
Instead of using magic offsets to access BTF ID set data, leverage types
from btf_ids.h (btf_id_set and btf_id_set8) which define the actual
layout of the data. Thanks to this change, set sorting should also
continue working if the layout changes.

This requires to sync the definition of 'struct btf_id_set8' from
include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync
the rest of the file at the moment, b/c that would require to also sync
multiple dependent headers and we don't need any other defs from
btf_ids.h.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com
2024-02-07 15:56:18 -08:00
Toke Høiland-Jørgensen
92a871ab9f libbpf: Use OPTS_SET() macro in bpf_xdp_query()
When the feature_flags and xdp_zc_max_segs fields were added to the libbpf
bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro.
This causes libbpf to write to those fields unconditionally, which means
that programs compiled against an older version of libbpf (with a smaller
size of the bpf_xdp_query_opts struct) will have its stack corrupted by
libbpf writing out of bounds.

The patch adding the feature_flags field has an early bail out if the
feature_flags field is not part of the opts struct (via the OPTS_HAS)
macro, but the patch adding xdp_zc_max_segs does not. For consistency, this
fix just changes the assignments to both fields to use the OPTS_SET()
macro.

Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com
2024-02-06 09:51:26 -08:00
Jose E. Marchesi
c27aa462aa bpf: Use -Wno-address-of-packed-member in some selftests
[Differences from V2:
- Remove conditionals in the source files pragmas, as the
  pragma is supported by both GCC and clang.]

Both GCC and clang implement the -Wno-address-of-packed-member
warning, which is enabled by -Wall, that warns about taking the
address of a packed struct field when it can lead to an "unaligned"
address.

This triggers the following errors (-Werror) when building three
particular BPF selftests with GCC:

  progs/test_cls_redirect.c
  986 |         if (ipv4_is_fragment((void *)&encap->ip)) {
  progs/test_cls_redirect_dynptr.c
  410 |         pkt_ipv4_checksum((void *)&encap_gre->ip);
  progs/test_cls_redirect.c
  521 |         pkt_ipv4_checksum((void *)&encap_gre->ip);
  progs/test_tc_tunnel.c
   232 |         set_ipv4_csum((void *)&h_outer.ip);

These warnings do not signal any real problem in the tests as far as I
can see.

This patch adds pragmas to these test files that inhibit the
-Waddress-of-packed-member warning.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240206102330.7113-1-jose.marchesi@oracle.com
2024-02-06 09:40:25 -08:00
Dave Thaler
563918a0e3 bpf, docs: Fix typos in instructions-set.rst
* "imm32" should just be "imm"
* Add blank line to fix formatting error reported by Stephen Rothwell [0]

[0]: https://lore.kernel.org/bpf/20240206153301.4ead0bad@canb.auug.org.au/T/#u

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240206045146.4965-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06 07:44:59 -08:00
Andrii Nakryiko
c7dcb6c9aa selftests/bpf: mark dynptr kfuncs __weak to make them optional on old kernels
Mark dynptr kfuncs as __weak to allow
verifier_global_subprogs/arg_ctx_{perf,kprobe,raw_tp} subtests to be
loadable on old kernels. Because bpf_dynptr_from_xdp() kfunc is used
from arg_tag_dynptr BPF program in progs/verifier_global_subprogs.c
*and* is not marked as __weak, loading any subtest from
verifier_global_subprogs fails on old kernels that don't have
bpf_dynptr_from_xdp() kfunc defined. Even if arg_tag_dynptr program
itself is not loaded, libbpf bails out on non-weak reference to
bpf_dynptr_from_xdp (that can't be resolved), which shared across all
programs in progs/verifier_global_subprogs.c.

So mark all dynptr-related kfuncs as __weak to unblock libbpf CI ([0]).
In the upcoming "kfunc in vmlinux.h" work we should make sure that
kfuncs are always declared __weak as well.

  [0] https://github.com/libbpf/libbpf/actions/runs/7792673215/job/21251250831?pr=776#step:4:7961

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206004008.1541513-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:02:38 -08:00
Andrii Nakryiko
d7bc416aa5 libbpf: fix return value for PERF_EVENT __arg_ctx type fix up check
If PERF_EVENT program has __arg_ctx argument with matching
architecture-specific pt_regs/user_pt_regs/user_regs_struct pointer
type, libbpf should still perform type rewrite for old kernels, but not
emit the warning. Fix copy/paste from kernel code where 0 is meant to
signify "no error" condition. For libbpf we need to return "true" to
proceed with type rewrite (which for PERF_EVENT program will be
a canonical `struct bpf_perf_event_data *` type).

Fixes: 9eea8fafe33e ("libbpf: fix __arg_ctx type enforcement for perf_event programs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206002243.1439450-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:01:46 -08:00
Alexei Starovoitov
6146fae67b Merge branch 'xsk-support-redirect-to-any-socket-bound-to-the-same-umem'
Magnus Karlsson says:

====================
xsk: support redirect to any socket bound to the same umem

This patch set adds support for directing a packet to any socket bound
to the same umem. This makes it possible to use the XDP program to
select what socket the packet should be received on. The user can
populate the XSKMAP with various sockets and as long as they share the
same umem, the XDP program can pick any one of them.

The implementation is straight-forward. Instead of testing that the
incoming packet is targeting the same device and queue id as the
socket is bound to, just check that the umem the packet was received
on is the same as the socket we want it to be received on. This
guarantees that the redirect is legal as it is already in the correct
umem.

Patch #1 implements the feature and patch #2 adds documentation.

Thanks: Magnus
====================

Link: https://lore.kernel.org/r/20240205123553.22180-1-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:01:16 -08:00
Magnus Karlsson
968595a936 xsk: document ability to redirect to any socket bound to the same umem
Document the ability to redirect to any socket bound to the same umem.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20240205123553.22180-3-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:01:15 -08:00
Magnus Karlsson
2863d665ea xsk: support redirect to any socket bound to the same umem
Add support for directing a packet to any socket bound to the same
umem. This makes it possible to use the XDP program to select what
socket the packet should be received on. The user can populate the
XSKMAP with various sockets and as long as they share the same umem,
the XDP program can pick any one of them.

Suggested-by: Yuval El-Hanany <yuvale@radware.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240205123553.22180-2-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:01:15 -08:00
Alexei Starovoitov
20a286c1a3 Merge branch 'transfer-rcu-lock-state-across-subprog-calls'
Kumar Kartikeya Dwivedi says:

====================
Transfer RCU lock state across subprog calls

David suggested during the discussion in [0] that we should handle RCU
locks in a similar fashion to spin locks where the verifier understands
when a lock held in a caller is released in callee, or lock taken in
callee is released in a caller, or the callee is called within a lock
critical section. This set extends the same semantics to RCU read locks
and adds a few selftests to verify correct behavior. This issue has also
come up for sched-ext programs.

This would now allow static subprog calls to be made without errors
within RCU read sections, for subprogs to release RCU locks of callers
and return to them, or for subprogs to take RCU lock which is later
released in the caller.

  [0]: https://lore.kernel.org/bpf/20240204120206.796412-1-memxor@gmail.com

Changelog:
----------
v1 -> v2:
v1: https://lore.kernel.org/bpf/20240204230231.1013964-1-memxor@gmail.com

 * Add tests for global subprog behaviour (Yafang)
 * Add Acks, Tested-by (Yonghong, Yafang)
====================

Link: https://lore.kernel.org/r/20240205055646.1112186-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:00:14 -08:00
Kumar Kartikeya Dwivedi
8be6a0147a selftests/bpf: Add tests for RCU lock transfer between subprogs
Add selftests covering the following cases:
- A static or global subprog called from within a RCU read section works
- A static subprog taking an RCU read lock which is released in caller works
- A static subprog releasing the caller's RCU read lock works

Global subprogs that leave the lock in an imbalanced state will not
work, as they are verified separately, so ensure those cases fail as
well.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240205055646.1112186-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:00:14 -08:00
Kumar Kartikeya Dwivedi
6fceea0fa5 bpf: Transfer RCU lock state between subprog calls
Allow transferring an imbalanced RCU lock state between subprog calls
during verification. This allows patterns where a subprog call returns
with an RCU lock held, or a subprog call releases an RCU lock held by
the caller. Currently, the verifier would end up complaining if the RCU
lock is not released when processing an exit from a subprog, which is
non-ideal if its execution is supposed to be enclosed in an RCU read
section of the caller.

Instead, simply only check whether we are processing exit for frame#0
and do not complain on an active RCU lock otherwise. We only need to
update the check when processing BPF_EXIT insn, as copy_verifier_state
is already set up to do the right thing.

Suggested-by: David Vernet <void@manifault.com>
Tested-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240205055646.1112186-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:00:14 -08:00
Alexei Starovoitov
8244ab509f Merge branch 'enable-static-subprog-calls-in-spin-lock-critical-sections'
Kumar Kartikeya Dwivedi says:

====================
Enable static subprog calls in spin lock critical sections

This set allows a BPF program to make a call to a static subprog within
a bpf_spin_lock critical section. This problem has been hit in sched-ext
and ghOSt [0] as well, and is mostly an annoyance which is worked around
by inling the static subprog into the critical section.

In case of sched-ext, there are a lot of other helper/kfunc calls that
need to be allow listed for the support to be complete, but a separate
follow up will deal with that.

Unlike static subprogs, global subprogs cannot be allowed yet as the
verifier will not explore their body when encountering a call
instruction for them. Therefore, we would need an alternative approach
(some sort of function summarization to ensure a lock is never taken
from a global subprog and all its callees).

 [0]: https://lore.kernel.org/bpf/bd173bf2-dea6-3e0e-4176-4a9256a9a056@google.com

Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20240204120206.796412-1-memxor@gmail.com

 * Indicate global function call in verifier error string (Yonghong, David)
 * Add Acks from Yonghong, David
====================

Link: https://lore.kernel.org/r/20240204222349.938118-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 19:58:47 -08:00
Kumar Kartikeya Dwivedi
e8699c4ff8 selftests/bpf: Add test for static subprog call in lock cs
Add selftests for static subprog calls within bpf_spin_lock critical
section, and ensure we still reject global subprog calls. Also test the
case where a subprog call will unlock the caller's held lock, or the
caller will unlock a lock taken by a subprog call, ensuring correct
transfer of lock state across frames on exit.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240204222349.938118-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 19:58:47 -08:00
Kumar Kartikeya Dwivedi
a44b1334aa bpf: Allow calling static subprogs while holding a bpf_spin_lock
Currently, calling any helpers, kfuncs, or subprogs except the graph
data structure (lists, rbtrees) API kfuncs while holding a bpf_spin_lock
is not allowed. One of the original motivations of this decision was to
force the BPF programmer's hand into keeping the bpf_spin_lock critical
section small, and to ensure the execution time of the program does not
increase due to lock waiting times. In addition to this, some of the
helpers and kfuncs may be unsafe to call while holding a bpf_spin_lock.

However, when it comes to subprog calls, atleast for static subprogs,
the verifier is able to explore their instructions during verification.
Therefore, it is similar in effect to having the same code inlined into
the critical section. Hence, not allowing static subprog calls in the
bpf_spin_lock critical section is mostly an annoyance that needs to be
worked around, without providing any tangible benefit.

Unlike static subprog calls, global subprog calls are not safe to permit
within the critical section, as the verifier does not explore them
during verification, therefore whether the same lock will be taken
again, or unlocked, cannot be ascertained.

Therefore, allow calling static subprogs within a bpf_spin_lock critical
section, and only reject it in case the subprog linkage is global.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240204222349.938118-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 19:58:47 -08:00
Dave Thaler
2d9a925d0f bpf, docs: Expand set of initial conformance groups
This patch attempts to update the ISA specification according
to the latest mailing list discussion about conformance groups,
in a way that is intended to be consistent with IANA registry
processes and IETF 118 WG meeting discussion.

It does the following:
* Split basic into base32 and base64 for 32-bit vs 64-bit base
  instructions
* Split division/multiplication/modulo instructions out of base groups
* Split atomic instructions out of base groups

There may be additional changes as discussion continues,
but there seems to be consensus on the principles above.

v1->v2: fixed typo pointed out by David Vernet

v2->v3: Moved multiplication to same groups as division/modulo

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240202221110.3872-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 14:44:45 -08:00
Yonghong Song
e7f3187317 selftests/bpf: Fix flaky selftest lwt_redirect/lwt_reroute
Recently, when running './test_progs -j', I occasionally hit the
following errors:

  test_lwt_redirect:PASS:pthread_create 0 nsec
  test_lwt_redirect_run:FAIL:netns_create unexpected error: 256 (errno 0)
  #142/2   lwt_redirect/lwt_redirect_normal_nomac:FAIL
  #142     lwt_redirect:FAIL
  test_lwt_reroute:PASS:pthread_create 0 nsec
  test_lwt_reroute_run:FAIL:netns_create unexpected error: 256 (errno 0)
  test_lwt_reroute:PASS:pthread_join 0 nsec
  #143/2   lwt_reroute/lwt_reroute_qdisc_dropped:FAIL
  #143     lwt_reroute:FAIL

The netns_create() definition looks like below:

  #define NETNS "ns_lwt"
  static inline int netns_create(void)
  {
        return system("ip netns add " NETNS);
  }

One possibility is that both lwt_redirect and lwt_reroute create
netns with the same name "ns_lwt" which may cause conflict. I tried
the following example:
  $ sudo ip netns add abc
  $ echo $?
  0
  $ sudo ip netns add abc
  Cannot create namespace file "/var/run/netns/abc": File exists
  $ echo $?
  1
  $

The return code for above netns_create() is 256. The internet search
suggests that the return value for 'ip netns add ns_lwt' is 1, which
matches the above 'sudo ip netns add abc' example.

This patch tried to use different netns names for two tests to avoid
'ip netns add <name>' failure.

I ran './test_progs -j' 10 times and all succeeded with
lwt_redirect/lwt_reroute tests.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240205052914.1742687-1-yonghong.song@linux.dev
2024-02-05 11:13:36 -08:00
Kui-Feng Lee
169e650069 selftests/bpf: Suppress warning message of an unused variable.
"r" is used to receive the return value of test_2 in bpf_testmod.c, but it
is not actually used. So, we remove "r" and change the return type to
"void".

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401300557.z5vzn8FM-lkp@intel.com/
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240204061204.1864529-1-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-05 10:52:38 -08:00
Yonghong Song
7e428638bd selftests/bpf: Fix flaky test ptr_untrusted
Somehow recently I frequently hit the following test failure
with either ./test_progs or ./test_progs-cpuv4:
  serial_test_ptr_untrusted:PASS:skel_open 0 nsec
  serial_test_ptr_untrusted:PASS:lsm_attach 0 nsec
  serial_test_ptr_untrusted:PASS:raw_tp_attach 0 nsec
  serial_test_ptr_untrusted:FAIL:cmp_tp_name unexpected cmp_tp_name: actual -115 != expected 0
  #182     ptr_untrusted:FAIL

Further investigation found the failure is due to
  bpf_probe_read_user_str()
where reading user-level string attr->raw_tracepoint.name
is not successfully, most likely due to the
string itself still in disk and not populated into memory yet.

One solution is do a printf() call of the string before doing bpf
syscall which will force the raw_tracepoint.name into memory.
But I think a more robust solution is to use bpf_copy_from_user()
which is used in sleepable program and can tolerate page fault,
and the fix here used the latter approach.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240204194452.2785936-1-yonghong.song@linux.dev
2024-02-05 10:48:41 -08:00
Kui-Feng Lee
df9705eaa0 bpf: Remove an unnecessary check.
The "i" here is always equal to "btf_type_vlen(t)" since
the "for_each_member()" loop never breaks.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240203055119.2235598-1-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-05 10:25:08 -08:00
Alexei Starovoitov
2a79690eae Merge branch 'two-small-fixes-for-global-subprog-tagging'
Andrii Nakryiko says:

====================
Two small fixes for global subprog tagging

Fix a bug with passing trusted PTR_TO_BTF_ID_OR_NULL register into global
subprog that expects `__arg_trusted __arg_nullable` arguments, which was
discovered when adopting production BPF application.

Also fix annoying warnings that are irrelevant for static subprogs, which are
just an artifact of using btf_prepare_func_args() for both static and global
subprogs.
====================

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240202190529.2374377-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-02 18:09:06 -08:00
Andrii Nakryiko
1eb986746a bpf: don't emit warnings intended for global subprogs for static subprogs
When btf_prepare_func_args() was generalized to handle both static and
global subprogs, a few warnings/errors that are meant only for global
subprog cases started to be emitted for static subprogs, where they are
sort of expected and irrelavant.

Stop polutting verifier logs with irrelevant scary-looking messages.

Fixes: e26080d0da87 ("bpf: prepare btf_prepare_func_args() for handling static subprogs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-02 18:08:59 -08:00
Andrii Nakryiko
e2e70535dd selftests/bpf: add more cases for __arg_trusted __arg_nullable args
Add extra layer of global functions to ensure that passing around
(trusted) PTR_TO_BTF_ID_OR_NULL registers works as expected. We also
extend trusted_task_arg_nullable subtest to check three possible valid
argumements: known NULL, known non-NULL, and maybe NULL cases.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-02 18:08:58 -08:00
Andrii Nakryiko
8f13c34087 bpf: handle trusted PTR_TO_BTF_ID_OR_NULL in argument check logic
Add PTR_TRUSTED | PTR_MAYBE_NULL modifiers for PTR_TO_BTF_ID to
check_reg_type() to support passing trusted nullable PTR_TO_BTF_ID
registers into global functions accepting `__arg_trusted __arg_nullable`
arguments. This hasn't been caught earlier because tests were either
passing known non-NULL PTR_TO_BTF_ID registers or known NULL (SCALAR)
registers.

When utilizing this functionality in complicated real-world BPF
application that passes around PTR_TO_BTF_ID_OR_NULL, it became apparent
that verifier rejects valid case because check_reg_type() doesn't handle
this case explicitly. Existing check_reg_type() logic is already
anticipating this combination, so we just need to explicitly list this
combo in the switch statement.

Fixes: e2b3c4ff5d18 ("bpf: add __arg_trusted global func arg tag")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-02 18:08:58 -08:00
Shung-Hsi Yu
a68b50f47b selftests/bpf: trace_helpers.c: do not use poisoned type
After commit c698eaebdf47 ("selftests/bpf: trace_helpers.c: Optimize
kallsyms cache") trace_helpers.c now includes libbpf_internal.h, and
thus can no longer use the u32 type (among others) since they are poison
in libbpf_internal.h. Replace u32 with __u32 to fix the following error
when building trace_helpers.c on powerpc:

  error: attempt to use poisoned "u32"

Fixes: c698eaebdf47 ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240202095559.12900-1-shung-hsi.yu@suse.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-02 14:14:03 -08:00
Andrii Nakryiko
6fb3f72702 Merge branch 'improvements-for-tracking-scalars-in-the-bpf-verifier'
Maxim Mikityanskiy says:

====================
Improvements for tracking scalars in the BPF verifier

From: Maxim Mikityanskiy <maxim@isovalent.com>

The goal of this series is to extend the verifier's capabilities of
tracking scalars when they are spilled to stack, especially when the
spill or fill is narrowing. It also contains a fix by Eduard for
infinite loop detection and a state pruning optimization by Eduard that
compensates for a verification complexity regression introduced by
tracking unbounded scalars. These improvements reduce the surface of
false rejections that I saw while working on Cilium codebase.

Patches 1-9 of the original series were previously applied in v2.

Patches 1-2 (Maxim): Support the case when boundary checks are first
performed after the register was spilled to the stack.

Patches 3-4 (Maxim): Support narrowing fills.

Patches 5-6 (Eduard): Optimization for state pruning in stacksafe() to
mitigate the verification complexity regression.

veristat -e file,prog,states -f '!states_diff<50' -f '!states_pct<10' -f '!states_a<10' -f '!states_b<10' -C ...

 * Without patch 5:

File                  Program   States (A)  States (B)  States    (DIFF)
--------------------  --------  ----------  ----------  ----------------
pyperf100.bpf.o       on_event        4878        6528   +1650 (+33.83%)
pyperf180.bpf.o       on_event        6936       11032   +4096 (+59.05%)
pyperf600.bpf.o       on_event       22271       39455  +17184 (+77.16%)
pyperf600_iter.bpf.o  on_event         400         490     +90 (+22.50%)
strobemeta.bpf.o      on_event        4895       14028  +9133 (+186.58%)

 * With patch 5:

File                     Program        States (A)  States (B)  States   (DIFF)
-----------------------  -------------  ----------  ----------  ---------------
bpf_xdp.o                tail_lb_ipv4         2770        2224   -546 (-19.71%)
pyperf100.bpf.o          on_event             4878        5848   +970 (+19.89%)
pyperf180.bpf.o          on_event             6936        8868  +1932 (+27.85%)
pyperf600.bpf.o          on_event            22271       29656  +7385 (+33.16%)
pyperf600_iter.bpf.o     on_event              400         450    +50 (+12.50%)
xdp_synproxy_kern.bpf.o  syncookie_tc          280         226    -54 (-19.29%)
xdp_synproxy_kern.bpf.o  syncookie_xdp         302         228    -74 (-24.50%)

v2 changes:

Fixed comments in patch 1, moved endianness checks to header files in
patch 12 where possible, added Eduard's ACKs.

v3 changes:

Maxim: Removed __is_scalar_unbounded altogether, addressed Andrii's
comments.

Eduard: Patch #5 (#14 in v2) changed significantly:
- Logical changes:
  - Handling of STACK_{MISC,ZERO} mix turned out to be incorrect:
    a mix of MISC and ZERO in old state is not equivalent to e.g.
    just MISC is current state, because verifier could have deduced
    zero scalars from ZERO slots in old state for some loads.
  - There is no reason to limit the change only to cases when
    old or current stack is a spill of unbounded scalar,
    it is valid to compare any 64-bit scalar spill with fake
    register impersonating MISC.
  - STACK_ZERO vs spilled zero case was dropped,
    after recent changes for zero handling by Andrii and Yonghong
    it is hard (impossible?) to conjure all ZERO slots for an spi.
    => the case does not make any difference in veristat results.
- Use global static variable for unbound_reg (Andrii)
- Code shuffling to remove duplication in stacksafe() (Andrii)
====================

Link: https://lore.kernel.org/r/20240127175237.526726-1-maxtram95@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-02 13:22:15 -08:00
Eduard Zingerman
73a28d9d00 selftests/bpf: States pruning checks for scalar vs STACK_MISC
Check that stacksafe() compares spilled scalars with STACK_MISC.
The following combinations are explored:
- old spill of imprecise scalar is equivalent to cur STACK_{MISC,INVALID}
  (plus error in unpriv mode);
- old spill of precise scalar is not equivalent to cur STACK_MISC;
- old STACK_MISC is equivalent to cur scalar;
- old STACK_MISC is not equivalent to cur non-scalar.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-7-maxtram95@gmail.com
2024-02-02 13:22:14 -08:00
Eduard Zingerman
6efbde200b bpf: Handle scalar spill vs all MISC in stacksafe()
When check_stack_read_fixed_off() reads value from an spi
all stack slots of which are set to STACK_{MISC,INVALID},
the destination register is set to unbound SCALAR_VALUE.

Exploit this fact by allowing stacksafe() to use a fake
unbound scalar register to compare 'mmmm mmmm' stack value
in old state vs spilled 64-bit scalar in current state
and vice versa.

Veristat results after this patch show some gains:

./veristat -C -e file,prog,states -f 'states_pct>10'  not-opt after
File                     Program                States   (DIFF)
-----------------------  ---------------------  ---------------
bpf_overlay.o            tail_rev_nodeport_lb4    -45 (-15.85%)
bpf_xdp.o                tail_lb_ipv4            -541 (-19.57%)
pyperf100.bpf.o          on_event                -680 (-10.42%)
pyperf180.bpf.o          on_event               -2164 (-19.62%)
pyperf600.bpf.o          on_event               -9799 (-24.84%)
strobemeta.bpf.o         on_event               -9157 (-65.28%)
xdp_synproxy_kern.bpf.o  syncookie_tc             -54 (-19.29%)
xdp_synproxy_kern.bpf.o  syncookie_xdp            -74 (-24.50%)

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-6-maxtram95@gmail.com
2024-02-02 13:22:14 -08:00
Maxim Mikityanskiy
067313a85c selftests/bpf: Add test cases for narrowing fill
The previous commit allowed to preserve boundaries and track IDs of
scalars on narrowing fills. Add test cases for that pattern.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-5-maxtram95@gmail.com
2024-02-02 13:22:14 -08:00
Maxim Mikityanskiy
c1e6148cb4 bpf: Preserve boundaries and track scalars on narrowing fill
When the width of a fill is smaller than the width of the preceding
spill, the information about scalar boundaries can still be preserved,
as long as it's coerced to the right width (done by coerce_reg_to_size).
Even further, if the actual value fits into the fill width, the ID can
be preserved as well for further tracking of equal scalars.

Implement the above improvements, which makes narrowing fills behave the
same as narrowing spills and MOVs between registers.

Two tests are adjusted to accommodate for endianness differences and to
take into account that it's now allowed to do a narrowing fill from the
least significant bits.

reg_bounds_sync is added to coerce_reg_to_size to correctly adjust
umin/umax boundaries after the var_off truncation, for example, a 64-bit
value 0xXXXXXXXX00000000, when read as a 32-bit, gets umin = 0, umax =
0xFFFFFFFF, var_off = (0x0; 0xffffffff00000000), which needs to be
synced down to umax = 0, otherwise reg_bounds_sanity_check doesn't pass.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-4-maxtram95@gmail.com
2024-02-02 13:22:14 -08:00
Maxim Mikityanskiy
6be503cec6 selftests/bpf: Test tracking spilled unbounded scalars
The previous commit added tracking for unbounded scalars on spill. Add
the test case to check the new functionality.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-3-maxtram95@gmail.com
2024-02-02 13:22:14 -08:00