IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Now that this test runs in netdev CI it looks like 10s isn't enough
for debug kernels:
selftests: net/netfilter: nft_flowtable.sh
2024/05/10 20:33:08 socat[12204] E write(7, 0x563feb16a000, 8192): Broken pipe
FAIL: file mismatch for ns1 -> ns2
-rw------- 1 root root 37345280 May 10 20:32 /tmp/tmp.Am0yEHhNqI
...
Looks like socat gets zapped too quickly, so increase timeout to 1m.
Could also reduce tx file size for KSFT_MACHINE_SLOW, but its preferrable
to have same test for both debug and nondebug.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240511064814.561525-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joachim kindly merged the IPv6 support in
https://github.com/troglobit/mtools/pull/2, so we can just use his
version now. A few more fixes subsequently came in for IPv6, so even
better.
Check that the deployed mtools version is 3.0 or above. Note that the
version check breaks compatibility with my fork where I didn't bump the
version, but I assume that won't be a problem.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240510112856.1262901-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Setting LED_OFF via brightness_set should deactivate hw control, so make
sure netdev trigger rules also get cleared in that case.
This fixes unwanted restoration of the default netdev trigger rules and
matches the behaviour when using the 'netdev' trigger without any
hardware offloading.
Fixes: 71e79430117d ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/5ed8ea615890a91fa4df59a7ae8311bbdf63cdcf.1715248281.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZA578ACgkQ1V2XiooU
IOS18g//Zyuv+23GcUM7+FEXrlMN658xJyWiYvKjOaZZx5ZiV0QdZc4cbfPFD44p
qZBMmVC/WVoC89SLdwH1W47KoJU0xsK3/OdqGHHeNJ69111wIQMpOLfAetS2K0mb
F+Ue2vyWg1GQDICGsCdenHX7ihtVvnJJkomxc+3ObxtLCNsb2Dsr6JM5hMVP5Bil
4UZnPsrgfWy3A8O92burlPVE1sTWDFfFUGIf8geJc4QadwkgufkzxMhXNO7xHlpG
EZ99s8FPyD3R6tRPjf4gwdjr7JjinrdrYjZDuS4d3Uv8pKlUqcx8PgXG51/unr/y
qlynLXtEc1QU6SO2jENosHAG2/LQG2zsYEiiLFCP+a1JOtOxevZQKx8MyAFW8xDX
+RQhcBpTBocIyJ/tCDoM9lp69iYTR196Ct48v6pSGMNhZcddT4K4BkUL47GEs8T3
IA5x8h5gV2Q9ECMgqSaycdUsfLNgE/6fWx0ROs/wo3tMsgWrXCSJi8RFtN1sNbIO
rfuNnQiETIFBkQxBi7um8jadxdfIHm65cjgZBCyVbNNml3JwjYvXLxCXt2G7LxC4
Sg4nZvIbqWIifoMc1aQKypvFZjzzsWtFmYCuEUVLrnpj2SFTyh5CNzNo3MlHf7LG
sRb/XubdY6e0spLzd5VDjwH5qOT3poWAccatRr5BVUarxXCD5Vs=
=RD9p
-----END PGP SIGNATURE-----
Merge tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
Patch #1 skips transaction if object type provides no .update interface.
Patch #2 skips NETDEV_CHANGENAME which is unused.
Patch #3 enables conntrack to handle Multicast Router Advertisements and
Multicast Router Solicitations from the Multicast Router Discovery
protocol (RFC4286) as untracked opposed to invalid packets.
From Linus Luessing.
Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of
dropping them, from Jason Xing.
Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0,
also from Jason.
Patch #6 removes reference in netfilter's sysctl documentation on pickup
entries which were already removed by Florian Westphal.
Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which
allows to evict entries from the conntrack table,
also from Florian.
Patches #8 to #16 updates nf_tables pipapo set backend to allocate
the datastructure copy on-demand from preparation phase,
to better deal with OOM situations where .commit step is too late
to fail. Series from Florian Westphal.
Patch #17 adds a selftest with packetdrill to cover conntrack TCP state
transitions, also from Florian.
Patch #18 use GFP_KERNEL to clone elements from control plane to avoid
quick atomic reserves exhaustion with large sets, reporter refers
to million entries magnitude.
* tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: allow clone callbacks to sleep
selftests: netfilter: add packetdrill based conntrack tests
netfilter: nft_set_pipapo: remove dirty flag
netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
netfilter: nft_set_pipapo: merge deactivate helper into caller
netfilter: nft_set_pipapo: prepare walk function for on-demand clone
netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
netfilter: nft_set_pipapo: move prove_locking helper around
netfilter: conntrack: remove flowtable early-drop test
netfilter: conntrack: documentation: remove reference to non-existent sysctl
netfilter: use NF_DROP instead of -NF_DROP
netfilter: conntrack: dccp: try not to drop skb in conntrack
netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler
netfilter: nf_tables: skip transaction if update object is not implemented
====================
Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[Changes from V1:
- The __compat_break has been abandoned in favor of
a more readable can_loop macro that can be used anywhere, including
loop conditions.]
The macro list_for_each_entry is defined in bpf_arena_list.h as
follows:
#define list_for_each_entry(pos, head, member) \
for (void * ___tmp = (pos = list_entry_safe((head)->first, \
typeof(*(pos)), member), \
(void *)0); \
pos && ({ ___tmp = (void *)pos->member.next; 1; }); \
cond_break, \
pos = list_entry_safe((void __arena *)___tmp, typeof(*(pos)), member))
The macro cond_break, in turn, expands to a statement expression that
contains a `break' statement. Compound statement expressions, and the
subsequent ability of placing statements in the header of a `for'
loop, are GNU extensions.
Unfortunately, clang implements this GNU extension differently than
GCC:
- In GCC the `break' statement is bound to the containing "breakable"
context in which the defining `for' appears. If there is no such
context, GCC emits a warning: break statement without enclosing `for'
o `switch' statement.
- In clang the `break' statement is bound to the defining `for'. If
the defining `for' is itself inside some breakable construct, then
clang emits a -Wgcc-compat warning.
This patch adds a new macro can_loop to bpf_experimental, that
implements the same logic than cond_break but evaluates to a boolean
expression. The patch also changes all the current instances of usage
of cond_break withing the header of loop accordingly.
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Link: https://lore.kernel.org/r/20240511212243.23477-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF selftest global_func10 in progs/test_global_func10.c contains:
struct Small {
long x;
};
struct Big {
long x;
long y;
};
[...]
__noinline int foo(const struct Big *big)
{
if (!big)
return 0;
return bpf_get_prandom_u32() < big->y;
}
[...]
SEC("cgroup_skb/ingress")
__failure __msg("invalid indirect access to stack")
int global_func10(struct __sk_buff *skb)
{
const struct Small small = {.x = skb->len };
return foo((struct Big *)&small) ? 1 : 0;
}
GCC emits a "maybe uninitialized" warning for the code above, because
it knows `foo' accesses `big->y'.
Since the purpose of this selftest is to check that the verifier will
fail on this sort of invalid memory access, this patch just silences
the compiler warning.
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240511212349.23549-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF selftest test_global_func9.c performs type punning and breaks
srict-aliasing rules.
In particular, given:
int global_func9(struct __sk_buff *skb)
{
int result = 0;
[...]
{
const struct C c = {.x = skb->len, .y = skb->family };
result |= foo((const struct S *)&c);
}
}
When building with strict-aliasing enabled (the default) the
initialization of `c' gets optimized away in its entirely:
[... no initialization of `c' ...]
r1 = r10
r1 += -40
call foo
w0 |= w6
Since GCC knows that `foo' accesses s->x, we get a "maybe
uninitialized" warning.
On the other hand, when strict-aliasing is disabled GCC only optimizes
away the store to `.y':
r1 = *(u32 *) (r6+0)
*(u32 *) (r10+-40) = r1 ; This is .x = skb->len in `c'
r1 = r10
r1 += -40
call foo
w0 |= w6
In this case the warning is not emitted, because s-> is initialized.
This patch disables strict aliasing in this test when building with
GCC. clang seems to not optimize this particular code even when
strict aliasing is enabled.
Tested in bpf-next master.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240511212213.23418-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The strdup() function returns a pointer to a new string which is a
duplicate of the string "ifname". Memory for the new string is obtained
with malloc(), and need to be freed with free().
This patch adds this missing "free(saved_hwtstamp_ifname)" in cleanup()
to avoid a potential memory leak in xdp_hw_metadata.c.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/af9bcccb96655e82de5ce2b4510b88c9c8ed5ed0.1715417367.git.tanggeliang@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch corrects a few warnings to allow selftests to compile for
GCC.
-- progs/cpumask_failure.c --
progs/bpf_misc.h:136:22: error: ‘cpumask’ is used uninitialized
[-Werror=uninitialized]
136 | #define __sink(expr) asm volatile("" : "+g"(expr))
| ^~~
progs/cpumask_failure.c:68:9: note: in expansion of macro ‘__sink’
68 | __sink(cpumask);
The macro __sink(cpumask) with the '+' contraint modifier forces the
the compiler to expect a read and write from cpumask. GCC detects
that cpumask is never initialized and reports an error.
This patch removes the spurious non required definitions of cpumask.
-- progs/dynptr_fail.c --
progs/dynptr_fail.c:1444:9: error: ‘ptr1’ may be used uninitialized
[-Werror=maybe-uninitialized]
1444 | bpf_dynptr_clone(&ptr1, &ptr2);
Many of the tests in the file are related to the detection of
uninitialized pointers by the verifier. GCC is able to detect possible
uninitialized values, and reports this as an error.
The patch initializes all of the previous uninitialized structs.
-- progs/test_tunnel_kern.c --
progs/test_tunnel_kern.c:590:9: error: array subscript 1 is outside
array bounds of ‘struct geneve_opt[1]’ [-Werror=array-bounds=]
590 | *(int *) &gopt.opt_data = bpf_htonl(0xdeadbeef);
| ^~~~~~~~~~~~~~~~~~~~~~~
progs/test_tunnel_kern.c:575:27: note: at offset 4 into object ‘gopt’ of
size 4
575 | struct geneve_opt gopt;
This tests accesses beyond the defined data for the struct geneve_opt
which contains as last field "u8 opt_data[0]" which clearly does not get
reserved space (in stack) in the function header. This pattern is
repeated in ip6geneve_set_tunnel and geneve_set_tunnel functions.
GCC is able to see this and emits a warning.
The patch introduces a local struct that allocates enough space to
safely allow the write to opt_data field.
-- progs/jeq_infer_not_null_fail.c --
progs/jeq_infer_not_null_fail.c:21:40: error: array subscript ‘struct
bpf_map[0]’ is partly outside array bounds of ‘struct <anonymous>[1]’
[-Werror=array-bounds=]
21 | struct bpf_map *inner_map = map->inner_map_meta;
| ^~
progs/jeq_infer_not_null_fail.c:14:3: note: object ‘m_hash’ of size 32
14 | } m_hash SEC(".maps");
This example defines m_hash in the context of the compilation unit and
casts it to struct bpf_map which is much smaller than the size of struct
bpf_map. It errors out in GCC when it attempts to access an element that
would be defined in struct bpf_map outsize of the defined limits for
m_hash.
This patch disables the warning through a GCC pragma.
This changes were tested in bpf-next master selftests without any
regressions.
Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Cc: jose.marchesi@oracle.com
Cc: david.faust@oracle.com
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Link: https://lore.kernel.org/r/20240510183850.286661-2-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch fixes an integer overflow warning raised by GCC in
xdp_prognum1 of progs/test_xdp_vlan.c:
GCC-BPF [test_maps] test_xdp_vlan.bpf.o
progs/test_xdp_vlan.c: In function 'xdp_prognum1':
progs/test_xdp_vlan.c:163:25: error: integer overflow in expression
'(short int)(((__builtin_constant_p((int)vlan_hdr->h_vlan_TCI)) != 0
? (int)(short unsigned int)((short int)((int)vlan_hdr->h_vlan_TCI
<< 8 >> 8) << 8 | (short int)((int)vlan_hdr->h_vlan_TCI << 0 >> 8
<< 0)) & 61440 : (int)__builtin_bswap16(vlan_hdr->h_vlan_TCI)
& 61440) << 8 >> 8) << 8' of type 'short int' results in '0' [-Werror=overflow]
163 | bpf_htons((bpf_ntohs(vlan_hdr->h_vlan_TCI) & 0xf000)
| ^~~~~~~~~
The problem lies with the expansion of the bpf_htons macro and the
expression passed into it. The bpf_htons macro (and similarly the
bpf_ntohs macro) expand to a ternary operation using either
__builtin_bswap16 or ___bpf_swab16 to swap the bytes, depending on
whether the expression is constant.
For an expression, with 'value' as a u16, like:
bpf_htons (value & 0xf000)
The entire (value & 0xf000) is 'x' in the expansion of ___bpf_swab16
and we get as one part of the expanded swab16:
((__u16)(value & 0xf000) << 8 >> 8 << 8
This will always evaluate to 0, which is intentional since this
subexpression deals with the byte guaranteed to be 0 by the mask.
However, GCC warns because the precise reason this always evaluates to 0
is an overflow. Specifically, the plain 0xf000 in the expression is a
signed 32-bit integer, which causes 'value' to also be promoted to a
signed 32-bit integer, and the combination of the 8-bit left shift and
down-cast back to __u16 results in a signed overflow (really a 'warning:
overflow in conversion from int to __u16' which is propegated up through
the rest of the expression leading to the ultimate overflow warning
above), which is a valid warning despite being the intended result of
this code.
Clang does not warn on this case, likely because it performs constant
folding later in the compilation process relative to GCC. It seems that
by the time clang does constant folding for this expression, the side of
the ternary with this overflow has already been discarded.
Fortunately, this warning is easily silenced by simply making the 0xf000
mask explicitly unsigned. This has no impact on the result.
Signed-off-by: David Faust <david.faust@oracle.com>
Cc: jose.marchesi@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240508193512.152759-1-david.faust@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Remove the redundant ethtool.h header file from tools/include/uapi/linux.
The file is unnecessary as the system uses the kernel's
include/uapi/linux/ethtool.h directly.
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240508104123.434769-1-tushar.vyavahare@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jordan Rife says:
====================
Retire progs/test_sock_addr.c
This patch series migrates remaining tests from bpf/test_sock_addr.c to
prog_tests/sock_addr.c and progs/verifier_sock_addr.c in order to fully
retire the old-style test program and expands test coverage to test
previously untested scenarios related to sockaddr hooks.
This is a continuation of the work started recently during the expansion
of prog_tests/sock_addr.c.
Link: https://lore.kernel.org/bpf/20240429214529.2644801-1-jrife@google.com/T/#u
=======
Patches
=======
* Patch 1 moves tests that check valid return values for recvmsg hooks
into progs/verifier_sock_addr.c, a new addition to the verifier test
suite.
* Patches 2-5 lay the groundwork for test migration, enabling
prog_tests/sock_addr.c to handle more test dimensions.
* Patches 6-11 move existing tests to prog_tests/sock_addr.c.
* Patch 12 removes some redundant test cases.
* Patches 14-17 expand on existing test coverage.
====================
Link: https://lore.kernel.org/r/20240510190246.3247730-1-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This expands coverage for ATTACH_REJECT tests to include connect_unix,
sendmsg_unix, recvmsg*, getsockname*, and getpeername*.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-18-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This expands coverage for getsockname and getpeername hooks to include
getsockname4, getsockname6, getpeername4, and getpeername6.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-17-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch expands test coverage for EPERM tests to include connect and
bind calls and rounds out the coverage for sendmsg by adding tests for
sendmsg_unix.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-16-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch expands verifier coverage for program return values to cover
bind, connect, sendmsg, getsockname, and getpeername hooks. It also
rounds out the recvmsg coverage by adding test cases for recvmsg_unix
hooks.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-15-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fully remove test_sock_addr.c and test_sock_addr.sh, as test coverage
has been fully moved to prog_tests/sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-14-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Remove these test cases completely, as the same behavior is already
covered by other sendmsg* test cases in prog_tests/sock_addr.c. This
just rewrites the destination address similar to sendmsg_v4_prog and
sendmsg_v6_prog.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-13-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Migrate test case from bpf/test_sock_addr.c ensuring that program
attachment fails when using an inappropriate attach type.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-12-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Migrates tests from progs/test_sock_addr.c ensuring that programs fail
to load when the expected attach type does not match.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-11-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg
respects when sendmsg6 hooks rewrite the destination IP with the IPv6
wildcard IP, [::].
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-10-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg
returns -ENOTSUPP when sending to an IPv4-mapped IPv6 address to
prog_tests/sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-9-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This set of tests checks that sendmsg calls are rejected (return -EPERM)
when the sendmsg* hook returns 0. Replace those in bpf/test_sock_addr.c
with corresponding tests in prog_tests/sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-8-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In preparation to move test cases from bpf/test_sock_addr.c that expect
system calls to return ENOTSUPP or EPERM, this patch propagates errno
from relevant system calls up to test_sock_addr() where the result can
be checked.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-6-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In preparation to move test cases from bpf/test_sock_addr.c that expect
ATTACH_REJECT, this patch adds BPF_SKEL_FUNCS_RAW to generate load and
destroy functions that use bpf_prog_attach() to control the attach_type.
The normal load functions use bpf_program__attach_cgroup which does not
have the same degree of control over the attach type, as
bpf_program_attach_fd() calls bpf_link_create() with the attach type
extracted from prog using bpf_program__expected_attach_type(). It is
currently not possible to modify the attach type before
bpf_program__attach_cgroup() is called, since
bpf_program__set_expected_attach_type() has no effect after the program
is loaded.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-5-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In preparation to move test cases from bpf/test_sock_addr.c that expect
LOAD_REJECT, this patch adds expected_attach_type and extends load_fn to
accept an expected attach type and a flag indicating whether or not
rejection is expected.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-4-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In preparation to migrate tests from bpf/test_sock_addr.c to
sock_addr.c, update BPF_SKEL_FUNCS so that it generates functions
based on prog_name instead of skel_name. This allows us to differentiate
between programs in the same skeleton.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-3-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This set of tests check that the BPF verifier rejects programs with
invalid return codes (recvmsg4 and recvmsg6 hooks can only return 1).
This patch replaces the tests in test_sock_addr.c with
verifier_sock_addr.c, a new verifier prog_tests for sockaddr hooks, in a
step towards fully retiring test_sock_addr.c.
Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-2-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF atomic operations with the BPF_FETCH modifier along with
BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements
all atomic operations except BPF_CMPXCHG with relaxed ordering.
Section 8.1 of the "The RISC-V Instruction Set Manual Volume I:
Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic
Instructions" says:
| To provide more efficient support for release consistency [5], each
| atomic instruction has two bits, aq and rl, used to specify additional
| memory ordering constraints as viewed by other RISC-V harts.
and
| If only the aq bit is set, the atomic memory operation is treated as
| an acquire access.
| If only the rl bit is set, the atomic memory operation is treated as a
| release access.
|
| If both the aq and rl bits are set, the atomic memory operation is
| sequentially consistent.
Fix this by setting both aq and rl bits as 1 for operations with
BPF_FETCH and BPF_XCHG.
[1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH"
should be the same as atomic_fetch_add(), which is currently not the
case on s390x: the serialization instruction "bcr 14,0" is missing.
This applies to "and", "or" and "xor" variants too.
s390x is allowed to reorder stores with subsequent fetches from
different addresses, so code relying on BPF_FETCH acting as a barrier,
for example:
stw [%r0], 1
afadd [%r1], %r2
ldxw %r3, [%r4]
may be broken. Fix it by emitting "bcr 14,0".
Note that a separate serialization instruction is not needed for
BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs
serialization itself.
Fixes: ba3b86b9cef0 ("s390/bpf: Implement new atomic ops")
Reported-by: Puranjay Mohan <puranjay12@gmail.com>
Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting
a read from struct thread_info. The SP_EL0 system register holds the
pointer to the task_struct and thread_info is the first member of this
struct. We can read the cpu number from the thread_info.
Here is how the ARM64 JITed assembly changes after this commit:
ARM64 JIT
===========
BEFORE AFTER
-------- -------
int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id();
mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0
movk x10, #0x802b, lsl #16 ldr w7, [x10, #24]
movk x10, #0x8000, lsl #32
blr x10
add x7, x0, #0x0
Performance improvement using benchmark[1]
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+-------------------+-------------------+--------------+
| Name | Before | After | % change |
|---------------+-------------------+-------------------+--------------|
| glob-arr-inc | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s | + 10.74% |
| arr-inc | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s | + 5.37% |
| hash-inc | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s | + 2.08% |
+---------------+-------------------+-------------------+--------------+
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.
Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
access using tpidr_el1"), the per-cpu offset for the CPU is stored in
the tpidr_el1/2 register of that CPU.
To support this BPF instruction in the ARM64 JIT, the following ARM64
instructions are emitted:
mov dst, src // Move src to dst, if src != dst
mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp.
add dst, dst, tmp // Add the per cpu offset to the dst.
To measure the performance improvement provided by this change, the
benchmark in [1] was used:
Before:
glob-arr-inc : 23.597 ± 0.012M/s
arr-inc : 23.173 ± 0.019M/s
hash-inc : 12.186 ± 0.028M/s
After:
glob-arr-inc : 23.819 ± 0.034M/s
arr-inc : 23.285 ± 0.017M/s
hash-inc : 12.419 ± 0.011M/s
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.
RISCV saves the pointer to the CPU's task_struct in the TP (thread
pointer) register. This makes it trivial to get the CPU's processor id.
As thread_info is the first member of task_struct, we can read the
processor id from TP + offsetof(struct thread_info, cpu).
RISCV64 JIT output for `call bpf_get_smp_processor_id`
======================================================
Before After
-------- -------
auipc t1,0x848c ld a5,32(tp)
jalr 604(t1)
mv a5,a0
Benchmark using [1] on Qemu.
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+------------------+------------------+--------------+
| Name | Before | After | % change |
|---------------+------------------+------------------+--------------|
| glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% |
| arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% |
| hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% |
+---------------+------------------+------------------+--------------+
NOTE: This benchmark includes changes from this patch and the previous
patch that implemented the per-cpu insn.
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.
RISC-V uses generic per-cpu implementation where the offsets for CPUs
are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores
the address of the task_struct in TP register. The first element in
task_struct is struct thread_info, and we can get the cpu number by
reading from the TP register + offsetof(struct thread_info, cpu).
Once we have the cpu number in a register we read the offset for that
cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this
offset to the destination register.
To measure the improvement from this change, the benchmark in [1] was
used on Qemu:
Before:
glob-arr-inc : 1.127 ± 0.013M/s
arr-inc : 1.121 ± 0.004M/s
hash-inc : 0.681 ± 0.052M/s
After:
glob-arr-inc : 1.138 ± 0.011M/s
arr-inc : 1.366 ± 0.006M/s
hash-inc : 0.676 ± 0.001M/s
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.
The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:
test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS
Deployment and structure
------------------------
The related codes are added to "arch/arc/net":
- bpf_jit.h -- The interface that a back-end translator must provide
- bpf_jit_core.c -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic
The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".
1. Normal pass # The necessary pass
1a. Dry run # Get the whole JIT length, epilogue offset, etc.
1b. Emit phase # Allocate memory and start emitting instructions
2. Extra pass # Only needed if there are relocations to be fixed
2a. Patch relocations
Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:
- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)
The result of "test_bpf" test suite on an HSDK board is:
hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:
.-----------.------------.-------------.
| test type | opcodes | # of cases |
|-----------+------------+-------------|
| atomic | 0xC3, 0xDB | 149 |
| div64 | 0x37, 0x3F | 22 |
| mod64 | 0x97, 0x9F | 15 |
`-----------^------------+-------------|
| (total) 186 |
`-------------'
Setup: build config
-------------------
The following configs must be set to have a working JIT test:
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_TEST_BPF=m
The following options are not necessary for the tests module,
but are good to have:
CONFIG_DEBUG_INFO=y # prerequisite for below
CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h
CONFIG_FTRACE=y #
CONFIG_BPF_SYSCALL=y # all these options lead to
CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y
CONFIG_PERF_EVENTS=y #
Some BPF programs provide data through /sys/kernel/debug:
CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug
Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.
[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7
Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:
pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats
Or else, the build will fail:
$ make V=1
...
BTF .btf.vmlinux.bin.o
pahole -J --btf_gen_floats \
-j --lang_exclude=rust \
--skip_encoding_btf_inconsistent_proto \
--btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
...
BTFIDS vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available
This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.
Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
...
test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]
Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support
Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-05-08 (most Intel drivers)
This series contains updates to i40e, iavf, ice, igb, igc, e1000e, and ixgbe
drivers.
Asbjørn Sloth Tønnesen adds checks against supported flower control flags
for i40e, iavf, ice, and igb drivers.
Michal corrects filters removed during eswitch release for ice.
Corinna Vinschen defers PTP initialization to later in probe so that
netdev log entry is initialized on igc.
Ilpo Järvinen removes a couple of unused, duplicate defines on
e1000e and ixgbe.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates
igc: fix a log entry using uninitialized netdev
ice: remove correct filters during eswitch release
igb: flower: validate control flags
ice: flower: validate control flags
iavf: flower: validate control flags
i40e: flower: validate control flags
====================
Link: https://lore.kernel.org/r/20240508173342.2760994-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Asbjørn Sloth Tønnesen says:
====================
net: qede: convert filter code to use extack
This series converts the filter code in the qede driver
to use NL_SET_ERR_MSG_*(extack, ...) for error handling.
Patch 1-12 converts qede_parse_flow_attr() to use extack,
along with all it's static helper functions.
qede_parse_flow_attr() is used in two places:
- qede_add_tc_flower_fltr()
- qede_flow_spec_to_rule()
In the latter call site extack is faked in the same way as
is done in mlxsw (patch 12).
While the conversion is going on, some error messages are silenced
in between patch 1-12. If wanted could squash patch 1-12 in a v3, but
I felt that it would be easier to review as 12 more trivial patches.
Patch 13 and 14, finishes up by converting qede_parse_actions(),
and ensures that extack is propagated to it, in both call contexts.
v1: https://lore.kernel.org/netdev/20240507104421.1628139-1-ast@fiberby.net/
====================
Link: https://lore.kernel.org/r/20240508143404.95901-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert DP_NOTICE/DP_INFO to NL_SET_ERR_MSG_MOD.
Keep edev around for use with QEDE_RSS_COUNT().
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-15-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pass extack to qede_flow_spec_validate() when called in
qede_flow_spec_to_rule().
Pass extack to qede_parse_actions().
Not converting qede_flow_spec_validate() to use extack for
errors, as it's only called from qede_flow_spec_to_rule(),
where extack is faked into a DP_NOTICE anyway, so opting to
keep DP_VERBOSE/DP_NOTICE usage.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-14-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since qede_parse_flow_attr() now does error reporting
through extack, then give it a fake extack and extract the
error message afterwards if one was set.
The extracted error message is then passed on through
DP_NOTICE(), including messages that was earlier issued
with DP_INFO().
This fake extack approach is already used by
mlxsw_env_linecard_modules_power_mode_apply() in
drivers/net/ethernet/mellanox/mlxsw/core_env.c
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-13-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert qede_parse_flow_attr() to take extack,
and drop the edev argument.
Convert DP_NOTICE calls to use NL_SET_ERR_MSG_* instead.
Pass extack in calls to qede_flow_parse_{tcp,udp}_v{4,6}().
In calls to qede_parse_flow_attr(), if extack is
unavailable, then use NULL for now, until a
subsequent patch makes extack available.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-12-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Define extack locally, to reduce line lengths and aid future users.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-11-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert qede_flow_parse_udp_v4() to take extack,
and drop the edev argument.
Pass extack in call to qede_flow_parse_v4_common().
In call to qede_flow_parse_udp_v4(), use NULL as extack
for now, until a subsequent patch makes extack available.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-10-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert qede_flow_parse_udp_v6() to take extack,
and drop the edev argument.
Pass extack in call to qede_flow_parse_v6_common().
In call to qede_flow_parse_udp_v6(), use NULL as extack
for now, until a subsequent patch makes extack available.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-9-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert qede_flow_parse_tcp_v4() to take extack,
and drop the edev argument.
Pass extack in call to qede_flow_parse_v4_common().
In call to qede_flow_parse_tcp_v4(), use NULL as extack
for now, until a subsequent patch makes extack available.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-8-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert qede_flow_parse_tcp_v6() to take extack,
and drop the edev argument.
Pass extack in call to qede_flow_parse_v6_common().
In call to qede_flow_parse_tcp_v6(), use NULL as extack
for now, until a subsequent patch makes extack available.
Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-7-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>