IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Andrii Nakryiko says:
====================
bpf-next 2021-12-10 v2
We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).
The main changes are:
1) Various samples fixes, from Alexander Lobakin.
2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.
3) A batch of new unified APIs for libbpf, logging improvements, version
querying, etc. Also a batch of old deprecations for old APIs and various
bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.
4) BPF documentation reorganization and improvements, from Christoph Hellwig
and Dave Tucker.
5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
libbpf, from Hengqi Chen.
6) Verifier log fixes, from Hou Tao.
7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.
8) Extend branch record capturing to all platforms that support it,
from Kajol Jain.
9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.
10) bpftool doc-generating script improvements, from Quentin Monnet.
11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.
12) Deprecation warning fix for perf/bpf_counter, from Song Liu.
13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
from Tiezhu Yang.
14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.
15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
and others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
libbpf: Add "bool skipped" to struct bpf_map
libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
selftests/bpf: Add test for libbpf's custom log_buf behavior
selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
libbpf: Deprecate bpf_object__load_xattr()
libbpf: Add per-program log buffer setter and getter
libbpf: Preserve kernel error code and remove kprobe prog type guessing
libbpf: Improve logging around BPF program loading
libbpf: Allow passing user log setting through bpf_object_open_opts
libbpf: Allow passing preallocated log_buf when loading BTF into kernel
libbpf: Add OPTS-based bpf_btf_load() API
libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
samples/bpf: Remove unneeded variable
bpf: Remove redundant assignment to pointer t
selftests/bpf: Fix a compilation warning
perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
samples: bpf: Fix 'unknown warning group' build warning on Clang
samples: bpf: Fix xdp_sample_user.o linking with Clang
...
====================
Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Switch from bpf_object__load_xattr() to bpf_object__load() and
kernel_log_level in bpf_object_open_opts.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-12-andrii@kernel.org
Add a selftest that validates that per-program and per-object log_buf
overrides work as expected. Also test same logic for low-level
bpf_prog_load() and bpf_btf_load() APIs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-11-andrii@kernel.org
Add missing "prog '%s': " prefixes in few places and use consistently
markers for beginning and end of program load logs. Here's an example of
log output:
libbpf: prog 'handler': BPF program load failed: Permission denied
libbpf: -- BEGIN PROG LOAD LOG ---
arg#0 reference type('UNKNOWN ') size cannot be determined: -22
; out1 = in1;
0: (18) r1 = 0xffffc9000cdcc000
2: (61) r1 = *(u32 *)(r1 +0)
...
81: (63) *(u32 *)(r4 +0) = r5
R1_w=map_value(id=0,off=16,ks=4,vs=20,imm=0) R4=map_value(id=0,off=400,ks=4,vs=16,imm=0)
invalid access to map value, value_size=16 off=400 size=4
R4 min value is outside of the allowed memory range
processed 63 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: failed to load program 'handler'
libbpf: failed to load object 'test_skeleton'
The entire verifier log, including BEGIN and END markers are now always
youtput during a single print callback call. This should make it much
easier to post-process or parse it, if necessary. It's not an explicit
API guarantee, but it can be reasonably expected to stay like that.
Also __bpf_object__open is renamed to bpf_object_open() as it's always
an adventure to find the exact function that implements bpf_object's
open phase, so drop the double underscored and use internal libbpf
naming convention.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211209193840.1248570-6-andrii@kernel.org
'sys/ioctl.h' included in 'mptcp_inq.c' is duplicated.
Reported-by: ZealRobot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20211210071424.425773-1-ye.guojin@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The following warning is triggered when I used clang compiler
to build the selftest.
/.../prog_tests/btf_dedup_split.c:368:6: warning: variable 'btf2' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (!ASSERT_OK(err, "btf_dedup"))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/.../prog_tests/btf_dedup_split.c:424:12: note: uninitialized use occurs here
btf__free(btf2);
^~~~
/.../prog_tests/btf_dedup_split.c:368:2: note: remove the 'if' if its condition is always false
if (!ASSERT_OK(err, "btf_dedup"))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/.../prog_tests/btf_dedup_split.c:343:25: note: initialize the variable 'btf2' to silence this warning
struct btf *btf1, *btf2;
^
= NULL
Initialize local variable btf2 = NULL and the warning is gone.
Fixes: 9a49afe6f5a5 ("selftests/bpf: Add btf_dedup case with duplicated structs within CU")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211209050403.1770836-1-yhs@fb.com
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix bogus compilter warning in nfnetlink_queue, from Florian Westphal.
2) Don't run conntrack on vrf with !dflt qdisc, from Nicolas Dichtel.
3) Fix nft_pipapo bucket load in AVX2 lookup routine for six 8-bit
groups, from Stefano Brivio.
4) Break rule evaluation on malformed TCP options.
5) Use socat instead of nc in selftests/netfilter/nft_zones_many.sh,
also from Florian
6) Fix KCSAN data-race in conntrack timeout updates, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: conntrack: annotate data-races around ct->timeout
selftests: netfilter: switch zone stress to socat
netfilter: nft_exthdr: break evaluation if setting TCP option fails
selftests: netfilter: Add correctness test for mac,net set type
nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups
vrf: don't run conntrack on vrf with !dflt qdisc
netfilter: nfnetlink_queue: silence bogus compiler warning
====================
Link: https://lore.kernel.org/r/20211209000847.102598-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
bpf 2021-12-08
We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 29 files changed, 659 insertions(+), 80 deletions(-).
The main changes are:
1) Fix an off-by-two error in packet range markings and also add a batch of
new tests for coverage of these corner cases, from Maxim Mikityanskiy.
2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh.
3) Fix two functional regressions and a build warning related to BTF kfunc
for modules, from Kumar Kartikeya Dwivedi.
4) Fix outdated code and docs regarding BPF's migrate_disable() use on non-
PREEMPT_RT kernels, from Sebastian Andrzej Siewior.
5) Add missing includes in order to be able to detangle cgroup vs bpf header
dependencies, from Jakub Kicinski.
6) Fix regression in BPF sockmap tests caused by missing detachment of progs
from sockets when they are removed from the map, from John Fastabend.
7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF
dispatcher, from Björn Töpel.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add selftests to cover packet access corner cases
bpf: Fix the off-by-two error in range markings
treewide: Add missing includes masked by cgroup -> bpf dependency
tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
bpf: Fix bpf_check_mod_kfunc_call for built-in modules
bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
mips, bpf: Fix reference to non-existing Kconfig symbol
bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.
Documentation/locking/locktypes: Update migrate_disable() bits.
bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap
bpf, sockmap: Attach map progs to psock early for feature probes
bpf, x86: Fix "no previous prototype" warning
====================
Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit adds BPF verifier selftests that cover all corner cases by
packet boundary checks. Specifically, 8-byte packet reads are tested at
the beginning of data and at the beginning of data_meta, using all kinds
of boundary checks (all comparison operators: <, >, <=, >=; both
permutations of operands: data + length compared to end, end compared to
data + length). For each case there are three tests:
1. Length is just enough for an 8-byte read. Length is either 7 or 8,
depending on the comparison.
2. Length is increased by 1 - should still pass the verifier. These
cases are useful, because they failed before commit 2fa7d94afc1a
("bpf: Fix the off-by-two error in range markings").
3. Length is decreased by 1 - should be rejected by the verifier.
Some existing tests are just renamed to avoid duplication.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211207081521.41923-1-maximmi@nvidia.com
Add tests for TLSv1.2 and TLSv1.3 with AES256-GCM cipher
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add tests for TLSv1.2 and TLSv1.3 with AES-CCM cipher.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
centos9 has nmap-ncat which doesn't like the '-q' option, use socat.
While at it, mark test skipped if needed tools are missing.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The existing net,mac test didn't cover the issue recently reported
by Nikita Yushchenko, where MAC addresses wouldn't match if given
as first field of a concatenated set with AVX2 and 8-bit groups,
because there's a different code path covering the lookup of six
8-bit groups (MAC addresses) if that's the first field.
Add a similar mac,net test, with MAC address and IPv4 address
swapped in the set specification.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After the below patch, the conntrack attached to skb is set to "notrack" in
the context of vrf device, for locally generated packets.
But this is true only when the default qdisc is set to the vrf device. When
changing the qdisc, notrack is not set anymore.
In fact, there is a shortcut in the vrf driver, when the default qdisc is
set, see commit dcdd43c41e60 ("net: vrf: performance improvements for
IPv4") for more details.
This patch ensures that the behavior is always the same, whatever the qdisc
is.
To demonstrate the difference, a new test is added in conntrack_vrf.sh.
Fixes: 8c9c296adfae ("vrf: run conntrack only in context of lower/physdev for locally generated packets")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Check that getsockopt(IP_TOS) returns what setsockopt(IP_TOS) did set
right before.
Also check that socklen_t == 0 and -1 input values match those
of normal tcp sockets.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
client & server use a unix socket connection to communicate
outside of the mptcp connection.
This allows the consumer to know in advance how many bytes have been
(or will be) sent by the peer.
This allows stricter checks on the bytecounts reported by TCP_INQ cmsg.
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Do checks on the returned inq counter.
Fail on:
1. Huge value (> 1 kbyte, test case files are 1 kb)
2. last hint larger than returned bytes when read was short
3. erronenous indication of EOF.
3) happens when a hint of X bytes reads X-1 on next call
but next recvmsg returns more data (instead of EOF).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test for bpf_iter_task_vma assumes that the output will be longer
than 1 kB, as the comment above the loop says. Due to this assumption,
the loop becomes infinite if the output turns to be shorter than 1 kB.
The return value of read_fd_into_buffer is 0 when the end of file was
reached, and len isn't being increased any more.
This commit adds a break on EOF to handle short output correctly. For
the reference, this is the contents that I get when running test_progs
under vmtest.sh, and it's shorter than 1 kB:
00400000-00401000 r--p 00000000 fe:00 25867 /root/bpf/test_progs
00401000-00674000 r-xp 00001000 fe:00 25867 /root/bpf/test_progs
00674000-0095f000 r--p 00274000 fe:00 25867 /root/bpf/test_progs
0095f000-00983000 r--p 0055e000 fe:00 25867 /root/bpf/test_progs
00983000-00a8a000 rw-p 00582000 fe:00 25867 /root/bpf/test_progs
00a8a000-0484e000 rw-p 00000000 00:00 0
7f6c64000000-7f6c64021000 rw-p 00000000 00:00 0
7f6c64021000-7f6c68000000 ---p 00000000 00:00 0
7f6c6ac8f000-7f6c6ac90000 r--s 00000000 00:0d 8032
anon_inode:bpf-map
7f6c6ac90000-7f6c6ac91000 ---p 00000000 00:00 0
7f6c6ac91000-7f6c6b491000 rw-p 00000000 00:00 0
7f6c6b491000-7f6c6b492000 r--s 00000000 00:0d 8032
anon_inode:bpf-map
7f6c6b492000-7f6c6b493000 rw-s 00000000 00:0d 8032
anon_inode:bpf-map
7ffc1e23d000-7ffc1e25e000 rw-p 00000000 00:00 0
7ffc1e3b8000-7ffc1e3bc000 r--p 00000000 00:00 0
7ffc1e3bc000-7ffc1e3bd000 r-xp 00000000 00:00 0
7fffffffe000-7ffffffff000 --xp 00000000 00:00 0
Fixes: e8168840e16c ("selftests/bpf: Add test for bpf_iter_task_vma")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211130181811.594220-1-maximmi@nvidia.com
The first commit cited below attempts to fix the off-by-one error that
appeared in some comparisons with an open range. Due to this error,
arithmetically equivalent pieces of code could get different verdicts
from the verifier, for example (pseudocode):
// 1. Passes the verifier:
if (data + 8 > data_end)
return early
read *(u64 *)data, i.e. [data; data+7]
// 2. Rejected by the verifier (should still pass):
if (data + 7 >= data_end)
return early
read *(u64 *)data, i.e. [data; data+7]
The attempted fix, however, shifts the range by one in a wrong
direction, so the bug not only remains, but also such piece of code
starts failing in the verifier:
// 3. Rejected by the verifier, but the check is stricter than in #1.
if (data + 8 >= data_end)
return early
read *(u64 *)data, i.e. [data; data+7]
The change performed by that fix converted an off-by-one bug into
off-by-two. The second commit cited below added the BPF selftests
written to ensure than code chunks like #3 are rejected, however,
they should be accepted.
This commit fixes the off-by-two error by adjusting new_range in the
right direction and fixes the tests by changing the range into the
one that should actually fail.
Fixes: fb2a311a31d3 ("bpf: fix off by one for range markings with L{T, E} patterns")
Fixes: b37242c773b2 ("bpf: add test cases to bpf selftests to cover all access tests")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211130181607.593149-1-maximmi@nvidia.com
Previously, the selftest framework always treats it as *ok* even though
some of them are failed actually. That's because the script always
returns 0.
It supports PASS/FAIL/SKIP exit code now.
CC: Philip Li <philip.li@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qdiscs/fq_pie requires CONFIG_NET_SCH_FQ_PIE, otherwise tc will fail
to create a fq_pie qdisc.
It fixes following issue:
# not ok 57 83be - Create FQ-PIE with invalid number of flows
# Command exited with 2, expected 0
# Error: Specified qdisc not found.
Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark the summary result as FAIL to prevent from confusing the selftest
framework if some of them are failed.
Previously, the selftest framework always treats it as *ok* even though
some of them are failed actually. That's because the script tdc.sh always
return 0.
# All test results:
#
# 1..97
# ok 1 83be - Create FQ-PIE with invalid number of flows
# ok 2 8b6e - Create RED with no flags
[...snip]
# ok 6 5f15 - Create RED with flags ECN, harddrop
# ok 7 53e8 - Create RED with flags ECN, nodrop
# ok 8 d091 - Fail to create RED with only nodrop flag
# ok 9 af8e - Create RED with flags ECN, nodrop, harddrop
# not ok 10 ce7d - Add mq Qdisc to multi-queue device (4 queues)
# Could not match regex pattern. Verify command output:
# qdisc mq 1: root
# qdisc fq_codel 0: parent 1:4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
# qdisc fq_codel 0: parent 1:3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
[...snip]
# ok 96 6979 - Change quantum of a strict ETS band
# ok 97 9a7d - Change ETS strict band without quantum
#
#
#
#
ok 1 selftests: tc-testing: tdc.sh <<< summary result
CC: Philip Li <philip.li@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently rp_filter tests in fib_tests.sh:fib_rp_filter_test() are
failing. ping sockets are bound to dummy1 using the "-I" option
(SO_BINDTODEVICE), but socket lookup is failing when receiving ping
replies, since the routing table thinks they belong to dummy0.
For example, suppose ping is using a SOCK_RAW socket for ICMP messages.
When receiving ping replies, in __raw_v4_lookup(), sk->sk_bound_dev_if
is 3 (dummy1), but dif (skb_rtable(skb)->rt_iif) says 2 (dummy0), so the
raw_sk_bound_dev_eq() check fails. Similar things happen in
ping_lookup() for SOCK_DGRAM sockets.
These tests used to pass due to a bug [1] in iputils, where "ping -I"
actually did not bind ICMP message sockets to device. The bug has been
fixed by iputils commit f455fee41c07 ("ping: also bind the ICMP socket
to the specific device") in 2016, which is why our rp_filter tests
started to fail. See [2] .
Fixing the tests while keeping everything in one netns turns out to be
nontrivial. Rework the tests and build the following topology:
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ network namespace 1 (ns1) │ │ network namespace 2 (ns2) │
│ │ │ │
│ ┌────┐ ┌─────┐ │ │ ┌─────┐ ┌────┐ │
│ │ lo │<───>│veth1│<────────┼────┼─>│veth2│<──────────>│ lo │ │
│ └────┘ ├─────┴──────┐ │ │ ├─────┴──────┐ └────┘ │
│ │192.0.2.1/24│ │ │ │192.0.2.1/24│ │
│ └────────────┘ │ │ └────────────┘ │
└─────────────────────────────┘ └─────────────────────────────┘
Consider sending an ICMP_ECHO packet A in ns2. Both source and
destination IP addresses are 192.0.2.1, and we use strict mode rp_filter
in both ns1 and ns2:
1. A is routed to lo since its destination IP address is one of ns2's
local addresses (veth2);
2. A is redirected from lo's egress to veth2's egress using mirred;
3. A arrives at veth1's ingress in ns1;
4. A is redirected from veth1's ingress to lo's ingress, again, using
mirred;
5. In __fib_validate_source(), fib_info_nh_uses_dev() returns false,
since A was received on lo, but reverse path lookup says veth1;
6. However A is not dropped since we have relaxed this check for lo in
commit 66f8209547cc ("fib: relax source validation check for loopback
packets");
Making sure A is not dropped here in this corner case is the whole point
of having this test.
7. As A reaches the ICMP layer, an ICMP_ECHOREPLY packet, B, is
generated;
8. Similarly, B is redirected from lo's egress to veth1's egress (in
ns1), then redirected once again from veth2's ingress to lo's
ingress (in ns2), using mirred.
Also test "ping 127.0.0.1" from ns2. It does not trigger the relaxed
check in __fib_validate_source(), but just to make sure the topology
works with loopback addresses.
Tested with ping from iputils 20210722-41-gf9fb573:
$ ./fib_tests.sh -t rp_filter
IPv4 rp_filter tests
TEST: rp_filter passes local packets [ OK ]
TEST: rp_filter passes loopback packets [ OK ]
[1] https://github.com/iputils/iputils/issues/55
[2] f455fee41c
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: adb701d6cfa4 ("selftests: add a test case for rp_filter")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20211201004720.6357-1-yepeilin.cs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Migrate all the selftests that were still using bpf_prog_load_xattr().
Few are converted to skeleton, others will use bpf_object__open_file()
API.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211201232824.3166325-7-andrii@kernel.org
xdpxceiver.c is using AF_XDP APIs that are deprecated starting from
libbpf 0.7. Until we migrate the test to libxdp or solve this issue in
some other way, mute deprecation warnings within xdpxceiver.c.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211201232824.3166325-6-andrii@kernel.org
The test_cmpxchg() and test_xchg() functions say "test_run add".
Therefore, make them say "test_run cmpxchg" and "test_run xchg",
respectively.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201005030.GA3071525@paulmck-ThinkPad-P17-Gen-1
Add $(OUTPUT) prefix to testing_helpers.o, so it can be built out of
tree when necessary. At the moment, in addition to being built in-tree
even when out-of-tree is required, testing_helpers.o is not built with
the right recipe when cross-building.
For consistency the other helpers, cgroup_helpers and trace_helpers, can
also be passed as objects instead of source. Use *_HELPERS variable to
keep the Makefile readable.
Fixes: f87c1930ac29 ("selftests/bpf: Merge test_stub.c into testing_helpers.c")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201145101.823159-1-jean-philippe@linaro.org
and wireguard.
Current release - regressions:
- smc: keep smc_close_final()'s error code during active close
Current release - new code bugs:
- iwlwifi: various static checker fixes (int overflow, leaks, missing
error codes)
- rtw89: fix size of firmware header before transfer, avoid crash
- mt76: fix timestamp check in tx_status; fix pktid leak;
- mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set()
Previous releases - regressions:
- smc: fix list corruption in smc_lgr_cleanup_early
- ipv4: convert fib_num_tclassid_users to atomic_t
Previous releases - always broken:
- tls: fix authentication failure in CCM mode
- vrf: reset IPCB/IP6CB when processing outbound pkts, prevent
incorrect processing
- dsa: mv88e6xxx: fixes for various device errata
- rds: correct socket tunable error in rds_tcp_tune()
- ipv6: fix memory leak in fib6_rule_suppress
- wireguard: reset peer src endpoint when netns exits
- wireguard: improve resilience to DoS around incoming handshakes
- tcp: fix page frag corruption on page fault which involves TCP
- mpls: fix missing attributes in delete notifications
- mt7915: fix NULL pointer dereference with ad-hoc mode
Misc:
- rt2x00: be more lenient about EPROTO errors during start
- mlx4_en: update reported link modes for 1/10G
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGo6KMACgkQMUZtbf5S
Iruq+BAAhRMTcL+X4eRIL9lIEvWEKHMKLCA/pUaQWNlSxsbEeydJWRNSc37Cs3pv
z0rYIEhfieOz8+QXS1Kq+yZwJVXjA8Jvgld2qw9V9Y5w+N15Mj8RUtG8NaUw+o4E
U8PCAbaamnbzyPdlCYcVHschd8MD0BCXm5+jAGeIyCP+KQCnhEpFZv+bvHaWzQR8
FZLYrhXTR9W0DFsrKG9+haqFwFBR3+VDqTGILhaHPE+r2o6wKQQ5yJMhd8fq0SaC
nne8zDkGuFEeW3cxj0VbhdRMyrV97eMK+P4dZ2P0Z7xcrsed9/2XJkNQNJGtuRnj
GGJV6utupJRAY+lnJNUkifqS4Wt7KirfZsSsyaKKa4plyoVgtGhiqEYFTQVLagC0
CF4Qe+3qks6rESbRu6PEFN4oWSkMEhRzdcDpg7vBDURUKcrRs9fgtNUJUCi8nKFA
A/F/K+7IHBoBZyQYZbYmnGdNsNauKbF3rUY3hwMGBfQZIr/wsql9+jhtLsmZX77m
V/L7KzT2jhhNc5gDzuLps25K3P7snKuV19qQSsY2LeuGj1x3gmWZ+ibN6ynhB+Gt
KBnfHDMTI/4aciZBIbwJmwfeRhCF8tOfw0WZdUP7FRIXukbfVuDBoznWLz4BKKgf
GSYSTNDs/PHZQo5vCQ/onvTwUK5aN6zoPNy5ih7lp9YZBYtN2TI=
=r0Jh
-----END PGP SIGNATURE-----
Merge tag 'net-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and wireguard.
Mostly scattered driver changes this week, with one big clump in
mv88e6xxx. Nothing of note, really.
Current release - regressions:
- smc: keep smc_close_final()'s error code during active close
Current release - new code bugs:
- iwlwifi: various static checker fixes (int overflow, leaks, missing
error codes)
- rtw89: fix size of firmware header before transfer, avoid crash
- mt76: fix timestamp check in tx_status; fix pktid leak;
- mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set()
Previous releases - regressions:
- smc: fix list corruption in smc_lgr_cleanup_early
- ipv4: convert fib_num_tclassid_users to atomic_t
Previous releases - always broken:
- tls: fix authentication failure in CCM mode
- vrf: reset IPCB/IP6CB when processing outbound pkts, prevent
incorrect processing
- dsa: mv88e6xxx: fixes for various device errata
- rds: correct socket tunable error in rds_tcp_tune()
- ipv6: fix memory leak in fib6_rule_suppress
- wireguard: reset peer src endpoint when netns exits
- wireguard: improve resilience to DoS around incoming handshakes
- tcp: fix page frag corruption on page fault which involves TCP
- mpls: fix missing attributes in delete notifications
- mt7915: fix NULL pointer dereference with ad-hoc mode
Misc:
- rt2x00: be more lenient about EPROTO errors during start
- mlx4_en: update reported link modes for 1/10G"
* tag 'net-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: dsa: b53: Add SPI ID table
gro: Fix inconsistent indenting
selftests: net: Correct case name
net/rds: correct socket tunable error in rds_tcp_tune()
mctp: Don't let RTM_DELROUTE delete local routes
net/smc: Keep smc_close_final rc during active close
ibmvnic: drop bad optimization in reuse_tx_pools()
ibmvnic: drop bad optimization in reuse_rx_pools()
net/smc: fix wrong list_del in smc_lgr_cleanup_early
Fix Comment of ETH_P_802_3_MIN
ethernet: aquantia: Try MAC address from device tree
ipv4: convert fib_num_tclassid_users to atomic_t
net: avoid uninit-value from tcp_conn_request
net: annotate data-races on txq->xmit_lock_owner
octeontx2-af: Fix a memleak bug in rvu_mbox_init()
net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed
net: dsa: mv88e6xxx: Fix inband AN for 2500base-x on 88E6393X family
...
The commit 087cba799ced ("selftests/bpf: Add weak/typeless ksym test for light skeleton")
added test_ksyms_weak to light skeleton testing, but remove CO-RE access.
Revert that part of commit, since light skeleton can use CO-RE in the kernel.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-17-alexei.starovoitov@gmail.com
Add a test where randmap() function is appended to three different bpf
programs. That action checks struct bpf_core_relo replication logic
and offset adjustment in gen loader part of libbpf.
Fourth bpf program has 360 CO-RE relocations from vmlinux, bpf_testmod,
and non-existing type. It tests candidate cache logic.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-16-alexei.starovoitov@gmail.com
$ ./fcnal-test.sh -t help
Test names: help
Looks it intent to list the available tests but it didn't do the right
thing. I will add another option the do that in the later patch.
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_addr_bind/ipv4_addr_bind are function names. Previously, bind test
would not be run by default due to the wrong case names
Fixes: 34d0302ab861 ("selftests: Add ipv6 address bind tests to fcnal-test")
Fixes: 75b2b2b3db4c ("selftests: Add ipv4 address bind tests to fcnal-test")
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add benchmark to measure the throughput and latency of the bpf_loop
call.
Testing this on my dev machine on 1 thread, the data is as follows:
nr_loops: 10
bpf_loop - throughput: 198.519 ± 0.155 M ops/s, latency: 5.037 ns/op
nr_loops: 100
bpf_loop - throughput: 247.448 ± 0.305 M ops/s, latency: 4.041 ns/op
nr_loops: 500
bpf_loop - throughput: 260.839 ± 0.380 M ops/s, latency: 3.834 ns/op
nr_loops: 1000
bpf_loop - throughput: 262.806 ± 0.629 M ops/s, latency: 3.805 ns/op
nr_loops: 5000
bpf_loop - throughput: 264.211 ± 1.508 M ops/s, latency: 3.785 ns/op
nr_loops: 10000
bpf_loop - throughput: 265.366 ± 3.054 M ops/s, latency: 3.768 ns/op
nr_loops: 50000
bpf_loop - throughput: 235.986 ± 20.205 M ops/s, latency: 4.238 ns/op
nr_loops: 100000
bpf_loop - throughput: 264.482 ± 0.279 M ops/s, latency: 3.781 ns/op
nr_loops: 500000
bpf_loop - throughput: 309.773 ± 87.713 M ops/s, latency: 3.228 ns/op
nr_loops: 1000000
bpf_loop - throughput: 262.818 ± 4.143 M ops/s, latency: 3.805 ns/op
>From this data, we can see that the latency per loop decreases as the
number of loops increases. On this particular machine, each loop had an
overhead of about ~4 ns, and we were able to run ~250 million loops
per second.
Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211130030622.4131246-5-joannekoong@fb.com
This patch tests bpf_loop in pyperf and strobemeta, and measures the
verifier performance of replacing the traditional for loop
with bpf_loop.
The results are as follows:
~strobemeta~
Baseline
verification time 6808200 usec
stack depth 496
processed 554252 insns (limit 1000000) max_states_per_insn 16
total_states 15878 peak_states 13489 mark_read 3110
#192 verif_scale_strobemeta:OK (unrolled loop)
Using bpf_loop
verification time 31589 usec
stack depth 96+400
processed 1513 insns (limit 1000000) max_states_per_insn 2
total_states 106 peak_states 106 mark_read 60
#193 verif_scale_strobemeta_bpf_loop:OK
~pyperf600~
Baseline
verification time 29702486 usec
stack depth 368
processed 626838 insns (limit 1000000) max_states_per_insn 7
total_states 30368 peak_states 30279 mark_read 748
#182 verif_scale_pyperf600:OK (unrolled loop)
Using bpf_loop
verification time 148488 usec
stack depth 320+40
processed 10518 insns (limit 1000000) max_states_per_insn 10
total_states 705 peak_states 517 mark_read 38
#183 verif_scale_pyperf600_bpf_loop:OK
Using the bpf_loop helper led to approximately a 99% decrease
in the verification time and in the number of instructions.
Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211130030622.4131246-4-joannekoong@fb.com
* Fix constant sign extension affecting TCR_EL2 and preventing
running on ARMv8.7 models due to spurious bits being set
* Fix use of helpers using PSTATE early on exit by always sampling
it as soon as the exit takes place
* Move pkvm's 32bit handling into a common helper
RISC-V:
* Fix incorrect KVM_MAX_VCPUS value
* Unmap stage2 mapping when deleting/moving a memslot
x86:
* Fix and downgrade BUG_ON due to uninitialized cache
* Many APICv and MOVE_ENC_CONTEXT_FROM fixes
* Correctly emulate TLB flushes around nested vmentry/vmexit
and when the nested hypervisor uses VPID
* Prevent modifications to CPUID after the VM has run
* Other smaller bugfixes
Generic:
* Memslot handling bugfixes
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGmHBEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOkGgf/RBjt1d7H6Um7tD7oA5QiIHmNY4ko
K/90OAa8h62rilxpqxkRgLNmphBc5AzcbufVXN4J1hVhw2M+u1ouDxKeHS1GEZTA
/XdNb0dwK99TpOJkIcuV/NQVIZUxkM00VbIiCoLkX06VuIc1Gie1G4bqzLhWCP8Y
ts9l/pkfafvfEmjmcjVd7gkDOnEPbT+JPDJcuo/RA7C7Z2L4+8DsFeyfWGqBP647
J6omUUxD82QRm28OVOK4V7aNALWsAdlaqHrVFAPZywQl7QTWMO0UTcKTdCCB2B4Q
QnHejFV6pFh55q3/fhe7epy9e2Sw+NOsmWKTEGPbU5nn94R8lyW1GV4ZUQ==
=Nduu
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix constant sign extension affecting TCR_EL2 and preventing
running on ARMv8.7 models due to spurious bits being set
- Fix use of helpers using PSTATE early on exit by always sampling it
as soon as the exit takes place
- Move pkvm's 32bit handling into a common helper
RISC-V:
- Fix incorrect KVM_MAX_VCPUS value
- Unmap stage2 mapping when deleting/moving a memslot
x86:
- Fix and downgrade BUG_ON due to uninitialized cache
- Many APICv and MOVE_ENC_CONTEXT_FROM fixes
- Correctly emulate TLB flushes around nested vmentry/vmexit and when
the nested hypervisor uses VPID
- Prevent modifications to CPUID after the VM has run
- Other smaller bugfixes
Generic:
- Memslot handling bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
KVM: fix avic_set_running for preemptable kernels
KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled
KVM: SEV: accept signals in sev_lock_two_vms
KVM: SEV: do not take kvm->lock when destroying
KVM: SEV: Prohibit migration of a VM that has mirrors
KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs locked
selftests: sev_migrate_tests: add tests for KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
KVM: SEV: move mirror status to destination of KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
KVM: SEV: initialize regions_list of a mirror VM
KVM: SEV: cleanup locking for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
KVM: SEV: do not use list_replace_init on an empty list
KVM: x86: Use a stable condition around all VT-d PI paths
KVM: x86: check PIR even for vCPUs with disabled APICv
KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled
KVM: selftests: page_table_test: fix calculation of guest_test_phys_mem
KVM: x86/mmu: Handle "default" period when selectively waking kthread
KVM: MMU: shadow nested paging does not have PKU
KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path
KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping
KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
...
VMs that mirror an encryption context rely on the owner to keep the
ASID allocated. Performing a KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
would cause a dangling ASID:
1. copy context from A to B (gets ref to A)
2. move context from A to L (moves ASID from A to L)
3. close L (releases ASID from L, B still references it)
The right way to do the handoff instead is to create a fresh mirror VM
on the destination first:
1. copy context from A to B (gets ref to A)
[later] 2. close B (releases ref to A)
3. move context from A to L (moves ASID from A to L)
4. copy context from L to M
So, catch the situation by adding a count of how many VMs are
mirroring this one's encryption context.
Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration")
Message-Id: <20211123005036.2954379-11-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I am putting the tests in sev_migrate_tests because the failure conditions are
very similar and some of the setup code can be reused, too.
The tests cover both successful creation of a mirror VM, and error
conditions.
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-9-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A kvm_page_table_test run with its default settings fails on VMX due to
memory region add failure:
> ==== Test Assertion Failure ====
> lib/kvm_util.c:952: ret == 0
> pid=10538 tid=10538 errno=17 - File exists
> 1 0x00000000004057d1: vm_userspace_mem_region_add at kvm_util.c:947
> 2 0x0000000000401ee9: pre_init_before_test at kvm_page_table_test.c:302
> 3 (inlined by) run_test at kvm_page_table_test.c:374
> 4 0x0000000000409754: for_each_guest_mode at guest_modes.c:53
> 5 0x0000000000401860: main at kvm_page_table_test.c:500
> 6 0x00007f82ae2d8554: ?? ??:0
> 7 0x0000000000401894: _start at ??:?
> KVM_SET_USER_MEMORY_REGION IOCTL failed,
> rc: -1 errno: 17
> slot: 1 flags: 0x0
> guest_phys_addr: 0xc0000000 size: 0x40000000
This is because the memory range that this test is trying to add
(0x0c0000000 - 0x100000000) conflicts with LAPIC mapping at 0x0fee00000.
Looking at the code it seems that guest_test_*phys*_mem variable gets
mistakenly overwritten with guest_test_*virt*_mem while trying to adjust
the former for alignment.
With the correct variable adjusted this test runs successfully.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <52e487458c3172923549bbcf9dfccfbe6faea60b.1637940473.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.
However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.
This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reported-by: Xiumei Mu <xmu@redhat.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Fixes: 900575aa33a3 ("wireguard: device: avoid circular netns references")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DEBUG_PI_LIST was renamed to DEBUG_PLIST since 8e18faeac3 ("lib/plist:
rename DEBUG_PI_LIST to DEBUG_PLIST").
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Fixes: 8e18faeac3e4 ("lib/plist: rename DEBUG_PI_LIST to DEBUG_PLIST")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>