IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Devlink fmsg (formatted message) is used by devlink health diagnose,
dump and drivers which support these devlink health callbacks.
Therefore, move devlink fmsg helpers and related code to file health.c.
Move devlink health diagnose to file health.c. No functional change in
this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move devlink health report helper and recover callback and related code
from leftover.c to health.c. No functional change in this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move devlink health get and set callbacks and related code from
leftover.c to health.c. No functional change in this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
devlink_nl_health_reporter_fill() error flow calls nla_nest_end(). Fix
it to call nla_nest_cancel() instead.
Note the bug is harmless as genlmsg_cancel() cancel the entire message,
so no fixes tag added.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move devlink health reporter create/destroy and related dev code to new
file health.c. This file shall include all callbacks and functionality
that are related to devlink health.
In addition, fix kdoc indentation and make reporter create/destroy kdoc
more clear. No functional change in this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Anton Protopopov says:
====================
Add a new benchmark for hashmap lookups and fix several typos.
In commit 3 I've patched the bench utility so that now command line options
can be reused by different benchmarks.
The benchmark itself is added in the last commit 7. I was using this benchmark
to test map lookup productivity when using a different hash function [1]. When
run with --quiet, the results can be easily plotted [2]. The results provided
by the benchmark look reasonable and match the results of my different
benchmarks (requiring to patch kernel to get actual statistics on map lookups).
Links:
[1] https://fosdem.org/2023/schedule/event/bpf_hashing/
[2] https://github.com/aspsk/bpf-bench/tree/master/hashmap-bench
Changes,
v1->v2:
- percpu_times_index[] is of wrong size (Martin)
- use base 0 for strtol (Andrii)
- just use -q without argument (Andrii)
- use less hacks when parsing arguments (Andrii)
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add a new benchmark which measures hashmap lookup operations speed. A user can
control the following parameters of the benchmark:
* key_size (max 1024): the key size to use
* max_entries: the hashmap max entries
* nr_entries: the number of entries to insert/lookup
* nr_loops: the number of loops for the benchmark
* map_flags The hashmap flags passed to BPF_MAP_CREATE
The BPF program performing the benchmarks calls two nested bpf_loop:
bpf_loop(nr_loops/nr_entries)
bpf_loop(nr_entries)
bpf_map_lookup()
So the nr_loops determines the number of actual map lookups. All lookups are
successful.
Example (the output is generated on a AMD Ryzen 9 3950X machine):
for nr_entries in `seq 4096 4096 65536`; do echo -n "$((nr_entries*100/65536))% full: "; sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=4 --nr_entries=$nr_entries --max_entries=65536 --nr_loops=1000000 --map_flags=0x40 | grep cpu; done
6% full: cpu01: lookup 50.739M ± 0.018M events/sec (approximated from 32 samples of ~19ms)
12% full: cpu01: lookup 47.751M ± 0.015M events/sec (approximated from 32 samples of ~20ms)
18% full: cpu01: lookup 45.153M ± 0.013M events/sec (approximated from 32 samples of ~22ms)
25% full: cpu01: lookup 43.826M ± 0.014M events/sec (approximated from 32 samples of ~22ms)
31% full: cpu01: lookup 41.971M ± 0.012M events/sec (approximated from 32 samples of ~23ms)
37% full: cpu01: lookup 41.034M ± 0.015M events/sec (approximated from 32 samples of ~24ms)
43% full: cpu01: lookup 39.946M ± 0.012M events/sec (approximated from 32 samples of ~25ms)
50% full: cpu01: lookup 38.256M ± 0.014M events/sec (approximated from 32 samples of ~26ms)
56% full: cpu01: lookup 36.580M ± 0.018M events/sec (approximated from 32 samples of ~27ms)
62% full: cpu01: lookup 36.252M ± 0.012M events/sec (approximated from 32 samples of ~27ms)
68% full: cpu01: lookup 35.200M ± 0.012M events/sec (approximated from 32 samples of ~28ms)
75% full: cpu01: lookup 34.061M ± 0.009M events/sec (approximated from 32 samples of ~29ms)
81% full: cpu01: lookup 34.374M ± 0.010M events/sec (approximated from 32 samples of ~29ms)
87% full: cpu01: lookup 33.244M ± 0.011M events/sec (approximated from 32 samples of ~30ms)
93% full: cpu01: lookup 32.182M ± 0.013M events/sec (approximated from 32 samples of ~31ms)
100% full: cpu01: lookup 31.497M ± 0.016M events/sec (approximated from 32 samples of ~31ms)
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-8-aspsk@isovalent.com
The bench utility will print
Setting up benchmark '<bench-name>'...
Benchmark '<bench-name>' started.
on startup to stdout. Suppress this output if --quiet option if given. This
makes it simpler to parse benchmark output by a script.
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-7-aspsk@isovalent.com
The "local-storage-tasks-trace" benchmark has a `--quiet` option. Move it to
the list of common options, so that the main code and other benchmarks can use
(new) env.quiet variable. Patch the run_bench_local_storage_rcu_tasks_trace.sh
helper script accordingly.
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-6-aspsk@isovalent.com
The benchs/bench_bpf_hashmap_full_update.c doesn't set a custom argp,
so it shouldn't include the <argp.h> header.
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-5-aspsk@isovalent.com
To parse command line the bench utility uses the argp_parse() function. This
function takes as an argument a parent 'struct argp' structure which defines
common command line options and an array of children 'struct argp' structures
which defines additional command line options for particular benchmarks. This
implementation doesn't allow benchmarks to share option names, e.g., if two
benchmarks want to use, say, the --option option, then only one of them will
succeed (the first one encountered in the array). This will be convenient if
same option names could be used in different benchmarks (with the same
semantics, e.g., --nr_loops=N).
Fix this by calling the argp_parse() function twice. The first call is the same
as it was before, with all children argps, and helps to find the benchmark name
and to print a combined help message if anything is wrong. Given the name, we
can call the argp_parse the second time, but now the children array points only
to a correct benchmark thus always calling the correct parsers. (If there's no
a specific list of arguments, then only one call to argp_parse will be done.)
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-4-aspsk@isovalent.com
The hashmap_report_final callback function defined in the
benchs/bench_bpf_hashmap_full_update.c file should be static.
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-3-aspsk@isovalent.com
To call the bpf_hashmap_full_update benchmark, one should say:
bench bpf-hashmap-ful-update
The patch adds a missing 'l' to the benchmark name.
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-2-aspsk@isovalent.com
Hou Tao says:
====================
From: Hou Tao <houtao1@huawei.com>
Hi,
The patchset tries to fix the hard-up problem found when checking how htab
handles element reuse in bpf memory allocator. The immediate reuse of
freed elements will reinitialize special fields (e.g., bpf_spin_lock) in
htab map value and it may corrupt lookup procedure with BFP_F_LOCK flag
which acquires bpf-spin-lock during value copying, and lead to hard-lock
as shown in patch #2. Patch #1 fixes it by using __GFP_ZERO when allocating
the object from slab and the behavior is similar with the preallocated
hash-table case. Please see individual patches for more details. And comments
are always welcome.
Regards,
Change Log:
v1:
* Use __GFP_ZERO instead of ctor to avoid retpoline overhead (from Alexei)
* Add comments for check_and_init_map_value() (from Alexei)
* split __GFP_ZERO patches out of the original patchset to unblock
the development work of others.
RFC: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The reinitialization of spin-lock in map value after immediate reuse may
corrupt lookup with BPF_F_LOCK flag and result in hard lock-up, so add
one test case to demonstrate the problem.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230215082132.3856544-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently the freed element in bpf memory allocator may be immediately
reused, for htab map the reuse will reinitialize special fields in map
value (e.g., bpf_spin_lock), but lookup procedure may still access
these special fields, and it may lead to hard-lockup as shown below:
NMI backtrace for cpu 16
CPU: 16 PID: 2574 Comm: htab.bin Tainted: G L 6.1.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
RIP: 0010:queued_spin_lock_slowpath+0x283/0x2c0
......
Call Trace:
<TASK>
copy_map_value_locked+0xb7/0x170
bpf_map_copy_value+0x113/0x3c0
__sys_bpf+0x1c67/0x2780
__x64_sys_bpf+0x1c/0x20
do_syscall_64+0x30/0x60
entry_SYSCALL_64_after_hwframe+0x46/0xb0
......
</TASK>
For htab map, just like the preallocated case, these is no need to
initialize these special fields in map value again once these fields
have been initialized. For preallocated htab map, these fields are
initialized through __GFP_ZERO in bpf_map_area_alloc(), so do the
similar thing for non-preallocated htab in bpf memory allocator. And
there is no need to use __GFP_ZERO for per-cpu bpf memory allocator,
because __alloc_percpu_gfp() does it implicitly.
Fixes: 0fd7c5d43339 ("bpf: Optimize call_rcu in non-preallocated hash map.")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230215082132.3856544-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- getattr mediation of old policy
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE7cSDD705q2rFEEf7BS82cBjVw9gFAmPtOMIACgkQBS82cBjV
w9grtg//QDrI+6kQ8mbiluRVbnZim3lRz6u8w9FUh9421H82FmoWlvkuwIe8HfNi
3VuhdfsoEUY2Jaq2xUtarAQzzLKJUIGyeqXUMck0Ri8ySEtS21ZrWQ1EOOzT6lCZ
NfFD5D/CoavHa4mROSos3j/F0+xjqlp+Wmy0erH4qKPd8S1pmleT9lVxboWx5i7o
LyMn1b36KNd3abYKpJPe/r2BBmi/qw+QqMJBu/KIuD5y5ElRAfz/bSogm4Q9Ap4S
1Tcn1dILdLOKAo1jc5TRLM9CkyPQ9XwLnodhVh8cQrLG3HdDnEdw1R2k8x7QDxvy
YIpq7bI5nHsMh+1Nzbf9F8lTsovn4OGcALo17LgrosR5oeza8lvrxxrgW7OR7D0m
uURhBjnySyEYJ7/qa8XeqcY7d1p4TnA3mGsItlmacgL5o3A/01uu2P/NzukceVv0
iAjaMVmSHECZ7B8PUYcSiXs0rFAjeMZGquT7w+UQiXgLeOl2FR0bjqv6VL46vhHq
6B1tO2UJXqtm7oGYjTqYfJ2HIRT9MYUF5fUGzUPPDohk2vgrsDARpF9ZTPXs2O1K
xJHHC5eRo9pteoNl3T30U8lBudonYDeWKEIPyGCGlwbm51SeHeo/VmAny+X0Osk7
168bq9KnFmrKdofNP6NOhiMmKlytUISZ3CVbC0NoLC/jj85RS+o=
=Ei2j
-----END PGP SIGNATURE-----
Merge tag 'apparmor-v6.2-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor fix from John Johansen:
"Regression fix for getattr mediation of old policy"
* tag 'apparmor-v6.2-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix regression in compat permissions for getattr
- always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote)
- add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner)
- set the DMA mask earlier (Christoph Hellwig)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmPtILgLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYM0gA//d2RFIIdhmFFu5iBp7S05D5CjLyQxP9jdVFQIJTjO
UTx6YIb18v9VEgxhWA8EwDQuLcW9Uj7s59Wxkt+Y6s/8PAVQxyzKyEyxTXJWG9kh
qQeTyymqOBHOScu3aWSIa7TpOlg8/Pxkn0MgmmYm3Wx+NA7+7xzxeVxs8hcoN4Hc
2mTPmfJY+DiOopyKV/awiwAG1g39gziPODh2VT1kJWn1Q7OxRMbZ9CN047b1yII8
54hYI0z8MF2T83SdVS+MADcqtWCNKrEVVrEVJS/kQ5fNKV4ojFOTf5hMR6xmo+bg
bXyxnu0HSM/Ij76Umw7xqlcF5L+Rn/MnJNebcJ+iNfsi6g+AaVwekbMBC4NVWTtp
iGI4uxH9EbWS654CikTdXPxtbQuODpYL1x/16vZqBQvmv5WJH07uVsrwG3Zoo9R+
krKNMiI7PkTUbH6IACmk/RNPfuxK0qYpTDZcvKIj6C0PBNd/0RpVi8mjeQzgFlHd
w+AZ7iu0/8HOTNpoI8Qi0NxlHBOAHFqILrNRf78YN1I8Hisa4JYILrhU1UL9wnDg
aCLUS56SGCFAjgVlYlakFq2iR8htbFh1fPhWz3Q6BV8k84LjaaHoZvVV3A0DYhdj
4+rOvI7k5r4sKIIp7jUe2xXwEyacOziY/KrI8kzBIxeio8+CykPNlbDTgIJHW79y
U5M=
=xRhD
-----END PGP SIGNATURE-----
Merge tag 'nvme-6.2-2023-02-15' of git://git.infradead.org/nvme into block-6.2
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.2
- always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote)
- add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner)
- set the DMA mask earlier (Christoph Hellwig)"
* tag 'nvme-6.2-2023-02-15' of git://git.infradead.org/nvme:
nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
nvme-pci: set the DMA mask earlier
nvme-pci: add bogus ID quirk for ADATA SX6000PNP
- Fix a teardown bug in the new nfs4_file hashtable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmPqS30ACgkQM2qzM29m
f5fV2BAAgkmq0MF6nTjjVcFYSUyJvPO4jCcT5ZPqJcv2CHCWp/EkhOrBMHtli3y2
zUspjZfHpuV5oi2j7J1/PnlLBbov7oXjg9vZ2xGazo1Zkr5wQHxwyuFXdVzte2Qk
fGcCEgLA3aPiluafzWHMYeoZUqVSQCT41UaUW4iauc/byRZmNs0nwVsYlfhWzg1U
dsQsK03S357mmqhevxIbBVvTRNakxJwqALBVZcEgfjulESwgQ6gIJpo8srrAkZ4v
2k5bPVVTRhyDCv+8fOe7UqIyW7HwT5WDvMJg7C883wcP8xk7NHqg54t7NVoPNv5R
oZLkwUBr5G75OThCmrm8zDMt6zqT76Wyc29fBQ0hjyNml0HOiB131yYv7zGW6qqD
2e+jQpAKDkc6IIy8IzqxgHan9lMvtYfz7uIQjGGXb+vPTF5S7BIdWJjSEvgo/7a7
RNlou48UAF4NeZHcDESXFArcQ0MXd9rBnawxRVDqMzQ3owMyn3Nt8FRx7K/dNGKl
6qnIR+H7G65KQgQCxiqrGvvO3NssCBurv/9BRNRmVshVzI0pdn/seZff4TngvDfI
GWvEE7nMFE1jDBrN5o5ecLUVI+zTDuNY3AAnaH7afiRCBQauq+AziCfvXoLtfGw2
+FktTeZRgZQzMFA62H3eEDvy5dKDp/a0ivP1QglIid6oBRH/muw=
=pQWv
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix a teardown bug in the new nfs4_file hashtable
* tag 'nfsd-6.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't destroy global nfs4_file table in per-net shutdown
Eduard Zingerman says:
====================
This patch-set is a part of preparation work for -mcpu=v4 option for
BPF C compiler (discussed in [1]). Among other things -mcpu=v4 should
enable generation of BPF_ST instruction by the compiler.
- Patches #1,2 adjust verifier to track values of constants written to
stack using BPF_ST. Currently these are tracked imprecisely, unlike
the writes using BPF_STX, e.g.:
fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm
after such instruction, where m stands for "misc",
just a note that something is written at fp[-8].
r1 = 42; verifier tracks r1=42 after this instruction.
fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction.
This patch makes both cases equivalent.
- Patches #3,4 adjust verifier.c:check_stack_write_fixed_off() to
preserve STACK_ZERO marks when BPF_ST writes zero. Currently these
are replaced by STACK_MISC, unlike zero writes using BPF_STX, e.g.:
... stack range [X,Y] is marked as STACK_ZERO ...
r0 = ... variable offset pointer to stack with range [X,Y] ...
fp[r0] = 0; currently verifier marks range [X,Y] as
STACK_MISC for such instructions.
r1 = 0;
fp[r0] = r1; verifier keeps STACK_ZERO marks for range [X,Y].
This patch makes both cases equivalent.
Motivating example for patch #1 could be found at [3].
Previous version of the patch-set is here [2], the changes are:
- Explicit initialization of fake register parent link is removed from
verifier.c:check_stack_write_fixed_off() as parent links are now
correctly handled by verifier.c:save_register_state().
- Original patch #1 is split in patches #1 & #3.
- Missing test case added for patch #3
verifier.c:check_stack_write_fixed_off() adjustment.
- Test cases are updated to use .prog_type = BPF_PROG_TYPE_SK_LOOKUP,
which requires return value to be in the range [0,1] (original test
cases assumed that such range is always required, which is not true).
- Original patch #3 with changes allowing BPF_ST writes to context is
withheld for now, w/o compiler support for BPF_ST it requires some
creative testing.
- Original patch #5 is removed from the patch-set. This patch
contained adjustments to expected verifier error messages in some
tests, necessary when C compiler generates BPF_ST instruction
instead of BPF_STX (changes to expected instruction indices). These
changes are not necessary yet.
[1] https://lore.kernel.org/bpf/01515302-c37d-2ee5-c950-2f556a4caad0@meta.com/
[2] https://lore.kernel.org/bpf/20221231163122.1360813-1-eddyz87@gmail.com/
[3] https://lore.kernel.org/bpf/f1e4282bf00aa21a72fc5906f8c3be1ae6c94a5e.camel@gmail.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A test case to verify that variable offset BPF_ST instruction
preserves STACK_ZERO marks when writes zeros, e.g. in the following
situation:
*(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8]
r0 = random(-7, -1) ; some random number in range of [-7, -1]
r0 += r10 ; r0 is now variable offset pointer to stack
*(u8*)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for
; fp[-8] should be preserved.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BPF_STX instruction preserves STACK_ZERO marks for variable offset
writes in situations like below:
*(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8]
r0 = random(-7, -1) ; some random number in range of [-7, -1]
r0 += r10 ; r0 is now a variable offset pointer to stack
r1 = 0
*(u8*)(r0) = r1 ; BPF_STX writing zero, STACK_ZERO mark for
; fp[-8] is preserved
This commit updates verifier.c:check_stack_write_var_off() to process
BPF_ST in a similar manner, e.g. the following example:
*(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8]
r0 = random(-7, -1) ; some random number in range of [-7, -1]
r0 += r10 ; r0 is now variable offset pointer to stack
*(u8*)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for
; fp[-8] is preserved
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Check that verifier tracks the value of 'imm' spilled to stack by
BPF_ST_MEM instruction. Cover the following cases:
- write of non-zero constant to stack;
- write of a zero constant to stack.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For aligned stack writes using BPF_ST instruction track stored values
in a same way BPF_STX is handled, e.g. make sure that the following
commands produce similar verifier knowledge:
fp[-8] = 42; r1 = 42;
fp[-8] = r1;
This covers two cases:
- non-null values written to stack are stored as spill of fake
registers;
- null values written to stack are stored as STACK_ZERO marks.
Previously both cases above used STACK_MISC marks instead.
Some verifier test cases relied on the old logic to obtain STACK_MISC
marks for some stack values. These test cases are updated in the same
commit to avoid failures during bisect.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Just after the fix to TASK_COMM_LEN not converted to its value in
trace_events was pulled, the kernel test robot reported that the helper
function trace_define_field_ext() added to that change was only used in
the file it was defined in but was not declared static. Make it a local
function.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY+pQvhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qpWMAP958Izvo22zPjlvqypLrC4wkwOrU6BG
ITApOESLGS6YMAEA3X1qVpjgXClFmRv6j+J7S6LdhUzhkOm9Sxg5Vejxzgo=
=4fmj
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.2-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixlet from Steven Rostedt:
"Make trace_define_field_ext() static.
Just after the fix to TASK_COMM_LEN not converted to its value in
trace_events was pulled, the kernel test robot reported that the
helper function trace_define_field_ext() added to that change was only
used in the file it was defined in but was not declared static.
Make it a local function"
* tag 'trace-v6.2-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Make trace_define_field_ext() static
This fixes a regression in mediation of getattr when old policy built
under an older ABI is loaded and mapped to internal permissions.
The regression does not occur for all getattr permission requests,
only appearing if state zero is the final state in the permission
lookup. This is because despite the first state (index 0) being
guaranteed to not have permissions in both newer and older permission
formats, it may have to carry permissions that were not mediated as
part of an older policy. These backward compat permissions are
mapped here to avoid special casing the mediation code paths.
Since the mapping code already takes into account backwards compat
permission from older formats it can be applied to state 0 to fix
the regression.
Fixes: 408d53e923bd ("apparmor: compute file permissions on profile load")
Reported-by: Philip Meulengracht <the_meulengracht@hotmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
This documentation wasn't added in the original patch,
add it now.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 6e4c0d0460bd ("wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Get rid of gfp parameter from cfg80211_obss_color_collision_notify
routine description.
Fixes: 935ef47b16cc ("wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/2da652e2cd5c7903191091ae9757718f1be802a1.1676453359.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we have multiple interfaces receiving the same frame,
such as a multicast frame, one interface might have a sta
and the other not. In this case, link_sta would be set but
not cleared again.
Always set link_sta, so we keep an invariant that link_sta
and sta are either both set or both not set.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's at least one case in ieee80211_rx_for_interface()
where we might pass &((struct sta_info *)NULL)->sta to it
only to then do container_of(), and then checking the
result for NULL, but checking the result of container_of()
for NULL looks really odd.
Fix this by just passing the struct sta_info * instead.
Fixes: e66b7920aa5a ("wifi: mac80211: fix initialization of rx->link and rx->link_sta")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a connection was established without going through
NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct.
Now we set it in __cfg80211_connect_result() when it is not already set.
When using a userspace configuration that does not call
cfg80211_connect() (can be checked with breakpoints in the kernel),
this patch should allow `networkctl status device_name` to output the
SSID instead of null.
Cc: stable@vger.kernel.org
Reported-by: Yohan Prod'homme <kernel@zoddo.fr>
Fixes: 7b0a0e3c3a88 (wifi: cfg80211: do some rework towards MLO link APIs)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711
Signed-off-by: Marc Bornand <dev.mbornand@systemb.ch>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The compiler is optimizing out majority of unref_ptr read/writes, so the test
wasn't testing much. For example, one could delete '__kptr' tag from
'struct prog_test_ref_kfunc __kptr *unref_ptr;' and the test would still "pass".
Convert it to volatile stores. Confirmed by comparing bpf asm before/after.
Fixes: 2cbc469a6fc3 ("selftests/bpf: Add C tests for kptr")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230214235051.22938-1-alexei.starovoitov@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
When the BPF selftests are cross-compiled, only the a host version of
bpftool is built. This version of bpftool is used on the host-side to
generate various intermediates, e.g., skeletons.
The test runners are also using bpftool, so the Makefile will symlink
bpftool from the selftest/bpf root, where the test runners will look
the tool:
| $(Q)ln -sf $(if $2,..,.)/tools/build/bpftool/bootstrap/bpftool \
| $(OUTPUT)/$(if $2,$2/)bpftool
There are two problems for cross-compilation builds:
1. There is no native (cross-compilation target) of bpftool
2. The bootstrap/bpftool is never cross-compiled (by design)
Make sure that a native/cross-compiled version of bpftool is built,
and if CROSS_COMPILE is set, symlink the native/non-bootstrap version.
Acked-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230214161253.183458-1-bjorn@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In commit 7e2a9ebe8126 ("docs, bpf: Ensure IETF's BPF mailing list gets
copied for ISA doc changes"), a new MAINTAINERS entry was added for any
BPF IETF documentation updates for the ongoing standardization process.
I've been making it a point to try and review as many BPF documentation
patches as possible, and have made a committment to Alexei to
consistently review BPF standardization patches going forward. This
patch adds my name as a reviewer to the MAINTAINERS entry for the
standardization effort.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230214223553.78353-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There exists build error when make -C tools/testing/selftests/bpf/
on LoongArch:
BINARY test_verifier
In file included from test_verifier.c:27:
tools/include/uapi/linux/bpf_perf_event.h:14:28: error: field 'regs' has incomplete type
14 | bpf_user_pt_regs_t regs;
| ^~~~
make: *** [Makefile:577: tools/testing/selftests/bpf/test_verifier] Error 1
make: Leaving directory 'tools/testing/selftests/bpf'
Add missing uapi header for LoongArch to use the following definition:
typedef struct user_pt_regs bpf_user_pt_regs_t;
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1676458867-22052-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stephen Rothwell reported htmldocs warning when merging bpf-next tree,
which was the same warning as reported by kernel test robot:
Documentation/bpf/graph_ds_impl.rst:62: ERROR: Error in "code-block" directive:
maximum 1 argument(s) allowed, 12 supplied.
The error is due to Sphinx confuses node_data struct declaration with
code-block directive option.
Fix the warning by separating the code-block marker with node_data struct
declaration.
Link: https://lore.kernel.org/linux-next/20230215144505.4751d823@canb.auug.org.au/
Link: https://lore.kernel.org/linux-doc/202302151123.wUE5FYFx-lkp@intel.com/
Fixes: c31315c3aa0929 ("bpf, documentation: Add graph documentation for non-owning refs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230215123253.41552-3-bagasdotme@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This driver uncondictionally uses the GPIOLIB_IRQCHIP so
select it.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
The commit 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
changed the policy such that I2C touchpads may be able to wake up the
system by default if the system is configured as such.
However for some devices there is a bug, that is causing the touchpad to
instantly wake up the device again once it gets deactivated. The root cause
is still under investigation (see Link tag).
To workaround this problem for the time being, introduce a quirk for this
model that will prevent the wakeup capability for being set for GPIO 16.
Fixes: 1796f808e4bb ("HID: i2c-hid: acpi: Stop setting wakeup_capable")
Link: https://lore.kernel.org/linux-acpi/20230210164636.628462-1-wse@tuxedocomputers.com/
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: <stable@vger.kernel.org> # v6.1+
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Since recently, the kernel is nagging about mutable irq_chips:
"not an immutable chip, please consider fixing it!"
Drop the unneeded copy, flag it as IRQCHIP_IMMUTABLE, add the new
helper functions and call the appropriate gpiolib functions.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Remove acpi_device declaration, as it is no longer needed.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
In fixed-link setup phylink_parse_fixedlink() unconditionally sets
Pause, Asym_Pause and Autoneg bits to "supported" bitmap, while MAC may
not support these.
This leads to ethtool reporting:
> Supported pause frame use: Symmetric Receive-only
> Supports auto-negotiation: Yes
regardless of what is actually supported.
Instead of unconditionally set Pause, Asym_Pause and Autoneg it is
sensible to set them according to validated "supported" bitmap, i.e. the
result of phylink_validate().
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
lianhui reports that when MPLS fails to register the sysctl table
under new location (during device rename) the old pointers won't
get overwritten and may be freed again (double free).
Handle this gracefully. The best option would be unregistering
the MPLS from the device completely on failure, but unfortunately
mpls_ifdown() can fail. So failing fully is also unreliable.
Another option is to register the new table first then only
remove old one if the new one succeeds. That requires more
code, changes order of notifications and two tables may be
visible at the same time.
sysctl point is not used in the rest of the code - set to NULL
on failures and skip unregister if already NULL.
Reported-by: lianhui tang <bluetlh@gmail.com>
Fixes: 0fae3bf018d9 ("mpls: handle device renames for per-device sysctls")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e48c414ee61f ("[INET]: Generalise the TCP sock ID lookup routines")
commented out the definition of SOCK_REFCNT_DEBUG in 2005 and later another
commit 463c84b97f24 ("[NET]: Introduce inet_connection_sock") removed it.
Since we could track all of them through bpf and kprobe related tools
and the feature could print loads of information which might not be
that helpful even under a little bit pressure, the whole feature which
has been inactive for many years is no longer supported.
Link: https://lore.kernel.org/lkml/20230211065153.54116-1-kerneljasonxing@gmail.com/
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add steering table/rule in RDMA_TX domain, to forward all traffic
to IPsec crypto table in NIC domain.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Add steering tables/rules to check if the decrypted traffic is RoCEv2,
if so then forward it to RDMA_RX domain.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Add IPSec flow steering priorities in RDMA namespaces. This allows
adding tables/rules to forward RoCEv2 traffic to the IPSec crypto
tables in NIC_TX domain, and accept RoCEv2 traffic from NIC_RX domain.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>