IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Dave Marchevsky says:
====================
At Meta we have a profiling daemon which periodically collects
information on many hosts. This collection usually involves grabbing
stacks (user and kernel) using perf_event BPF progs and later symbolicating
them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on
remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In
those cases we must fall back to digging around in /proc/PID/maps to map
virtual address to (binary, offset). The /proc/PID/maps digging does not
occur synchronously with stack collection, so the process might already
be gone, in which case it won't have /proc/PID/maps and we will fail to
symbolicate.
This 'exited process problem' doesn't occur very often as
most of the prod services we care to profile are long-lived daemons, but
there are enough usecases to warrant a workaround: a BPF program which
can be optionally loaded at data collection time and essentially walks
/proc/PID/maps. Currently this is done by walking the vma list:
struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap);
mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop */
Since commit 763ecb0350 ("mm: remove the vma linked list") there's no
longer a vma linked list to walk. Walking the vma maple tree is not as
simple as hopping struct vm_area_struct->vm_next. Luckily,
commit f39af05949 ("mm: add VMA iterator"), another commit in that series,
added struct vma_iterator and for_each_vma macro for easy vma iteration. If
similar functionality was exposed to BPF programs, it would be perfect for our
usecase.
This series adds such functionality, specifically a BPF equivalent of
for_each_vma using the open-coded iterator style.
Notes:
* This approach was chosen after discussion on a previous series [0] which
attempted to solve the same problem by adding a BPF_F_VMA_NEXT flag to
bpf_find_vma.
* Unlike the task_vma bpf_iter, the open-coded iterator kfuncs here do not
drop the vma read lock between iterations. See Alexei's response in [0].
* The [vsyscall] page isn't really part of task->mm's vmas, but
/proc/PID/maps returns information about it anyways. The vma iter added
here does not do the same. See comment on selftest in patch 3.
* bpf_iter_task_vma allocates a _data struct which contains - among other
things - struct vma_iterator, using BPF allocator and keeps a pointer to
the bpf_iter_task_vma_data. This is done in order to prevent changes to
struct ma_state - which is wrapped by struct vma_iterator - from
necessitating changes to uapi struct bpf_iter_task_vma.
Changelog:
v6 -> v7: https://lore.kernel.org/bpf/20231010185944.3888849-1-davemarchevsky@fb.com/
Patch numbers correspond to their position in v6
Patch 2 ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
* Add Andrii ack
Patch 3 ("bpf: Introduce task_vma open-coded iterator kfuncs")
* Add Andrii ack
* Add missing __diag_ignore_all for -Wmissing-prototypes (Song)
Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter")
* Remove two unnecessary header includes (Andrii)
* Remove extraneous !vmas_seen check (Andrii)
New Patch ("bpf: Add BPF_KFUNC_{START,END}_defs macros")
* After talking to Andrii, this is an attempt to clean up __diag_ignore_all
spam everywhere kfuncs are defined. If nontrivial changes are needed,
let's apply the other 4 and I'll respin as a standalone patch.
v5 -> v6: https://lore.kernel.org/bpf/20231010175637.3405682-1-davemarchevsky@fb.com/
Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter")
* Remove extraneous blank line. I did this manually to the .patch file
for v5, which caused BPF CI to complain about failing to apply the
series
v4 -> v5: https://lore.kernel.org/bpf/20231002195341.2940874-1-davemarchevsky@fb.com/
Patch numbers correspond to their position in v4
New Patch ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
* Patch 2's renaming of this selftest, and associated changes in the
userspace runner, are split out into this separate commit (Andrii)
Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
* Remove bpf_iter_task_vma kfuncs from libbpf's bpf_helpers.h, they'll be
added to selftests' bpf_experimental.h in selftests patch below (Andrii)
* Split bpf_iter_task_vma.c renaming into separate commit (Andrii)
Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter")
* Add bpf_iter_task_vma kfuncs to bpf_experimental.h (Andrii)
* Remove '?' from prog SEC, open_and_load the skel in one operation (Andrii)
* Ensure that fclose() always happens in test runner (Andrii)
* Use global var w/ 1000 (vm_start, vm_end) structs instead of two
MAP_TYPE_ARRAY's w/ 1k u64s each (Andrii)
v3 -> v4: https://lore.kernel.org/bpf/20230822050558.2937659-1-davemarchevsky@fb.com/
Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num")
* Add Andrii ack
Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
* Mark bpf_iter_task_vma_new args KF_RCU and remove now-unnecessary !task
check (Yonghong)
* Although KF_RCU is a function-level flag, in reality it only applies to
the task_struct *task parameter, as the other two params are a scalar int
and a specially-handled KF_ARG_PTR_TO_ITER
* Remove struct bpf_iter_task_vma definition from uapi headers, define in
kernel/bpf/task_iter.c instead (Andrii)
Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter")
* Use a local var when looping over vmas to track map idx. Update vmas_seen
global after done iterating. Don't start iterating or update vmas_seen if
vmas_seen global is nonzero. (Andrii)
* Move getpgid() call to correct spot - above skel detach. (Andrii)
v2 -> v3: https://lore.kernel.org/bpf/20230821173415.1970776-1-davemarchevsky@fb.com/
Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num")
* Add Yonghong ack
Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
* UAPI bpf header and tools/ version should match
* Add bpf_iter_task_vma_kern_data which bpf_iter_task_vma_kern points to,
bpf_mem_alloc/free it instead of just vma_iterator. (Alexei)
* Inner data ptr == NULL implies initialization failed
v1 -> v2: https://lore.kernel.org/bpf/20230810183513.684836-1-davemarchevsky@fb.com/
* Patch 1
* Now removes the unnecessary BTF_TYPE_EMIT instead of changing the
type (Yonghong)
* Patch 2
* Don't do unnecessary BTF_TYPE_EMIT (Yonghong)
* Bump task refcount to prevent ->mm reuse (Yonghong)
* Keep a pointer to vma_iterator in bpf_iter_task_vma, alloc/free
via BPF mem allocator (Yonghong, Stanislav)
* Patch 3
[0]: https://lore.kernel.org/bpf/20230801145414.418145-1-davemarchevsky@fb.com/
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
The open-coded task_vma iter added earlier in this series allows for
natural iteration over a task's vmas using existing open-coded iter
infrastructure, specifically bpf_for_each.
This patch adds a test demonstrating this pattern and validating
correctness. The vma->vm_start and vma->vm_end addresses of the first
1000 vmas are recorded and compared to /proc/PID/maps output. As
expected, both see the same vmas and addresses - with the exception of
the [vsyscall] vma - which is explained in a comment in the prog_tests
program.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231013204426.1074286-5-davemarchevsky@fb.com
This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_task_vma in open-coded
iterator style. BPF programs can use these kfuncs directly or through
bpf_for_each macro for natural-looking iteration of all task vmas.
The implementation borrows heavily from bpf_find_vma helper's locking -
differing only in that it holds the mmap_read lock for all iterations
while the helper only executes its provided callback on a maximum of 1
vma. Aside from locking, struct vma_iterator and vma_next do all the
heavy lifting.
A pointer to an inner data struct, struct bpf_iter_task_vma_data, is the
only field in struct bpf_iter_task_vma. This is because the inner data
struct contains a struct vma_iterator (not ptr), whose size is likely to
change under us. If bpf_iter_task_vma_kern contained vma_iterator directly
such a change would require change in opaque bpf_iter_task_vma struct's
size. So better to allocate vma_iterator using BPF allocator, and since
that alloc must already succeed, might as well allocate all iter fields,
thereby freezing struct bpf_iter_task_vma size.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231013204426.1074286-4-davemarchevsky@fb.com
Further patches in this series will add a struct bpf_iter_task_vma,
which will result in a name collision with the selftest prog renamed in
this patch. Rename the selftest to avoid the collision.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231013204426.1074286-3-davemarchevsky@fb.com
Commit 6018e1f407 ("bpf: implement numbers iterator") added the
BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num
doesn't exist, so only a forward declaration is emitted in BTF:
FWD 'btf_iter_num' fwd_kind=struct
That commit was probably hoping to ensure that struct bpf_iter_num is
emitted in vmlinux BTF. A previous version of this patch changed the
line to emit the correct type, but Yonghong confirmed that it would
definitely be emitted regardless in [0], so this patch simply removes
the line.
This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't
causing any issues that I noticed, aside from mild confusion when I
looked through the code.
[0]: https://lore.kernel.org/bpf/25d08207-43e6-36a8-5e0f-47a913d4cda5@linux.dev/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231013204426.1074286-2-davemarchevsky@fb.com
linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support
for lazy preemption")) that adds an extra member to struct trace_entry.
This causes the offset of args field in struct trace_event_raw_sys_enter
be different from the one in struct syscall_trace_enter:
struct trace_event_raw_sys_enter {
struct trace_entry ent; /* 0 12 */
/* XXX last struct has 3 bytes of padding */
/* XXX 4 bytes hole, try to pack */
long int id; /* 16 8 */
long unsigned int args[6]; /* 24 48 */
/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
char __data[]; /* 72 0 */
/* size: 72, cachelines: 2, members: 4 */
/* sum members: 68, holes: 1, sum holes: 4 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 8 bytes */
};
struct syscall_trace_enter {
struct trace_entry ent; /* 0 12 */
/* XXX last struct has 3 bytes of padding */
int nr; /* 12 4 */
long unsigned int args[]; /* 16 0 */
/* size: 16, cachelines: 1, members: 3 */
/* paddings: 1, sum paddings: 3 */
/* last cacheline: 16 bytes */
};
This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
test_profiler testcase because max_ctx_offset is calculated based on the
former struct, while off on the latter:
10488 if (is_tracepoint || is_syscall_tp) {
10489 int off = trace_event_get_offsets(event->tp_event);
10490
10491 if (prog->aux->max_ctx_offset > off)
10492 return -EACCES;
10493 }
What bpf program is actually getting is a pointer to struct
syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes
the problem by aligning struct syscall_tp_t with struct
syscall_trace_(enter|exit) and changing the tests to use these structs
to dereference context.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20231013054219.172920-1-asavkov@redhat.com
It was reported that there is a compiler warning on the unused variable
"sin_addr_len" in af_inet.c when CONFIG_CGROUP_BPF is not set.
This patch is to address it similar to the ipv6 counterpart
in inet6_getname(). It is to "return sin_addr_len;"
instead of "return sizeof(*sin);".
Fixes: fefba7d1ae ("bpf: Propagate modified uaddrlen from cgroup sockaddr programs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/bpf/20231013185702.3993710-1-martin.lau@linux.dev
Closes: https://lore.kernel.org/bpf/20231013114007.2fb09691@canb.auug.org.au/
Reproduce environment:
network with 3 VM linuxs is connected as below:
VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3
VM1: eth0 ip: 192.168.122.207 MTU 1800
VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500
VM3: eth0 ip: 192.168.123.240 MTU 1800
Reproduce:
VM1 send 1600 bytes UDP data to VM3 using tools scapy with flags='DF'.
scapy command:
send(IP(dst="192.168.123.240",flags='DF')/UDP()/str('0'*1600),count=1,
inter=1.000000)
Result:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 7 0 2 2 0 0 2 5 0 0 0 0 0 0 0 1 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
ForwDatagrams is always keeping 2 without increment.
Issue description and patch:
ip_exceeds_mtu() in ip_forward() drops this IP datagram because skb len
(1600 sending by scapy) is over MTU(1500 in VM2) if "DF" is set.
According to RFC 4293 "3.2.3. IP Statistics Tables",
+-------+------>------+----->-----+----->-----+
| InForwDatagrams (6) | OutForwDatagrams (6) |
| V +->-+ OutFragReqds
| InNoRoutes | | (packets)
/ (local packet (3) | |
| IF is that of the address | +--> OutFragFails
| and may not be the receiving IF) | | (packets)
the IPSTATS_MIB_OUTFORWDATAGRAMS should be counted before fragment
check.
The existing implementation, instead, would incease the counter after
fragment check: ip_exceeds_mtu() in ipv4 and ip6_pkt_too_big() in ipv6.
So do patch to move IPSTATS_MIB_OUTFORWDATAGRAMS counter to ip_forward()
for ipv4 and ip6_forward() for ipv6.
Test result with patch:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss
Ip: 1 64 7 0 2 3 0 0 2 5 0 0 0 0 0 0 0 1 0
......
root@qemux86-64:~#
----------------------------------------------------------------------
ForwDatagrams is updated from 2 to 3.
Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
Signed-off-by: Heng Guo <heng.guo@windriver.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231011015137.27262-1-heng.guo@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky says:
====================
This PR is collected from
https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org
This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).
These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.
[1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/
* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Handle IPsec steering upon master unbind/bind
net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
net/mlx5: Add create alias flow table function to ipsec roce
net/mlx5: Implement alias object allow and create functions
net/mlx5: Add alias flow table bits
net/mlx5: Store devcom pointer inside IPsec RoCE
net/mlx5: Register mlx5e priv to devcom in MPV mode
RDMA/mlx5: Send events from IB driver about device affiliation state
net/mlx5: Introduce ifc bits for migration in a chunk mode
====================
Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sabrina Dubroca says:
====================
net: tls: various code cleanups and improvements
This series contains multiple cleanups and simplifications for the
config code of both TLS_SW and TLS_HW.
It also modifies the chcr_ktls driver to use driver_state like all
other drivers, so that we can then make driver_state fixed size
instead of a flex array always allocated to that same fixed size. As
reported by Gustavo A. R. Silva, the way chcr_ktls misuses
driver_state irritates GCC [1].
Patches 1 and 2 are follow-ups to my previous cipher_desc series.
[1] https://lore.kernel.org/netdev/ZRvzdlvlbX4+eIln@work/
====================
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
driver_state is a flex array, but is always allocated by the tls core
to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by
making that size explicit so that sizeof(struct
tls_offload_context_{tx,rx}) works.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
chcr_ktls uses the space reserved in driver_state by
tls_set_device_offload, but makes up into own wrapper around
tls_offload_context_tx instead of accessing driver_state via the
__tls_driver_ctx helper.
In this driver, driver_state is only used to store a pointer to a
larger context struct allocated by the driver.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.
While at it, fix up the reverse xmas tree ordering.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not really needed since we end up refetching it as tls_ctx. We
can also remove the NULL check, since we have already dereferenced ctx
in do_tls_setsockopt_conf.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most values are shared. Nonce size turns out to be equal to IV size
for all offloadable ciphers.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify tls_set_sw_offload, and allow reuse for the tls_device code.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit
in cipher_context's size and can simplify the init code a bit.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's defined in include/net/tls.h, avoid using an overly generic name.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should never reach tls_device_reencrypt, tls_enc_record, or
tls_enc_skb with a cipher_type that can't be offloaded. Replace those
checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of
hard-coding offloadable cipher types.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I skipped this conversion in my previous series.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is just a trivial fix for a typo in a comment, no functional
changes.
Signed-off-by: Johannes Zink <j.zink@pengutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The file name used in flash test was "dummy" because at the time test
was written, drivers were responsible for file request and as netdevsim
didn't do that, name was unused. However, the file load request is
now done in devlink code and therefore the file has to exist.
Use first random file from /lib/firmware for this purpose.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add software timestamp capabilities to the xen-netback driver
by advertising it on the struct ethtool_ops and calling
skb_tx_timestamp before passing the buffer to the queue.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
NUL-padding is not required as the buffer is already memset to 0:
| memset(adapter->fw_version, 0, 32);
Note that another usage of strscpy exists on the same buffer:
| strscpy((char *)adapter->fw_version, "N/A", sizeof(adapter->fw_version));
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use preferred device_get_match_data() instead of of_match_device() and
acpi_match_device() to get the driver match data. With this, adjust the
includes to explicitly include the correct headers.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
While there, use struct_size() helper, instead of the open-coded
version, to calculate the size for the allocation of the whole
flexible structure, including of course, the flexible-array member.
This code was found with the help of Coccinelle, and audited and
fixed manually.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use preferred spi_get_device_match_data() instead of of_match_device() and
spi_get_device_id() to get the driver match data. With this, adjust the
includes to explicitly include the correct headers.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use preferred device_get_match_data() instead of of_match_device() to
get the driver match data. With this, adjust the includes to explicitly
include the correct headers.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, clock configuration is spread throughout the driver and
partially duplicated for the STM32MP1 and STM32 MCU variants. This makes
it difficult to keep track of which clocks need to be enabled or disabled
in various scenarios.
This patch adds symmetric stm32_dwmac_clk_enable/disable() functions
that handle all clock configuration, including quirks required while
suspending or resuming. syscfg_clk and clk_eth_ck are not present on
STM32 MCUs, but it is fine to try to configure them anyway since NULL
clocks are ignored.
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen says:
====================
Extend VXLAN driver to support FDB flushing
The merge commit 9271686937 ("Merge branch 'br-flush-filtering'") added
support for FDB flushing in bridge driver. Extend VXLAN driver to support
FDB flushing also. Add support for filtering by fields which are relevant
for VXLAN FDBs:
* Source VNI
* Nexthop ID
* 'router' flag
* Destination VNI
* Destination Port
* Destination IP
Without this set, flush for VXLAN device fails:
$ bridge fdb flush dev vx10
RTNETLINK answers: Operation not supported
With this set, such flush works with the relevant arguments, for example:
$ bridge fdb flush dev vx10 vni 5000 dst 193.2.2.1
< flush all vx10 entries with VNI 5000 and destination IP 193.2.2.1>
Some preparations are required, handle them before adding flushing support
in VXLAN driver. See more details in commit messages.
Patch set overview:
Patch #1 prepares flush policy to be used by VXLAN driver
Patches #2-#3 are preparations in VXLAN driver
Patch #4 adds an initial support for flushing in VXLAN driver
Patches #5-#9 add support for filtering by several attributes
Patch #10 adds a test for FDB flush with VXLAN
Patch #11 extends the test to check FDB flush with bridge
====================
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the test to check flushing with bridge device, test flush by device
and by VID.
Add test case for flushing with "self" and "master" and attributes that are
supported only in one driver, this is unrecommended configuration, check it
to verify that user gets an error.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test all the supported arguments for FDB flush. The test checks
configuration, not traffic. Note that the flag 'offloaded' is not checked
as it is not relevant when there is no hardware.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for flush VXLAN FDB entries by destination IP. FDB entry is
stored as {MAC, SRC_VNI} + remote. The destination IP is an attribute of
the remote. For multicast entries, the VXLAN driver stores a linked list
of remotes for a given key.
In user space, each remote is represented as a separate entry, so when
flush is sent with filter of 'destination IP', flush only the match
remotes. In case that there are no additional remotes, destroy the entry.
For example, the following are stored as one entry with several remotes:
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.3 self permanent
00:00:00:00:00:00 dst 192.1.1.1 self permanent
00:00:00:00:00:00 dst 192.1.1.2 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 1000 self permanent
When user flush by destination IP x, only the relevant remotes will be
flushed:
$ bridge fdb flush dev vx10 dst 192.1.1.1
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.3 self permanent
00:00:00:00:00:00 dst 192.1.1.2 self permanent
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for flush VXLAN FDB entries by destination port. FDB entry
is stored as {MAC, SRC_VNI} + remote. The destination port is an attribute
of the remote. For multicast entries, the VXLAN driver stores a linked list
of remotes for a given key.
In user space, each remote is represented as a separate entry, so when
flush is sent with filter of 'destination port', flush only the match
remotes. In case that there are no additional remotes, destroy the entry.
For example, the following are stored as one entry with several remotes:
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.1 port 1111 vni 2000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 port 1111 vni 3000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 port 2222 vni 2000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent
When user flush by port x, only the relevant remotes will be flushed:
$ bridge fdb flush dev vx10 port 1111
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.1 port 2222 vni 2000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for flush VXLAN FDB entries by destination VNI. FDB entry is
stored as {MAC, SRC_VNI} + remote. The destination VNI is an attribute
of the remote. For multicast entries, the VXLAN driver stores a linked list
of remotes for a given key.
In user space, each remote is represented as a separate entry, so when
flush is sent with filter of 'destination VNI', flush only the match
remotes. In case that there are no additional remotes, destroy the entry.
For example, the following are stored as one entry with several remotes:
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 4000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 2000 self permanent
00:00:00:00:00:00 dst 192.1.1.2 vni 2000 self permanent
When user flush by VNI x, only the relevant remotes will be flushed:
$ bridge fdb flush dev vx10 vni 2000
$ bridge fdb show dev vx10
00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent
00:00:00:00:00:00 dst 192.1.1.1 vni 4000 self permanent
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for flush VXLAN FDB entries by nexthop ID.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for flush VXLAN FDB entries by source VNI.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The merge commit 9271686937 ("Merge branch 'br-flush-filtering'")
added support for FDB flushing in bridge driver only, the VXLAN driver does
not support such flushing. Extend VXLAN driver to support FDB flushing.
In this commit, add support for flushing with state and flags, which are
the fields that supported in the bridge driver.
Note that bridge driver supports 'NTF_USE' flag, but there is no point to
support this flag for flushing as it is ignored when flags are stored.
'NTF_STICKY' is not relevant for VXLAN driver.
'NTF_ROUTER' is not supported in bridge driver for flush as it is not
relevant for bridge, add it for VXLAN.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the function vxlan_flush() does not flush the default FDB entry
(an entry with all_zeros_mac and default VNI), as it is deleted at
vxlan_uninit(). When this function will be used for flushing FDB entries
from user space, it will have to flush also the default entry in case that
other parameters match (e.g., VNI, flags).
Extend 'struct vxlan_fdb_flush_desc' to include an indication whether
the default entry should be flushed or not. The default value (false)
indicates to flush it, adjust all the existing callers to set
'.ignore_default_entry' to true, so the current behavior will not be
changed.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function vxlan_flush() gets a boolean called 'do_all' and in case
that it is false, it does not flush entries with state 'NUD_PERMANENT'
or 'NUD_NOARP'. The following patches will add support for FDB flush
with parameters from user space. Make the function more generic, so it
can be used later.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The merge commit 9271686937 ("Merge branch 'br-flush-filtering'")
added support for FDB flushing in bridge driver. The following patches
will extend VXLAN driver to support FDB flushing as well. The netlink
message for bulk delete is shared between the drivers. With the existing
implementation, there is no way to prevent user from flushing with
attributes that are not supported per driver. For example, when VNI will
be added, user will not get an error for flush FDB entries in bridge
with VNI, although this attribute is not relevant for bridge.
As preparation for support of FDB flush in VXLAN driver, move the policy
to be handled in bridge driver, later a new policy for VXLAN will be
added in VXLAN driver. Do not pass 'vid' as part of ndo_fdb_del_bulk(),
as this field is relevant only for bridge.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
kernel/bpf/verifier.c
829955981c ("bpf: Fix verifier log for async callback return values")
a923819fb2 ("bpf: Treat first argument as return value for bpf_throw")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previous releases - regressions:
- af_packet: fix fortified memcpy() without flex array.
- tcp: fix crashes trying to free half-baked MTU probes
- xdp: fix zero-size allocation warning in xskq_create()
- can: sja1000: always restart the tx queue after an overrun
- eth: mlx5e: again mutually exclude RX-FCS and RX-port-timestamp
- eth: nfp: avoid rmmod nfp crash issues
- eth: octeontx2-pf: fix page pool frag allocation warning
Previous releases - always broken:
- mctp: perform route lookups under a RCU read-side lock
- bpf: s390: fix clobbering the caller's backchain in the trampoline
- phy: lynx-28g: cancel the CDR check work item on the remove path
- dsa: qca8k: fix qca8k driver for Turris 1.x
- eth: ravb: fix use-after-free issue in ravb_tx_timeout_work()
- eth: ixgbe: fix crash with empty VF macvlan list
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmUnw0USHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkN0EP/RKl317fLqlm6ZzRUMVP169CNRAgMaBG
7FIwxlCv4hfO2Rx09Mxu2wjDp+tBQKqBKaxfcwh8tEdLMqqCymOW2K5+tWVty8C8
TJJS+zggqLAo7DjXbnT8GBm5owHPLKGNxW6vRmnw9xraCD/nuV1wqolI2+l4IxB+
kqfliltepnJSakg0uXg7/uwAE87slBzX5VgB6K5JKLiiDMD8tYoAUmZzH8bMJd0l
Cl7+L+ucRfQkj0DPfuZM/FncM0el7oFB6imnKd36hD6vfDfCNxpyNBYG1yZ/61/N
7H3E595Hr9PA+YBZjja3UvQGbFXkyMHloQdYxmq4s0T2WHqKwRyjLlwPayMXvavn
OTJh2VAs68ivtti0ry5Nbgz4viiNfr32PLyZr6XySwCZ1/TCLjV4Cq9IYnaP3YeM
KA+CIl3d0asQdZuMXTBivmtF65Buawt9UX/gJzUst2mNdcqhV1RTNWDNWoFLQ0qW
gz8XN68V5LhbaaOq/Lat80krWgNLNZIlTNmSsE/Ie799w7dAHn/xvT6h+h5pF1XX
dhng9NK7RL7KVcI/9walArOnhz9ksGWc2+JPMQohuPM/ITMHW11oOUOX6NwAre5m
hBJKh+Rz7ylLDLn33C4qowUhxnJlqqm+rDCVDTmoYngEFQvhEl19mfndSsC8P/K/
xXQJ+diS/Jug
=orAS
-----END PGP SIGNATURE-----
Merge tag 'net-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN and BPF.
We have a regression in TC currently under investigation, otherwise
the things that stand off most are probably the TCP and AF_PACKET
fixes, with both issues coming from 6.5.
Previous releases - regressions:
- af_packet: fix fortified memcpy() without flex array.
- tcp: fix crashes trying to free half-baked MTU probes
- xdp: fix zero-size allocation warning in xskq_create()
- can: sja1000: always restart the tx queue after an overrun
- eth: mlx5e: again mutually exclude RX-FCS and RX-port-timestamp
- eth: nfp: avoid rmmod nfp crash issues
- eth: octeontx2-pf: fix page pool frag allocation warning
Previous releases - always broken:
- mctp: perform route lookups under a RCU read-side lock
- bpf: s390: fix clobbering the caller's backchain in the trampoline
- phy: lynx-28g: cancel the CDR check work item on the remove path
- dsa: qca8k: fix qca8k driver for Turris 1.x
- eth: ravb: fix use-after-free issue in ravb_tx_timeout_work()
- eth: ixgbe: fix crash with empty VF macvlan list"
* tag 'net-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
rswitch: Fix imbalance phy_power_off() calling
rswitch: Fix renesas_eth_sw_remove() implementation
octeontx2-pf: Fix page pool frag allocation warning
nfc: nci: assert requested protocol is valid
af_packet: Fix fortified memcpy() without flex array.
net: tcp: fix crashes trying to free half-baked MTU probes
net/smc: Fix pos miscalculation in statistics
nfp: flower: avoid rmmod nfp crash issues
net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read
ethtool: Fix mod state of verbose no_mask bitset
net: nfc: fix races in nfc_llcp_sock_get() and nfc_llcp_sock_get_sn()
mctp: perform route lookups under a RCU read-side lock
net: skbuff: fix kernel-doc typos
s390/bpf: Fix unwinding past the trampoline
s390/bpf: Fix clobbering the caller's backchain in the trampoline
net/mlx5e: Again mutually exclude RX-FCS and RX-port-timestamp
net/smc: Fix dependency of SMC on ISM
ixgbe: fix crash with empty VF macvlan list
net/mlx5e: macsec: use update_pn flag instead of PN comparation
net: phy: mscc: macsec: reject PN update requests
...
AngeloGioacchino Del Regno is stepping in as co-maintainer for the
MediaTek SoC platform and starts by sending some dts fixes for
the mt8195 platform that had been pending for a while.
On the ixp4xx platform, Krzysztof Halasa steps down as co-maintainer,
reflecting that Linus Walleij has been handling this on his own
for the past few years.
Generic RISC-V kernels are now marked as incompatible with the
RZ/Five platform that requires custom hacks both for managing
its DMA bounce buffers and for addressing low virtual memory.
Finally, there is one bugfix for the AMDTEE firmware driver
to prevent a use-after-free bug.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmUn5QgACgkQYKtH/8kJ
UicWRw/+J+gYuPbjAO5A34KjcvE0/oHoX0CartiJLjGMSboXqjvlJOL2V37q9cTO
kt/all/wWYnyvr3L09jPKZY8J9stw6wgMpkPZpcAORkF/Vc8KNEvBBVVnTIZSlie
G6HSNW1S3qMPdt2mxjPWeO7aoKqq/lIuQoJDDAh3XQWYowy7++o6TreLs14UsGfv
+PRNm5dR+SGe5QC/vIJIn0U7bTD7PRQ7xEdv2LC+ANto+mbtdyVOKh16kcTnzO+2
NUHmBQvHqGS0Q1uN1hiXQocL9WA7vreVLk7ARbq/SLr1ccOsxJrxKj9LYPhoLq68
8oJCHR8RBAXxYInhiw2xR62KczTEVickNWlHR7aiWlQ+Bxha/YhpmUAzh/hrlvWg
edCBUSIxQW1CyLmbMxAqyHQn72F+sMM/LulhmftHuBcbF1YwNseAV67MKjoMSTr0
rjSiXpzdomCvgZxhJYujHLjugKh6jfLMRwPx+0P6qKebdm/y1a17kGtUf/NQ24bn
nDAeOAKWRRdEu4CjcoYkzVLgE6MlXUiSbSmpsPpDevge1qbcrfHgIATHech4oyDd
h2o8xIO37H4QB3s9w18g05OQRToRlBHPMxQhD+vlRy77Zd9BE7wZqKcwR9XjkyyX
+qPcNHVN0khxf+/NYiIE/Wn5Z57PL2vvgYoSp2L2Wi+UiYEZ0Ek=
=Ukoh
-----END PGP SIGNATURE-----
Merge tag 'soc-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"AngeloGioacchino Del Regno is stepping in as co-maintainer for the
MediaTek SoC platform and starts by sending some dts fixes for the
mt8195 platform that had been pending for a while.
On the ixp4xx platform, Krzysztof Halasa steps down as co-maintainer,
reflecting that Linus Walleij has been handling this on his own for
the past few years.
Generic RISC-V kernels are now marked as incompatible with the RZ/Five
platform that requires custom hacks both for managing its DMA bounce
buffers and for addressing low virtual memory.
Finally, there is one bugfix for the AMDTEE firmware driver to prevent
a use-after-free bug"
* tag 'soc-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
IXP4xx MAINTAINERS entries
arm64: dts: mediatek: mt8195: Set DSU PMU status to fail
arm64: dts: mediatek: fix t-phy unit name
arm64: dts: mediatek: mt8195-demo: update and reorder reserved memory regions
arm64: dts: mediatek: mt8195-demo: fix the memory size to 8GB
MAINTAINERS: Add Angelo as MediaTek SoC co-maintainer
soc: renesas: Make ARCH_R9A07G043 (riscv version) depend on NONPORTABLE
tee: amdtee: fix use-after-free vulnerability in amdtee_close_session